Skip to content
Longterm Wiki
Back

can worsen with model size

web

This OpenReview paper examines scaling behavior of alignment techniques, relevant to debates about whether larger models are automatically safer or whether alignment interventions like RLHF become more costly or less effective at scale. Page was temporarily unavailable at time of analysis.

Metadata

Importance: 55/100conference paperprimary source

Summary

This paper investigates how alignment techniques such as RLHF may exhibit scaling problems, where safety-relevant behaviors or alignment costs worsen rather than improve as models grow larger. The work likely examines the relationship between model scale and alignment properties.

Key Points

  • Alignment properties or costs may not improve monotonically with model scale, potentially degrading with larger models
  • RLHF and human feedback-based training may introduce unexpected scaling challenges
  • Larger models could exhibit worse alignment behavior in certain metrics despite improved general capabilities
  • Results suggest caution about assuming scale automatically improves safety or alignment outcomes

Cited by 1 page

PageTypeQuality
RLHFResearch Area63.0

Cached Content Preview

HTTP 200Fetched Feb 26, 20260 KB
Open Peer Review. Open Publishing. Open Access.
Open Discussion. Open Recommendations.Open Directory. Open API. Open Source.

#### The requested page could not be loaded.

```
OpenReview is temporarily unavailable. Please try again later.
```

If you'd like to report this error to the developers, please send an email
to [info@openreview.net](mailto:info@openreview.net).
Resource ID: 7712afe39f75a44c | Stable ID: NzJhMjI2Mj