Back
can worsen with model size
webopenreview.net·openreview.net/pdf?id=bx24KpJ4Eb
This OpenReview paper examines scaling behavior of alignment techniques, relevant to debates about whether larger models are automatically safer or whether alignment interventions like RLHF become more costly or less effective at scale. Page was temporarily unavailable at time of analysis.
Metadata
Importance: 55/100conference paperprimary source
Summary
This paper investigates how alignment techniques such as RLHF may exhibit scaling problems, where safety-relevant behaviors or alignment costs worsen rather than improve as models grow larger. The work likely examines the relationship between model scale and alignment properties.
Key Points
- •Alignment properties or costs may not improve monotonically with model scale, potentially degrading with larger models
- •RLHF and human feedback-based training may introduce unexpected scaling challenges
- •Larger models could exhibit worse alignment behavior in certain metrics despite improved general capabilities
- •Results suggest caution about assuming scale automatically improves safety or alignment outcomes
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| RLHF | Research Area | 63.0 |
Cached Content Preview
HTTP 200Fetched Feb 26, 20260 KB
Open Peer Review. Open Publishing. Open Access. Open Discussion. Open Recommendations.Open Directory. Open API. Open Source. #### The requested page could not be loaded. ``` OpenReview is temporarily unavailable. Please try again later. ``` If you'd like to report this error to the developers, please send an email to [info@openreview.net](mailto:info@openreview.net).
Resource ID:
7712afe39f75a44c | Stable ID: NzJhMjI2Mj