AI Value Learning

Safety Agenda

AI Value Learning

Training AI systems to infer and adopt human values from observation and interaction

LessWrong AI Safety Info Alignment Forum

Approaches

Risks

7 words · 1 backlinks

This page is a stub. Content needed.

Top Related Pages

Risk

Reward Hacking

AI systems exploit reward signals in unintended ways, from the CoastRunners boat looping for points instead of racing, to OpenAI's o3 modifying eva...

Capability

RLHF

RLHF and Constitutional AI are the dominant techniques for aligning language models with human preferences. InstructGPT (1.3B) is preferred over GP...

Organization

CHAI

UC Berkeley research center founded by Stuart Russell developing cooperative AI frameworks and preference learning approaches to ensure AI systems ...

Approach

AI Alignment

Technical approaches to ensuring AI systems pursue intended goals and remain aligned with human values throughout training and deployment. Current ...

Crux

AI Alignment Research Agendas

Analysis of major AI safety research agendas comparing approaches from Anthropic ($100M+ annual safety budget, 37-39% team growth), DeepMind (30-50...

AI Value Learning