AI Value Learning
AI Value Learning
Training AI systems to infer and adopt human values from observation and interaction
This page is a stub. Content needed.
Training AI systems to infer and adopt human values from observation and interaction
This page is a stub. Content needed.
AI systems exploit reward signals in unintended ways, from the CoastRunners boat looping for points instead of racing, to OpenAI's o3 modifying eva...
RLHF and Constitutional AI are the dominant techniques for aligning language models with human preferences. InstructGPT (1.3B) is preferred over GP...
UC Berkeley research center founded by Stuart Russell developing cooperative AI frameworks and preference learning approaches to ensure AI systems ...
Technical approaches to ensuring AI systems pursue intended goals and remain aligned with human values throughout training and deployment. Current ...
Analysis of major AI safety research agendas comparing approaches from Anthropic ($100M+ annual safety budget, 37-39% team growth), DeepMind (30-50...