CAIA Speaker Event: Sandhini Agarwal, OpenAI @ Caltech
Who: Sandhini Agarwal, OpenAI (Virtual)
When: March 7, 4-5 pm PT
Where: Broad 100
Zoom: https://rit.zoom.us/j/95446915834
Talk: Collective alignment, public input on OpenAI's Model Spec
This talk walks through OpenAI's "collective alignment" experiment: surveying 1,000+ people globally about value sensitive model behavior, comparing those preferences to the Model Spec, and turning disagreement points into concrete Spec updates through internal review. It will cover what kinds of prompts were used, how preferences were translated into actionable rules, which proposals were adopted versus deferred, and the main limitations of this approach.
About the speaker:
Sandhini Agarwal is an AI Policy Researcher at OpenAI on the Deployment Planning team, focused on enabling safe and responsible deployment of OpenAI models. Her prior work at OpenAI includes leading the bias analysis for GPT3, building OpenAI's first content filter, leading research and strategy on deployment implications of im2text multimodal models (CLIP), and building safety-focused evaluations for large scale language models.
Everyone is welcome: No specific technical background, ask questions.
And yes, we will have pizza and boba.
