H.B. Keller Colloquium
Annenberg 105
Recent Advances in AI Safety and Robustness
As AI systems become more powerful, it is increasingly important that developers be able to strictly enforce desired policies for the systems. Unfortunately, via techniques such as adversarial attacks, it has traditionally been possible to circumvent model policies, allowing bad actors to manipulate LLMs for unintended and potentially harmful purposes. In this talk, I will highlight several recent directions of work that are making progress in addressing these challenges, including methods for robustness to jailbreaks, safety pre-training, and methods for preventing undesirable model distillation. I will additionally highlight some of the areas I believe to be most crucial for future work in the field.
For more information, please contact Narin Seraydarian by phone at (626) 395-6580 or by email at [email protected].
Event Series
H. B. Keller Colloquium Series