CAISI Research Program at CIFAR
Building Trust & Fairness
Introduction
Actively embedding human values and equity into AI systems ensures that AI is aligned with the public good and does not perpetuate existing societal harms such as bias and discrimination.
Intelligent Ideas with Yoshua Bengio
How do we design AIs that will not harm people? Canada CIFAR AI Chair Yoshua Bengio (Mila, University of Montreal, LawZero) explores the various ways that AI could harm society and why he remains optimistic about the future.
Spotlight
Building Safe AI Through More Reliable Reasoning
How can we trust an AI that knows more than we do? AI systems draw on immense data, leaving users unable to verify their outputs. While AI debate was proposed to solve this by forcing models to cite evidence, current systems fail to argue reliably — misinterpreting facts and taking sources out of context. To earn trust, an AI must learn to construct and support justifications that can be taken at face value.
One research team aims to address this issue by developing an AI Debate Framework, a goal pursued by Gillian Hadfield, (Johns Hopkins University, University of Toronto (status only), Vector Institute) and her team, including Maria Ryskina, in their 2025 AI Safety Catalyst project funded by CIFAR.
“Our goal is to teach language models to construct reasonable justifications and to support them with valid evidence from available sources of information,” says Maria Ryskina, a CIFAR AI Safety Postdoctoral Fellow at Vector Institute. “Such models will be more worthy of users’ trust as they can be reliably overseen by non-experts.”
Trust Between Humans and AI Systems
Drawing on insights from law, economics, cultural evolution, and political science, the AI Debate Framework project aims to equip AI with normative reasoning — making choices informed by the rules that coordinate behaviour, mirroring how people’s actions are informed by their societies’ norms and laws. The project aims to make AI systems more trustworthy, resilient and better aligned with human social structures.
The team plans to deliver several key technical assets: novel AI agent architectures, a new dataset of disciplined normative reasoning, and a debate-based framework to test and enforce stable, shared norms in AI agent interactions.
The end goal is to make AI systems safer by raising their awareness of the unwritten rules and complex trade-offs of our diverse human society.
“The unconstrained growth of artificial intelligence has already started to have major ripple effects on people's lives. I am proud to be part of an effort to steer the trajectory of AI development and usage towards a more positive future,” added Ryskina.
“The highly interdisciplinary nature of this collaboration helps reframe the researchers' thinking about AI as not just a technical matter, but also a social, political and economic one. Learning from experts in other fields as well as your own is crucial for making meaningful progress in today’s AI research.”
CIFAR AI Safety Postdoctoral Fellow
Funded Projects
Solution Network: Mitigating Dialect Bias
- Laleh Seyyed-Kalantari (York University)
- Blessing Ogbuokiri (Brock University)
AI Safety Catalyst Project: Sampling Latent Explanations From LLMs for Safe and Interpretable Reasoning
- Yoshua Bengio (Canada CIFAR AI Chair, Mila, Université de Montréal, LawZero)
AI Safety Catalyst Project: Adversarial Robustness of LLM Safety
- Gauthier Gidel (Canada CIFAR AI Chair, Mila, Université de Montréal)
AI Safety Catalyst Project: Advancing AI Alignment Through Debate and Shared Normative Reasoning
- Gillian Hadfield (Johns Hopkins University, University of Toronto)
- Maria Ryskina (CIFAR AI Safety Postdoctoral Fellow, University of Toronto)
AI Safety Catalyst Project: Formalizing Constraints For Assessing and Mitigating Agentic Risk
- Sheila McIlraith (Canada CIFAR AI Chair, Vector Institute, University of Toronto)
Renowned computer scientist Deborah Raji (Mozilla) speaks about the sociotechnical dimensions of AI safety at the inaugural CAISI Research Program Annual Meeting in October 2025.