CAISI Research Program at CIFAR

Building Trust & Fairness

Introduction

Actively embedding human values and equity into AI systems ensures that AI is aligned with the public good and does not perpetuate existing societal harms such as bias and discrimination.

Intelligent Ideas with Yoshua Bengio

How do we design AIs that will not harm people? Canada CIFAR AI Chair Yoshua Bengio (Mila, University of Montreal, LawZero) explores the various ways that AI could harm society and why he remains optimistic about the future.

Spotlight

Building Safe AI Through More Reliable Reasoning

How can we trust an AI that knows more than we do? AI systems draw on immense data, leaving users unable to verify their outputs. While AI debate was proposed to solve this by forcing models to cite evidence, current systems fail to argue reliably — misinterpreting facts and taking sources out of context. To earn trust, an AI must learn to construct and support justifications that can be taken at face value.

One research team aims to address this issue by developing an AI Debate Framework, a goal pursued by Gillian Hadfield, (Johns Hopkins University, University of Toronto (status only), Vector Institute) and her team, including Maria Ryskina, in their 2025 AI Safety Catalyst project funded by CIFAR.

“Our goal is to teach language models to construct reasonable justifications and to support them with valid evidence from available sources of information,” says Maria Ryskina, a CIFAR AI Safety Postdoctoral Fellow at Vector Institute. “Such models will be more worthy of users’ trust as they can be reliably overseen by non-experts.”

Trust Between Humans and AI Systems

Drawing on insights from law, economics, cultural evolution, and political science, the AI Debate Framework project aims to equip AI with normative reasoning — making choices informed by the rules that coordinate behaviour, mirroring how people’s actions are informed by their societies’ norms and laws. The project aims to make AI systems more trustworthy, resilient and better aligned with human social structures.

The team plans to deliver several key technical assets: novel AI agent architectures, a new dataset of disciplined normative reasoning, and a debate-based framework to test and enforce stable, shared norms in AI agent interactions.

The end goal is to make AI systems safer by raising their awareness of the unwritten rules and complex trade-offs of our diverse human society.

“The unconstrained growth of artificial intelligence has already started to have major ripple effects on people's lives. I am proud to be part of an effort to steer the trajectory of AI development and usage towards a more positive future,” added Ryskina.

“The highly interdisciplinary nature of this collaboration helps reframe the researchers' thinking about AI as not just a technical matter, but also a social, political and economic one. Learning from experts in other fields as well as your own is crucial for making meaningful progress in today’s AI research.”

Maria Ryskina

CIFAR AI Safety Postdoctoral Fellow

Funded Projects

Solution Network: Mitigating Dialect Bias

Laleh Seyyed-Kalantari (York University)
Blessing Ogbuokiri (Brock University)

AI Safety Catalyst Project: Sampling Latent Explanations From LLMs for Safe and Interpretable Reasoning

Yoshua Bengio (Canada CIFAR AI Chair, Mila, Université de Montréal, LawZero)

AI Safety Catalyst Project: Adversarial Robustness of LLM Safety

Gauthier Gidel (Canada CIFAR AI Chair, Mila, Université de Montréal)

AI Safety Catalyst Project: Advancing AI Alignment Through Debate and Shared Normative Reasoning

Gillian Hadfield (Johns Hopkins University, University of Toronto)
Maria Ryskina (CIFAR AI Safety Postdoctoral Fellow, University of Toronto)

AI Safety Catalyst Project: Formalizing Constraints For Assessing and Mitigating Agentic Risk

Sheila McIlraith (Canada CIFAR AI Chair, Vector Institute, University of Toronto)

See details of all projects

Renowned computer scientist Deborah Raji (Mozilla) speaks about the sociotechnical dimensions of AI safety at the inaugural CAISI Research Program Annual Meeting in October 2025.