CAISI Research Program at CIFAR
Securing Critical Systems
Introduction
Developing rigorous tools to evaluate the safety, accuracy, and reliability of frontier AI is essential for securing critical systems and enabling responsible industry innovation.
Intelligent Ideas with Nicolas Papernot
How do we build safe, trustworthy AI systems? Nicolas Papernot’s research explores how machines can learn important information without compromising personal user data. Securing critical systems is essential for building trust among Canadians and keeping their information safe and secure while accelerating the adoption of AI.
Spotlight
Landmark Evaluation Study Measures Safety and Reliability of Frontier AI
In a revolutionary benchmark study, the Vector Institute conducted an independent evaluation to measure the safety and reliability of the world’s LLMs.
Led by Vector’s AI Engineering team, the project assessed 11 prominent frontier AI models from around the world. The evaluation examined both open- and closed-source systems, including the January 2025 release of DeepSeek-R1, testing each one against a comprehensive suite of 16 different benchmarks.
This project marks a critical milestone in AI safety research. As the global AI race intensifies, with the development of increasingly powerful LLMs, developing trusted and widely accepted benchmarks is essential. This research provides a crucial tool for helping researchers, developers and policymakers understand how these models perform in terms of accuracy, reliability and fairness.
Enabling Safe and Reliable AI Adoption
“This study gives people and organizations a clear, independent picture of how these models actually behave so that AI can be adopted safely and with confidence,” says Deval Pandya, VP of AI Engineering, Vector Institute.
To promote transparency and accountability, the Vector Institute has open-sourced the entire study, including its results and underlying code. “By releasing our work openly, we are giving everyone the ability to verify, learn and build on it, which supports smarter and safer use of AI across the country,” he added.
The evaluation features powerful benchmarks like MMLU-Pro, MMMU, and OS-World, which are now widely used in the field. These specific benchmarks, developed by University of Waterloo professors and Canada CIFAR AI Chairs at the Vector Institute, Wenhu Chen and Victor Zhong, whose research is improving how we approach benchmark techniques, are now widely adopted by major companies, including OpenAI, Google, and Anthropic.
“I am excited that we are helping create a future where AI earns trust because people can see how it works and test it for themselves. By open-sourcing our evaluations and tools, we are enabling a broader community to replicate the results, spot gaps and improve on them. This is how we support safe adoption, not just in labs but in hospitals, classrooms, businesses and public services.”
VP AI Engineering, Vector Institute
Funded Projects
AI Safety Catalyst Project: Safe Autonomous Chemistry Labs
- Alán Aspuru-Guzik (Canada CIFAR AI Chair, Vector Institute, University of Toronto)
AI Safety Catalyst Project: Adversarial Robustness in Knowledge Graphs
- Ebrahim Bagheri (University of Toronto)
- Jian Tang (Canada CIFAR AI Chair, Mila, HEC Montréal & Université de Montréal)
- Benjamin Fung (Mila, McGill University)
AI Safety Catalyst Project: Maintaining Meaningful Control: Navigating Agency and Oversight in AI-Assisted Coding
- Jackie Chi Kit Cheung (Canada CIFAR AI Chair, Mila, McGill University)
- Jin Guo (McGill University)
AI Safety Catalyst Project: Safety Assurance and Engineering for Multimodal Foundation Model-enabled AI Systems
- Foutse Khomh (Canada CIFAR AI Chair, Mila, Polytechnique Montréal)
- Lei Ma (Canada CIFAR AI Chair, Amii, University of Alberta)
Blessing Ogbuokiri (Brock University), Co-director of the Mitigating Dialect Bias Solution Network attends the inaugural CAISI Research Program Annual Meeting in October 2025.