Mila

Quebec AI institute founded by Yoshua Bengio. World-leading academic research in deep learning and AI for social good.

mila.quebec

Total papers

Total citations

123

Avg citations

Top Researchers

Yoshua BengioAaron CourvilleIrina Rish

Recent Papers

Feb 5, 2026

1w ago

Constrained Group Relative Policy Optimization

This paper introduces Constrained Group Relative Policy Optimization (C-GRPO), a Lagrangian-based extension of Group Relative Policy Optimization (GRPO) for constrained policy optimization using indicator cost functions. The authors identify and formally derive a pathology in naive multi-component advantage estimation that corrupts the Lagrangian signal due to mismatched component-wise standard deviations. They propose a scalarized advantage construction to address this issue, demonstrating improved constraint satisfaction and task success in both a toy gridworld and robotics tasks.

Derives a scalarized advantage construction for constrained policy optimization within the GRPO framework to mitigate issues arising from mismatched component-wise standard deviations in multi-component advantage estimation.

Roger Girgis, Rodrigue de Schaetzen, Luke Rowe +32602.05863

Jan 22, 2026

3w ago·affiliated lab: MIT CSAIL

Foundation models for electrocardiogram interpretation: clinical implications.

This study establishes SSL as a promising paradigm for ECG analysis, particularly in settings with limited annotated data, enhancing accessibility, generalizability, and fairness in AI-driven cardiac diagnostics across diverse clinical environments and questions.

A. Nolin-Lapalme, Achille Sowa, Jacques Delfrate +30

Dec 19, 2025

Dec 19, 2025·affiliated labs: MIT CSAIL, Mila, Tsinghua AI

OpenAI GPT-5 System Card

The paper introduces GPT-5, a unified system comprising a fast, general-purpose model and a deeper reasoning model, managed by a real-time router trained on user feedback and performance metrics. GPT-5 demonstrates improved performance on benchmarks, faster response times, and enhanced utility for real-world queries, with significant reductions in hallucinations, improved instruction following, and minimized sycophancy. The system incorporates "safe-completions" for safety and is treated as High capability in the Biological and Chemical domain under OpenAI's Preparedness Framework, triggering associated safeguards.

Introduces a unified GPT-5 system with a real-time router that dynamically selects between a fast, general-purpose model and a deeper reasoning model based on query characteristics, optimizing for speed and accuracy.

Aaditya K. Singh, Adam Fry, Adam Perelman +479622601.03267

Reasoning & Chain-of-ThoughtTool Use & AgentsEval Frameworks & Benchmarks

Dec 7, 2025

Dec 7, 2025·affiliated labs: Stanford HAI, MIT CSAIL, Berkeley AI Research (BAIR), Tsinghua AI

International AI Safety Report 2025: Second Key Update: Technical Safeguards and Risk Management

The International AI Safety Report 2025's Second Key Update analyzes the current state of AI risk management and technical mitigations employed by researchers, companies, and governments. It highlights advancements in training safer models and monitoring outputs while acknowledging uncertainties in the effectiveness of these measures and their variability across applications. The report aims to inform policymakers, researchers, and the public about progress and remaining gaps in AI safety.

Synthesizes recent developments in AI risk management and technical risk mitigation strategies, identifying both progress and persistent gaps in ensuring the safety of general-purpose AI systems.

Y. Bengio, Stephen Clare, Carina Prunkl +34

Constitutional AI & AI EthicsRed-Teaming & Adversarial RobustnessEval Frameworks & Benchmarks

Oct 24, 2025

gRED Computational SciencesOct 24, 2025·affiliated labs: NVIDIA Research, Mila

Deep-learning-based virtual screening of antibacterial compounds.

This work integrates small-molecule high-throughput screening with a deep-learning-based virtual screening approach to uncover new antibacterial compounds, illustrating a 90-fold improved hit rate over the high-throughput screening experiment used for training.

Gabriele Scalia, S. Rutherford, Ziqing Lu +17

Sep 30, 2025

Recursive Self-Aggregation Unlocks Deep Thinking in Large Language Models

The paper introduces Recursive Self-Aggregation (RSA), a novel test-time scaling method for LLMs that iteratively refines a population of reasoning chains by aggregating subsets of solutions. RSA leverages information from intermediate reasoning steps to bootstrap from partially correct chains of thought, combining parallel and sequential scaling benefits. Empirical results demonstrate that RSA significantly improves performance across various tasks and models, enabling smaller models like Qwen3-4B to compete with larger reasoning models.

Introduces Recursive Self-Aggregation (RSA), a novel inference-time scaling method that recursively aggregates and refines reasoning chains to improve LLM performance.

S. Venkatraman, Vineet Jain, Sarthak Mittal +952509.26626

Scaling Laws & Emergent AbilitiesReasoning & Chain-of-ThoughtInference & Quantization

May 22, 2025

Structure-Aligned Protein Language Model

This paper introduces a dual-task framework to inject structural knowledge into protein language models (pLMs) by aligning residue representations with protein graph neural networks (pGNNs) and predicting structure tokens. A residue loss selection module is used to focus training on reliable structural information. Post-training ESM2 and AMPLIFY with this method yields significant improvements in deep mutational scanning fitness prediction and contact prediction, demonstrating robustness across model sizes.

Introduces a dual-task framework for structure alignment that effectively incorporates both inter- and intra-protein structural knowledge into pLMs via contrastive learning with pGNNs and structure token prediction.

Can Chen, David Heurtel-Depeiges, Robert M. Vernon +32505.16896

Scientific Discovery & Drug DesignArchitecture Design (Transformers, SSMs, MoE)Natural Language Processing

Apr 1, 2025

Apr 1, 2025·affiliated lab: Mila

Command A: An Enterprise-Ready Large Language Model

The paper introduces Command A, a large language model designed for enterprise applications, featuring agent optimization, multilingual support (23 languages), and a hybrid architecture. The model leverages a decentralized training approach with self-refinement and model merging to achieve strong RAG capabilities, grounding, and tool use for automating business processes. Evaluations across enterprise tasks and public benchmarks demonstrate excellent performance and efficiency, with weights released for research.

Introduces Command A, a novel enterprise-focused LLM, and details its training and evaluation.

Team Cohere, Aakanksha, Arash Ahmadian +223382504.00698

Architecture Design (Transformers, SSMs, MoE)Tool Use & AgentsRecommendation & Information Retrieval

Mar 25, 2025

Mar 25, 2025·affiliated lab: Mila

Efficient Model Development through Fine-tuning Transfer

The paper introduces a method for transferring fine-tuning updates between different versions of large language models by extracting and applying the "diff vector" representing weight changes from fine-tuning one model to another. This approach addresses the inefficiency of retraining models from scratch with each new base model release, especially for domain-specific or multilingual tasks. Experiments demonstrate significant performance improvements on tasks like IFEval, LiveCodeBench, and Global MMLU by transferring fine-tuning updates, even surpassing the performance of the target model's instruction-tuned version without additional training.

Demonstrates that fine-tuning updates, represented as diff vectors, can be effectively transferred between different versions of large language models, leading to significant performance gains on various tasks.

Pin-Jie Lin, Rishab Balasubramanian, Fengyuan Liu +22503.20110

Training Efficiency & OptimizationOpen-Source Models & WeightsNatural Language Processing

Mar 10, 2025

Learning Decision Trees as Amortized Structure Inference

The paper introduces DT-GFN, a method for learning decision tree ensembles by formulating decision tree construction as a sequential planning problem solved via a deep reinforcement learning policy (GFlowNet). This approach addresses challenges in scaling and generalizing decision tree models for tabular data by learning to sample decision trees from the Bayesian posterior. DT-GFN outperforms state-of-the-art decision tree and deep learning methods on classification benchmarks, demonstrates robustness to distribution shifts, and produces interpretable models with shorter description lengths.

Introduces a novel approach, DT-GFN, that leverages GFlowNets to learn a generative model for sampling decision trees from the Bayesian posterior, enabling amortized structure inference for decision tree ensembles.

Mohammed Mahfoud, Ghait Boukachab, Michal Koziarski +42503.06985

Architecture Design (Transformers, SSMs, MoE)Training Efficiency & Optimization

Mar 5, 2025

Mar 5, 2025·affiliated lab: MIT CSAIL

Foundation models for generalizable electrocardiogram interpretation: comparison of supervised and self-supervised electrocardiogram foundation models

The authors developed and compared two open-source foundation models for ECG interpretation: DeepECG-SSL, a self-supervised model pretrained with contrastive learning and masked lead modeling, and DeepECG-SL, a supervised model. Both models were trained on over 1 million ECGs to predict 77 cardiac conditions and were evaluated on multiple datasets for ECG interpretation and digital biomarker tasks. DeepECG-SSL outperformed DeepECG-SL on digital biomarker tasks with limited labeled data, demonstrating the potential of self-supervised learning for ECG analysis, while both models showed minimal performance disparities across age and gender.

Demonstrates the efficacy of self-supervised learning for ECG analysis, particularly in low-data regimes, by developing and evaluating DeepECG-SSL, an open-source foundation model that outperforms its supervised counterpart on digital biomarker tasks.

A. Nolin-Lapalme, Achille Sowa, Jacques Delfrate +30

Open-Source Models & WeightsTraining Efficiency & OptimizationScientific Discovery & Drug Design

Feb 17, 2025

In-Context Parametric Inference: Point or Distribution Estimators?

This paper compares amortized point estimators (trained via maximum likelihood or MAP) with amortized posterior inference methods (using normalizing flows, score-based diffusion, or diagonal Gaussian approximations) in the context of in-context learning. The study rigorously evaluates both in-distribution and out-of-distribution generalization across various problem settings, including linear models and shallow neural networks. The key finding is that amortized point estimators generally outperform posterior inference methods, although posterior inference remains competitive in low-dimensional problems.

Empirically demonstrates that amortized point estimators are generally superior to amortized posterior inference methods for in-context learning across a range of tasks.

Sarthak Mittal, Y. Bengio, Nikolay Malkin +12502.11617

Natural Language ProcessingTraining Efficiency & OptimizationInference & Quantization

Jan 14, 2025

EarthView: A Large Scale Remote Sensing Dataset for Self-Supervision

The paper introduces EarthView, a large-scale remote sensing dataset comprising 15 tera pixels of multi-source imagery (NEON, Sentinel, Satellogic) spanning 2017-2022, designed for self-supervised learning. To leverage this dataset, the authors develop Earth-MAE, a Masked Autoencoder variant tailored for remote sensing data, capable of handling diverse modalities like hyperspectral, multispectral, and topographical data. Experiments demonstrate that pre-training Earth-MAE on Satellogic data within EarthView improves performance on downstream Earth monitoring tasks.

Introduces EarthView, a large-scale, multi-source remote sensing dataset, and Earth-MAE, a self-supervised Masked Autoencoder, to facilitate and improve deep learning for Earth monitoring.

Diego A. Velázquez, Pau Rodr'iguez L'opez, Sergio Alonso +672501.08111

Data Curation & Synthetic DataComputer VisionMultimodal Models

Lattice is designed for desktop

Mila

Top Researchers

Recent Papers

Search