
Mila
Quebec AI institute founded by Yoshua Bengio. World-leading academic research in deep learning and AI for social good.
mila.quebec13
123
9
Top Researchers
Recent Papers
This paper introduces Constrained Group Relative Policy Optimization (C-GRPO), a Lagrangian-based extension of Group Relative Policy Optimization (GRPO) for constrained policy optimization using indicator cost functions. The authors identify and formally derive a pathology in naive multi-component advantage estimation that corrupts the Lagrangian signal due to mismatched component-wise standard deviations. They propose a scalarized advantage construction to address this issue, demonstrating improved constraint satisfaction and task success in both a toy gridworld and robotics tasks.
Derives a scalarized advantage construction for constrained policy optimization within the GRPO framework to mitigate issues arising from mismatched component-wise standard deviations in multi-component advantage estimation.
This study establishes SSL as a promising paradigm for ECG analysis, particularly in settings with limited annotated data, enhancing accessibility, generalizability, and fairness in AI-driven cardiac diagnostics across diverse clinical environments and questions.
This study establishes SSL as a promising paradigm for ECG analysis, particularly in settings with limited annotated data, enhancing accessibility, generalizability, and fairness in AI-driven cardiac diagnostics across diverse clinical environments and questions.
The paper introduces GPT-5, a unified system comprising a fast, general-purpose model and a deeper reasoning model, managed by a real-time router trained on user feedback and performance metrics. GPT-5 demonstrates improved performance on benchmarks, faster response times, and enhanced utility for real-world queries, with significant reductions in hallucinations, improved instruction following, and minimized sycophancy. The system incorporates "safe-completions" for safety and is treated as High capability in the Biological and Chemical domain under OpenAI's Preparedness Framework, triggering associated safeguards.
Introduces a unified GPT-5 system with a real-time router that dynamically selects between a fast, general-purpose model and a deeper reasoning model based on query characteristics, optimizing for speed and accuracy.
The International AI Safety Report 2025's Second Key Update analyzes the current state of AI risk management and technical mitigations employed by researchers, companies, and governments. It highlights advancements in training safer models and monitoring outputs while acknowledging uncertainties in the effectiveness of these measures and their variability across applications. The report aims to inform policymakers, researchers, and the public about progress and remaining gaps in AI safety.
Synthesizes recent developments in AI risk management and technical risk mitigation strategies, identifying both progress and persistent gaps in ensuring the safety of general-purpose AI systems.
This work integrates small-molecule high-throughput screening with a deep-learning-based virtual screening approach to uncover new antibacterial compounds, illustrating a 90-fold improved hit rate over the high-throughput screening experiment used for training.
This work integrates small-molecule high-throughput screening with a deep-learning-based virtual screening approach to uncover new antibacterial compounds, illustrating a 90-fold improved hit rate over the high-throughput screening experiment used for training.
The paper introduces Recursive Self-Aggregation (RSA), a novel test-time scaling method for LLMs that iteratively refines a population of reasoning chains by aggregating subsets of solutions. RSA leverages information from intermediate reasoning steps to bootstrap from partially correct chains of thought, combining parallel and sequential scaling benefits. Empirical results demonstrate that RSA significantly improves performance across various tasks and models, enabling smaller models like Qwen3-4B to compete with larger reasoning models.
Introduces Recursive Self-Aggregation (RSA), a novel inference-time scaling method that recursively aggregates and refines reasoning chains to improve LLM performance.
This paper introduces a dual-task framework to inject structural knowledge into protein language models (pLMs) by aligning residue representations with protein graph neural networks (pGNNs) and predicting structure tokens. A residue loss selection module is used to focus training on reliable structural information. Post-training ESM2 and AMPLIFY with this method yields significant improvements in deep mutational scanning fitness prediction and contact prediction, demonstrating robustness across model sizes.
Introduces a dual-task framework for structure alignment that effectively incorporates both inter- and intra-protein structural knowledge into pLMs via contrastive learning with pGNNs and structure token prediction.
The paper introduces Command A, a large language model designed for enterprise applications, featuring agent optimization, multilingual support (23 languages), and a hybrid architecture. The model leverages a decentralized training approach with self-refinement and model merging to achieve strong RAG capabilities, grounding, and tool use for automating business processes. Evaluations across enterprise tasks and public benchmarks demonstrate excellent performance and efficiency, with weights released for research.
Introduces Command A, a novel enterprise-focused LLM, and details its training and evaluation.
The paper introduces a method for transferring fine-tuning updates between different versions of large language models by extracting and applying the "diff vector" representing weight changes from fine-tuning one model to another. This approach addresses the inefficiency of retraining models from scratch with each new base model release, especially for domain-specific or multilingual tasks. Experiments demonstrate significant performance improvements on tasks like IFEval, LiveCodeBench, and Global MMLU by transferring fine-tuning updates, even surpassing the performance of the target model's instruction-tuned version without additional training.
Demonstrates that fine-tuning updates, represented as diff vectors, can be effectively transferred between different versions of large language models, leading to significant performance gains on various tasks.
The paper introduces DT-GFN, a method for learning decision tree ensembles by formulating decision tree construction as a sequential planning problem solved via a deep reinforcement learning policy (GFlowNet). This approach addresses challenges in scaling and generalizing decision tree models for tabular data by learning to sample decision trees from the Bayesian posterior. DT-GFN outperforms state-of-the-art decision tree and deep learning methods on classification benchmarks, demonstrates robustness to distribution shifts, and produces interpretable models with shorter description lengths.
Introduces a novel approach, DT-GFN, that leverages GFlowNets to learn a generative model for sampling decision trees from the Bayesian posterior, enabling amortized structure inference for decision tree ensembles.
The authors developed and compared two open-source foundation models for ECG interpretation: DeepECG-SSL, a self-supervised model pretrained with contrastive learning and masked lead modeling, and DeepECG-SL, a supervised model. Both models were trained on over 1 million ECGs to predict 77 cardiac conditions and were evaluated on multiple datasets for ECG interpretation and digital biomarker tasks. DeepECG-SSL outperformed DeepECG-SL on digital biomarker tasks with limited labeled data, demonstrating the potential of self-supervised learning for ECG analysis, while both models showed minimal performance disparities across age and gender.
Demonstrates the efficacy of self-supervised learning for ECG analysis, particularly in low-data regimes, by developing and evaluating DeepECG-SSL, an open-source foundation model that outperforms its supervised counterpart on digital biomarker tasks.
This paper compares amortized point estimators (trained via maximum likelihood or MAP) with amortized posterior inference methods (using normalizing flows, score-based diffusion, or diagonal Gaussian approximations) in the context of in-context learning. The study rigorously evaluates both in-distribution and out-of-distribution generalization across various problem settings, including linear models and shallow neural networks. The key finding is that amortized point estimators generally outperform posterior inference methods, although posterior inference remains competitive in low-dimensional problems.
Empirically demonstrates that amortized point estimators are generally superior to amortized posterior inference methods for in-context learning across a range of tasks.
The paper introduces EarthView, a large-scale remote sensing dataset comprising 15 tera pixels of multi-source imagery (NEON, Sentinel, Satellogic) spanning 2017-2022, designed for self-supervised learning. To leverage this dataset, the authors develop Earth-MAE, a Masked Autoencoder variant tailored for remote sensing data, capable of handling diverse modalities like hyperspectral, multispectral, and topographical data. Experiments demonstrate that pre-training Earth-MAE on Satellogic data within EarthView improves performance on downstream Earth monitoring tasks.
Introduces EarthView, a large-scale, multi-source remote sensing dataset, and Earth-MAE, a self-supervised Masked Autoencoder, to facilitate and improve deep learning for Earth monitoring.

