Mila

×Natural Language Processing

14 papers from Mila on Natural Language Processing

Apr 29, 2026

D sequence? Across the small3w ago·also BAIR, Mila, ×4, UC Santa Cruz +1

When 2D Tasks Meet 1D Serialization: On Serialization Friction in Structured Tasks

LLMs struggle with structured 2D tasks when inputs are serialized into 1D, revealing a surprising performance gap compared to vision-augmented models that directly process the 2D layout.

Chung-Hsiang Lo, Lu Li, Diji Yang +4

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Reasoning & Chain-of-Thought

Apr 28, 2026

Mila3w ago·also BJTU

CroSearch-R1: Better Leveraging Cross-lingual Knowledge for Retrieval-Augmented Generation

CroSearch-R1 reveals that integrating cross-lingual knowledge through a dynamic retrieval strategy can substantially enhance the performance of Retrieval-Augmented Generation systems.

Ruizhen Qi, Fengran Mo, Sijin Lu +3

Natural Language Processing Recommendation & Information Retrieval RLHF & Preference Learning

3w ago·also Mila, Gaoling AI, School of Mathematics, UvA

The Attention Market: Interpreting Online Fair Re-ranking as Manifold Optimization under Walrasian Equilibrium

ManifoldRank reveals that treating fairness as a taxation cost can significantly enhance the effectiveness of online fair re-ranking algorithms.

Chen Xu, Wei Chu, Wenyue Hu +5

Constitutional AI & AI Ethics Natural Language Processing Recommendation & Information Retrieval

Apr 27, 2026

Mila3w ago·also Capital One

Learning to Route Queries to Heads for Attention-based Re-ranking with Large Language Models

LLMs re-rank documents better when you learn to route each query to the specific attention heads that matter, instead of relying on static subsets or everything at once.

Yuxing Tian, Fengran Mo, Zhiqi Huang +2

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Recommendation & Information Retrieval

Apr 9, 2026

Tara ResearchApr 9, 2026·also Mila, AI Institute, TU Munich

Activation Steering for Aligned Open-ended Generation without Sacrificing Coherence

Continuously nudging LLM activations during generation can effectively correct misalignment without sacrificing coherence, offering a lightweight runtime defense against adversarial prompts and other triggers.

Niklas Herbster, Martin Zborowski, A. Tosato +4

Constitutional AI & AI Ethics Natural Language Processing Red-Teaming & Adversarial Robustness+2

Mar 16, 2026

MilaMar 16, 2026·also CNRS, Kyoto, Ukrainian Catholic University, Université du Québec à Montréal (UQAM)

Sequential Transport for Causal Mediation Analysis

Ditch the cross-world counterfactuals: Sequential Transport offers a DAG-aware, optimal transport approach to causal mediation analysis, providing deterministic counterfactual mediators and fine-grained attribution.

Agathe Fernandes-Machado, Iryna Voitsitska, Arthur Charpentier +1

Natural Language Processing

Mar 2, 2026

Mar 2, 2026·also Mila, Dutch Police

Assessing Crime Disclosure Patterns in a Large-Scale Cybercrime Forum

One in four initial posts on a major cybercrime forum contain explicit crime-related content, revealing a surprisingly high baseline of open criminal activity.

Raphael Hoheisel, Raphael Hoheisel, Tom Meurs +9

Natural Language Processing

MilaMar 2, 2026·also AI Institute, McGill, Poly Montreal, School of Computer Science

The Expressive Limits of Diagonal SSMs for State-Tracking

Diagonal SSMs, despite their empirical success, provably fail to track states of non-Abelian groups, revealing fundamental limitations in their expressive power.

Behnoush Khavari, Sarath Chandar

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing

Feb 26, 2026

MilaFeb 26, 2026·also UW, Clemson University

Towards Dynamic Dense Retrieval with Routing Strategy

Forget full fine-tuning: this dynamic routing strategy lets you adapt dense retrieval to new domains while using just 2% of the parameters.

Zhan Su, Fengran Mo, Jinghan Zhang +4

Natural Language Processing Recommendation & Information Retrieval Training Efficiency & Optimization

Feb 23, 2026

Feb 23, 2026·also Mila

Learning Discriminative and Generalizable Anomaly Detector for Dynamic Graph with Limited Supervision

Achieve state-of-the-art dynamic graph anomaly detection with limited labels by learning a robust decision boundary around normal data, outperforming methods that overfit to scarce anomalies.

Yiyan Qi, Jian Guo

Computer Vision Natural Language Processing Recommendation & Information Retrieval

MilaFeb 23, 2026·also IDEA

ReAttn: Improving Attention-based Re-ranking via Attention Re-weighting

Attention-based re-ranking gets a boost: ReAttn's post-hoc re-weighting tames over-concentration and lexical bias, leading to more accurate and interpretable results without extra training.

Yuxing Tian, Fengran Mo, Weixu Zhang +2

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Recommendation & Information Retrieval

Feb 19, 2026

The Fin AIFeb 19, 2026·also Mila, California State University, Columbia, Georgia Tech +2

Conv-FinRe: A Conversational and Longitudinal Benchmark for Utility-Grounded Financial Recommendation

LLMs struggle to balance rational financial decisions with mimicking noisy user behavior, often overfitting to short-term market trends instead of aligning with long-term investment goals.

Yan Wang, Yi Han, Lingfei Qian +12

Eval Frameworks & Benchmarks Natural Language Processing Recommendation & Information Retrieval

Feb 16, 2026

MilaFeb 16, 2026·also Deakin

What hackers talk about when they talk about AI: Early-stage diffusion of a cybercrime innovation

Cybercriminals are actively exploring AI's potential for both enhancing existing attacks and creating novel illicit tools, but harbor significant doubts about its real-world effectiveness and impact on their operations.

Benoît Dupont, Chad Whelan, Serge-Olivier Paquette

Constitutional AI & AI Ethics Natural Language Processing Red-Teaming & Adversarial Robustness

May 22, 2025

MilaMay 22, 2025·also Amgen, Chandar Research Lab, Poly Montreal

Structure-Aligned Protein Language Model

Dramatically improve protein language models by simply post-training them to align with protein graphs, yielding a 59% increase in contact prediction accuracy.

Can Chen, David Heurtel-Depeiges, Robert M. Vernon +3

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Scientific Discovery & Drug Design