Search papers, labs, and topics across Lattice.
9 papers from Mila on Architecture Design (Transformers, SSMs, MoE)
Forget brute-force scaling: Tiny Aya proves a 3B parameter model can achieve state-of-the-art multilingual performance with clever training and region-aware specialization.
Ditch the text: WavSLM shows you can train a competitive speech language model using only distilled WavLM representations, unlocking a simpler, single-stream generative pretraining paradigm for speech.
Diagonal SSMs, despite their empirical success, provably fail to track states of non-Abelian groups, revealing fundamental limitations in their expressive power.
Takeuchi's Information Criterion (TIC) accurately predicts DNN generalization gaps, but only when models operate near the Neural Tangent Kernel (NTK) regime.
Attention-based re-ranking gets a boost: ReAttn's post-hoc re-weighting tames over-concentration and lexical bias, leading to more accurate and interpretable results without extra training.
Boost macrocycle generation rates from 1% to 99% by guiding diffusion models with persistent homology, opening new avenues for drug discovery.
Dramatically improve protein language models by simply post-training them to align with protein graphs, yielding a 59% increase in contact prediction accuracy.
Command A shows how to build an enterprise-grade LLM that balances performance, efficiency, and multilingual capabilities using decentralized training and model merging.
Ditch the greedy heuristics: GFlowNets can learn to sample decision trees from the Bayesian posterior, outperforming standard methods and scaling consistently with ensemble size.