Search papers, labs, and topics across Lattice.
19 papers from Meta AI (FAIR) on Architecture Design (Transformers, SSMs, MoE)
Elastic-Sketch's performance hinges on stream characteristics and eviction thresholds, but this work cracks the code to near-optimal configuration by deriving closed-form expressions for its limiting behavior under stationary random streams.
Forget exotic attention mechanisms – MobileLLM-Flash achieves up to 1.8x faster LLM prefill on mobile CPUs by smartly pruning and adapting existing architectures for on-device use.
Current AI's hunger for curated data may be solved by a new architecture inspired by human cognition that flexibly switches between observation, active behavior, and meta-control.
Self-supervised video models can now learn dense features rivaling supervised methods, unlocking a 20-point jump in robot grasping success.
Forget imbalanced LoRA usage: ReMix leverages reinforcement learning to route effectively among LoRAs, boosting performance in parameter-efficient fine-tuning.
A surprisingly simple change to the motion latent space—representing each body joint with its own token—dramatically improves text-to-motion generation quality, outperforming monolithic latent vector approaches.
Pre-normalization in Transformers is the culprit behind the mysterious link between massive activation outliers and attention sinks, but decoupling them reveals their distinct functions: global parameterization vs. local attention modulation.
Forget scaling compute – the future of AI hinges on a 1000x leap in energy efficiency via tight AI+Hardware co-design over the next decade.
Forget same-family constraints: you can compress prompts for LLaMA with a Qwen draft model and still get 90-100% of the original performance.
Vision models are far more data-hungry than language models, but Mixture-of-Experts can harmonize this asymmetry for truly unified multimodal models.
Instruction-following in large reasoning models gets a serious upgrade with RAIN-Merging, a gradient-free technique that merges in instruction-tuned capabilities without wrecking the model's ability to think step-by-step.
Forget quadratic complexity: ULTRA-HSTU achieves 21x faster inference and 4-8% better engagement in large-scale recommendation by co-designing input sequences, sparse attention, and model topology.
Achieve zero-collision embedding tables in production recommenders without sacrificing training speed, unlocking better personalization via fresher and higher-quality item embeddings.
Ditch ANN search altogether: MFLI learns a hierarchical index alongside item embeddings, boosting recall by up to 11.8% and cold-content delivery by 57.29% in large-scale recommender systems.
Finally, a streaming ASR model matches Whisper's offline transcription quality while maintaining sub-second latency.
Achieve state-of-the-art UAV detection by swapping transformers for Mamba, yielding a faster and more accurate multimodal detector.
Achieve up to 39.6% FLOP reduction in LLM inference without retraining or architectural changes using QuickSilver's dynamic token-level optimizations.
Ditch the pre-trained models: PAST directly learns speech tokens from phonetic data, outperforming existing methods in representation and reconstruction.
Edit the bassline, drums, or other instruments of any song with this new open-source multi-stem music generation model.