Search papers, labs, and topics across Lattice.
16 papers published across 2 labs.
Forget buying new GPUs – clever context-length routing can boost your LLM inference energy efficiency by 2.5x, dwarfing the 1.7x gain from upgrading to a B200.
Optimizing multilingual training? Shapley values reveal the hidden cross-lingual transfer effects that current scaling laws miss, leading to better language mixture ratios.
Forget quadratic attention: FEAT achieves state-of-the-art performance on structured data with linear complexity and 40x faster inference.
Masked diffusion language models can now achieve 21.8x better compute efficiency than autoregressive models, thanks to binary encoding and index shuffling.
Mamba-3 delivers a 1.8 point accuracy boost over competing models in downstream language tasks, proving that SSM-inspired techniques can unlock substantial performance gains without sacrificing inference efficiency.
Forget buying new GPUs – clever context-length routing can boost your LLM inference energy efficiency by 2.5x, dwarfing the 1.7x gain from upgrading to a B200.
Optimizing multilingual training? Shapley values reveal the hidden cross-lingual transfer effects that current scaling laws miss, leading to better language mixture ratios.
Forget quadratic attention: FEAT achieves state-of-the-art performance on structured data with linear complexity and 40x faster inference.
Masked diffusion language models can now achieve 21.8x better compute efficiency than autoregressive models, thanks to binary encoding and index shuffling.
Mamba-3 delivers a 1.8 point accuracy boost over competing models in downstream language tasks, proving that SSM-inspired techniques can unlock substantial performance gains without sacrificing inference efficiency.
LLMs' true power lies in the "unexplainable" – capabilities that exceed rule-based systems, challenging the pursuit of full interpretability.
Forget trial-and-error: this paper derives hyperparameter scaling laws for modern optimizers directly from convergence bounds, potentially automating and optimizing the hyperparameter tuning process.
Forget scaling laws: smaller, domain-adapted AI systems can mathematically outperform massive generalist models in real-world institutional settings, thanks to a non-monotonic relationship between model size and "institutional fitness."
Forget simple scaling laws: the compute-optimal number of parallel rollouts in LLM RL plateaus, revealing distinct mechanisms for easy vs. hard problems.
Re-training LLMs on their own generated content can fundamentally limit what they can learn, but only under specific, theoretically-defined conditions related to generation quality.
Forget brute-force scaling: the secret to better educational AI agents lies in carefully structuring their roles, skills, and tools.
Nanofilaments can paradoxically aggregate due to entropic forces, defying the conventional wisdom that entropy always favors disaggregation at the nanoscale.
Language models seem to prefer truth not because they're seeking it, but because correct information is often easier to compress and more internally consistent.
RAG with small language models (<8B parameters) can be a net negative, as they often ignore retrieved context and even "forget" existing knowledge.
Prompt-based jailbreak attacks aren't just effective, they're shockingly efficient, outperforming optimization-based methods by more effectively navigating the prompt space.
AI electricity demand won't necessarily explode as AI scales – whether it does or doesn't hinges on sustained efficiency improvements outpacing income-driven demand.