Search papers, labs, and topics across Lattice.
24 papers published across 3 labs.
Looping a language model block four times only gives you the effective capacity of 1.4 additional unique blocks, but costs as much to train as 2.4.
Forget pruning by variance: high-variance activations in transformers are surprisingly uncorrelated with predictive power.
Depth in neural networks isn't just about the final output; this work shows how each intermediate layer can be a progressively refined approximation, with error explicitly tied to the layer's geometric scale.
Despite architectural differences, language models exhibit convergent evolution by learning similar periodic features for number representation, but achieving geometric separability depends on subtle training factors.
Image generators aren't just for making pretty pictures; they're secretly state-of-the-art vision learners, rivaling specialized models in tasks from segmentation to depth estimation.
Forget pruning by variance: high-variance activations in transformers are surprisingly uncorrelated with predictive power.
Depth in neural networks isn't just about the final output; this work shows how each intermediate layer can be a progressively refined approximation, with error explicitly tied to the layer's geometric scale.
Despite architectural differences, language models exhibit convergent evolution by learning similar periodic features for number representation, but achieving geometric separability depends on subtle training factors.
Image generators aren't just for making pretty pictures; they're secretly state-of-the-art vision learners, rivaling specialized models in tasks from segmentation to depth estimation.
Looping a language model block four times only gives you the effective capacity of 1.4 additional unique blocks, but costs as much to train as 2.4.
Forget training from scratch: Nexusformer lets you scale Transformers by nonlinearly expanding attention, inheriting knowledge and slashing compute by up to 41.5%.
Forget scaling laws: strategically equipping small language models with tools delivers a better performance/cost tradeoff than simply scaling up or deploying multi-agent systems.
Upcycling MoE models can achieve the same performance as larger fixed-size models while cutting GPU costs by 32%.
LLMs waste compute on tokens that have already "figured it out" – DASH selectively skips these tokens during prefill, speeding things up without retraining or sacrificing accuracy.
Unveiling the "topological dual of a dataset" provides a Rosetta Stone for neuro-symbolic AI, promising to unlock mechanistic interpretability and overcome scaling bottlenecks.
Training on mixed complexity datasets can yield up to 5x sample efficiency in low data regimes, challenging conventional wisdom about data quantity in LLM fine-tuning.
Forget expensive compression trials – a simple spectral statistic can accurately predict how much your LLM will degrade *before* you even compress it.
Fine-tuned small language models can reliably generalize to larger and structurally distinct graphs, maintaining strong performance in graph property estimation.
TriMix reveals that prioritizing small, specialized models can dramatically improve low-resource language adaptation, overturning the assumption that bigger models always lead the way.
LLMs' surprising grammatical struggles aren't due to inherent limitations, but rather a lack of exposure to specific linguistic structures in their training data – a problem fixable with just a tiny amount of targeted data augmentation.
LLM-based ASR can be shrunk to 2.3B parameters and still beat larger models in real-world scenarios by carefully delineating encoder and LLM roles and using a multi-stage training approach.
Decomposing LLMs doesn't have to mean sacrificing inference speed: DeInfer unlocks efficient parallel inference for these models.
RankUp tackles representation collapse in deep recommender systems, unlocking significant GMV gains in real-world deployments by strategically boosting the effective rank of token representations.
GSQ closes the accuracy gap in low-precision quantization, achieving results comparable to complex vector methods while remaining easy to implement.
LLMs can achieve up to 2x inference speedup without retraining by intelligently sharing KV cache states during early exit, sidestepping the usual performance bottlenecks.
By embedding attention within a recurrent state, Sessa unlocks power-law memory decay and selective retrieval capabilities previously unattainable by either Transformers or Mamba-style models alone.
Generative AI's "black box" nature isn't a bug, it's a feature stemming from a fundamental mismatch between user expectations and the technology's statistical foundations.
LLM agent systems can achieve up to 76% speedups and significantly reduced hotspot miss rates by intelligently caching logits and scheduling compute resources based on agent behavior.
LLM scaling bottlenecks demand a shift towards cloud-native architectures and distributed systems, unlocking potential gains from serverless inference and quantum computing.