Search papers, labs, and topics across Lattice.
10 papers published across 0 labs.
Achieve perfect train-test error tracking with a new training algorithm, Decoupled Descent, that eliminates the need for validation sets in certain stylized settings.
Machine learning can turn sparse simulation data into a complete phase diagram for collective motion models, revealing nuanced phase boundaries.
Subword tokenization's secret sauce isn't just vocabulary size – it's the boosted training throughput and the subtle linguistic priors baked into subword boundaries.
Language diffusion models aren't just generative, they're associative memories that reveal a sharp memorization-to-generalization transition detectable via conditional entropy.
Forget scaling laws: this study reveals a detailed empirical map of *when* and *why* transformers succeed or fail at in-context learning, highlighting the crucial interplay of dimensionality, signal strength, and contextual information.
Achieve perfect train-test error tracking with a new training algorithm, Decoupled Descent, that eliminates the need for validation sets in certain stylized settings.
Machine learning can turn sparse simulation data into a complete phase diagram for collective motion models, revealing nuanced phase boundaries.
Subword tokenization's secret sauce isn't just vocabulary size – it's the boosted training throughput and the subtle linguistic priors baked into subword boundaries.
Language diffusion models aren't just generative, they're associative memories that reveal a sharp memorization-to-generalization transition detectable via conditional entropy.
Forget scaling laws: this study reveals a detailed empirical map of *when* and *why* transformers succeed or fail at in-context learning, highlighting the crucial interplay of dimensionality, signal strength, and contextual information.
Chain-of-Thought reasoning in Transformers hits a surprising expressivity ceiling when generalizing to longer sequences, unless you let your vocabulary grow with the problem size and use "signpost" tokens.
Unstructured pruning isn't just about shrinking LLMs; it can actually *boost* their reasoning abilities during test-time scaling, outperforming even the full, unpruned models.
LLMs from different vendors and sizes secretly speak the same statistical language, enabling a blazing-fast, model-agnostic output verification method.
Probabilistic Transformers can now scale to 0.4B parameters and beat standard Transformers of the same size, thanks to a hyperparameter transfer trick.
Forget training from scratch: HyLo lets you breathe new (long-context) life into your existing Transformer LLMs, achieving 32x context extension and 90% KV-cache reduction.