Search papers, labs, and topics across Lattice.
Novel neural network architectures including transformer variants, state space models, mixture of experts, and attention mechanisms.
#17 of 24
5
Achieve near-lossless 60% attention latency reduction in video editing by exploiting query sharpness to dynamically route attention.
Transformers may succeed at time series forecasting without relying on the complex superposition that drives their power in NLP, challenging the assumption that these models are leveraging rich compositional representations.
Outlier tokens in Diffusion Transformers aren't just extreme values; they corrupt local patch semantics, and can be tamed with Dual-Stage Registers to boost image generation quality.
Transformers can be explicitly designed to perform nonlinear regression in-context by leveraging attention as a featurizer, offering a theoretical understanding of how these models learn complex relationships from prompts.
Forget scaling laws – the real bottleneck in associative memory isn't storage, it's retrieval: forcing a single "winner" costs you a logarithmic factor in capacity compared to allowing a ranked list.
Skip the sampling: accurately predict the behavior of wide, random MLPs with a fraction of the compute, especially when assessing rare, high-stakes outcomes.
GMD algorithms, previously seen as a novel generative framework, can be understood as directly targeting fixed points of Wasserstein Gradient Flows, offering a new perspective on their optimization process.
Modeling 10,000+ correlated outputs is now tractable: T-LVMOGP offers a scalable alternative to restrictive low-rank MOGPs by learning a flexible deep kernel in a shared embedding space.
Forget rigid memory structures: Memini lets your LLM's external knowledge evolve organically, learning and forgetting like a brain.
Infinite-width approximations, a cornerstone of neural network theory, crumble much faster in recurrent models than previously thought, failing beyond a depth of order $\sqrt{n}$.
Doubly sparse regression gets a boost: this method avoids predictor duplication, saving compute, by projecting directly onto the intersection of selected groups.
Training MoE models just got a whole lot faster: Piper achieves up to 3.5x higher MFU by intelligently scheduling pipeline parallelism and optimizing communication.
Long-context models face a provable "impossibility triangle": you can't have efficiency, compactness, and unbounded recall *at the same time*.
Forget trying to shoehorn hypergraphs into pairwise representations – this diffusion model directly generates them from incidence matrices, unlocking more realistic and complex structures.
Scale multi-agent RL diversity metrics to hundreds of agents without sacrificing accuracy: Graph-SND offers a drop-in replacement for quadratic SND calculations, achieving near-identical results with order-of-magnitude speedups.
LLMs can now generate high-performance CUDA attention kernels that outperform hand-optimized code, thanks to a novel lift-transfer-lower approach that leverages expert knowledge.
Forget fine-tuning: "skill neologisms"—new soft tokens—let you inject skills into LLMs without weight updates, composing them zero-shot for flexible knowledge expansion.
Inverting time-domain marine electromagnetic data, a traditionally computationally intensive task, can now be done 21,000x faster with a deep learning model that also outperforms traditional optimization methods.
Geometric continuity in deep networks isn't just a byproduct of depth, but an actively sculpted property arising from the interplay of residual connections and symmetry-breaking activations.
Granular Mixture-of-Experts can now be efficient: AIR-MoE's two-stage routing slashes routing costs without sacrificing performance.
Batch normalization's power comes from reshaping the geometry of neural network decision boundaries on a per-batch basis, not just from optimization benefits.
Neural networks can now discover previously unknown behavior in hard PDE problems, revealing that Strichartz extremizers for the critical Airy equation are not attained but approached by mKdV breathers.
LLMs can be efficiently post-trained by only updating half the parameters, slashing memory costs without sacrificing performance.
LLMs can now generate neural architectures with 75% less code and higher accuracy by learning to write code "diffs" instead of building from scratch.
Token embedding geometry isn't just abstract math—it directly mirrors how language models internally represent and reason about the world, as shown by its alignment with board state and piece importance in chess.
Symmetric spectral analysis of attention is fundamentally blind to information flow direction, but a simple asymmetry coefficient can restore the signal.
Graph models can now generalize to entirely new datasets with different input features, thanks to a simple projection into a shared random space.
Neural operators can stably and accurately correct the structured truncation errors of classical numerical solvers for dispersive PDEs, even with rough data.
Physics-informed neural operators can now learn continually without forgetting, thanks to a simple replay strategy that preserves past knowledge while rapidly adapting to new out-of-distribution data.
Diffusion models' reliance on global information isn't just a quirk – it's fundamentally linked to the moment they commit to a specific semantic outcome.
By explicitly modeling literal polarity in SAT formulas, GNNs can more accurately predict unsatisfiable cores.
Approximate computing can break MoEs in unexpected ways, with dense networks sometimes proving more robust, but careful retraining can unlock surprising efficiency gains in specific architectures.
Control-dependent latent dynamics, achieved with a surprisingly small parameter increase, unlock robust MPC performance in time-varying environments where standard Koopman methods falter.
Unlock white-box inference for SOC-ICNNs by directly reading out geometric primitives like Hessians from the optimal dual variables, bypassing black-box differentiation.
Forget opaque transformers: Gyan offers SOTA language modeling with full interpretability, lower compute, and human-like compositional understanding.
Ditch the attention: ConvRec proves convolutional networks can beat Transformers in sequential recommendation while slashing compute and memory costs.
MoEs, despite their scaling advantages, suffer from a surprising "spectral plasticity loss" in continual RL, but a simple Parseval penalty can recover performance.
By embedding whole-slide images in a hybrid hyperbolic-Euclidean space, BatMIL unlocks superior classification performance compared to traditional Euclidean-only methods, revealing the importance of geometric awareness in capturing complex tissue organization.
Hallucinations in diffusion models aren't just mode interpolation gone wrong, but instabilities on the model's manifold, and squashing its local intrinsic dimension can fix them.
Stop brittle, undeployable AI-generated code: this retrieval-augmented scaffolding method bakes in architectural constraints from the start.
A single vision-language foundation model, DART, can perform a full rope inspection workflow, including damage classification, severity estimation, and few-shot recognition, all without task-specific fine-tuning.
Shuffling activations, a popular defense in secure Transformer inference, crumbles under a new alignment attack that recovers model weights for just $1.
Ditch the vector DB – this new agent architecture achieves SOTA memory recall by storing everything verbatim and optimizing retrieval, all in a single SQLite file.
Transformers with average attention can natively execute arithmetic circuits, suggesting a new architectural direction for reasoning and computation.
Unlock scalable, high-quality singing voice synthesis by directly generating structured musical scores from audio, outperforming existing systems on multiple datasets.
HeterSEED achieves state-of-the-art performance on heterophilic heterogeneous graphs by decoupling semantic and structural information, offering a more robust approach than relying on feature similarity alone.
Freezing your VAE and permuting high-frequency visual signals unlocks a new SOTA for VLM prompt learning, boosting harmonic-mean accuracy to 81.51%.
CNN-BiLSTM beats AutoML for Indonesian hate speech detection, but the gains are modest, suggesting the dataset's limitations are a bigger bottleneck than model architecture.
Even state-of-the-art multilingual models struggle to tag parts-of-speech in Tajik when trained on isolated words, highlighting the critical role of syntactic context.
UniVer achieves state-of-the-art speculative decoding by jointly optimizing multi-step and multi-draft verification, outperforming existing methods by up to 8.5% in acceptance length.