Search papers, labs, and topics across Lattice.
100 papers published across 3 labs.
Bicycle robots can now do front-flips, thanks to a reinforcement learning method that bootstraps from dynamically infeasible reference motions.
Tucker Attention squeezes an order of magnitude more parameter efficiency out of attention layers, while unifying and simplifying Group Query Attention, Multi-Head Latent Attention, and standard Multi-Head Attention.
Forget hand-crafted features: DistilBERT can automatically identify parallelizable loops in code with >99% accuracy, opening the door to more efficient automatic parallelization.
Quantum chemistry's density matrix approach reveals interpretable early warning signals of phase transitions in deep learning, from grokking to emergent misalignment.
Chess transformers trained solely on move sequences face a "dual-capability bottleneck" where excelling at both state tracking and decision-making requires carefully balancing data diversity and quality, a tension that simple scaling cannot resolve.
Tucker Attention squeezes an order of magnitude more parameter efficiency out of attention layers, while unifying and simplifying Group Query Attention, Multi-Head Latent Attention, and standard Multi-Head Attention.
Forget hand-crafted features: DistilBERT can automatically identify parallelizable loops in code with >99% accuracy, opening the door to more efficient automatic parallelization.
Quantum chemistry's density matrix approach reveals interpretable early warning signals of phase transitions in deep learning, from grokking to emergent misalignment.
Chess transformers trained solely on move sequences face a "dual-capability bottleneck" where excelling at both state tracking and decision-making requires carefully balancing data diversity and quality, a tension that simple scaling cannot resolve.
Multimodal AI models learn to be lazy, often ignoring entire modalities, and current active learning methods don't fix the problem.
Radically simpler train loading plans are now possible by implicitly modeling rehandle costs, slashing the complexity of optimization problems.
By mixing flows and using a teacher-student approach, MMAE learns to classify encrypted traffic more accurately than previous masked autoencoders.
By disentangling headers and payloads with a Mixture-of-Experts architecture, TrafficMoE achieves state-of-the-art encrypted traffic classification, proving that heterogeneity-aware modeling is crucial for extracting discriminative features from noisy, encrypted data.
Target networks don't have to be a necessary evil: aligning online and target network estimates can actually *accelerate* RL convergence.
Forget ensembles and retraining: estimate LLM uncertainty with just a single forward-backward pass by assuming parameter covariance isotropy.
You can shrink a spacecraft anomaly detection model by 97% and still catch almost all the problems.
Real-time vocal denoising is now possible with deep learning, achieving significant SNR improvements at under 10ms latency.
Grokking isn't just about local circuits or optimization tricks, but a global structural collapse of redundant model manifolds, revealing a deep connection between compression and generalization.
Forget expensive finetuning: DUME dynamically combines existing expert LLMs into a powerful MoE *without* additional training, unlocking multi-domain performance at minimal cost.
LLMs can better capture human semantic similarity by predicting sets of related concepts instead of single next tokens.
Now, clients can actually *verify* that their data has been removed from a federated learning model, even when the server is untrusted.
LLMs aren't the only path to vulnerability detection: a GNN-based model achieves near-parity with 100x less overhead.
Single-pixel imaging gets a deep learning boost: SISTA-Net leverages learned sparsity and hybrid CNN-VSSM architectures to achieve state-of-the-art reconstruction quality, even in noisy underwater environments.
By directly optimizing clinical dose-volume histogram (DVH) metrics, this method produces 3D dose predictions that more closely align with clinical treatment planning criteria than traditional voxel-wise approaches.
Forget expensive labels: CoRe-DA leverages contrastive learning and self-training to achieve state-of-the-art surgical skill assessment across diverse surgical environments without requiring target domain annotations.
Diffusion models can beat discriminative classifiers at facial expression recognition, but only with a dynamically adjusted margin loss that accounts for per-sample difficulty.
Stop averaging prototypes blindly: FedDBP uses Fisher information to intelligently fuse local prototypes, significantly boosting performance in heterogeneous federated learning.
Passive iFIR filters learned from just three minutes of robot data can dramatically outperform optimized PID controllers in velocity tracking tasks, offering a fast and stable alternative for robot control.
By optimizing PID gains with MPPI, this method achieves comparable performance to conventional MPPI with significantly fewer samples, offering a more sample-efficient approach to learning-based control.
Get kilohertz-level dexterous hand teleoperation *with* formal safety guarantees, thanks to a new convex optimization approach.
Quantum circuit compilation, a major bottleneck, can be sped up by over 15x with minimal overhead using a new parallelization technique validated on 8000 large-scale, configurable random circuits.
Dataflow networks can achieve significant energy savings without sacrificing throughput by strategically powering down actors during idle periods, a balance efficiently discovered using a novel "Hop and Skip" exploration strategy.
Pinpointing performance bottlenecks in large-scale AI training just got 100x faster, thanks to a new system that watches the whole stack without slowing things down.
Achieve up to 4.17x speedup in DRL training by intelligently partitioning tasks across CPUs, FPGAs, and AI Engines on AMD Versal ACAP, demonstrating the power of hardware-aware algorithm design.
Unlock 600,000x faster TSV design by replacing computationally expensive full-wave simulations with physics-informed graph neural networks.
Forget the cold start: training transformers for protein structure prediction peaks at intermediate temperatures, revealing a sweet spot in the loss landscape.
Calculating excited states of molecules with thousands of atoms, previously a computational bottleneck, is now practical on a single GPU thanks to a new implementation of TDDFT-risp.
Scanning every token to focus attention is now passé: HISA prunes irrelevant context blocks *before* token-level scoring, slashing compute without sacrificing selection fidelity.
Forget backpropagation through time: recurrent networks already have temporal credit baked into their forward pass.
Forget painstaking hyperparameter tuning: this hypersphere parameterization lets you transfer a single learning rate across model sizes, depths, and even MoE architectures, slashing compute costs by 1.58x.
Forget heuristics – this work gives provable conditions for *when* and *how* auxiliary data actually improve generalization in transfer learning.
Correcting errors early in the diffusion process matters more than fixing them later: Stepwise-Flow-GRPO leverages this insight to dramatically improve RL-based flow model training.
Unlock $\sqrt{N}$ regret in offline policy learning, even with complex policy classes, by trading off policy and environment complexity.
Backpropagation-free test-time adaptation can be both accurate and efficient: PACE achieves state-of-the-art accuracy while slashing runtime by over 50%.
Models can dynamically grow their own capacity during continual learning, adding parameters only when and where they're needed, without human intervention.
Actor-critic methods can achieve state-of-the-art sample complexity in linear MDPs *without* relying on computationally expensive implicit policies or strong assumptions about exploration.
Narrow ResNets can struggle to represent critical points in input-output mappings, effectively pushing them to infinity and hindering accurate function approximation.
Scaling laws work so well because they capture the essence of computation, not the specifics of implementation, leading to a persistent efficiency arms race.
Escape the tyranny of ill-conditioned optimization landscapes: Yau's Affine Normal Descent offers provably robust convergence by intrinsically adapting to anisotropic curvature through volume-preserving affine invariance.
Higher-order neural networks don't need hypergraphs: SHONNs unlock their power for general-purpose feedforward architectures by sidestepping stability and scaling issues.
Neural networks can turbocharge classical optimization for high-dimensional matrix estimation, achieving faster convergence without sacrificing theoretical guarantees.
Classical models of hydrogen storage in geological formations fall apart when applied to diverse samples, but this physics-informed neural network nails it, achieving R2 = 0.9544.
Second-order federated learning can be made robust and practical: FedRCO overcomes instability issues and outperforms first-order methods in non-IID settings.
Forget smooth sailing: FI-KAN's fractal bases let neural networks conquer non-smooth functions and PDEs with up to 79x better accuracy.
LLMs can reason more accurately and concisely when RL is guided by token-level entropy, pinpointing and exploring "forks in the road" during the reasoning process.
Row/column normalization *before* orthogonalization can significantly boost convergence and reduce validation perplexity in LLaMA2 pretraining, outperforming the base Muon optimizer.
Transformers can now predict with an explicit internal structure of uncertainty, enabling stronger probabilistic evaluation and a more informative analysis of model behavior.
Differentiable Power-Flow unlocks scalable, gradient-based optimization for power grid management, outperforming traditional methods and enabling new applications like real-time contingency analysis.
Unconstrained bandit linear optimization can be surprisingly reduced to standard online linear optimization using a perturbation approach, unlocking new regret guarantees and high-probability bounds.
Federated learning can overcome data sparsity and privacy concerns to improve livestock growth prediction using real-world farm data.
Forget expensive verification: training networks to be *trivially* verifiable yields state-of-the-art Lipschitz bounds and adversarial robustness.
Agentic RL rollouts are bottlenecked by long-tail trajectory generation, but Heddle's trajectory-centric approach achieves 2.5x higher throughput.
Dataset condensation, already vulnerable to backdoor attacks, now faces a far stealthier threat: InkDrop leverages decision boundary uncertainty to hide malicious triggers, making detection significantly harder.
A lightweight transformer can forecast optical amplifier failures in real-time, paving the way for self-healing networks.
Random weight initialization is a major source of instability in deep learning, especially for rare classes, but this work shows how to eliminate it entirely with structured orthogonal initialization.
Achieve near state-of-the-art OCR accuracy with 95% less compute by decoupling character detection from language correction and training the language model on synthetic noise alone.
FedDES achieves instance-level personalization in federated learning by dynamically selecting and weighting peer models with a GNN, leading to significant performance gains in heterogeneous environments.
Disentangling perception and reasoning with role-specific rewards in multimodal LLMs boosts accuracy by 7 points, revealing a critical bottleneck in existing joint optimization approaches.
Adversarial training unlocks domain-invariant prompts for CLIP, boosting zero-shot generalization beyond standard prompt tuning.
Achieve state-of-the-art segmentation accuracy on drivable-area and lane segmentation tasks with a model under 5M parameters, demonstrating that high performance doesn't always require massive architectures.
Quantum circuits can match classical MLPs on EEG classification tasks while using 50x fewer parameters, thanks to differentiable quantum architecture search that automatically optimizes circuit topology.
RecycleLoRA reveals that strategically targeting minor subspace directions in VFMs with LoRA adapters can unlock surprisingly robust domain generalization in semantic segmentation.
A new swarm-based optimization algorithm, inspired by dogfighting but built on kinematic equations, achieves state-of-the-art performance across diverse benchmark and real-world engineering problems.
Autonomous architecture search for molecular transformers is surprisingly fruitless: you're better off just tuning learning rates.
Demystifying LLMs for the masses might be as simple as turning their mechanics into a game.
Forget retraining the whole model when adding a new image degradation type – this modular routing approach lets you plug in a new expert with minimal overhead.
Wavelet decomposition offers a surprisingly effective way to disentangle anatomical structure from domain-specific noise in fundus images, leading to state-of-the-art generalization performance.
Simple fine-tuning with a parallel decoder and smart learning rate schedule lets you beat more complex meta-learning approaches in cross-domain few-shot object detection.
Unlock 5x faster autoregressive image generation by using a single entropy signal to simultaneously optimize draft prediction and enable single-step diffusion decoding.
Overcome reward sparsity in medical visual grounding by dynamically tightening reward criteria based on model performance, leading to improved localization accuracy and training stability.
Event cameras, fused with traditional frames using an energy-aware approach, can significantly boost the accuracy of autonomous vehicle steering prediction.
Forget painstakingly tuning MPC controllers by hand: this method learns optimal humanoid locomotion policies by aligning MPC cost functions with high-fidelity RL data.
Bicycle robots can now do front-flips, thanks to a reinforcement learning method that bootstraps from dynamically infeasible reference motions.
Achieve FP16-level LLM accuracy at 3-bit quantization, unlocking 1.5x faster inference than 4-bit methods on consumer GPUs.
Squeezing loop control down to <10% of array resources unlocks near-zero-overhead parallel loop acceleration on Tightly Coupled Processor Arrays.
Injecting carefully-selected, reverse-ordered behavioral curricula into generative recommendation models can significantly boost conversion rates, as demonstrated by a 2% lift in online advertising revenue.
Forget painstakingly tuning data mixture ratios for continual pre-training: OptiMer lets you train individual models and then *optimize* their combination weights *afterward*, cutting search costs by up to 35x.
Efficiency is the key bottleneck preventing video generation models from becoming general-purpose world simulators, and this paper provides a taxonomy of techniques to overcome it.
LLMs can now automatically evolve and optimize GPU kernels to beat hand-tuned and proprietary models like Gemini and Claude.
Squeezing the most out of your MLLM's visual budget is now possible: ResAdapt learns to allocate visual tokens intelligently *before* encoding, boosting performance by 15% while processing 16x more frames at the same cost.
Hadamard rotations unlock near-lossless 5-bit quantization for LLMs, outperforming standard techniques without calibration data.
By cleverly repurposing an unused sign bit, IF4 achieves superior quantization performance compared to NVFP4 without increasing bit-width.
Automating the messy process of post-training quantization, OneComp lets you compress generative AI models with a single line of code.
Forget fine-tuning: merging language-specific weights into instruction-tuned LLMs unlocks surprisingly effective instruction following in low-resource languages.
Blockchain-based federated learning can be made practical by using multi-task peer prediction to overcome the computational bottleneck of contribution measurement.
Slash malware detection labeling costs by 90% using combined active and semi-supervised learning, without sacrificing performance.
Forget expensive full fine-tuning: this training-free data selection method uses in-context learning to slash MLLM training costs while maintaining performance.
Zero-shot visuotactile policies trained in a fast, parallelized simulator can directly control real robots in contact-rich tasks.
Forget fixed memory budgets: dynamically allocating exemplar storage across federated clients boosts performance in class-incremental learning for heterogeneous medical data.
Intra-warp load imbalance, a major bottleneck in GPU-accelerated Electronic Design Automation, can be eliminated through warp-level parallel orchestration, leading to significant speedups in static timing analysis.
Save time and resources: predict federated learning performance *before* deployment by quantifying dataset and client complexity.
Differentiable optimization can supercharge classical ILP solvers, slashing runtime by 10x on combinatorial scheduling problems.
Training large models without communication overhead is now plausible: OptINC uses optical interconnects to perform gradient averaging and quantization directly in the network.
Achieve up to 32.1% energy-delay product improvement in high-speed adders by co-optimizing prefix topology and standard cell mapping, outperforming commercial synthesis tools.
You can boost ranking model performance in low-traffic recommendation systems by directly distilling knowledge from a large-scale, but different, domain like video recommendations.