Search papers, labs, and topics across Lattice.
100 papers published across 5 labs.
Stop wasting RL on easy problems: a difficulty-aware curriculum for SFT and RL unlocks better reasoning in LLMs.
LMMs can slash FLOPs by 89% without sacrificing accuracy, thanks to a frequency-modulated visual restoration technique that preserves crucial visual semantics even with fewer tokens.
LLMs can now synthesize high-performance kernels for niche hardware like NPUs, even with limited data, thanks to a self-evolving agent that bootstraps and refines code via value-driven reinforcement learning.
By intelligently suppressing boundary outliers before quantization, BS-KMQ slashes quantization error by 3x and boosts energy efficiency by 24x in in-memory computing.
Stop wasting compute on easy and impossible examples: PACED distillation focuses your student model's training on the sweet spot where it actually learns.
Stop wasting RL on easy problems: a difficulty-aware curriculum for SFT and RL unlocks better reasoning in LLMs.
LMMs can slash FLOPs by 89% without sacrificing accuracy, thanks to a frequency-modulated visual restoration technique that preserves crucial visual semantics even with fewer tokens.
LLMs can now synthesize high-performance kernels for niche hardware like NPUs, even with limited data, thanks to a self-evolving agent that bootstraps and refines code via value-driven reinforcement learning.
By intelligently suppressing boundary outliers before quantization, BS-KMQ slashes quantization error by 3x and boosts energy efficiency by 24x in in-memory computing.
Stop wasting compute on easy and impossible examples: PACED distillation focuses your student model's training on the sweet spot where it actually learns.
Achieve real-time photorealistic image enhancement without sacrificing visual quality or semantic consistency, thanks to a novel hybrid training strategy for GANs.
Forget slow FP64: this work unlocks efficient double-precision matrix multiplication on modern GPUs by adapting the Ozaki-II scheme to run on faster FP8 hardware.
LoRA fine-tuning can significantly boost the voice cloning capabilities of LLM-based TTS systems, but only if the training data is acoustically diverse enough.
By combining differentiable indexing with isotropic geometric optimization, DGI achieves state-of-the-art generative retrieval, especially for long-tail items that are often missed by other methods.
Achieve up to 12x greater sample efficiency in reasoning tasks by relaxing strict imitation constraints in on-policy distillation, enabling smaller models to match the performance of much larger ones.
Humanoid robots can now reliably transport objects on a tray in the real world, thanks to a hierarchical RL approach that isolates and cancels gait-induced disturbances.
Forget hand-tuning rollout budgets: $V_{0.5}$ dynamically allocates compute to sparse RL rollouts based on a real-time statistical test of a generalist value model's prior, slashing variance and boosting performance.
AI electricity demand won't necessarily explode as AI scales – whether it does or doesn't hinges on sustained efficiency improvements outpacing income-driven demand.
Forget ZKPs: this federated learning scheme uses "self-destructing" backdoors to verify aggregation integrity, achieving 1000x speedups over traditional crypto.
Achieve the seemingly impossible: ASTER uses RL to enable cable-suspended quadrotors to perform autonomous inverted flight.
Forget training on massive datasets: this new diffusion policy learns human-like 3D scanning strategies that generalize to unseen objects while being robust to noise.
Training embodied intelligence models just got 40x faster thanks to a thousand-GPU cloud platform and a suite of optimizations spanning data pipelines, model architecture, and infrastructure.
By selectively injecting teacher demonstrations only during failure, HAPO overcomes the limitations of both pure RL and mixed-policy optimization in sparse-reward RLVR, enabling models to surpass static teacher forcing.
Subtracting the mean from activations unlocks stable FP4 training for LLMs, closing the performance gap with BF16 without complex spectral methods.
Ditch the heuristic latent spaces: Geometric Autoencoders offer a principled way to inject VFM priors into diffusion models, yielding state-of-the-art image generation with better compression and semantic depth.
Forget catastrophic forgetting: this imitation learning framework remembers up to 65% more while improving AUC by 10-17 points on the LIBERO benchmark.
Forget hand-crafted rewards: this new method learns dexterous manipulation by encouraging the robot hand to explore diverse contact patterns on objects, leading to impressive real-world transfer.
SMEs can slash carbon emissions by 37% and costs by 3.6% simply by using Aceso's carbon-aware microservice placement, even with regionally limited infrastructure.
Forget first-order gradients: Geo-ADAPT-VQE slashes energy error by up to 100x in quantum chemistry calculations by intelligently navigating the quantum state space geometry.
Unlock superior trajectories in complex environments with a new ADMM-based solver that jointly optimizes spatial and temporal domains, eliminating the need for complex warm starting.
Ditch discrete visual tokens: UniCom achieves SOTA multimodal generation by compressing continuous semantic representations, unlocking better controllability and consistency in image editing.
Trajectory optimization just got a whole lot faster and more energy-efficient: a GPU-native solver achieves 4x speedup and halves energy consumption compared to optimized CPU baselines.
A compact 0.9B multimodal model, GLM-OCR, achieves state-of-the-art document understanding by predicting multiple tokens at once, boosting decoding throughput without blowing up memory.
Differentiable physics enables high-resolution 3D tomography of subsurface defects by enforcing thermodynamic laws as hard constraints, outperforming traditional methods and PINNs.
Jointly training layered Gaussian splats boosts reconstruction quality by up to 2.6 dB, proving that coordinating optimization across layers is key for progressive 2D Gaussian splatting.
You can slash ASR error rates in low-resource languages by over 60% with a simple continued pretraining recipe.
A single meta-RL policy can now handle 66% mass variations and 70% rotor thrust losses in quadrotors, achieving zero-shot sim-to-real transfer for agile maneuvers.
Forget fine-tuning: this method adapts robots to changing environments by learning a low-dimensional "Trend ID" embedding, keeping the core model fixed.
Ditch the slow diffusion grind: Marigold-SSD delivers zero-shot depth completion in a single step, rivaling discriminative models in speed while retaining diffusion's accuracy.
CD-Raft slashes distributed consensus latency by nearly 50% in cross-domain settings, offering a significant speedup for data-intensive AI workloads.
Forget contrastive learning: LLM2Vec-Gen learns text embeddings by representing the *response* an LLM would generate, unlocking safety and reasoning abilities for embedding tasks.
A single Bayesian Optimization loop can now handle minimization, single-point saddle searches, and double-ended saddle searches on potential energy surfaces, thanks to a unified framework leveraging Gaussian Processes.
Dynamically selecting QR factorization based on condition number estimates dramatically improves the performance of the ChASE library for solving eigenproblems.
Vision-language models can significantly enhance language models through knowledge distillation, even without direct textual understanding, challenging conventional KD paradigms.
Straighter flows, better generations: COT-FM carves up complex generative tasks into simpler, cluster-specific flows, leading to faster and more reliable sampling.
By respecting the intrinsic geometry of the probability simplex, $\alpha$-GaBO significantly outperforms standard Bayesian optimization in tasks involving probabilities and mixtures.
Generative drifting's empirical success is no longer a mystery: it's secretly score matching, but with frequency-dependent convergence bottlenecks that explain the preference for Laplacian kernels.
Make your transformers more robust to noise and improve training dynamics with a surprisingly simple, lightweight "pseudo-projector" module inspired by multigrid methods.
Row-normalized optimizers can match Muon's performance on large language models while being faster in large-token and low-loss regimes, offering a practical alternative for pre-training.
Source-free test-time adaptation for image regression gets a boost with Predictive Spectral Calibration, which aligns target features within the source predictive support and calibrates residual spectral slack, leading to significant performance gains under distribution shifts.
Unlock calibrated uncertainty in Mixture-of-Experts Transformers with VMoER, a Bayesian routing method that slashes calibration error by 94% while barely impacting FLOPs.
Physics-based dynamics models can make or break sim-to-real reinforcement learning, boosting real-world success by 50% in industrial control tasks where simplified models fail.
DendroNNs offer a 4x energy efficiency boost over existing neuromorphic hardware by mimicking dendritic computation and training via a gradient-free rewiring mechanism.
Steer clear of catastrophic forgetting in VLMs with EvoPrompt, a new method that evolves prompts by preserving learned semantic directions while adapting their magnitude.
Forget gradients: this new sampler learns complex distributions, even with discrete parameters, by enforcing time-reversibility and comparing forward and backward Markov trajectories.
Accurately predict material phase diagrams at low temperatures with minimal computational cost by combining classical thermodynamics with modern free energy techniques.
FP64 tensor cores, previously untapped for large-scale scientific computing, now unlock 2x speedups and 83% energy savings in finite element simulations on NVIDIA's latest GPUs.
Skip the expensive proxy model training: this training-free method boosts VLLM performance by up to 4.8% using only 10-15% of the data, simply by measuring how much the question *changes* the model's view of the answer.
Forget manual hyperparameter tuning: OptEMA achieves near-optimal deterministic convergence in zero-noise stochastic optimization, adapting automatically.
LLMs can learn new tasks without forgetting old ones, thanks to a memory-aware replay strategy that selectively rehearses important examples.
Distributing SciML models with hardware and physics awareness slashes latency and energy consumption by over 8x and 33x, respectively, while paradoxically *improving* reconstruction fidelity.
Regularizing Lipschitz constants in MLPs within neural oscillators provably and practically enhances generalization, offering a path to more robust learning of complex dynamical systems.
Spatial audio cues and directional priors can be jointly learned end-to-end to significantly boost keyword spotting accuracy in noisy environments, outperforming traditional cascaded approaches.
Achieve up to 7.24% code-size reduction by identifying and extracting idempotent backward slices, enabling the merging of non-contiguous instruction sequences within and across functions.
Forget laboriously sifting through layers or datasets for PEFT: GAST co-optimizes both, adaptively picking the most impactful data for each layer based on gradient alignment.
Achieve near-FP32 image restoration performance with an Int8 model that runs at 442 FPS on NVIDIA Jetson Orin, all thanks to a quantization-aware distillation framework that avoids decoder distillation.
Forget waiting hours: this MORL framework achieves 270x speedups on robotics tasks thanks to GPU-native parallelization.
Ditch finicky gradient descent: this paper recasts Transformer training as an optimal control problem, guaranteeing global optimality and robustness.
Get 6x the RLHF alignment for your LLM with a new active learning pipeline that focuses on annotating the most informative response pairs.
Tighter privacy guarantees and higher utility in language models are simultaneously achievable via a principled parameter clipping strategy for Nonparametric Variational Differential Privacy.
Forget parameter counts – the true memorization capacity of deep ReLU networks is fundamentally bounded by the product of squared width and squared depth, $W^2L^2$, scaling linearly with data size.
Finally, analog joint source-channel coding can be deployed on standard digital transceivers, unlocking the potential of semantic communication on existing infrastructure.
ConvNets strike back: a ConvNeXt-based diffusion model matches Transformer performance at half the FLOPs and 7x faster training, all on just 4 GPUs.
TMFGs can now scale to millions of data points thanks to a-TMFG, which approximates the correlation matrix on-the-fly using kNN graphs and clever memory management.
Chamfer distance, the workhorse loss for point cloud tasks, can actually *increase* when you optimize it, unless you use non-local coupling to avoid gradient collapse.
GP Thompson Sampling's reliance on probability $\delta$ dooms it to polynomial regret, a stark contrast to GP-UCB's more favorable bounds.
Get up to 24x faster sine/cosine calculations on ESP32 microcontrollers by dynamically switching between fixed-point and floating-point precision.
Muon's "one-size-fits-all" spectral update is holding back your models: Mousse adapts to curvature and cuts training time by 12%.
Ditching Gaussian and Poisson noise assumptions in NMF can dramatically improve model fit and feature recovery, especially when using Tweedie and Negative Binomial distributions for overdispersed data.
Forget fine-tuning: this training-free method boosts retrieval accuracy for tricky negation queries by up to 10% using clever embedding optimization.
Achieve higher accuracy and faster convergence in split learning by intelligently pruning communication channels based on label awareness.
RiO-DETR makes real-time oriented object detection with transformers a reality by cleverly decoupling angle estimation and injecting angular diversity into dense supervision.
Task specialization in robot swarms doesn't always improve efficiency, especially when you're on a tight optimization budget.
Pretrained ALiBi transformers suffer from a widespread attention collapse that can be surgically repaired to yield a 25% perplexity improvement, suggesting that standard pretraining leaves performance on the table.
Tensor-based PEFT methods like LoRETTA can dramatically reduce catastrophic forgetting in sequential learning by capturing richer structural information within compact parameter budgets.
Forget return curves – a simple measure of neuron activation patterns (OUI) at just 10% of training can predict PPO performance better than existing methods, enabling early pruning of bad runs.
Achieve up to 23% better prediction accuracy in manufacturing surrogate modeling by jointly modeling inter-task similarity and data fidelity using a hierarchical Bayesian approach.
By explicitly modeling mid-to-high frequency patterns often ignored by existing methods, FreqCycle unlocks state-of-the-art time series forecasting accuracy while maintaining faster inference.
Achieve comparable speech restoration quality with conditional diffusion models using 10x fewer neural network evaluations via a novel iSDE solver.
Quantifying uncertainty in physics-informed neural networks for medical imaging boosts accuracy and reliability, leading to better stroke assessment.
Stop CIL models from catastrophically forgetting by explicitly minimizing causal incompleteness within tasks and maximizing separability between tasks.
FrameDiT achieves state-of-the-art video generation by ditching token-level attention for a novel matrix-based attention that operates directly on entire frames.
Forget shaving yaks – this new protocol slashes communication costs in distributed expert learning while *improving* regret bounds.
Robots can now achieve superior surface coverage with precise end-effector poses thanks to a new SE(3)-aware Stein Variational Gradient Descent method that outperforms existing trajectory optimization techniques.
Time series anomaly detection gets a boost from temporal-conditioned normalizing flows that capture complex temporal dynamics and uncertainty.
Gordon's comparison theorem bridges the gap between complex ML training dynamics and tractable surrogate systems, offering a path to more accurate non-asymptotic analysis.
Transformers get a surprising boost in language modeling performance by simply ignoring "themselves" during attention.
Stop catastrophic forgetting in continual learning by better aligning your classifiers to your feature backbone with a new loss function.
Even when paraphrasing content that explicitly contradicts a teacher's preferences, language models can still subliminally learn those preferences, raising serious concerns about bias propagation in self-training scenarios.
Bridging the gap between deep learning and neuroscience, this work presents a biologically plausible alternative to backpropagation through time, potentially unlocking new avenues for brain-inspired AI.
Forget parameter conflicts: representational incompatibility is the real culprit behind LLM merging failures, setting fundamental limits on which tasks can be successfully combined.
Forget ensembling or retraining: model merging lets you Frankenstein LLMs for specialized skills at minimal cost.
Achieve up to two orders of magnitude reduction in semantic communication rate by strategically incorporating common randomness in a privacy-preserving distributed computation framework.
Forget SLAM, ReCoSplat uses a "Render-and-Compare" module to autoregressively refine Gaussian Splatting reconstructions, even from unposed video, achieving SOTA novel view synthesis.
Quadruped robots can now learn to navigate complex, real-world environments in minutes, not hours, thanks to a new RL framework that prioritizes safety and efficient exploration.