Search papers, labs, and topics across Lattice.
100 papers published across 4 labs.
Forget imbalanced LoRA usage: ReMix leverages reinforcement learning to route effectively among LoRAs, boosting performance in parameter-efficient fine-tuning.
G-STAR tackles long-form, multi-speaker ASR by giving Speech-LLMs time-aware speaker tracking, enabling robust identity linking across chunks.
Exploit the surprisingly stable, yet heterogeneous, sparsity patterns across attention heads to slash LLM attention latency by 2.88x without sacrificing quality.
By modeling contextual relationships between DNS queries, DNS-GT significantly improves domain name embedding quality, leading to better performance in botnet detection and domain classification.
Achieve real-time photorealistic image enhancement without sacrificing visual quality or semantic consistency, thanks to a novel hybrid training strategy for GANs.
G-STAR tackles long-form, multi-speaker ASR by giving Speech-LLMs time-aware speaker tracking, enabling robust identity linking across chunks.
Exploit the surprisingly stable, yet heterogeneous, sparsity patterns across attention heads to slash LLM attention latency by 2.88x without sacrificing quality.
By modeling contextual relationships between DNS queries, DNS-GT significantly improves domain name embedding quality, leading to better performance in botnet detection and domain classification.
Achieve real-time photorealistic image enhancement without sacrificing visual quality or semantic consistency, thanks to a novel hybrid training strategy for GANs.
By combining differentiable indexing with isotropic geometric optimization, DGI achieves state-of-the-art generative retrieval, especially for long-tail items that are often missed by other methods.
Hyper-redundant robots get a 75% accuracy boost thanks to a neural network that adaptively blends learned behavior with kinematic priors.
Diffusion Transformers can be accelerated by up to 7x with nearly lossless performance using a training-free method that selectively computes on sparse anchor tokens, outperforming existing temporal acceleration techniques.
Explicitly aligning audio and video streams in a multimodal Transformer boosts emotion recognition, showing that ignoring frame-rate differences hurts performance.
Ditch slow, multi-step sampling for target speaker extraction: AlphaFlowTSE achieves faster, one-step generation with improved speaker similarity and real-world generalization.
Ditch the heuristic latent spaces: Geometric Autoencoders offer a principled way to inject VFM priors into diffusion models, yielding state-of-the-art image generation with better compression and semantic depth.
Quantum-Centric Supercomputers promise to break down the barriers between quantum and classical computing, enabling seamless hybrid algorithms and accelerating discovery across applications.
Get faster long-context LLM inference without sacrificing accuracy: LookaheadKV predicts KV cache importance, outperforming costly draft generation methods by 14.5x.
Quantum computers and molecular clocks just got a boost: researchers have achieved coherent control of forbidden vibrational transitions in single nitrogen molecular ions.
Representing graphs as strings with a guaranteed-valid instruction set unlocks language model-based approaches for graph similarity, generation, and conditioned modeling.
Quantifying the overhead of post-quantum cryptography reveals exactly where the performance bottlenecks lie in real-world TLS 1.3 transactions.
A GCN model trained on static analysis reports can achieve near-perfect accuracy in distinguishing true vulnerabilities from false positives, even uncovering genuine security weaknesses missed by the original SAST tools.
This new OCR model beats Gemini-3.1-Pro and Qwen3-VL-235B on key information extraction, thanks to its clever "Layout-as-Thought" process that recovers layout grounding in end-to-end OCR.
Ditch discrete visual tokens: UniCom achieves SOTA multimodal generation by compressing continuous semantic representations, unlocking better controllability and consistency in image editing.
A compact 0.9B multimodal model, GLM-OCR, achieves state-of-the-art document understanding by predicting multiple tokens at once, boosting decoding throughput without blowing up memory.
Differentiable physics enables high-resolution 3D tomography of subsurface defects by enforcing thermodynamic laws as hard constraints, outperforming traditional methods and PINNs.
A single LLM can now handle both non-streaming and streaming ASR, opening the door to more flexible and efficient speech recognition systems.
Jointly training layered Gaussian splats boosts reconstruction quality by up to 2.6 dB, proving that coordinating optimization across layers is key for progressive 2D Gaussian splatting.
A pipelined FPGA architecture slashes the power consumption of JPEG XS's Intra Pattern Copy displacement vector search, enabling practical hardware deployment for low-latency image compression.
A single system now rivals or beats specialized models across ASR, voice activity detection, language ID, and punctuation, setting a new bar for industrial-grade speech processing.
Ditch the interleaved item-action token mess: new architectures slash sequence complexity by 50% in generative recommenders, boosting performance and cutting training time.
Straighter flows, better generations: COT-FM carves up complex generative tasks into simpler, cluster-specific flows, leading to faster and more reliable sampling.
Backdoor triggers in ViTs leave a surprisingly clear signature: a linear direction in activation space that can be directly manipulated to activate or deactivate the backdoor.
A 4B-parameter model, InternVL-U, outperforms 14B-parameter models in multimodal generation and editing, proving that size isn't everything.
Generative drifting's empirical success is no longer a mystery: it's secretly score matching, but with frequency-dependent convergence bottlenecks that explain the preference for Laplacian kernels.
Make your transformers more robust to noise and improve training dynamics with a surprisingly simple, lightweight "pseudo-projector" module inspired by multigrid methods.
Row-normalized optimizers can match Muon's performance on large language models while being faster in large-token and low-loss regimes, offering a practical alternative for pre-training.
Unlock calibrated uncertainty in Mixture-of-Experts Transformers with VMoER, a Bayesian routing method that slashes calibration error by 94% while barely impacting FLOPs.
DendroNNs offer a 4x energy efficiency boost over existing neuromorphic hardware by mimicking dendritic computation and training via a gradient-free rewiring mechanism.
By injecting geological priors into the attention mechanism, GIAT achieves state-of-the-art lithology identification while also improving the interpretability of the model's predictions.
On-device LLM inference can be sped up by an order of magnitude with a flexible TrustZone-based system that selectively protects memory and the NPU.
State-of-the-art skeleton-based action recognition is now possible through a game-theoretic contrastive learning framework that maximizes action-relevant information while minimizing encoding redundancy.
ZipPIR delivers SimplePIR-level throughput without the massive client-side storage, finally making high-performance private information retrieval practical for resource-constrained devices.
On-device LLM inference with PIM is now more practical: PIM-SHERPA resolves memory inconsistencies, slashing memory capacity needs by ~50% without sacrificing performance.
Ditch the latency tax of traditional scheduling: this new approach delivers data "just-in-time" for safety-critical systems, boosting performance without sacrificing reliability.
By strategically increasing hash collisions, Nemo slashes write amplification in flash caches for tiny objects, a persistent bottleneck even with advanced SSDs.
A virtualized XRootD frontend can sustain over 50 Gb/s throughput in real-world large-scale WAN transfers, challenging assumptions about virtualization overhead in high-performance data systems.
BinaryAttention proves you can more than halve the runtime of attention in vision and diffusion transformers without sacrificing accuracy, simply by using the sign of queries and keys.
Forget manual hyperparameter tuning: OptEMA achieves near-optimal deterministic convergence in zero-noise stochastic optimization, adapting automatically.
A hierarchical graph attention network beats traditional machine learning models by 21% in predicting spectrum demand, offering a more reliable approach to spectrum management.
A complete, GPU-accelerated bimanual mobile manipulation platform can be built for under $1300, opening up robotics research and education to a wider audience.
Regularizing Lipschitz constants in MLPs within neural oscillators provably and practically enhances generalization, offering a path to more robust learning of complex dynamical systems.
Spatial audio cues and directional priors can be jointly learned end-to-end to significantly boost keyword spotting accuracy in noisy environments, outperforming traditional cascaded approaches.
Forget blurry sketch-to-image outputs: this method uses component-aware self-attention and coordinate-preserving fusion to generate photorealistic images with unprecedented fidelity and spatial accuracy.
By computing the *difference* between attention maps, DCAU-Net achieves state-of-the-art medical image segmentation while dramatically reducing computational cost compared to standard self-attention.
Ignoring CSI phase information in robotic activity recognition is a mistake: fusing it with amplitude data in a novel gated BiLSTM architecture significantly boosts accuracy and robustness.
Nezha shatters I/O bottlenecks in distributed key-value stores by decoupling key-value persistence within Raft, yielding up to 4.6x throughput gains.
Physics-informed neural operators can drastically improve the accuracy and stability of phase-field modeling, outperforming standard neural operators in complex materials simulations.
Forget interference as just noise: correlated features in neural networks can constructively superpose to form semantic clusters, especially with weight decay.
By recombining subgraphs from sparse models without retraining, "model stitching" creates a diverse set of model variants that significantly improves the efficiency of multi-DNN inference on edge SoCs.
Ditch finicky gradient descent: this paper recasts Transformer training as an optimal control problem, guaranteeing global optimality and robustness.
Forget parameter counts – the true memorization capacity of deep ReLU networks is fundamentally bounded by the product of squared width and squared depth, $W^2L^2$, scaling linearly with data size.
ConvNets strike back: a ConvNeXt-based diffusion model matches Transformer performance at half the FLOPs and 7x faster training, all on just 4 GPUs.
TMFGs can now scale to millions of data points thanks to a-TMFG, which approximates the correlation matrix on-the-fly using kNN graphs and clever memory management.
A robot can now achieve 90% success in peg-in-hole tasks, even with only 0.1mm clearance, by intelligently fusing vision and tactile feedback when visual occlusion occurs.
Double the emotion conversion accuracy in voice conversion models with a simple prefix that jointly controls sequence modulation and acoustic realization.
Muon's "one-size-fits-all" spectral update is holding back your models: Mousse adapts to curvature and cuts training time by 12%.
Achieve RAG efficiency without sacrificing accuracy: LooComp prunes context by identifying and retaining only the most critical sentences for answering a query.
Unlock full-duplex speech-to-speech dialogue without VAD limitations using chunk-wise micro-turns and special control tokens to steer LLM behavior in a cascaded pipeline.
RiO-DETR makes real-time oriented object detection with transformers a reality by cleverly decoupling angle estimation and injecting angular diversity into dense supervision.
DRIFT achieves state-of-the-art object detection performance on 4D radar point clouds by fusing local and global contexts with a novel dual-representation transformer architecture.
Pretrained ALiBi transformers suffer from a widespread attention collapse that can be surgically repaired to yield a 25% perplexity improvement, suggesting that standard pretraining leaves performance on the table.
Tensor-based PEFT methods like LoRETTA can dramatically reduce catastrophic forgetting in sequential learning by capturing richer structural information within compact parameter budgets.
By learning visual representations from scene-level semantics down to pixel-level details, C2FMAE overcomes the limitations of both contrastive learning and masked image modeling.
By explicitly modeling mid-to-high frequency patterns often ignored by existing methods, FreqCycle unlocks state-of-the-art time series forecasting accuracy while maintaining faster inference.
Prompt engineering is dead; long live context engineering—the key to scaling multi-agent AI systems lies in carefully designing the agent's informational environment, not just individual prompts.
Quantifying uncertainty in physics-informed neural networks for medical imaging boosts accuracy and reliability, leading to better stroke assessment.
Stop CIL models from catastrophically forgetting by explicitly minimizing causal incompleteness within tasks and maximizing separability between tasks.
FrameDiT achieves state-of-the-art video generation by ditching token-level attention for a novel matrix-based attention that operates directly on entire frames.
Time series anomaly detection gets a boost from temporal-conditioned normalizing flows that capture complex temporal dynamics and uncertainty.
Gordon's comparison theorem bridges the gap between complex ML training dynamics and tractable surrogate systems, offering a path to more accurate non-asymptotic analysis.
$P^2$GNN's plug-and-play prototype approach boosts GNN performance by injecting global context and denoising local neighborhoods, achieving state-of-the-art results across diverse datasets.
Transformers get a surprising boost in language modeling performance by simply ignoring "themselves" during attention.
Bridging the gap between deep learning and neuroscience, this work presents a biologically plausible alternative to backpropagation through time, potentially unlocking new avenues for brain-inspired AI.
Forget parameter conflicts: representational incompatibility is the real culprit behind LLM merging failures, setting fundamental limits on which tasks can be successfully combined.
YOLO architecture search can now be sped up dramatically: a new surrogate benchmark lets you evaluate designs without full training, and it's good enough to find architectures that beat YOLOv12.
State-of-the-art language models might be too sophisticated: simpler n-gram statistics better explain human reading times.
Forget confidence scores: a modality-aware early exit strategy for spoken language models slashes decoding costs without sacrificing accuracy or perceptual quality, revealing that speech tokens require specialized handling compared to text.
Forget SLAM, ReCoSplat uses a "Render-and-Compare" module to autoregressively refine Gaussian Splatting reconstructions, even from unposed video, achieving SOTA novel view synthesis.
Achieve a 277x speedup in autoregressive video generation by distilling diffusion models with a novel "diagonal distillation" approach that leverages temporal context and mitigates error propagation.
Autonomous racecars can now learn tire dynamics 71% faster and with 60% higher accuracy by "seeing" the road surface and remembering past driving behavior.
Don't fully retrain your draft model after fine-tuning your LLM: EDA restores speculative decoding performance with significantly less compute by adapting only a small, private component and regenerating training data.
Forget imbalanced LoRA usage: ReMix leverages reinforcement learning to route effectively among LoRAs, boosting performance in parameter-efficient fine-tuning.
Mimicking human eye movements with a Vision Transformer's attention maps yields a surprisingly effective and efficient image classification strategy.
Beat the state-of-the-art in radio signal separation by 122x using a transformer trained on cross-entropy loss, and the same architecture could work for gravitational waves.
Achieve more efficient reasoning in Transformers without increasing test-time cost by using training-only techniques that guide attention and dynamically adjust sharpness.
Noise in photonic quantum systems severely limits the performance of quantum machine learning algorithms, demanding robust noise mitigation strategies for practical implementations.
Forget gradient descent: this new method routes transformer activations through a Hopfield-inspired memory in a single forward pass to achieve state-of-the-art online continual learning.
Optimal transport provides a surprisingly tight and efficiently computable bound on transductive generalization in graph node classification, revealing how GNN depth impacts representation geometry.
Imperfect code from LLMs can still teach AI to understand circuit structure, unlocking a scalable path to netlist representation learning without expensive, clean datasets.
Forget black-box audio synthesis: this differentiable engine sound model gives you interpretable knobs to control physical parameters like valve dynamics and exhaust resonances.
LLMs suffer from a severe gradient bottleneck in the output layer, suppressing 95-99% of the gradient norm and crippling training.
Mamba-2's efficiency doesn't require custom CUDA kernels: XLA's compiler optimizations are enough to unlock near-optimal performance across diverse hardware.
Achieve better video editing without retraining by dynamically locking background features based on a "hallucination metric" that detects when the diffusion model is about to go astray.
Mixture-of-Experts models might be hiding more of their reasoning than we thought, thanks to a newly quantified "opaque serial depth" metric.
Forget wavelets, transformers with Koopman operator-derived features unlock superior ECG classification, especially in complex multi-class scenarios.