Search papers, labs, and topics across Lattice.
100 papers published across 2 labs.
Tucker Attention squeezes an order of magnitude more parameter efficiency out of attention layers, while unifying and simplifying Group Query Attention, Multi-Head Latent Attention, and standard Multi-Head Attention.
Forget hand-crafted features: DistilBERT can automatically identify parallelizable loops in code with >99% accuracy, opening the door to more efficient automatic parallelization.
LLMs' skewed matrix shapes need not hamstring systolic array performance: SISA's partitioned architecture achieves up to 8.52x speedup and 93% EDP reduction compared to monolithic arrays.
Forget privacy concerns: you can train high-performing deep learning models for dynamic MRI reconstruction using *synthetic* fractal data.
Chess transformers trained solely on move sequences face a "dual-capability bottleneck" where excelling at both state tracking and decision-making requires carefully balancing data diversity and quality, a tension that simple scaling cannot resolve.
Tucker Attention squeezes an order of magnitude more parameter efficiency out of attention layers, while unifying and simplifying Group Query Attention, Multi-Head Latent Attention, and standard Multi-Head Attention.
Forget hand-crafted features: DistilBERT can automatically identify parallelizable loops in code with >99% accuracy, opening the door to more efficient automatic parallelization.
LLMs' skewed matrix shapes need not hamstring systolic array performance: SISA's partitioned architecture achieves up to 8.52x speedup and 93% EDP reduction compared to monolithic arrays.
Forget privacy concerns: you can train high-performing deep learning models for dynamic MRI reconstruction using *synthetic* fractal data.
Chess transformers trained solely on move sequences face a "dual-capability bottleneck" where excelling at both state tracking and decision-making requires carefully balancing data diversity and quality, a tension that simple scaling cannot resolve.
LLMs spontaneously organize into brain-like functional units where the whole is greater than the sum of its parts, and destroying these synergistic cores cripples reasoning.
Image generation models can now achieve state-of-the-art fidelity with up to 64x fewer tokens, thanks to a novel masking strategy that prevents latent space collapse.
Human brains and neural networks may converge on similar "Platonic" representations for linguistic constructions, suggesting universal principles guide efficient language abstraction.
By mixing flows and using a teacher-student approach, MMAE learns to classify encrypted traffic more accurately than previous masked autoencoders.
By disentangling headers and payloads with a Mixture-of-Experts architecture, TrafficMoE achieves state-of-the-art encrypted traffic classification, proving that heterogeneity-aware modeling is crucial for extracting discriminative features from noisy, encrypted data.
Forget attention: Metriplectic dynamics offer a surprisingly effective and parameter-efficient route to neural computation, outperforming standard architectures in several domains.
Forget tedious poster design – iPoster lets you sketch your vision and then uses a smart diffusion model to instantly generate polished, content-aware layouts that respect your constraints.
Quantum-inspired architectures can significantly improve 3D cloud forecasting by better capturing nonlocal dependencies, outperforming classical methods like ConvLSTM and Transformers.
You can shrink a spacecraft anomaly detection model by 97% and still catch almost all the problems.
Real-time vocal denoising is now possible with deep learning, achieving significant SNR improvements at under 10ms latency.
Grokking isn't just about local circuits or optimization tricks, but a global structural collapse of redundant model manifolds, revealing a deep connection between compression and generalization.
Formalizing speculative execution vulnerabilities with compositional semantics allows for automated detection and verification, moving beyond ad-hoc countermeasures.
LLMs aren't the only path to vulnerability detection: a GNN-based model achieves near-parity with 100x less overhead.
Achieve structured IPC and practical message movement in modular services with CNS, a lightweight hybrid event fabric that bridges in-process and inter-node communication with minimal overhead.
Guaranteeing that erasing "erasable" function arguments provably preserves program behavior opens the door to more efficient and verifiable code optimization.
Single-pixel imaging gets a deep learning boost: SISTA-Net leverages learned sparsity and hybrid CNN-VSSM architectures to achieve state-of-the-art reconstruction quality, even in noisy underwater environments.
Video Transformers can achieve near-full attention accuracy with significantly less compute by focusing only on informative vertical vectors.
Masked motion generators struggle with complex movements because they treat all frames the same – until now.
Diffusion models can beat discriminative classifiers at facial expression recognition, but only with a dynamically adjusted margin loss that accounts for per-sample difficulty.
A training-free feature adjustment pipeline unlocks the power of Visual Geometry Grounded Transformers for stereo vision, achieving state-of-the-art results on KITTI.
Rendering artifacts in feed-forward 3D Gaussian Splatting? Solved: AA-Splat delivers a whopping 7dB PSNR boost by fixing screen-space dilation filters.
Forget blurry averages – DMA unlocks sharp, realistic concept prototypes directly within diffusion models, offering a new lens into model understanding and bias.
Forget expensive training: FlexMem unlocks SOTA long-video MLLM performance on a single GPU by cleverly mimicking human memory recall.
LLMs can maintain conversational stability and improve retrieval accuracy in long-running interactions by adaptively compressing context, leading to reduced token usage and faster inference.
Dialogue agents can now remember what you told them six turns ago with 57% accuracy, thanks to a new memory architecture that selectively forgets less important details.
Unlock rapid UAV design iteration with MetaMorpher's modular, nonlinear flight dynamics model that accurately simulates diverse wing configurations and flight modes.
World models can achieve state-of-the-art video prediction and emergent object decomposition by combining object-centric slots, hierarchical temporal dynamics, and learned causal interaction graphs.
Dataflow networks can achieve significant energy savings without sacrificing throughput by strategically powering down actors during idle periods, a balance efficiently discovered using a novel "Hop and Skip" exploration strategy.
Finally, a gem5-integrated simulator that accurately models CXL memory expansion for LLMs, capturing real-world effects like cache pollution.
Achieve up to 4.17x speedup in DRL training by intelligently partitioning tasks across CPUs, FPGAs, and AI Engines on AMD Versal ACAP, demonstrating the power of hardware-aware algorithm design.
Forget the cold start: training transformers for protein structure prediction peaks at intermediate temperatures, revealing a sweet spot in the loss landscape.
Twisted bilayer graphene enables the creation of parallel and configurable logic gates by exploiting layer-selective hydrogenation and proton transport.
Ditching mel-spectrograms unlocks surprisingly better text-to-speech, as LongCat-AudioDiT proves that waveform latent diffusion can beat the state-of-the-art in zero-shot voice cloning.
By disentangling speakers earlier in the process, SR-CorrNet avoids the information bottleneck that plagues existing speech separation models, leading to improved performance in challenging acoustic environments.
Generative recommendation models can adapt to evolving user behavior without catastrophic forgetting by selectively updating item tokens based on a novel drift-detection mechanism.
Brain-inspired AI gets a boost: a new graph neural network fuses structural and functional brain data to predict cognitive function better than ever before.
Unleashing creative potential in text-to-image models just got easier: on-the-fly repulsion in the contextual space lets you steer diffusion transformers towards richer diversity without sacrificing image quality or blowing your compute budget.
Scanning every token to focus attention is now passé: HISA prunes irrelevant context blocks *before* token-level scoring, slashing compute without sacrificing selection fidelity.
Forget backpropagation through time: recurrent networks already have temporal credit baked into their forward pass.
Forget painstaking hyperparameter tuning: this hypersphere parameterization lets you transfer a single learning rate across model sizes, depths, and even MoE architectures, slashing compute costs by 1.58x.
Forget heuristics – this work gives provable conditions for *when* and *how* auxiliary data actually improve generalization in transfer learning.
Backpropagation-free test-time adaptation can be both accurate and efficient: PACE achieves state-of-the-art accuracy while slashing runtime by over 50%.
Models can dynamically grow their own capacity during continual learning, adding parameters only when and where they're needed, without human intervention.
Narrow ResNets can struggle to represent critical points in input-output mappings, effectively pushing them to infinity and hindering accurate function approximation.
Ditching Markovian constraints unlocks surprisingly better discrete generation, with simplex denoising outperforming diffusion and flow-matching on graphs.
Higher-order neural networks don't need hypergraphs: SHONNs unlock their power for general-purpose feedforward architectures by sidestepping stability and scaling issues.
The shift from traditional simulation to deep learning for network performance modeling brings new opportunities, but also requires careful consideration of evaluation methodologies to ensure fair comparison.
Spectral analysis of graph neighborhoods reveals a surprisingly effective and efficient way to boost anomaly detection, consistently outperforming existing GNN-based methods.
Forget smooth sailing: FI-KAN's fractal bases let neural networks conquer non-smooth functions and PDEs with up to 79x better accuracy.
Multi-resolution decomposition and diffusion models can boost time series forecasting accuracy by up to 10% over existing methods.
KAN-PCA beats classical PCA in capturing variance in asset returns by learning nonlinear relationships, especially when markets get weird.
Row/column normalization *before* orthogonalization can significantly boost convergence and reduce validation perplexity in LLaMA2 pretraining, outperforming the base Muon optimizer.
Transformers can now predict with an explicit internal structure of uncertainty, enabling stronger probabilistic evaluation and a more informative analysis of model behavior.
Transformers can now dynamically adapt expert weighting in online learning, achieving state-of-the-art dynamic regret in non-stationary environments.
Finally, a framework that unifies dynamic graph models, topological learning, and multimodal fusion to decompose health risk into interpretable components.
Steer Stable Diffusion's attention like an equalizer, sculpting image details without retraining by simply tweaking the frequency spectrum of cross-attention.
A lightweight transformer can forecast optical amplifier failures in real-time, paving the way for self-healing networks.
Reconstructing high-resolution turbulence from extremely coarse data is now possible with SIMR-NO, which not only beats existing methods in accuracy but also respects the underlying physics.
Diffusion Maps alone fail to directly recover low-dimensional charts, requiring combination of multiple modes, challenging their common perception as a drop-in dimensionality reduction technique.
FedDES achieves instance-level personalization in federated learning by dynamically selecting and weighting peer models with a GNN, leading to significant performance gains in heterogeneous environments.
Correlated diffusion, enabled by probabilistic computers, surpasses independent diffusion in generative modeling by exploiting structured probabilistic sampling.
Runaway compute costs for diffusion models on GPUs? EdgeDiT slashes parameters by 30% and latency by 40% while maintaining image quality, all on your phone.
LLMs and Stable Diffusion aren't just cool tools; they're the twin pillars of a new era where AI agents can conduct "deep research" rivaling top human scientists.
Diffusion models can now predict driver attention with state-of-the-art accuracy by incorporating LLM-enhanced semantic reasoning.
Latent planning for reasoning can actually *hurt* performance due to decoder distribution shift, highlighting a critical challenge in bridging neural and symbolic reasoning.
Achieve state-of-the-art segmentation accuracy on drivable-area and lane segmentation tasks with a model under 5M parameters, demonstrating that high performance doesn't always require massive architectures.
Quantum circuits can match classical MLPs on EEG classification tasks while using 50x fewer parameters, thanks to differentiable quantum architecture search that automatically optimizes circuit topology.
RecycleLoRA reveals that strategically targeting minor subspace directions in VFMs with LoRA adapters can unlock surprisingly robust domain generalization in semantic segmentation.
Autonomous architecture search for molecular transformers is surprisingly fruitless: you're better off just tuning learning rates.
Forget pruning or quantization: MPO decomposition lets you compress a transformer by 13x while retaining 97% accuracy.
LVLM inference is ripe for optimization, but current acceleration techniques only scratch the surface.
Syntactic NMT decoders don't have to be bottom-up: a top-down tree generation strategy can drastically improve translation of long, rare sequences.
Quantum-proofing your 5G core doesn't have to break the bank: a sidecar proxy can add post-quantum cryptography with a predictable 50ms latency hit.
A task-specific, lightweight transformer can outperform state-of-the-art reasoning LLMs and commercial tools in C code vulnerability detection, at a fraction of the inference cost.
Global context and confidence-guided refinement can unlock state-of-the-art optical flow estimation, even in challenging scenarios with large displacements and occlusions.
Forget retraining the whole model when adding a new image degradation type – this modular routing approach lets you plug in a new expert with minimal overhead.
SSMs struggle to segment thin structures because they propagate information across, not along, the target, but this frequency-aware approach realigns serialization to trace the geometry.
Achieve significantly better structure preservation in text-guided image editing by injecting structure-related features into visual autoregressive models, guided by reinforcement learning.
Simple fine-tuning with a parallel decoder and smart learning rate schedule lets you beat more complex meta-learning approaches in cross-domain few-shot object detection.
Unlock 5x faster autoregressive image generation by using a single entropy signal to simultaneously optimize draft prediction and enable single-step diffusion decoding.
Lightweight DisCNNs offer a surprisingly efficient route to object detection by exploiting monotonic relationships between network outputs and feature presence.
A single model can now achieve state-of-the-art semantic segmentation across diverse sensor modalities like thermal, depth, and polarization, eliminating the need for modality-specific architectures.
Neurosurgeons gain a compact, sterilizable RCM joint with near-isotropic stiffness, minimizing unwanted motion during delicate procedures.
RDMA failover can be made significantly more efficient and correct by selectively retransmitting only the requests that were actually lost during a link failure, avoiding redundant retransmissions and semantic violations.
Achieve FP16-level LLM accuracy at 3-bit quantization, unlocking 1.5x faster inference than 4-bit methods on consumer GPUs.
Squeezing loop control down to <10% of array resources unlocks near-zero-overhead parallel loop acceleration on Tightly Coupled Processor Arrays.
Forget CPUs and GPUs: MCPT-Solver uses spintronics and Bayesian inference to create a hardware random number generator that dramatically accelerates Monte Carlo particle transport simulations.
Achieve 40% better visual fidelity in multimodal face generation by deeply fusing text and spatial priors within a unified diffusion transformer.
Efficiency is the key bottleneck preventing video generation models from becoming general-purpose world simulators, and this paper provides a taxonomy of techniques to overcome it.
DINOv3's self-supervised features are surprisingly good at zero-shot in-context segmentation, beating fine-tuned models with a fraction of the parameters.
Ditch the coordinate system: VLMs can point *way* better by directly selecting visual tokens, leading to SOTA results and improved sample efficiency.
LLMs can achieve human-like efficiency in long-term interactions by structuring memory around emotional valence, prioritizing automatic retrieval, and actively encoding information based on curiosity and feedback.
Simple factorization beats BERT at generalizing to unseen combinations of intents, but only if you evaluate it the right way.
Forget slow rotations: IsoQuant's quaternion-based approach warps RotorQuant in LLM KV cache compression, delivering up to 6x speedups on synthetic data.
By projecting text into a complex semantic space, this model achieves SOTA on aspect-based sentiment analysis by disentangling sentiment polarity from semantic intensity.