Search papers, labs, and topics across Lattice.
Novel neural network architectures including transformer variants, state space models, mixture of experts, and attention mechanisms.
#5 of 24
2
Tucker Attention squeezes an order of magnitude more parameter efficiency out of attention layers, while unifying and simplifying Group Query Attention, Multi-Head Latent Attention, and standard Multi-Head Attention.
Forget hand-crafted features: DistilBERT can automatically identify parallelizable loops in code with >99% accuracy, opening the door to more efficient automatic parallelization.
LLMs' skewed matrix shapes need not hamstring systolic array performance: SISA's partitioned architecture achieves up to 8.52x speedup and 93% EDP reduction compared to monolithic arrays.
Forget privacy concerns: you can train high-performing deep learning models for dynamic MRI reconstruction using *synthetic* fractal data.
Chess transformers trained solely on move sequences face a "dual-capability bottleneck" where excelling at both state tracking and decision-making requires carefully balancing data diversity and quality, a tension that simple scaling cannot resolve.
LLMs spontaneously organize into brain-like functional units where the whole is greater than the sum of its parts, and destroying these synergistic cores cripples reasoning.
Image generation models can now achieve state-of-the-art fidelity with up to 64x fewer tokens, thanks to a novel masking strategy that prevents latent space collapse.
Human brains and neural networks may converge on similar "Platonic" representations for linguistic constructions, suggesting universal principles guide efficient language abstraction.
By mixing flows and using a teacher-student approach, MMAE learns to classify encrypted traffic more accurately than previous masked autoencoders.
By disentangling headers and payloads with a Mixture-of-Experts architecture, TrafficMoE achieves state-of-the-art encrypted traffic classification, proving that heterogeneity-aware modeling is crucial for extracting discriminative features from noisy, encrypted data.
Forget attention: Metriplectic dynamics offer a surprisingly effective and parameter-efficient route to neural computation, outperforming standard architectures in several domains.
Forget tedious poster design – iPoster lets you sketch your vision and then uses a smart diffusion model to instantly generate polished, content-aware layouts that respect your constraints.
Quantum-inspired architectures can significantly improve 3D cloud forecasting by better capturing nonlocal dependencies, outperforming classical methods like ConvLSTM and Transformers.
You can shrink a spacecraft anomaly detection model by 97% and still catch almost all the problems.
Real-time vocal denoising is now possible with deep learning, achieving significant SNR improvements at under 10ms latency.
Grokking isn't just about local circuits or optimization tricks, but a global structural collapse of redundant model manifolds, revealing a deep connection between compression and generalization.
Formalizing speculative execution vulnerabilities with compositional semantics allows for automated detection and verification, moving beyond ad-hoc countermeasures.
LLMs aren't the only path to vulnerability detection: a GNN-based model achieves near-parity with 100x less overhead.
Achieve structured IPC and practical message movement in modular services with CNS, a lightweight hybrid event fabric that bridges in-process and inter-node communication with minimal overhead.
Guaranteeing that erasing "erasable" function arguments provably preserves program behavior opens the door to more efficient and verifiable code optimization.
Single-pixel imaging gets a deep learning boost: SISTA-Net leverages learned sparsity and hybrid CNN-VSSM architectures to achieve state-of-the-art reconstruction quality, even in noisy underwater environments.
Video Transformers can achieve near-full attention accuracy with significantly less compute by focusing only on informative vertical vectors.
Masked motion generators struggle with complex movements because they treat all frames the same – until now.
Diffusion models can beat discriminative classifiers at facial expression recognition, but only with a dynamically adjusted margin loss that accounts for per-sample difficulty.
A training-free feature adjustment pipeline unlocks the power of Visual Geometry Grounded Transformers for stereo vision, achieving state-of-the-art results on KITTI.
Rendering artifacts in feed-forward 3D Gaussian Splatting? Solved: AA-Splat delivers a whopping 7dB PSNR boost by fixing screen-space dilation filters.
Forget blurry averages – DMA unlocks sharp, realistic concept prototypes directly within diffusion models, offering a new lens into model understanding and bias.
Forget expensive training: FlexMem unlocks SOTA long-video MLLM performance on a single GPU by cleverly mimicking human memory recall.
LLMs can maintain conversational stability and improve retrieval accuracy in long-running interactions by adaptively compressing context, leading to reduced token usage and faster inference.
Dialogue agents can now remember what you told them six turns ago with 57% accuracy, thanks to a new memory architecture that selectively forgets less important details.
Unlock rapid UAV design iteration with MetaMorpher's modular, nonlinear flight dynamics model that accurately simulates diverse wing configurations and flight modes.
World models can achieve state-of-the-art video prediction and emergent object decomposition by combining object-centric slots, hierarchical temporal dynamics, and learned causal interaction graphs.
Dataflow networks can achieve significant energy savings without sacrificing throughput by strategically powering down actors during idle periods, a balance efficiently discovered using a novel "Hop and Skip" exploration strategy.
Finally, a gem5-integrated simulator that accurately models CXL memory expansion for LLMs, capturing real-world effects like cache pollution.
Achieve up to 4.17x speedup in DRL training by intelligently partitioning tasks across CPUs, FPGAs, and AI Engines on AMD Versal ACAP, demonstrating the power of hardware-aware algorithm design.
Forget the cold start: training transformers for protein structure prediction peaks at intermediate temperatures, revealing a sweet spot in the loss landscape.
Twisted bilayer graphene enables the creation of parallel and configurable logic gates by exploiting layer-selective hydrogenation and proton transport.
Ditching mel-spectrograms unlocks surprisingly better text-to-speech, as LongCat-AudioDiT proves that waveform latent diffusion can beat the state-of-the-art in zero-shot voice cloning.
By disentangling speakers earlier in the process, SR-CorrNet avoids the information bottleneck that plagues existing speech separation models, leading to improved performance in challenging acoustic environments.
Generative recommendation models can adapt to evolving user behavior without catastrophic forgetting by selectively updating item tokens based on a novel drift-detection mechanism.
Brain-inspired AI gets a boost: a new graph neural network fuses structural and functional brain data to predict cognitive function better than ever before.
Unleashing creative potential in text-to-image models just got easier: on-the-fly repulsion in the contextual space lets you steer diffusion transformers towards richer diversity without sacrificing image quality or blowing your compute budget.
Scanning every token to focus attention is now passé: HISA prunes irrelevant context blocks *before* token-level scoring, slashing compute without sacrificing selection fidelity.
Forget backpropagation through time: recurrent networks already have temporal credit baked into their forward pass.
Forget painstaking hyperparameter tuning: this hypersphere parameterization lets you transfer a single learning rate across model sizes, depths, and even MoE architectures, slashing compute costs by 1.58x.
Forget heuristics – this work gives provable conditions for *when* and *how* auxiliary data actually improve generalization in transfer learning.
Backpropagation-free test-time adaptation can be both accurate and efficient: PACE achieves state-of-the-art accuracy while slashing runtime by over 50%.
Models can dynamically grow their own capacity during continual learning, adding parameters only when and where they're needed, without human intervention.
Narrow ResNets can struggle to represent critical points in input-output mappings, effectively pushing them to infinity and hindering accurate function approximation.
Ditching Markovian constraints unlocks surprisingly better discrete generation, with simplex denoising outperforming diffusion and flow-matching on graphs.