Search papers, labs, and topics across Lattice.
100 papers published across 4 labs.
Requirements volatility doesn't just delay projects; it directly undermines software architecture, leading to technical debt and scheduling nightmares.
Ditch the feature extraction pipeline: GenMask directly generates segmentation masks with a diffusion transformer, achieving SOTA results by harmonizing mask and image generation in a single model.
Cost volumes might be overkill: WAFT-Stereo proves you can ditch them for a warping-based approach and still dominate stereo matching benchmarks with significantly improved efficiency.
A compact masked diffusion model can rival multi-billion parameter models in a morphologically rich language like Turkish, challenging the assumption that bigger is always better.
Representation-Pivoted Autoencoders enable diffusion models to generate and edit images with higher fidelity by learning a compressed latent space that preserves the semantics of pre-trained visual representations.
Ditch the feature extraction pipeline: GenMask directly generates segmentation masks with a diffusion transformer, achieving SOTA results by harmonizing mask and image generation in a single model.
Cost volumes might be overkill: WAFT-Stereo proves you can ditch them for a warping-based approach and still dominate stereo matching benchmarks with significantly improved efficiency.
A compact masked diffusion model can rival multi-billion parameter models in a morphologically rich language like Turkish, challenging the assumption that bigger is always better.
Representation-Pivoted Autoencoders enable diffusion models to generate and edit images with higher fidelity by learning a compressed latent space that preserves the semantics of pre-trained visual representations.
Achieve state-of-the-art time series forecasting accuracy with significantly reduced memory usage and faster inference by using a sparse attention mechanism that fuses multi-modal embeddings.
Explicitly reconstructing 3D scenes with Gaussian Splatting unlocks state-of-the-art BEV perception, proving that geometric understanding is key to accurate spatial reasoning.
Fine-tuning a visual geometry transformer with SEAR unlocks surprisingly accurate RGB-Thermal 3D reconstruction, even surpassing SOTA methods despite training on significantly less multimodal data.
Unlock 4-15% faster Gaussian Splatting without retraining your existing datasets by swapping in a polynomial kernel.
Orthogonal constraints can rescue sparse embeddings in recommender systems from representation collapse, unlocking significant performance gains in large-scale industrial deployments.
LLMs can maintain reasoning boundaries with >99% reliability under adversarial attacks when equipped with explicit process-control layers, a massive improvement over standard RLHF.
By enforcing graph isomorphism across counterfactual inputs, UGID reveals that debiasing LLMs can be achieved by directly manipulating internal representations and attention mechanisms.
By mimicking the brain's "global workspace," MANAR achieves linear-time attention without sacrificing performance, offering a drop-in replacement for standard attention that's both faster and potentially more creative.
Cross-lingual alignment can actually *hurt* transfer learning performance because aligning embeddings doesn't necessarily help with the downstream task.
CNNs still reign supreme in Burmese handwritten digit recognition, but physics-inspired PETNNs are hot on their heels, outperforming Transformers and KANs.
Naive fine-tuning leads to catastrophic forgetting, but combining replay-based and parameter isolation strategies can actually *improve* performance over joint training in continual learning for intent classification.
Achieve state-of-the-art single image reflection removal by explicitly guiding a diffusion model with spatial intensity and high-frequency priors derived directly from the input image.
Diffusion language models can achieve up to 26x inference speedups with almost no accuracy loss, thanks to a clever entropy-based KV caching strategy that avoids costly full forward passes.
LLMs can maintain generation quality in long-context scenarios while using significantly less context, simply by adaptively allocating context based on uncertainty.
Forget brittle, hand-coded robot assembly routines: ATG-MoE learns complex, multi-skill manipulation directly from visual and language inputs, achieving impressive success rates in both simulation and real-world industrial tasks.
Ditch manual huge page configuration: TurboMem's lock-free design and transparent huge page auto-merging can boost packet throughput by up to 28% in DPDK.
Current methods to protect satellites from radiation drain batteries and interrupt service, but a new routing protocol can minimize both.
Forget massive SRAMs: this work shows that clever data streaming and compute/transfer overlap can yield 22x speedups for transformer inference, even with standard PCIe interconnects.
Refining generative models with discriminator guidance provably improves generalization, offering a theoretical justification for techniques like score-based diffusion.
Unlock faster diffusion model analysis: Neural Galerkin Normalizing Flows offer a cost-effective surrogate for transition probability density functions, outperforming direct PDE solving.
Get continuous level-of-detail rendering in 3D Gaussian Splatting without sacrificing top-end quality – no architectural changes needed.
LLMs can now write the code to solve your combinatorial optimization problems, thanks to a new GPU-accelerated framework accessible through a pure-Python API.
Unlock the power of interpretable AI: SINDy-KANs distills complex neural networks into sparse equations, revealing the underlying dynamics of systems.
By jointly optimizing onboard computing and data routing, iSatCR slashes data transmission needs in LEO satellite networks, outperforming traditional routing-only approaches, especially under heavy load.
Foundation models for EEG can now be 377x more efficient and handle 12x longer sequences, thanks to a novel Mamba-based architecture that also cracks the code for handling variable electrode setups.
Confidential databases can be 78x faster by ditching crypto in the query path.
Unlocking new high-probability differentials in SIMON32 cracks open avenues for more efficient cryptanalysis, pushing past current state-of-the-art round limits.
LLMs can automatically discover novel, practical green AI tactics directly from code repositories, revealing hidden strategies for sustainable ML.
Autoregressive generative classifiers can beat diffusion models at image classification, but only if you marginalize over token order.
Compact ViTs can now rival or surpass CNN-based architectures like YOLO for edge-based object detection, instance segmentation, and pose estimation, thanks to task-specialized distillation.
Ditch the training: SVOO achieves up to 1.93x speedup in video generation with sparse attention by exploiting the intrinsic, layer-specific sparsity patterns of attention without any fine-tuning.
CNNs still reign supreme for medical image segmentation on heterogeneous datasets, beating out hybrid transformer models despite the latter's theoretical advantages.
Automating motor insurance from vehicle damage analysis to claims evaluation is now possible with a vertically integrated AI paradigm.
Tapered backbones in 3D-printed continuum robots unlock enhanced compliance and manipulability, all while slashing costs and assembly time.
State Space Models can outperform Vision Transformers as vision encoders in VLMs, particularly when model size is a constraint.
Forget uniform weighting: the Exponentially Weighted Signature lets you inject temporal context and richer memory dynamics into path representations.
Edge devices can now run MoEs in real-time thanks to a dynamic quantization scheme that prioritizes important experts and critical layers.
Discrete diffusion models can now generate more diverse text without sacrificing quality, thanks to a new decoding method that explicitly optimizes for diversity during beam search.
Achieve fast and effective generalized symmetric matrix factorization by exploiting exact penalty and relaxation properties, enabling efficient solutions for a broad class of problems.
Random projections in continual learning don't have to be random: carefully guiding them with target-aligned data beats the SOTA.
Discovering hierarchical structure in sequential data is now tractable, thanks to a new model that learns online without supervision.
Forget static models: this adaptive framework slashes stock price prediction error by dynamically routing data through specialized pathways based on real-time market regime detection.
Spectral GNNs' purported spectral advantages for node classification are illusory; their performance actually hinges on their underlying MPNN structure, debunking the "graph Fourier transform" narrative.
MRI reconstruction can be made dramatically more robust to clinical domain shifts by eliminating the need for explicit coil sensitivity map estimation.
EWC, a classic method for continual learning, has been underperforming because it suffers from gradient vanishing and protects the wrong parameters – but a simple "Logits Reversal" trick fixes both.
Transformers can nail in-context learning for regression even when the data is a mess of non-Gaussian noise, heavy tails, and non-i.i.d. distributions, outperforming classical estimators.
Gradient misalignment across devices in parallel split learning can be tamed with a novel gradient alignment strategy, leading to faster convergence and higher accuracy in heterogeneous federated learning.
LVLMs can gain a surprising amount of spatial reasoning ability by explicitly generating segmentation and depth tokens before answering questions.
Supervised learning models can reliably outperform widely-used commercial AI text detectors, even across different languages and specialized domains like mental health.
Robots can learn faster and generalize better by encoding dynamics directly into their neural network architecture, outperforming standard transformers and GNNs.
Ditch the power-hungry actuators: this passive elastic-folding mechanism lets you stack and airdrop sensors that reliably self-deploy into 3D structures.
Ditch the mask decoder: a single segmentation token can unlock competitive image segmentation directly from MLLMs.
Diffusion models can generate segmentations that rival discriminative methods, but only if you reshape their vector fields with a distance-aware correction term that combats gradient vanishing.
Representing complex 3D biomedical graphs as learned tokens unlocks generative modeling and efficient analysis of anatomical structures.
End-to-end quantum image generation is now possible, even with limited qubits, thanks to a new method that bridges the gap between quantum circuits and pixel intensities.
Finally, a neural interatomic potential that accurately models long-range electrostatic interactions without sacrificing SO(3) equivariance or energy-force consistency.
By combining CNNs and State Space Models, DA-Mamba achieves efficient global-local feature alignment for domain adaptive object detection, outperforming prior CNN-only and Transformer-based approaches.
High-dimensional discrete tokens, previously out of reach for generative models, can now be directly generated, unlocking a unified token prediction paradigm for multimodal architectures.
Text-to-3D generation gets a semantic upgrade: DreamPartGen creates 3D objects with parts that not only look right but also understand their relationships and align with textual descriptions.
Ditch slow, unstable AR estimation: neural nets offer a 12x speed boost and better convergence, without sacrificing interpretability.
Schrödinger Bridges elegantly unify diffusion models, score-based models, and flow matching under a single, powerful framework.
Spatial awareness is the secret ingredient to unlocking better visual in-context learning, boosting performance across diverse vision tasks.
The chaos of MTSAD research gets a little tamer with a new taxonomy that exposes the field's hidden convergence on Transformers and reconstruction, hinting at where the next breakthroughs will come from.
Predict thermal warpage in chiplet designs 200x faster than FEM simulations using a physics-aware graph neural network that learns directly from floorplans.
By recasting attention as a cooperative game and a statistical physics system, NeuroGame Transformer captures higher-order token dependencies, outperforming standard pairwise attention mechanisms.
Injecting "historical attention" into vision transformers boosts accuracy by over 1% with minimal architectural changes, suggesting that current ViTs underutilize information learned in earlier layers.
AdaMuS overcomes the bias towards high-dimensional data in multi-view learning by adaptively pruning redundant parameters and sparsely fusing views, leading to improved performance on dimensionally unbalanced data.
LLMs aren't just better tools; they're forcing us to rethink the very nature of information, knowledge, and meaning in system design.
The field of video understanding is rapidly shifting from isolated pipelines to unified models capable of adapting to diverse downstream tasks, demanding a re-evaluation of current approaches.
Achieve controllable and scalable speech generation with MOSS-TTS, enabling zero-shot voice cloning and long-form synthesis.
Forget finetuning – Kumiho's graph-native memory lets you swap in a better LLM and instantly double your agent's reasoning accuracy on complex cognitive tasks.
Video diffusion transformers exhibit a hidden "magnitude hierarchy" in their activations that can be exploited for training-free quality improvements via a simple steering method.
Forget geometric LODs: tokenizing 3D shapes by semantic salience unlocks SOTA reconstruction and efficient autoregressive generation with 10x-1000x fewer tokens.
Forget scaling laws: dropout robustness in transformers is a lottery, with smaller models sometimes showing perfect stability while larger models crumble under stochastic inference.
Generate consistent stereo videos directly from RGB data, bypassing depth estimation and monocular-to-stereo conversion, with StereoWorld's novel camera-aware attention mechanisms.
Unlock faster, more accurate interlinear glossing for low-resource languages by treating morphemes as atomic units, outperforming existing methods and enabling user-guided lexicon expansion without retraining.
Generate realistic, atom-level molecular dynamics trajectories orders of magnitude faster with a novel State Space Model that captures long-range dependencies in biomolecular systems.
Ditch costly PIDE integration: RHYME-XT learns the flow map directly, offering a continuous-time, discretization-invariant representation that beats state-of-the-art neural operators.
LLMs can get a massive multilingual boost, especially in low-resource languages, by offloading translation to specialized models and carefully aligning their representations.
Attention sinks aren't just a forward-pass phenomenon; they actively warp the training landscape by creating "gradient sinks" that drive massive activations.
Achieve single-pass alignment of multi-talker speech – a feat previously impossible – by modeling overlaps as shuffles.
Achieve near-optimal waveform optimization with 98.8% spectral efficiency using a 5-layer, AutoML-tuned unrolled proximal gradient descent network trained on just 100 samples.
Software architecture, a critical but underspecified domain, finally gets a unified benchmarking platform with ArchBench, enabling standardized evaluation of LLMs on complex system design tasks.
Injecting "beneficial noise" into cross-attention mechanisms can significantly improve unsupervised domain adaptation by forcing models to focus on content rather than style distractions.
Ruyi2.5 achieves comparable performance to Qwen3-VL on general multimodal benchmarks while significantly outperforming it in privacy-constrained surveillance, demonstrating the effectiveness of its edge-cloud architecture.
Requirements volatility doesn't just delay projects; it directly undermines software architecture, leading to technical debt and scheduling nightmares.
Synthesizing realistic 6-DOF object manipulation trajectories in complex 3D environments just got a whole lot better with GMT, a multimodal transformer that substantially outperforms existing methods.
By disentangling semantic and contextual cues in vision-language models, PCA-Seg achieves state-of-the-art open-vocabulary segmentation with only 0.35M additional parameters per block.
Achieve up to 2.4x speedup over OpenBLAS on RISC-V by using MLIR and xDSL to generate optimized RVV code, finally unlocking the potential of RISC-V vector extensions.
Training video diffusion models with pixel-wise losses just got a whole lot cheaper: ChopGrad reduces memory complexity from linear to constant with video length.
Graph transformers avoid oversmoothing in deep layers by structurally preserving community information, a theoretical advantage over GCNs revealed through Gaussian process limits.
Cycle consistency training unlocks stable and accurate inverse kinematics for wearable soft robots, even with their inherent nonlinearities and hysteresis.
Convolutional Neural Operators (CNOs) surprisingly excel at capturing translated dynamics in the FitzHugh-Nagumo model, despite other architectures achieving lower training error or faster inference.
Forget prompt engineering: this new region proposal network spots objects across diverse datasets without *any* text or image prompts.
Infinite neural nets can be sparse, and this paper proves it, showing that total variation regularization provably yields sparse solutions in infinite-width shallow ReLU networks, with sparsity bounds tied to the geometry of the data.
Ditch the feature engineering: Baguan-TS lets you use raw time series sequences directly for in-context forecasting, outperforming traditional methods.