Search papers, labs, and topics across Lattice.
100 papers published across 6 labs.
Forget reinforcement learning; this algorithm learns in real-time without any feedback at all.
Unlock face recognition with just one labeled example and a flood of unlabeled data, achieving state-of-the-art accuracy in a practical authentication scenario.
Image editing gets a reasoning upgrade: a chain-of-thought verifier model beats powerful VLMs at judging edits and boosts editing model performance.
Decision trees and diffusion models are secretly doing the same thing: optimizing a shared objective called Global Trajectory Score Matching.
LVLMs are better at spotting their own mistakes than generating correct answers in the first place, and this self-awareness can be exploited to reduce hallucinations.
Decision trees and diffusion models are secretly doing the same thing: optimizing a shared objective called Global Trajectory Score Matching.
LVLMs are better at spotting their own mistakes than generating correct answers in the first place, and this self-awareness can be exploited to reduce hallucinations.
Jointly training the tokenizer and autoregressive model slashes ImageNet FID to 1.48, finally making end-to-end autoregressive image generation competitive.
Instead of training separate video diffusion models for each multimodal task, UniVidX learns a single model that handles diverse pixel-aligned video generation problems.
Forget grid layouts: Map2World lets you generate consistent 3D worlds from arbitrary segment maps, offering unprecedented control and scalability.
Ditch the complex multimodal pre-training pipelines: GenLIP proves a simple language modeling objective can effectively align vision encoders with LLMs, achieving strong performance with less data.
LVLMs can maintain sharper visual focus during long-form generation by adding a lightweight, learnable memory module that bypasses attention dilution.
Quantum autoencoders can purify adversarial examples, boosting the robustness of quantum classifiers by up to 68% without adversarial training.
Stop blurring the details: structure-aware Gaussian Splatting densification uses frequency analysis to resolve high-frequency textures faster and with higher quality.
Forget reinforcement learning; this algorithm learns in real-time without any feedback at all.
Simply detecting distribution shifts in visual MBRL is easy; the real challenge is applying the right action-level corrections, which this paper tackles with a novel local expert growth strategy.
Architectural diversity offers surprisingly little defense against adversarial attacks on VLMs for autonomous driving, with physical patches transferring effectively across different models.
Segmenting tiny brain arteries just got a whole lot better: a new loss function boosts Dice scores by up to 10% on these critical but challenging structures.
Forget tedious, brittle automation scripts: RL-powered GUI agents are showing signs of "System 2" reasoning without explicit supervision, hinting at a future of truly intelligent digital inhabitants.
Unsupervised knowledge injection via fuzzy logic lets image classifiers reason about concepts they were never explicitly trained on, boosting accuracy and generalization.
Even the most advanced vision-language models struggle to accurately identify anatomical structures in medical images, raising serious concerns about their reliability in clinical settings.
Ignoring language-specific structure in scene-text captioning is a recipe for disaster in tonal languages like Vietnamese, but a new graph framework leveraging phonological attention can help.
Even GPT-5.1 struggles to distinguish AI-generated academic images from real ones, achieving only 48.8% accuracy, revealing a significant gap between generative and forensic AI capabilities.
The hidden cost of rapidly iterating on AI-enabled perception systems? A growing "Requirements Debt" that threatens auditability, reliability, and certification readiness.
A 48-camera system finally unlocks real-time, room-scale multi-human, multi-robot interaction research in realistic home environments.
By unifying specialized detectors with MLLMs in an agentic framework, Echo-{\alpha} achieves state-of-the-art ultrasound interpretation, suggesting a path to more accurate, interpretable, and transferable medical AI.
Ditch the static image: this method generates realistic talking avatars by learning from *videos* of the subject in completely different scenes.
CNNs are surprisingly fragile to even single-pixel shifts, but strategically placed global average pooling can fix this with a 98% parameter reduction and no accuracy loss.
A lightweight CNN can achieve 97% accuracy in classifying mango leaf diseases, offering a practical solution for early disease detection in agriculture.
Today's best vision-language models are surprisingly bad at reading scientific figures, failing to match expert-level reasoning on a new benchmark of experimental images.
A new test split for DeepSpaceYoloDataset helps push the boundaries of automated astronomical object detection by providing a more diverse and challenging evaluation benchmark.
By explicitly aligning image features with the hierarchical structure of radiology reports, RIHA generates more clinically accurate and coherent reports than models that treat reports as flat sequences.
CNN classifiers don't just select from cleaned features, they actively cancel out shared background information via destructive interference, rewriting our understanding of how these networks actually "see".
Forget task-specific architectures: Uni-HOI uses a unified framework with LLMs to jointly model text, human motion, and object motion, enabling strong performance across diverse HOI tasks.
EdgeFM delivers production-grade VLM/LLM inference performance on edge devices, outperforming vendor-specific toolchains by up to 49% while remaining open-source and cross-platform.
Stop those blurry edges: Softmax-GS uses learnable competition between Gaussians to sharpen 3D Gaussian Splatting, achieving state-of-the-art performance in novel view synthesis.
Achieve high-fidelity 3D rendering from sparse, unconstrained real-world images by intelligently synthesizing novel views with diffusion models and Gaussian replication.
Real-time glottis segmentation during Nasotracheal Intubation just got a whole lot faster and more accurate, thanks to a new network that's both lightweight and scale-robust.
Achieve faster, more accurate hyperspectral image classification by decoupling pixel clustering from classification, yielding region-level consistency and boundary alignment.
Forget fully connected relation graphs: CasLayout's sparse relation modeling unlocks enhanced controllability and realism in 3D indoor scene synthesis.
Achieve state-of-the-art gait recognition by dynamically fusing body shape and motion features, even when people are wearing coats.
Simple, artist-friendly quad meshes can now be automatically generated on 3D shapes using a diffusion model trained on a continuous surface representation, sidestepping the complexity of discrete mesh optimization.
Ditching PCA for spectral reduction can yield state-of-the-art performance in multisource remote sensing image classification while slashing computational costs.
Achieve up to 2.5X faster video object removal by focusing DiT computations only on the essential tokens dictated by the mask.
A single self-supervised model trained on millions of unlabeled brain MRI slices can generalize across diverse neuroimaging tasks, rivaling or exceeding specialized models, even with limited labeled data.
Volumetric videoconferencing doesn't have to freeze and stutter: ReVo recovers up to 32% of lost RGB data and slashes video freezes by 95% using a cross-layer approach.
Unlock a baby's-eye view: Reconstruct and replay infant movements on robots to simulate their sensory experiences, offering unprecedented insights into early development.
Current image forensics fall flat when faced with the subtle manipulations now possible in 3D Gaussian Splatting scenes, highlighting a critical gap in content authenticity assessment.
Teaching VLMs to "look back" and "look ahead" with lightweight spatial reasoning tasks unlocks surprisingly strong navigation performance.
Simple frequency masking and gated injection can dramatically improve the generalization of AI-generated image detectors, even against unseen generative models.
Ditch the costly sampling: Noise2Map turns diffusion models into fast, end-to-end semantic segmentation and change detection machines by directly predicting maps from noise.
Existing synthetic image detectors fail to generalize because they're trained on biased data, but HiMix overcomes this with artifact-aware representations and mixup augmentation, achieving state-of-the-art generalization to unseen generators.
Current DeepFake detectors can be fooled by semantically inconsistent real audio and video, highlighting a critical blind spot in their ability to assess realistic manipulations.
Unlock bandwidth-adaptive point cloud transmission with TAFA-GSGC, a single-model codec that delivers up to 9 quality levels from a single bitstream.
Expert-level video aesthetics can be captured and improved using a hierarchical rubric and reward models trained with a progressive learning scheme.
By explicitly modeling uncertainty in hypergraph refinement, UHR-Net achieves more accurate segmentation of challenging lesions in medical images.
Achieve clinically relevant accuracy in dynamic bronchoscopy without breath-hold protocols by modeling patient-specific respiratory deformation within a Gaussian splatting framework.
Forget per-scene optimization: GenWildSplat achieves state-of-the-art 3D reconstruction from sparse, unposed images in real-time using a purely feed-forward approach.
Discovering reusable, semantic "Action Motifs" from human movement data unlocks significant gains in action recognition, motion prediction, and interpolation.
Achieve state-of-the-art open-vocabulary occupancy prediction without any training data, outperforming supervised and self-supervised methods by a large margin.
Control over physical properties like friction and restitution in generated videos is now possible, paving the way for more realistic and controllable video synthesis.
Fréchet Distance, previously deemed impractical for training, unlocks surprisingly high-fidelity image generation when optimized in representation space with decoupled batch sizes.
Today's visual generation models are often evaluated on the wrong things, leading to inflated performance claims that mask critical failures in spatial reasoning, temporal consistency, and causal understanding.
Reconstructing real-world scenes in Minecraft unlocks a customizable embodied AI playground, but only if we can solve the occupancy prediction bottleneck – and this new dataset shows we're not there yet.
Forget painstakingly programming robot interactions – ExoActor uses video generation to hallucinate plausible behaviors, then translates them into robot actions.
Ditch the clunky inverse kinematics: MoCapAnything V2 learns to predict character rotations directly from video, slashing error rates and boosting speed by 20x.
Diffusion models struggle with multi-object generation not because of imbalanced concept representation, but primarily due to scene complexity and a surprising difficulty in counting, especially when training data is limited.
HERMES++ achieves state-of-the-art performance in both future point cloud prediction and 3D scene understanding by unifying these tasks within a single driving world model.
By jointly embedding spatial biology, histology, and clinical data, Haiku lets you ask "what if" questions about disease progression, revealing molecular shifts linked to clinical outcomes.
Robots can now better anticipate your actions thanks to a new method that understands the "sub-actions" within your movements.
Unlock face recognition with just one labeled example and a flood of unlabeled data, achieving state-of-the-art accuracy in a practical authentication scenario.
Hierarchical scene graph matching, learned end-to-end, unlocks fast and accurate robot localization by grounding real-time sensor data against prior architectural maps.
Automated vehicles can achieve fail-operational capabilities by using a hierarchical monitoring framework that combines functional consistency checks with anomaly detection to handle system failures and unfamiliar scenarios.
Image editing gets a reasoning upgrade: a chain-of-thought verifier model beats powerful VLMs at judging edits and boosts editing model performance.
Hyperspherical latent spaces unlock better 3D scene understanding from vision transformers, especially when bandwidth is constrained.
Ditch the training data: this method uses a pre-trained diffusion model to jointly compress and transmit images, outperforming classic techniques without any task-specific training.
Expert imbalance can cripple learning-to-defer systems, but a novel cost-sensitive margin-based loss function can restore performance.
Imagine a Pokemon TCG where every card is uniquely yours, dynamically generated by AI to reflect your playstyle and preferences.
Forget noisy starts – ABC diffusion models leverage the inherent structure of continuous processes, generating future states from already-close previous states for more realistic dynamics.
MLLMs can ace circuit-to-code generation by cheating with identifier semantics, even when the circuit diagram is blank.
Injecting optical flow into VLMs lets them spot subtle video transitions that other methods miss, opening the door to more robust video understanding.
Achieve detailed tunnel defect inspection without any training by visually recalibrating foundation model proposals to overcome tunnel-specific interference.
Automated segmentation of radiological Peritoneal Cancer Index (rPCI) regions from CT scans is now feasible, potentially replacing invasive surgical assessment for peritoneal metastases.
You can now get real-time (825 FPS) crack detection on UAVs without sacrificing accuracy, thanks to a new attention-enhanced lightweight CNN.
A carefully crafted synthetic data pipeline and rubric-guided RL lets a 4B parameter model nearly match Gemini-3-Flash on wafer defect analysis, suggesting that data quality and targeted training can trump sheer model size.
Overcome the chaos of classroom behavior recognition with ALC-YOLOv8s, achieving state-of-the-art detection of dense, occluded, and imbalanced student actions.
Night photography can now look stunningly realistic, thanks to a new rendering technique that beats existing methods on perceptual quality and color accuracy.
Despite advances in deep learning, manufacturing-focused 3D reconstruction still struggles with reflective surfaces and dynamic environments, highlighting the need for robust hybrid systems.
Existing 3D human mesh recovery systems fall apart for individuals with limb loss, but ResiHMR explicitly reconstructs residual-limb surfaces and performs topology-adaptive optimization, opening the door to more inclusive and accurate human modeling.
Guaranteeing topological consistency in image segmentation is now possible within deep learning frameworks thanks to a novel differentiable simple point computation method applicable to continuous-valued images.
Controllable 3D generation takes a leap forward with 3D-ReGen, a framework that leverages an initial 3D shape for tasks like enhancement and editing, outperforming existing methods.
Ditch the garment masks: a simple human mask is all you need to nail video virtual try-on in the wild.
Achieve robust ionogram track separation, even under disturbed ionospheric conditions with unknown track numbers, by integrating physical models into fuzzy clustering.
Seemingly innocuous augmentations like blur can cripple self-supervised learning for fine-grained tasks like plant identification, but domain-aware choices unlock surprisingly strong performance.
Despite the promise of VLMs, current models still struggle to grasp the nuances of climate change discourse in social media videos, highlighting the need for more specialized approaches.
NeRFs get a boost in video reconstruction quality by explicitly modeling inter- and intra-ray similarities with a novel transformer architecture.
Unlock generalist robots by learning manipulation skills directly from the abundance of human activity videos, bypassing the robot data bottleneck.
Initializing prompts in flatter regions of the loss landscape dramatically improves calibration and performance in test-time prompt tuning for vision-language models.
By explicitly modeling relationships between multiple relevant video segments, ClipTBP significantly improves video moment retrieval, especially when queries are ambiguous.
Stop wasting compute pre-training on domain-specific datasets; this simple strategy lets you pre-train on ImageNet and still achieve state-of-the-art results on diverse remote sensing segmentation tasks.
Achieve superior CT-MRI cervical spine registration by adaptively fusing Mamba-based global context with Swin Transformer-based local detail.
Ditch the post-capture processing bottleneck: FUN achieves real-time hyperspectral object detection by jointly learning reconstruction and detection in a single, efficient network.
LVLMs leak visual text style into semantic inference, meaning the font of a word can change the attributes a model associates with the concept it represents.
Achieve high-fidelity CBCT reconstructions from ultra sparse-view data by decoupling geometry and texture in 3D Gaussian Splatting, enabling physically consistent residual detail compensation.
Seemingly strong segmentation models can fail at clinically critical tumor-vessel interfaces, highlighting the need for uncertainty-aware AI in pancreatic cancer staging.