Search papers, labs, and topics across Lattice.
Mamba's superior sequence modeling lets you generate longer, more realistic dance sequences than clunky Transformers ever could.
Panoramic depth perception and differentiable physics unlock surprisingly robust collision avoidance, even generalizing to unseen simulation environments.
Accelerate video generation by 45% without retraining, simply by pruning redundant latent patches and cleverly recovering attention scores.
Achieve real-time traffic analytics across city-scale camera networks by offloading DNN inference to edge devices and using cloud-based GNNs for forecasting, all while dynamically adapting to changing conditions with federated learning.
Unsupervised discovery of object keypoints and dynamics directly from video unlocks state-of-the-art world models applicable to decision-making.
Visual artists are overwhelmingly resisting generative AI in the workplace, deploying active "refusal" strategies against pressure from clients and bosses.
By disentangling camera-space estimation from world-space refinement via dual diffusion models, DuoMo achieves state-of-the-art human motion reconstruction from noisy video, bypassing the limitations of parametric models.
Forget fine-tuning: Prompting MLLMs with a dynamic interval-based decoding strategy lets them generate surprisingly human-like, pause-aware real-time game commentary.
Stain normalization and decoupled learning can dramatically improve the robustness of white blood cell classification, even in the face of significant staining variations and class imbalances.
Achieve 7x accuracy gains in real-world collaborative SLAM by using a robust, distributed optimization algorithm resilient to communication limits and noisy data.
Forget monolithic models: pMoE shows that ensembling diverse expert prompts within a single model framework yields surprisingly large gains in visual adaptation across a wide range of tasks.
Forget language and appearance: CAD models can now directly prompt accurate instance segmentation of industrial objects, even with diverse surface properties.
Unlabeled monocular videos can now be used to train state-of-the-art 3D/4D reconstruction systems, thanks to a factored flow prediction approach that disentangles geometry and pose learning.
Forget cloud GPUs – a new model brings unified multimodal understanding and generation to your iPhone, running 6x faster than alternatives.
Image-to-image editors silently weaken or ignore your edit instructions based on the subject's race, gender, and age, revealing surprising demographic biases.
Forget clunky skeletons: this new model lets you prompt your way to accurate 3D human meshes from single images, even in the wildest poses.
Stop treating generated images like real ones: GMAIL aligns them as separate modalities in a shared latent space, unlocking significant gains in vision-language tasks.
VLMs that ace RGB images completely fail at thermal imagery, revealing a critical gap in their ability to reason about temperature and physical properties.
Forget rigid game environments – PAN lets you simulate open-world scenarios with language-specified actions and long-term visual coherence, opening the door to more realistic AI training.
Synthetic data generated by fine-tuning Stable Diffusion on multi-region satellite imagery boosts small object detection accuracy by 20%, even when real labeled data is scarce.
Forget tedious manual annotation: FlexDataset crafts customized, high-fidelity annotated datasets with 5x faster annotation times using a composition-to-data approach.
Achieve semantically coherent image compositions by mixing layout-focused and appearance-focused visual representations in a diffusion model's cross-attention.