Search papers, labs, and topics across Lattice.
A generative model of human physiology not only beats existing clinical risk scores at predicting disease, but also accurately simulates the effects of clinical interventions, paving the way for personalized medicine.
VLN agents can navigate more accurately in zero-shot settings by "looking forward, now, and backward," mimicking human navigational strategies.
Multimodal models can now achieve state-of-the-art performance in real-world tasks like document understanding and audio-video comprehension with significantly reduced inference latency thanks to novel token-reduction techniques.
Audio-language models can now reason about 30-minute-long audio clips with timestamp-grounded intermediate steps, unlocking a new level of fine-grained understanding.
Finally, a method disentangles dynamic egocentric scenes into background, hand, and object components, enabling fine-grained understanding and editing.
Swap out slow, one-token-at-a-time generation in VLMs for a 6x speed boost, without sacrificing quality, using a surprisingly simple direct conversion to block-diffusion decoding.
Finally, a video generation model lets you puppeteer objects and their reactions independently, all while freely moving the camera.
Forget simulated manipulation—ManipulationNet offers a global infrastructure for benchmarking robots in the real world, complete with standardized hardware and software, to finally measure progress toward general manipulation.
Forget monolithic LoRAs: LoRWeB dynamically mixes a basis set of LoRAs to unlock SOTA generalization in visual analogy tasks.
Forget synthetic data that looks like it came from a PS2 game: NVIDIA's new Cosmos-Predict2.5 generates high-fidelity videos for training embodied AI, opening the door to more realistic and reliable simulations.