Search papers, labs, and topics across Lattice.
13 papers from Berkeley AI Research (BAIR) on Computer Vision
Teaching robots to manipulate objects just got easier: OCRA learns directly from human demonstration videos by focusing on object interactions and incorporating tactile feedback.
MLLMs can now handle 4K videos up to 100x faster thanks to AutoGaze, which selectively attends to only the most informative patches.
Ditch the clunky controllers: this hand-shadowing pipeline lets you teleoperate a robot arm with just an RGB-D camera and some clever inverse kinematics.
Get 2x faster video generation from diffusion transformers without sacrificing quality, thanks to a clever parameter-free error compensation technique.
Achieve globally consistent 3D reconstruction over sequences exceeding 19,000 frames by combining test-time training with sliding window attention, outperforming prior state-of-the-art methods by over 74% on ATE on KITTI.
Forget tedious manual calibration: D-REX automatically builds high-fidelity digital twins by identifying object mass directly from real-world grasping data.
Unlock autonomous driving with YouTube: a new label-free pretraining method learns driving representations directly from unposed in-the-wild videos, outperforming LiDAR baselines with only a single monocular camera.
Ditching explicit 3D geometry, RAYNOVA achieves SOTA multi-view video generation by modeling spatio-temporal relationships directly with a dual-causal autoregressive framework and Pl眉cker-ray positional encoding.
Human-level 3D perception can emerge from a surprisingly simple, scalable learning objective using multi-view images, finally closing the gap between AI and human performance on this fundamental visual task.
Forget synthetic data鈥攕caling up human egocentric video by 20x unlocks surprisingly effective dexterous robot manipulation, even transferring to robots with different hand configurations.
Forget clunky skeletons: this new model lets you prompt your way to accurate 3D human meshes from single images, even in the wildest poses.
Denoising diffusion models can significantly outperform discriminative methods in learning-to-rank, suggesting a new path for improving information retrieval.
An end-to-end learned robotic system can now clean your kitchen in a completely new house, thanks to a novel co-training approach on diverse data.