Meta AI (FAIR)

×Computer Vision

20 papers from Meta AI (FAIR) on Computer Vision

Jul 5, 2026

1w ago·also Meta AI

SceneFrom3D: Geometry-Conditioned Outdoor 3D Scene Generation via View Scheduling with Object-Level Control

Automated view scheduling in SceneFrom3D transforms the landscape of outdoor 3D scene generation, enabling unprecedented control over object appearance and geometry.

Geon-Yeong Kim, J. Park, Nuri Ryu +2

Computer Vision Multimodal Models World Models & Planning

Jul 1, 2026

Meta AI1w ago·also Rochester Institute of Technology

Information-Regularized Attention for Visual-Centric Reasoning

Stochastic attention isn't just a regularizer; it fundamentally transforms how visual information is learned in VLMs, leading to more stable and reliable models.

Guohao Sun, Xiaofang Wang, Yash Patel +3

Computer Vision Multimodal Models

Jun 30, 2026

Meta AI1w ago·also Codec Avatars Lab, HKUST

LUNA: Learning Universal 3D Human Animation Beyond Skinning

LUNA achieves realistic 3D human animation from 2D inputs without the limitations of traditional skinning methods, enabling unprecedented flexibility and expressivity.

Peng Li, Rawal Khirodkar, Junxuan Li +5

Computer Vision Multimodal Models

Jun 23, 2026

NVIDIA2w ago·also Meta AI, Codec Avatars Lab, D Vision (, D) +1

FiCA: Feed-forward instant Gaussian Codec Avatars from a Single Portrait Image

FiCA generates photorealistic avatars from a single image, achieving unprecedented visual quality and identity fidelity without the need for individual optimization.

Kim Youwang, Zhengyu Yang, Liuhao Ge +8

Computer Vision Multimodal Models

Jun 14, 2026

Meta AIJun 14, 2026·also NYU

You Don't Need Strong Assumptions: Visual Representation Learning via Temporal Differences

Relying on causal relationships rather than strong inductive biases, TDV achieves state-of-the-art performance in visual representation learning, challenging the status quo of self-supervised methods.

Ninad Daithankar, Alexi Gladstone, Yann LeCun +1

Computer Vision Multimodal Models

May 28, 2026

BAIRMay 28, 2026·also Meta AI

Beyond 3D VQAs: Injecting 3D Spatial Priors into Vision-Language Models for Enhanced Geometric Reasoning

Forget task-specific fine-tuning – teaching VLMs basic geometry yields a +29% boost on spatial reasoning benchmarks.

Chun-Hsiao Yeh, Shengyi Qian, Manchen Wang +4

Computer Vision Multimodal Models Reasoning & Chain-of-Thought

May 27, 2026

Meta AIMay 27, 2026·also Ecole Normale Supérieure, PSL University

Misalignment Between Backpropagation and the Hierarchy of Brain Responses to Images

Backpropagation's gradients, while predictive of high-level visual cortex activity, march to a different hierarchical beat than the brain itself, challenging its status as a biologically plausible learning mechanism.

Joséphine Raugel, Maximilian Seitzer, Marc Szafraniec +6

Computer Vision Interpretability & Mechanistic Interp

May 25, 2026

Meta AIMay 25, 2026·also ETH

Global Structure-from-Motion Meets Feedforward Reconstruction

Classical SfM can get stuck, and feedforward reconstruction can be brittle, but combining them creates a system that's both robust and accurate.

Linfei Pan, Johannes Schönberge, Marc Pollefeys

Computer Vision Robotics & Embodied AI

May 21, 2026

Meta AIMay 21, 2026

GazePrior: Zero-Shot AR/VR Eye Tracking via Learned 3D Gaze Reconstruction

Skip the costly data collection for new eye-tracking devices: GazePrior synthesizes realistic training data by learning a 3D prior of human eyes, enabling zero-shot transfer.

Corentin Dumery, David Colmenares, Alexander Fix +2

Computer Vision Data Curation & Synthetic Data Robotics & Embodied AI

Meta AIMay 21, 2026·also BAIR, NYU

Cambrian-P: Pose-Grounded Video Understanding

Camera pose, largely ignored in video LLMs, unlocks significant gains in spatial reasoning and even improves general video QA when used as a lightweight supervisory signal.

Jihan Yang, Zifan Zhao, Xichen Pan +3

Computer Vision Multimodal Models Robotics & Embodied AI

Apr 30, 2026

Meta AIApr 30, 2026

3D-ReGen: A Unified 3D Geometry Regeneration Framework

Controllable 3D generation takes a leap forward with 3D-ReGen, a framework that leverages an initial 3D shape for tasks like enhancement and editing, outperforming existing methods.

Geon Yeong Park, Geon Yeong Park, Roman Shapovalov +7

Architecture Design (Transformers, SSMs, MoE)Computer Vision Multimodal Models

Apr 28, 2026

Meta AIApr 28, 2026·also Brown, UIUC, UT Arlington

IAM: Identity-Aware Human Motion and Shape Joint Generation

Human motion generation gets a dose of reality: IAM shows that explicitly modeling body morphology and identity leads to more realistic and consistent movements.

Wenqi Jia, Wenqi Jia, Zekun Li +14

Computer Vision Multimodal Models Natural Language Processing+1

Apr 9, 2026

Qiance and Ziqi contributed equally toApr 9, 2026·also Meta AI, NYU

EgoEverything: A Benchmark for Human Behavior Inspired Long Context Egocentric Video Understanding in AR Environment

Current egocentric video benchmarks miss the mark: EgoEverything uses human gaze to create questions that actually reflect how people behave, not just what they see.

Qiance Tang, Ziqi Wang, Jieyu Lin +3

Computer Vision Eval Frameworks & Benchmarks Multimodal Models

Meta AIApr 9, 2026

SceneScribe-1M: A Large-Scale Video Dataset with Comprehensive Geometric and Semantic Annotations

A million videos with paired depth, camera pose, and 3D point tracks could unlock a new wave of 3D-aware video models.

Yunnan Wang, Kecheng Zheng, Jianyuan Wang +9

Computer Vision Data Curation & Synthetic Data Multimodal Models

Apr 8, 2026

LinxiApr 8, 2026·also CMU ML, Meta AI, MIT CSAIL, Stanford HAI +5

EgoVerse: An Egocentric Human Dataset for Robot Learning from Around the World

Scaling robot learning with human data isn't a simple "more is better" equation; alignment with robot learning objectives is key.

Ryan Punamiya, Simar Kareer, Josh Citron +26

Computer Vision Data Curation & Synthetic Data Robotics & Embodied AI

Apr 8, 2026·also ETH, Meta AI

GenLCA: 3D Diffusion for Full-Body Avatars from In-the-Wild Videos

Training 3D avatar diffusion models on millions of in-the-wild videos is now possible, thanks to a clever 3D tokenization and visibility-aware training strategy that overcomes partial observability.

Yiqian Wu, Rawal Khirodkar, Egor Zakharov +7

Architecture Design (Transformers, SSMs, MoE)Computer Vision Multimodal Models

Apr 6, 2026

Apr 6, 2026·also Meta AI, Stanford HAI, Fudan, Rochester

GLANCE: A Global-Local Coordination Multi-Agent Framework for Music-Grounded Non-Linear Video Editing

Music-grounded video editing can now produce significantly more coherent timelines thanks to a novel global-local coordination mechanism that resolves cross-segment conflicts.

Zhiyang Xu, Siyao Dai, Huanjie Dong +2

Computer Vision Multimodal Models Speech & Audio

Apr 1, 2026

Meta AIApr 1, 2026·also Northwestern

Autoregressive Appearance Prediction for 3D Gaussian Avatars

Stop avatars from looking like they're having a seizure: this method uses autoregressive prediction of appearance latents to create temporally stable and high-fidelity 3D Gaussian avatars.

Michael Steiner, Zhang Chen, Alexander Richard +3

Architecture Design (Transformers, SSMs, MoE)Computer Vision

Feb 25, 2026

Meta AIFeb 25, 2026

Decoding the Hook: A Multimodal LLM Framework for Analyzing the Hooking Period of Video Ads

Unlocking the secrets of viral video ads: a new MLLM framework reveals which initial moments hook viewers and drive conversions.

Poppy Zhang, Shawndra Hill, Amel Awadelkarim

Computer Vision Multimodal Models Natural Language Processing

Feb 24, 2026

Feb 24, 2026·also Meta AI

Causal Decoding for Hallucination-Resistant Multimodal Large Language Models

By surgically intervening in MLLM decoding, this work cuts hallucination rates without sacrificing descriptive quality, a feat prior methods struggled to achieve.

Shiwei Tan, Hengyi Wang, Weiyi Qin

Computer Vision Multimodal Models Reasoning & Chain-of-Thought

Search

Meta AI (FAIR)