Meta AI (FAIR)

×Multimodal Models

10 papers from Meta AI (FAIR) on Multimodal Models

Apr 30, 2026

Meta AI3w ago·also Oxford

3D-ReGen: A Unified 3D Geometry Regeneration Framework

Controllable 3D generation takes a leap forward with 3D-ReGen, a framework that leverages an initial 3D shape for tasks like enhancement and editing, outperforming existing methods.

Geon Yeong Park, Geon Yeong Park, Roman Shapovalov +8

Architecture Design (Transformers, SSMs, MoE)Computer Vision Multimodal Models

Apr 28, 2026

Meta AI3w ago·also Brown

IAM: Identity-Aware Human Motion and Shape Joint Generation

Human motion generation gets a dose of reality: IAM shows that explicitly modeling body morphology and identity leads to more realistic and consistent movements.

Wenqi Jia, Wenqi Jia, Zekun Li +14

Computer Vision Multimodal Models Natural Language Processing+1

Apr 9, 2026

Qiance and Ziqi contributed equally toApr 9, 2026·also Meta AI

EgoEverything: A Benchmark for Human Behavior Inspired Long Context Egocentric Video Understanding in AR Environment

Current egocentric video benchmarks miss the mark: EgoEverything uses human gaze to create questions that actually reflect how people behave, not just what they see.

Qiance Tang, Ziqi Wang, Jieyu Lin +3

Computer Vision Eval Frameworks & Benchmarks Multimodal Models

Meta AIApr 9, 2026

SceneScribe-1M: A Large-Scale Video Dataset with Comprehensive Geometric and Semantic Annotations

A million videos with paired depth, camera pose, and 3D point tracks could unlock a new wave of 3D-aware video models.

Yunnan Wang, Kecheng Zheng, Jianyuan Wang +9

Computer Vision Data Curation & Synthetic Data Multimodal Models

Apr 8, 2026

Apr 8, 2026·also ETH, Meta AI

GenLCA: 3D Diffusion for Full-Body Avatars from In-the-Wild Videos

Training 3D avatar diffusion models on millions of in-the-wild videos is now possible, thanks to a clever 3D tokenization and visibility-aware training strategy that overcomes partial observability.

Yiqian Wu, Rawal Khirodkar, Egor Zakharov +7

Architecture Design (Transformers, SSMs, MoE)Computer Vision Multimodal Models

Apr 6, 2026

China University of Mining and Technology-BeijingApr 6, 2026·also Meta AI, UW

Rethinking Model Efficiency: Multi-Agent Inference with Large Models

Forget scaling laws: a large VLM strategically paired with a smaller model's reasoning tokens can rival the performance of a much larger, monolithic model.

SiXun Dong, Juhua Hu, Steven Li +1

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Multimodal Models

Concordia UniversityApr 6, 2026·also Meta AI, Stanford HAI, Fudan, Rochester +1

GLANCE: A Global-Local Coordination Multi-Agent Framework for Music-Grounded Non-Linear Video Editing

Music-grounded video editing can now produce significantly more coherent timelines thanks to a novel global-local coordination mechanism that resolves cross-segment conflicts.

Haibo Wang, Zhiyang Xu, Siyao Dai +5

Computer Vision Multimodal Models Speech & Audio

Feb 25, 2026

University of MarlandFeb 25, 2026·also Meta AI

Decoding the Hook: A Multimodal LLM Framework for Analyzing the Hooking Period of Video Ads

Unlocking the secrets of viral video ads: a new MLLM framework reveals which initial moments hook viewers and drive conversions.

Kunpeng Zhang, Poppy Zhang, Shawndra Hill +1

Computer Vision Multimodal Models Natural Language Processing

Feb 24, 2026

Feb 24, 2026·also Meta AI

Causal Decoding for Hallucination-Resistant Multimodal Large Language Models

By surgically intervening in MLLM decoding, this work cuts hallucination rates without sacrificing descriptive quality, a feat prior methods struggled to achieve.

Shiwei Tan, Hengyi Wang, Weiyi Qin +1

Computer Vision Multimodal Models Reasoning & Chain-of-Thought

Jan 3, 2025

Meta AIJan 3, 2025·also Sorbonne

MusicGen-Stem: Multi-stem music generation and edition through autoregressive modeling

Edit the bassline, drums, or other instruments of any song with this new open-source multi-stem music generation model.

Simon Rouard, Robin San Roman, Yossi Adi +112

Architecture Design (Transformers, SSMs, MoE)Data Curation & Synthetic Data Multimodal Models+1

Search

Meta AI (FAIR)