Google Research

×Computer Vision

18 papers from Google Research on Computer Vision

Jul 6, 2026

FM-ChangeNet: Learning Change through Pathwise Feature Transport

A pathwise approach to change detection reveals that continuous transport in feature space significantly enhances the model's ability to capture and interpret temporal changes.

Roie Kazoom, George Leifman, Genady Beryozkin

Computer Vision

Jun 30, 2026

ETH1w ago·also Google Research

AugSplat: Radiance Field-Informed Gaussian Splatting for Sparse-View Settings

AugSplat boosts reconstruction quality in sparse-view 3D vision by leveraging synthetic views from neural radiance fields, achieving real-time performance without sacrificing accuracy.

Lorenzo Lazzaroni, Riccardo Bollati, Daniel Barath +1

Computer Vision

Jun 29, 2026

Google Research2w ago

Explainability-Aware Frustum Attack: Exposing Structural Vulnerabilities in LiDAR-Based 3D Object Detectors

LiDAR-based 3D object detectors can be compromised by targeting just a few critical spatial regions, revealing a significant structural vulnerability.

Chengzeng You, Binbin Xu, Soteris Demetriou

Computer Vision Red-Teaming & Adversarial Robustness

Jun 23, 2026

Google Research2w ago·also Oxford, TU Munich

FLAT: Feedforward Latent Triangle Splatting for Geometrically Accurate Scene Generation

Triangle splats from video diffusion latents yield superior geometric accuracy and visual quality, challenging the dominance of volumetric 3D Gaussians in scene generation.

Orest Kupyn, Goutam Bhat, Philipp Henzler

Computer Vision Multimodal Models World Models & Planning

May 28, 2026

Google ResearchMay 28, 2026·also Bar-Ilan, HUJI

LiveSVG: Zero-Shot SVG Animation via Video Generation

Ditch the brittle code synthesis and noisy gradients: LiveSVG unlocks high-quality SVG animations by directly fitting vector graphics to reference videos generated from motion prompts.

Matan Levy, R. Margolin, Bar Cavia +6

Code Generation & Program Synthesis Computer Vision Multimodal Models

May 27, 2026

May 27, 2026·also Google Research, Saarland Informatics Campus

EgoRelight: Egocentric Human Capture and Illumination Recovery for Relightable and Photoreal Avatar Rendering

Imagine telepresence where your avatar convincingly blends into any environment, relit in real-time based on the scene's actual lighting, all from a single headset.

Jianchun Chen, Rohit Pandey, Thabo Beeler +2

Computer Vision Multimodal Models Robotics & Embodied AI

May 22, 2026

May 22, 2026·also Google Research, Vector

Good Token Hunting: A Hitchhiker's Guide to Token Selection for Visual Geometry Transformers

You can slash the compute cost of visual geometry transformers by 85% without sacrificing accuracy by intelligently pruning redundant tokens across frames and within layers.

Shuhong Zheng, Erik Sandström, Marie-Julie Rakotosaona +1

Architecture Design (Transformers, SSMs, MoE)Computer Vision Training Efficiency & Optimization

Apr 27, 2026

Google ResearchApr 27, 2026·also Beihang, ByteDance

Co-Director: Agentic Generative Video Storytelling

Forget handcrafted prompts: a hierarchical multi-agent framework turns diffusion models into coherent storytelling engines by globally optimizing for semantic coherence.

Yale Song, Yale Song, Yiwen Song +27

Computer Vision Multimodal Models Tool Use & Agents

Apr 22, 2026

Google ResearchApr 22, 2026

RSRCC: A Remote Sensing Regional Change Comprehension Benchmark Constructed via Retrieval-Augmented Best-of-N Ranking

Current remote sensing change captioning datasets miss fine-grained localized semantic reasoning, but RSRCC fills this gap with 126k change-specific questions.

Roie Kazoom, Yotam Gigi, George Leifman +2

Computer Vision Eval Frameworks & Benchmarks Multimodal Models

Google ResearchApr 22, 2026·also Max Planck, VIA Research Center

R-CoV: Region-Aware Chain-of-Verification for Alleviating Object Hallucinations in LVLMs

LVLMs can self-detect and correct object hallucinations by focusing on specific image regions, offering a simple, training-free fix.

Nathalie Rauschmayr, Bernt Schiele

Computer Vision Multimodal Models Reasoning & Chain-of-Thought

Apr 15, 2026

Apr 15, 2026·also Google Research

CANVAS: Continuity-Aware Narratives via Visual Agentic Storyboarding

Generating consistent visual narratives is now possible: CANVAS outperforms existing methods by explicitly planning character, background, and scene continuity across multiple shots.

Ishani Mondal, I. Mondal, Mihir Parmar +3

Computer Vision Multimodal Models Tool Use & Agents

Apr 14, 2026

Google ResearchApr 14, 2026·also Max Planck

Grasp in Gaussians: Fast Monocular Reconstruction of Dynamic Hand-Object Interactions

Reconstructing dynamic hand-object interactions from monocular video can be 6x faster and significantly more accurate by ditching heavy neural representations for a revived Sum-of-Gaussians approach.

Ayce Idil Aytekin, Zhengyang Shen, Thabo Beeler +2

Computer Vision Robotics & Embodied AI

Mar 27, 2026

Google ResearchMar 27, 2026·also CREST-ENSAE, KU, Oxford

VGGRPO: Towards World-Consistent Video Generation with 4D Latent Reward

Achieve world-consistent video generation by directly optimizing geometry in the latent space of pre-trained video diffusion models, sidestepping costly RGB-space operations and architectural changes.

Zhaochong An, Orest Kupyn, Théo Uscidda +5

Computer Vision Multimodal Models World Models & Planning

Mar 11, 2026

Google ResearchMar 11, 2026·also BUPT, Columbia, UMich

MoXaRt: Audio-Visual Object-Guided Sound Interaction for XR

Imagine an XR experience where you can selectively isolate and enhance individual sound sources in real-time, making chaotic audio environments crystal clear.

Tianyu Xu, Sieun Kim, Qianhuizhi Zheng +5

Computer Vision Multimodal Models Speech & Audio

Mar 9, 2026

Google ResearchMar 9, 2026·also UC Santa Cruz

CAST: Modeling Visual State Transitions for Consistent Video Retrieval

Forget local semantic alignment: CAST unlocks temporally coherent video retrieval and generation by explicitly modeling visual state transitions.

Yanqing Liu, Yingcheng Liu, Fanghong Dong +4

Computer Vision Multimodal Models Recommendation & Information Retrieval

Mar 6, 2026

Mar 6, 2026·also Google Research, A*STAR, Saarbrücken Research Center for Visual Computing, SUTD

Physical Simulator In-the-Loop Video Generation

AI-generated videos can now respect physics, thanks to a framework that uses a physical simulator to guide diffusion models, resulting in more realistic and coherent motion.

Lin Geng Foo, Mark He Huang, Alexandros Lattas +2

Computer Vision Multimodal Models World Models & Planning

Feb 19, 2026

DeepMindFeb 19, 2026·also Google Research, KU, UZH

Tree crop mapping of South America reveals links to deforestation and conservation

Existing deforestation monitoring maps misclassify smallholder agroforestry as "forest," risking unfair penalties under regulations like the EUDR.

Yuchang Jiang, Anton Raichuk, Xiaoye Tong +6

Computer Vision Multimodal Models Scientific Discovery & Drug Design

Feb 17, 2026

NVIDIAFeb 17, 2026·also ETH, Google Research

RaCo: Ranking and Covariance for Practical Learned Keypoints

Forget complex architectures: RaCo achieves SOTA keypoint matching and repeatability by cleverly combining ranking and covariance estimation in a lightweight network, trained without covisible image pairs.

Abhiram Shenoi, Philipp Lindenberger, Paul-Edouard Sarlin

Computer Vision Recommendation & Information Retrieval Robotics & Embodied AI

Search

Google Research