Dongfu Jiang

University of Waterloo

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Tool Use & Agents (5)Eval Frameworks & Benchmarks (4)Multimodal Models (3)Architecture Design (Transformers, SSMs, MoE) (2)

Frequent co-authors

Boxin Wang (3)Wenliang Dai (3)M. Shoeybi (3)Mohammad Shoeybi (3)

Papers (10)

Jul 6, 2026

NVIDIA2w ago·also HKUST, Waterloo

Unified Audio Intelligence Without Regressing on Text Intelligence

Audex achieves state-of-the-art audio understanding and generation while maintaining the reasoning prowess of its text-only foundation, all through a unified architecture.

Zhifeng Kong, Sang-gil Lee, JaeHyeon Kim +17

Multimodal Models Speech & Audio

Jun 12, 2026

AI2Jun 12, 2026·also NVIDIA, NJU, TU Darmstadt, University of California +1

Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Achieving six times the inference throughput of current LLMs while maintaining accuracy, Nemotron 3 Ultra redefines performance benchmarks for agentic reasoning tasks.

NVIDIA, Aaron Blakeman, Aaron Thomas +322

Architecture Design (Transformers, SSMs, MoE)Scaling Laws & Emergent Abilities Tool Use & Agents

Jun 12, 2026·also SJTU, Texas A&M, UCSD, UofT +1

Dr-DCI: Scaling Direct Corpus Interaction via Dynamic Workspace Expansion

DR-DCI achieves a remarkable 73.3% accuracy in agentic search tasks while efficiently scaling from 100K to 10M documents, outperforming traditional methods.

Yi Lu, Zhuofeng Li, Ping Nie +6

Recommendation & Information Retrieval Tool Use & Agents

Jun 3, 2026

UWJun 3, 2026·also MIT CSAIL, Stanford HAI, Notre Dame, Princeton +1

AutoLab: Can Frontier Models Solve Long-Horizon Auto Research and Engineering Tasks?

Success in long-horizon tasks hinges more on an agent's iterative persistence than on the quality of its initial solution.

Zhangchen Xu, Junda Chen, Yue Huang +14

Eval Frameworks & Benchmarks Scalable Oversight & Alignment Theory Scientific Discovery & Drug Design

Jun 1, 2026

NVIDIAJun 1, 2026·also BAIR, Galbot, Georgia Tech, HKUST +9

Cosmos 3: Omnimodal World Models for Physical AI

Cosmos 3 sets a new benchmark for omnimodal models, outperforming existing state-of-the-art in both Text-to-Image and Image-to-Video tasks.

Aditi, Niket Agarwal, Arslan Ali +285

Multimodal Models Robotics & Embodied AI World Models & Planning

Apr 15, 2026

Apr 15, 2026·also JHU, NJU, SJTU, UCSD +1

ReviewGrounder: Improving Review Substantiveness with Rubric-Guided, Tool-Integrated Agents

A clever two-stage agent using smaller models can produce better, more substantive peer reviews than brute-force application of the largest LLMs.

Zhuofeng Li, Yi Lu, Dongfu Jiang +5

Eval Frameworks & Benchmarks Natural Language Processing Tool Use & Agents

Apr 14, 2026

AI2Apr 14, 2026·also NVIDIA, NJU, Waterloo

Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Nemotron 3 Super proves you can achieve comparable accuracy to existing 120B models, but with significantly higher inference throughput, by combining Mamba, Attention, and Mixture-of-Experts.

Aakshita Chandiramani, Aaron Blakeman, Abdullahi Olaoye +272

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Tool Use & Agents

Apr 9, 2026

CMU MLApr 9, 2026·also Tsinghua AI, NJU, Waterloo

ClawBench: Can AI Agents Complete Everyday Online Tasks?

Today's best AI agents can only complete 33% of common online tasks like booking appointments or filling out job applications, revealing a significant gap between current capabilities and real-world utility.

Yuxuan Zhang, Yubo Wang, Yipeng Zhu +19

Eval Frameworks & Benchmarks Natural Language Processing Tool Use & Agents

Apr 6, 2026

Watch Before You Answer: Learning from Visually Grounded Post-Training

Current video understanding benchmarks and post-training datasets are riddled with linguistic biases, meaning VLMs might be acing tests without actually "watching" the video.

Eunjeong Hwang, Huaisong Zhang, Penghui Du +7

Computer Vision Eval Frameworks & Benchmarks Multimodal Models

Mar 19, 2026

NVIDIAMar 19, 2026·also HKUST, Samsung, Waterloo

Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation

A 30B MoE model can now achieve Gold Medal-level performance in IMO, IOI, and ICPC, rivaling frontier models with 20x more parameters.

Zhuoling Yang, Zhuolin Yang, Yang Chen +23

Code Generation & Program Synthesis Reasoning & Chain-of-Thought RLHF & Preference Learning

Search

Dongfu Jiang

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (10)