Lattice AI Research

Research focus

Eval Frameworks & Benchmarks (2)Tool Use & Agents (2)Code Generation & Program Synthesis (1)Robotics & Embodied AI (1)

Frequent co-authors

Yipeng Ouyang (1)Bingjie Liu (1)Zhongchun Zheng (1)Yuhao Gu (1)

Papers (2)

May 26, 2026

Yipeng Ouyang +5May 26, 2026

Benchmarks are Not Enough: RAMP for Runtime Assessing of Agentic Models in Production Systems

RAMP uncovers that agentic models can lose up to 80% of their effectiveness in complex, real-world workflows, a stark contrast to their performance in isolated benchmarks.

Yipeng Ouyang, Xinmiao Huang, Bingjie Liu +3

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Tool Use & Agents

Yifan Sui +14May 26, 2026·also StepFun

AndroidDaily: A Verifiable Benchmark for Mobile GUI Agents on Real-World Closed-Source Applications

Current mobile GUI agents are surprisingly inept at everyday smartphone tasks, achieving only 62% success on a new benchmark of real-world Android apps.

Yifan Sui, Xinmiao Huang, Hongbing Li +12

Eval Frameworks & Benchmarks Robotics & Embodied AI Tool Use & Agents

Search

Xinmiao Huang

Research focus

Frequent co-authors

Papers (2)