Baotian Hu

Harbin Institute of Technology, Shenzhen

Papers on Lattice

Total citations

Topics

Publication activitypapers/week, last 8 weeks

Research focus

Eval Frameworks & Benchmarks (3)Natural Language Processing (3)Tool Use & Agents (2)Recommendation & Information Retrieval (2)

Frequent co-authors

Jinchao Li (1)Yunxin Li (1)Chenrui Zhao (1)Zhenran Xu (1)

Papers (5)

Apr 30, 2026

1d ago

WindowsWorld: A Process-Centric Benchmark of Autonomous GUI Agents in Professional Cross-Application Environments

Today's best GUI agents can barely handle real-world professional workflows, failing at tasks requiring reasoning across just three applications with success rates under 21%.

Jinchao Li, Yunxin Li, Chenrui Zhao +3

Eval Frameworks & Benchmarks Tool Use & Agents

Apr 22, 2026

Chenyuan Zhang +71w ago·also Tsinghua AI, HIT

Less Languages, Less Tokens: An Efficient Unified Logic Cross-lingual Chain-of-Thought Reasoning Framework

Reasoning across languages doesn't have to break the bank: a new framework slashes token costs by over 50% while maintaining accuracy, especially boosting performance in low-resource languages.

Chenyuan Zhang, Qiguang Chen, Xie Chen +5

Inference & Quantization Natural Language Processing Reasoning & Chain-of-Thought

Apr 21, 2026

Beijing Language and Culture University1w ago·also DAMO, ELLIS, HIT, IBM Research +5

CulturALL: Benchmarking Multilingual and Multicultural Competence of LLMs on Grounded Tasks

LLMs still struggle to reason in context when cultural and linguistic nuances are involved, achieving only 44% accuracy on a new grounded benchmark spanning 14 languages.

Wenjiang Luo, Haotian Ye, Md Mehrab Hossain +16

Eval Frameworks & Benchmarks Natural Language Processing

Apr 15, 2026

2w ago

ToolOmni: Enabling Open-World Tool Use via Agentic learning with Proactive Retrieval and Grounded Execution

LLMs can now navigate the ever-expanding universe of external tools with significantly improved accuracy and generalization, thanks to a new agentic framework that proactively retrieves and grounds tool execution.

Shouzheng Huang, Baotian Hu

Recommendation & Information Retrieval Tool Use & Agents

Mar 13, 2026

Mar 13, 2026·also Northeastern, Shenzhen Loop Area Institute (SLAI);

LMEB: Long-horizon Memory Embedding Benchmark

Traditional text embedding benchmarks fail to capture the nuances of long-horizon memory retrieval, but this new benchmark reveals that bigger models don't always win, and performance on standard tasks doesn't guarantee success in complex, context-dependent memory scenarios.

Xinping Zhao, Xinshuo Hu, Jiaxin Xu +7

Eval Frameworks & Benchmarks Natural Language Processing Recommendation & Information Retrieval

Search

Baotian Hu

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (5)