Tsinghua AI

Tsinghua University's AI research group. Leading Chinese institution in NLP, knowledge graphs, and large language models.

ml.cs.tsinghua.edu.cn

Total papers

Total citations

999

Avg citations

Top Researchers

Jie TangZhiyuan LiuMaosong Sun

Recent Papers

2025

FunReason-MT Technical Report: Overcoming the Complexity Barrier in Multi-Turn Function Calling

FunReason-MT is presented, a novel data synthesis framework for real-world multi-turn tool use that resolves the complexity barrier in multi-turn FC data by employing 1) Environment-API Graph Interactions to gather varied high-quality trajectories, 2) Advanced Tool-Query Synthesis to simplify hard query construction, and 3) Guided Iterative Chain for sophisticated CoT generation.

Zengzhuang Xu, Bingguang Hao, Zechuan Wang +12

Feb 12, 2026

2d ago

Analytical Search

The paper introduces "analytical search" as a new search paradigm tailored for complex analytical information needs, addressing the limitations of relevance-based ranking and retrieval-augmented generation (RAG) in tasks requiring trend analysis, causal inference, and verifiable conclusions. It proposes a system framework that integrates query understanding, recall-oriented retrieval, reasoning-aware fusion, and adaptive verification to support structured, multi-step inference. The authors argue that analytical search offers improved control over reasoning, evidence usage, and verifiability, leading to more accountable and utility-driven results compared to existing search paradigms.

Introduces and formalizes the concept of "analytical search" as a distinct search paradigm designed to address complex analytical information needs by emphasizing evidence-governed, process-oriented workflows.

Shuo Miao, Yiqun Liu, Qingyao Ai2602.11581

Recommendation & Information RetrievalNatural Language Processing

2d ago

PatientHub: A Unified Framework for Patient Simulation

The paper introduces PatientHub, a unified framework to standardize the creation, composition, and deployment of simulated patients for training counselors and scaling therapeutic assessment using Large Language Models. PatientHub addresses the fragmentation in existing patient simulation approaches by providing standardized data formats, prompts, and evaluation metrics, thus improving reproducibility and enabling fair comparisons. The authors demonstrate PatientHub's utility through case studies, showcasing standardized cross-method evaluation, seamless integration of custom evaluation metrics, and the prototyping of new simulator variants.

Introduces PatientHub, a modular framework that unifies patient simulation by standardizing data formats, prompts, and evaluation metrics to facilitate reproducibility and fair comparison of different methods.

Sahand Sabour, NG TszYam2602.11684

Eval Frameworks & BenchmarksOpen-Source Models & WeightsNatural Language Processing

2d ago

JEPA-VLA: Video Predictive Embedding is Needed for VLA Models

The paper identifies limitations in current Vision-Language-Action (VLA) models stemming from inadequate visual representations learned through language-image contrastive learning or image-based self-supervised learning. It proposes JEPA-VLA, a method that integrates video predictive embeddings (specifically V-JEPA 2) into VLAs to improve environment understanding and policy priors. Experiments on benchmarks like LIBERO and real-robot tasks demonstrate that JEPA-VLA significantly improves performance by leveraging the ability of video predictive embeddings to encode task-relevant temporal dynamics.

Introduces JEPA-VLA, a novel approach that adaptively integrates video predictive embeddings into existing VLAs to enhance environment understanding and policy priors.

Shangchen Miao, Ningya Feng, Jialong Wu +32602.11832

Multimodal ModelsRobotics & Embodied AIWorld Models & PlanningComputer Vision

2d ago

MiniCPM-SALA: Hybridizing Sparse and Linear Attention for Efficient Long-Context Modeling

The paper introduces MiniCPM-SALA, a 9B-parameter hybrid architecture that combines sparse attention (InfLLM-V2) and linear attention (Lightning Attention) to improve long-context modeling efficiency. A layer selection algorithm integrates the two attention mechanisms in a 1:3 ratio, along with a hybrid positional encoding (HyPE), to maintain performance while improving efficiency. The paper also presents a cost-effective continual training framework that transforms pre-trained Transformer models into hybrid models, reducing training costs by 75% and enabling the model to achieve 3.5x faster inference speeds at 256K sequence length and supporting context lengths up to 1M tokens on a single NVIDIA A6000D GPU.

Introduces a hybrid sparse and linear attention architecture, MiniCPM-SALA, that achieves efficient long-context modeling with minimal performance degradation compared to full-attention models.

2602.11761

Architecture Design (Transformers, SSMs, MoE)Training Efficiency & OptimizationInference & Quantization

2d ago·affiliated lab: Tsinghua AI

Beyond End-to-End Video Models: An LLM-Based Multi-Agent System for Educational Video Generation

The paper introduces LAVES, a hierarchical LLM-based multi-agent system, to generate high-quality instructional videos from educational problems by decomposing the generation workflow into specialized agents for problem-solving, visualization, and narration. LAVES addresses limitations of end-to-end video generation models in scenarios requiring logical rigor and precise knowledge representation. The system achieves a throughput of over one million videos per day with a 95% cost reduction compared to industry standards, while maintaining a high acceptance rate, by constructing a structured executable video script compiled into synchronized visuals and narration.

Introduces a hierarchical LLM-based multi-agent system (LAVES) that decomposes educational video generation into specialized agents, enabling automated end-to-end production with high throughput and cost efficiency.

Jiulong Wu, Dong Xie, Deguo Xia +12602.11790

Tool Use & AgentsMultimodal ModelsComputer Vision

Feb 9, 2026

5d ago

NarraScore: Bridging Visual Narrative and Musical Dynamics via Hierarchical Affective Control

The paper introduces NarraScore, a hierarchical framework for generating soundtracks for long-form videos by leveraging emotion as a compressed representation of narrative logic. It uses frozen Vision-Language Models (VLMs) to extract Valence-Arousal trajectories from video and employs a Dual-Branch Injection strategy, consisting of a Global Semantic Anchor and a Token-Level Affective Adapter, to control musical dynamics. Experiments show that NarraScore achieves state-of-the-art consistency and narrative alignment with minimal computational cost.

Introduces a hierarchical framework, NarraScore, that leverages VLMs and a dual-branch injection strategy to generate narrative-aligned soundtracks for long-form videos.

Yufan Wen, Ziyi Guo, Lihua Zhang +22602.09070

Multimodal ModelsSpeech & AudioComputer Vision

Jan 7, 2026

EvoMDT: a self-evolving multi-agent system for structured clinical decision-making in multi-cancer

The paper introduces EvoMDT, a self-evolving multi-agent system designed to improve structured clinical decision-making in multi-cancer multidisciplinary tumor boards (MDTs). EvoMDT uses a self-evolution loop to dynamically update prompts, consensus weights, and retrieval scope based on expert feedback and outcome signals, enhancing robustness and traceability. Evaluated on oncology QA benchmarks and real-world datasets, EvoMDT outperformed LLM baselines, achieving higher guideline concordance, semantic alignment with expert plans, and comparable decision quality to human MDTs with reduced response time.

Introduces a self-evolving multi-agent system, EvoMDT, that adaptively refines its decision-making process for cancer treatment recommendations based on expert feedback and outcome signals.

Qicai Liu, Zhichao Hu, Tao Huang +10

Reasoning & Chain-of-ThoughtRLHF & Preference Learning

Dec 19, 2025

Dec 19, 2025·affiliated labs: MIT CSAIL, Mila, Tsinghua AI

OpenAI GPT-5 System Card

The paper introduces GPT-5, a unified system comprising a fast, general-purpose model and a deeper reasoning model, managed by a real-time router trained on user feedback and performance metrics. GPT-5 demonstrates improved performance on benchmarks, faster response times, and enhanced utility for real-world queries, with significant reductions in hallucinations, improved instruction following, and minimized sycophancy. The system incorporates "safe-completions" for safety and is treated as High capability in the Biological and Chemical domain under OpenAI's Preparedness Framework, triggering associated safeguards.

Introduces a unified GPT-5 system with a real-time router that dynamically selects between a fast, general-purpose model and a deeper reasoning model based on query characteristics, optimizing for speed and accuracy.

Aaditya K. Singh, Adam Fry, Adam Perelman +479622601.03267

Reasoning & Chain-of-ThoughtTool Use & AgentsEval Frameworks & Benchmarks

Dec 7, 2025

Dec 7, 2025·affiliated labs: Stanford HAI, MIT CSAIL, Berkeley AI Research (BAIR), Tsinghua AI

International AI Safety Report 2025: Second Key Update: Technical Safeguards and Risk Management

The International AI Safety Report 2025's Second Key Update analyzes the current state of AI risk management and technical mitigations employed by researchers, companies, and governments. It highlights advancements in training safer models and monitoring outputs while acknowledging uncertainties in the effectiveness of these measures and their variability across applications. The report aims to inform policymakers, researchers, and the public about progress and remaining gaps in AI safety.

Synthesizes recent developments in AI risk management and technical risk mitigation strategies, identifying both progress and persistent gaps in ensuring the safety of general-purpose AI systems.

Y. Bengio, Stephen Clare, Carina Prunkl +34

Constitutional AI & AI EthicsRed-Teaming & Adversarial RobustnessEval Frameworks & Benchmarks

Nov 23, 2025

CycleChemist: A Dual-Pronged Machine Learning Framework for Organic Photovoltaic Discovery

The paper introduces CycleChemist, a dual-pronged machine learning framework for organic photovoltaic (OPV) material discovery, addressing the limitation of existing methods that focus on either donor or acceptor materials in isolation. They curate the Organic Photovoltaic Donor Acceptor Dataset (OPV2D), containing 2000 experimentally characterized donor-acceptor pairs, and develop a hierarchical graph neural network (OPVC) to predict OPV behavior, incorporating multi-task learning and donor-acceptor interaction modeling. The framework also includes MatGPT, a generative transformer for producing synthetically accessible organic semiconductors, guided by reinforcement learning to optimize material properties.

Introduces CycleChemist, a novel dual machine learning framework that integrates predictive modeling with generative molecular design for the data-driven discovery of high-performance OPV materials.

Hou Hei Lam, Jiangjie Qiu, Xiuyuan Hu +52511.19500

Scientific Discovery & Drug DesignTraining Efficiency & Optimization

Oct 31, 2025

Oct 31, 2025·affiliated lab: MIT CSAIL

DLDC: A Dual Loop Data Cleaning Method for Fine-Tuning Remote Sensing Image Generative Models

The paper introduces a Dual Loop Data Cleaning (DLDC) method to automatically generate high-quality remote sensing image-text training data by leveraging contrastive multimodal quality evaluations. DLDC uses an external generation loop (EGL) based on a multimodal foundational model for layout description and an internal evaluation loop (IEL) based on contrastive learning metrics to assess image-text matching. Fine-tuning T2I models with the cleaned dataset results in significant improvements in image generation quality, as evidenced by substantial reductions in FID and increases in CLIP and RemoteCLIP scores, and improved downstream segmentation performance.

Introduces a dual-loop data cleaning method (DLDC) that automatically generates high-quality remote sensing image-text training data, eliminating the need for manual annotation.

Tian Xing, Hu Yan, Xinwei Wang +4

Data Curation & Synthetic DataMultimodal ModelsComputer Vision

Oct 28, 2025

FunReason-MT Technical Report: Advanced Data Synthesis Solution for Real-world Multi-Turn Tool-use

The paper introduces FunReason-MT, a novel data synthesis framework designed to generate high-quality, multi-turn training data for function calling in large language models, addressing limitations in existing methods like random sampling and multi-agent role-playing. FunReason-MT employs Environment-API Graph Interactions, Advanced Tool-Query Synthesis, and Guided Iterative Chain to overcome challenges in targeted data synthesis, hard query construction, and multi-turn logical dependency. Experiments on BFCLv3 and BFCLv4 show that models trained on FunReason-MT data achieve state-of-the-art performance among comparable-sized models, demonstrating the framework's effectiveness in agentic learning.

Introduces FunReason-MT, a data synthesis framework that generates high-quality, multi-turn function calling data by integrating Environment-API Graph Interactions, Advanced Tool-Query Synthesis, and Guided Iterative Chain.

Zengzhuang Xu, Bingguang Hao, Zechuan Wang +142510.24645

Tool Use & AgentsData Curation & Synthetic DataReasoning & Chain-of-Thought

Oct 8, 2025

SAMR: A Spatial-Augmented Mixed Reality Method for Enhancing Vision-Language Models in 3D Scene Understanding

The paper introduces SAMR, a Spatial-Augmented Mixed Reality method, to improve Vision-Language Model (VLM) performance in 3D scene understanding within mixed reality environments. SAMR uses FastSAM-based segmentation to generate object-level meshes from HMD images, maps feature points to 3D coordinates via ray casting and triangular facet fitting, and integrates multimodal interactions (gestures, gaze, voice) for prompt annotation. Experiments across six application scenarios (object identification, relationship analysis, etc.) demonstrate SAMR's effectiveness in enhancing VLMs for 3D scene interpretation.

Introduces SAMR, a novel spatial-augmented mixed reality method, that enhances VLMs by incorporating spatial context and multimodal interaction for improved 3D scene understanding.

Junjian Lin, Wenzhuo Sun, Xiangyu Zhang +4

Multimodal ModelsComputer VisionRobotics & Embodied AI

Sep 30, 2025

dVLA: Diffusion Vision-Language-Action Model with Multimodal Chain-of-Thought

The paper introduces dVLA, a diffusion-based Vision-Language-Action model that unifies visual perception, language reasoning, and robotic control under a single diffusion objective. dVLA incorporates a multimodal chain-of-thought to improve cross-modal reasoning and generalization. The model achieves state-of-the-art performance on the LIBERO benchmark (96.4% success rate) and demonstrates robust real-world performance on a Franka robot, including a challenging bin-picking task.

Introduces dVLA, a diffusion-based VLA model that unifies perception, language, and action with a multimodal chain-of-thought, achieving strong performance and generalization in robotic tasks.

Junjie Wen, Minjie Zhu, Jiaming Liu +62509.25681

Reasoning & Chain-of-ThoughtMultimodal ModelsRobotics & Embodied AI

Sep 16, 2025

Bridging Perception and Planning: Towards End-to-End Planning for Signal Temporal Logic Tasks

The paper introduces Structured-MoE STL Planner (S-MSP), a differentiable framework for end-to-end task and motion planning from multi-view camera observations and STL specifications. S-MSP integrates STL constraints directly into the training loop using a composite loss function that combines trajectory reconstruction and STL robustness. The core innovation is a structure-aware Mixture-of-Experts (MoE) model that enables horizon-aware specialization by projecting sub-tasks into temporally anchored embeddings, leading to improved STL satisfaction and trajectory feasibility in factory-logistics scenarios.

Introduces a differentiable end-to-end framework, S-MSP, that directly maps multi-view camera observations and STL specifications to feasible trajectories using a structure-aware Mixture-of-Experts model.

Bowen Ye, Junyue Huang, Yang Liu +22509.12813

World Models & PlanningRobotics & Embodied AIMultimodal ModelsComputer Vision

Sep 2, 2025

Oyster-I: Beyond Refusal - Constructive Safety Alignment for Responsible Language Models

The paper introduces Constructive Safety Alignment (CSA), a paradigm shift from refusal-based safety mechanisms in LLMs to a more human-centric approach that actively guides vulnerable users towards safe and helpful outcomes. CSA incorporates game-theoretic anticipation of user reactions, fine-grained risk boundary discovery, and interpretable reasoning control. The implementation, Oyster-I (Oy1), demonstrates state-of-the-art safety among open models while maintaining high general capabilities, exhibiting strong constructive engagement and robustness against jailbreaks.

This paper pioneers Constructive Safety Alignment (CSA), a novel paradigm that transforms LLM safety from reactive refusal to proactive guidance, specifically addressing the needs of vulnerable users.

Ranjie Duan, Jiexi Liu, Xiaojun Jia +27122509.01909

Constitutional AI & AI EthicsRed-Teaming & Adversarial RobustnessRLHF & Preference Learning

Aug 22, 2025

Research on Automated Recommendation Technology for Sea-River-Inland Waterway Intermodal Transport Solutions

This paper introduces an automated recommendation system for Sea-River-Inland Waterway Intermodal Transport (SRIIT) that optimizes dry bulk cargo transport by considering port capacity, path throughput, and time windows. A dual-mode scheduling algorithm (fair/priority-based) resolves resource competition using throughput-aware simulation and daily resource recovery. The system, validated on Yancheng's network, demonstrates reduced transport time, prioritized delivery during congestion, and customizable cost-time-carbon balancing.

Introduces a weight-driven optimization framework for intermodal transport that balances cost, time, and carbon emissions based on shipper preferences.

Zhiyuan Liu, Feiyang Ma, Yong Zhang +2

Tool Use & AgentsWorld Models & PlanningRecommendation & Information Retrieval

May 26, 2025

Understanding the Performance Gap in Preference Learning: A Dichotomy of RLHF and DPO

This paper provides a theoretical analysis of the performance differences between RLHF and DPO, decomposing the gap into explicit (optimization) and implicit (finite sample) representation gaps. The analysis characterizes how the relative capacities of reward and policy model classes impact policy quality under model misspecification, revealing scenarios where RLHF, DPO, or online DPO can outperform each other. Furthermore, the paper demonstrates a statistical advantage for RLHF in settings with implicitly sparse ground-truth rewards, requiring fewer samples to learn an effective reward model.

Decomposes the performance gap between RLHF and DPO into explicit and implicit representation gaps, providing a nuanced understanding of their relative strengths and weaknesses under varying model misspecifications and sample complexities.

Ruizhe Shi, Minhak Song, Runlong Zhou +362505.19770

RLHF & Preference Learning

Apr 14, 2025

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

InternVL3 is a new multimodal model trained from scratch using a native multimodal pre-training paradigm, jointly learning from multimodal and text data, thus avoiding the alignment issues of adapting text-only LLMs. The model incorporates variable visual position encoding (V2PE) for longer contexts and uses post-training techniques like SFT and MPO, along with test-time scaling. InternVL3-78B achieves state-of-the-art performance among open-source MLLMs, scoring 72.2 on MMMU and rivaling proprietary models while maintaining strong language proficiency.

Introduces a native multimodal pre-training paradigm that jointly learns multimodal and linguistic capabilities from scratch, eliminating the need to adapt text-only LLMs for multimodal tasks.

Jinguo Zhu, Weiyun Wang, Zhe Chen +459012504.10479

Multimodal ModelsOpen-Source Models & WeightsTraining Efficiency & Optimization

Mar 28, 2025

Self-Healing UI Test Automation via Multi-Modal Fusion

Zemin Su, Cuiqin Bai, Shaowen Wei +5

Mar 26, 2025

Mar 26, 2025·affiliated lab: Tsinghua AI

Robust Deep Reinforcement Learning in Robotics via Adaptive Gradient-Masked Adversarial Attacks

The paper introduces Adaptive Gradient-Masked Reinforcement (AGMR) Attack, a novel white-box adversarial attack method designed to effectively target deep reinforcement learning (DRL) agents in robotic control by addressing the limitations of existing supervised learning-based attacks. AGMR uses a gradient-based soft masking mechanism to dynamically identify and selectively perturb critical state dimensions, optimizing the adversarial policy for maximum impact on long-term rewards. Experiments demonstrate that AGMR significantly outperforms existing adversarial attack methods in reducing victim agent performance and improving robustness through adversarial training.

Introduces a novel white-box adversarial attack, AGMR, that selectively perturbs critical state dimensions in DRL agents using a gradient-based soft masking mechanism to maximize impact on long-term rewards.

Zongyuan Zhang, Tian-dong Duan, Zheng Lin +8102503.20844

Red-Teaming & Adversarial RobustnessRobotics & Embodied AI

Jan 23, 2025

CRPO: Confidence-Reward Driven Preference Optimization for Machine Translation

The paper introduces Confidence-Reward driven Preference Optimization (CRPO), a novel preference optimization method for machine translation that addresses the challenge of low-quality preference data in Direct Preference Optimization (DPO). CRPO enhances data selection by incorporating model confidence alongside reward scores, focusing on challenging sentence pairs where the model exhibits uncertainty or poor performance. Experiments demonstrate that CRPO outperforms existing preference optimization techniques, including RS-DPO, RSO, and MBR score, in both translation accuracy and data efficiency across LLMs and encoder-decoder models like NLLB.

Introduces a confidence-reward driven preference optimization (CRPO) method that improves the quality of preference data used in direct preference optimization by selecting challenging sentence pairs based on model uncertainty and reward scores.

Guofeng Cui, Pichao Wang, Yang Liu +32501.13927

RLHF & Preference LearningNatural Language Processing

Lattice is designed for desktop

Tsinghua AI

Top Researchers

Recent Papers

Search