Tsinghua AI

×Natural Language Processing

54 papers from Tsinghua AI on Natural Language Processing

May 6, 2026

Tsinghua AI2w ago·also SEU, Siemens AI

Breaking the Quality-Privacy Tradeoff in Tabular Data Generation via In-Context Learning

Tabular data synthesis no longer needs to sacrifice privacy for quality: pretraining on diverse datasets lets models generalize from limited context, breaking the traditional tradeoff.

Xinyan Han, Yan Lu, Xiaoyu Lin +5

Data Curation & Synthetic Data Natural Language Processing

May 3, 2026

2w ago·also Tsinghua AI, AgiBot

Spoken Language Identification with Pre-trained Models and Margin Loss

Margin loss fine-tuning of ECAPA-TDNNs slashes the error rate in spoken language identification by over 50%, highlighting the power of discriminative representation learning.

Zhihua Fang, Liang He, Weiwu Jiang

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Speech & Audio

Apr 30, 2026

Tsinghua AI3w ago

DPN-LE: Dual Personality Neuron Localization and Editing for Large Language Models

LLMs can have their personalities surgically altered by tweaking just 0.5% of their neurons, preserving general capabilities while achieving competitive control.

Lifan Zheng, Xue Yang, Jiawei Chen +5

Interpretability & Mechanistic Interp Natural Language Processing

Tsinghua AI3w ago·also SEU

FedHarmony: Harmonizing Heterogeneous Label Correlations in Federated Multi-Label Learning

Federated learning can overcome data silos, but struggles when clients have different label relationships; FedHarmony shows how to harmonize these differences, leading to better performance.

Zhi Kou, Zhiqiang Kou, Junxiang Wu +11

Data Curation & Synthetic Data Distributed Systems & Hardware Natural Language Processing

Tsinghua AI3w ago

From Context to Skills: Can Language Models Learn from Context Skillfully?

Forget manual skill annotation: Ctx2Skill lets language models teach themselves to master complex contexts, unlocking better reasoning without human intervention.

Shuzheng Si, Haozhe Zhao, Yu Lei +11

Natural Language Processing Reasoning & Chain-of-Thought Tool Use & Agents

Apr 29, 2026

Tsinghua AI3w ago·also CAS, Fudan, HFUT, Pengcheng Laboratory +2

Walk With Me: Long-Horizon Social Navigation for Human-Centric Outdoor Assistance

Robots can now navigate complex outdoor environments using only high-level human instructions and readily available GPS/map data, bypassing the need for expensive HD maps or limited short-horizon policies.

Lingfeng Zhang, Xiaoshuai Hao, Xizhou Bu +10

Natural Language Processing Robotics & Embodied AI

Tsinghua AI3w ago·also Tencent AI

SPG-Codec: Exploring the Role and Boundaries of Semantic Priors in Ultra-Low-Bitrate Neural Speech Coding

Semantic priors in neural speech codecs hit a wall: their benefits plateau beyond 6 kbps, revealing a fundamental limit to improving intelligibility at higher bitrates.

Mingyu Zhao, Zijian Lin, Kun Wei +2

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Natural Language Processing+1

Tsinghua AI3w ago

CL-bench Life: Can Language Models Learn from Real-Life Context?

Today's best language models can barely make sense of your messy group chats and fragmented digital life, achieving only 19% accuracy on a new benchmark of real-world reasoning.

Shihan Dou, Yujiong Shen, Chenhao Huang +33

Eval Frameworks & Benchmarks Natural Language Processing

Tsinghua AI3w ago

Decoupling Knowledge and Task Subspaces for Composable Parametric Retrieval Augmented Generation

Untangling task-solving skills from factual knowledge in PRAG adapters makes them play better together, boosting performance when you combine multiple documents.

Weihang Su, Hanwen Zhang, Qingyao Ai +1

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Recommendation & Information Retrieval

Apr 28, 2026

Tsinghua AI3w ago·also ZJU

MAIC-UI: Making Interactive Courseware with Generative UI

Educators can now create interactive STEM courseware without coding, and see a ~10-point improvement in student STEM outcomes.

Shangqing Tu, Yanjia Li, Keyu Chen +9

Code Generation & Program Synthesis Natural Language Processing Tool Use & Agents

Apr 27, 2026

3w ago·also Tsinghua AI, The Key Laboratory of Road and Traffic Engineering, UCF

Towards Lawful Autonomous Driving: Deriving Scenario-Aware Driving Requirements from Traffic Laws and Regulations

LLMs can now generate driving rules from traffic laws with significantly improved accuracy by grounding their reasoning in structured traffic scenarios.

Bowen Jian, Rongjie Yu, Hong Wang +2

Constitutional AI & AI Ethics Natural Language Processing Robotics & Embodied AI

Apr 23, 2026

Tsinghua AIApr 23, 2026

Provably Secure Steganography Based on List Decoding

Unlock higher-capacity covert communication with LLMs: a new steganography scheme uses list decoding to substantially outperform existing methods without sacrificing security or efficiency.

Kaiyi Pang, Minhao Bai

Natural Language Processing

Apr 22, 2026

Tsinghua AIApr 22, 2026

From Scene to Object: Text-Guided Dual-Gaze Prediction

LLMs can now predict where drivers look with uncanny human-like accuracy, thanks to a new dataset and architecture that grounds attention in objects, not just scenes.

Zehong Ke, Yanbo Jiang, Jinhao Li +4

Computer Vision Multimodal Models Natural Language Processing

Apr 21, 2026

Tsinghua AIApr 21, 2026

HoWToBench: Holistic Evaluation for LLM's Capability in Human-level Writing using Tree of Writing

LLM-as-a-judge can be made far more reliable by explicitly modeling the aggregation weights of sub-features in a tree structure, achieving near-human agreement on complex writing tasks.

Andrew Feng, Cunxiang Wang, Yu-Wei Luo +5

Eval Frameworks & Benchmarks Natural Language Processing Reasoning & Chain-of-Thought

Tsinghua AIApr 21, 2026·also UCL, UT Austin

Large language models perceive cities through a culturally uneven baseline

LLMs don't see cities neutrally; their perception is skewed towards a culturally uneven baseline, favoring Western perspectives.

Rong Zhao, Wanqi Liu, Zhizhou Sha +2

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

Tsinghua AIApr 21, 2026·also CUHK

Towards Streaming Target Speaker Extraction via Chunk-wise Interleaved Splicing of Autoregressive Language Model

Autoregressive generative models, previously unsuitable for real-time target speaker extraction, can now achieve offline-level performance in streaming scenarios thanks to a novel chunk-wise splicing technique.

Shuhai Peng, Hui Lu, Jinjiang Liu +8

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Speech & Audio

Apr 20, 2026

Tsinghua AIApr 20, 2026·also Fudan

SELF-EMO: Emotional Self-Evolution from Recognition to Consistent Expression

LLMs can significantly boost their emotional intelligence simply by role-playing conversations with themselves, iteratively refining their ability to both recognize and express emotions.

Shaowei Zhang, Faqiang Qian, Yan Chen +5

Eval Frameworks & Benchmarks Natural Language Processing

Tsinghua AIApr 20, 2026·also Kyoto

Understanding the Prompt Sensitivity

LLMs disperse similar prompts instead of clustering them, leading to significant prompt sensitivity that challenges stability and reliability.

Yang Liu, Chenhui Chu

Interpretability & Mechanistic Interp Natural Language Processing

Apr 17, 2026

Tsinghua AIApr 17, 2026·also BIGAI, University of Science and Technology

Revisiting a Pain in the Neck: A Semantic Reasoning Benchmark for Language Models

LLMs still struggle to understand the meaning of common phrases, idioms, and compound words, revealing critical gaps in semantic reasoning.

Yang Liu, Hongming Li, Melissa Xiaohui Qin +2

Eval Frameworks & Benchmarks Natural Language Processing

Apr 16, 2026

School of Information ScienceApr 16, 2026·also Tsinghua AI, Shanghai University of Finance and Economics

SPAGBias: Uncovering and Tracing Structured Spatial Gender Bias in Large Language Models

LLMs don't just reflect gender bias in public vs. private spaces; they encode nuanced, micro-level mappings that substantially exceed real-world distributions, shaping spatial gender narratives in unexpected ways.

Binxian Su, Binxian Su, Haoye Lou +8

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

Apr 15, 2026

Apr 15, 2026·also Tsinghua AI, China Telecom Corporation Limited, Southwestern Univer- sity of Finance and Economics, SYSU

RPS: Information Elicitation with Reinforcement Prompt Selection

RL can teach LLMs to be better interviewers, adaptively prompting users to reveal hidden information in dialogue.

Jingyao Lu, Xibo Wang, Haonan Huang +4

Natural Language Processing RLHF & Preference Learning Tool Use & Agents

Apr 14, 2026

Humanoid Robot (Shanghai) Co.Apr 14, 2026·also Tsinghua AI, KCL, PKU, UAlberta +1

LLMs Are Not a Silver Bullet: A Case Study on Software Fairness

LLMs underperform traditional ML methods in software fairness tasks, challenging the assumption that they offer a silver bullet solution for bias mitigation.

Xinyue Li, Sixuan Li, Ying Xiao +4

Code Generation & Program Synthesis Constitutional AI & AI Ethics Natural Language Processing

Tsinghua AIApr 14, 2026

Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

OPD's "free lunch" of dense token-level reward may be an illusion, as teacher novelty, not just higher scores, drives successful distillation.

Yuxin Zuo, Yuxin Zuo, Bingxiang He +8

Inference & Quantization Natural Language Processing Training Efficiency & Optimization

Tsinghua AIApr 14, 2026

TimeMark: A Trustworthy Time Watermarking Framework for Exact Generation-Time Recovery from AIGC

Achieve 100% accurate and forgery-proof time watermarks in LLM-generated text, finally making AI watermarking reliable enough for legal disputes.

Shangkun Che, Silin Du, Ge Gao

Constitutional AI & AI Ethics Natural Language Processing

Apr 13, 2026

Tsinghua AIApr 13, 2026·also HIT, Nankai University

C-ReD: A Comprehensive Chinese Benchmark for AI-Generated Text Detection Derived from Real-World Prompts

Current Chinese AI-generated text detection benchmarks are too homogeneous; C-ReD fixes this with real-world prompts and diverse LLMs, enabling better generalization.

Chenxi Qing, Junxi Wu, Yixiang Qiu +3

Data Curation & Synthetic Data Eval Frameworks & Benchmarks Natural Language Processing

Apr 13, 2026·also Tsinghua AI, NJU

HistLens: Mapping Idea Change across Concepts and Corpora

See how ideas like "democracy" or "freedom" have subtly shifted their meaning across different news sources and time periods, all within a single, comparable framework.

Yi Jing, Weiyun Qiu, Yihang Peng +1

Data Curation & Synthetic Data Natural Language Processing

Apr 13, 2026·also Tsinghua AI, ZJU

Bridging the RGB-IR Gap: Consensus and Discrepancy Modeling for Text-Guided Multispectral Detection

By explicitly modeling both consensus and discrepancy between RGB and IR data, this text-guided multispectral object detector significantly boosts performance on multispectral benchmarks.

Zhen Wang, Enhao Huang, Kangqing Shen +1

Computer Vision Multimodal Models Natural Language Processing

Tsinghua AIApr 13, 2026

The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping

LLMs can learn to avoid repeating mistakes by remembering and penalizing frequently recurring error patterns in past rollouts.

Enxi Wang, Yufei Gao, Weixin Zhang +3

Natural Language Processing RLHF & Preference Learning Training Efficiency & Optimization

Tsinghua AIApr 13, 2026

MimicLM: Zero-Shot Voice Imitation through Autoregressive Modeling of Pseudo-Parallel Speech Corpora

Forget complex disentanglement architectures or low-quality synthetic targets: MimicLM achieves superior voice imitation by cleverly using synthetic speech as the *source* and real speech as the *target* in a pseudo-parallel training setup.

Tao Feng, Yuancheng Wang, Xueyao Zhang +4

Data Curation & Synthetic Data Natural Language Processing Speech & Audio

Apr 11, 2026

Tsinghua AIApr 11, 2026·also HKU, Huawei, LongCat Team, Ohio State +3

Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation

Attention Sink, where Transformers fixate on seemingly irrelevant tokens, is more than just a quirk – it's a fundamental challenge impacting training, inference, and even causing hallucinations, demanding a systematic approach to understanding and mitigating its effects.

Zunhai Su, Hengyuan Zhang, Yifan Zhang +12

Architecture Design (Transformers, SSMs, MoE)Interpretability & Mechanistic Interp Natural Language Processing

Apr 9, 2026

Tsinghua AIApr 9, 2026·also Department of Computer Science and Technology, Penn State

Twitch Third-Party Developers'Support Seeking and Provision Practices on Discord

Twitch developers' reliance on Discord for support creates a form of "platform labor" as they bridge the gap between formal platform support and informal community assistance.

Jie Cai, Yueyan Liu, John M. Carroll +2

Code Generation & Program Synthesis Natural Language Processing Recommendation & Information Retrieval+1

CMU MLApr 9, 2026·also Tsinghua AI, Waterloo

ClawBench: Can AI Agents Complete Everyday Online Tasks?

Today's best AI agents can only complete 33% of common online tasks like booking appointments or filling out job applications, revealing a significant gap between current capabilities and real-world utility.

Yuxuan Zhang, Yubo Wang, Yipeng Zhu +19

Eval Frameworks & Benchmarks Natural Language Processing Tool Use & Agents

Apr 7, 2026

Apr 7, 2026·also Tsinghua AI, Aarhus University

QA-MoE: Towards a Continuous Reliability Spectrum with Quality-Aware Mixture of Experts for Robust Multimodal Sentiment Analysis

Existing multimodal sentiment analysis models crumble under real-world noise, but QA-MoE leverages uncertainty to dynamically route inputs, achieving robust performance across a continuous spectrum of data quality.

Yitong Zhu, Yuxuan Jiang, Guanxuan Jiang +3

Architecture Design (Transformers, SSMs, MoE)Multimodal Models Natural Language Processing

Tsinghua AIApr 7, 2026·also HKUST, SJTU, UESTC

ActivityEditor: Learning to Synthesize Physically Valid Human Mobility

Synthesizing realistic human mobility in data-scarce regions is now possible thanks to a dual-LLM-agent framework that learns physical constraints via reinforcement learning.

Chenjie Yang, Yutian Jiang, Anqi Liang +5

Data Curation & Synthetic Data Natural Language Processing Tool Use & Agents

Apr 6, 2026

Tsinghua AIApr 6, 2026·also Beijing Sport University

BoxComm: Benchmarking Category-Aware Commentary Generation and Narration Rhythm in Boxing

Current multimodal models can't handle the rapid-fire tactical analysis required for boxing commentary, as revealed by a new dataset and evaluation framework.

Kaiwen Wang, Rongrong Deng, Yiming Shi +2

Eval Frameworks & Benchmarks Multimodal Models Natural Language Processing

Apr 2, 2026

Tsinghua AIApr 2, 2026

Towards Position-Robust Talent Recommendation via Large Language Models

LLMs can now recommend talent without falling prey to position bias, thanks to a new architecture that understands candidate relationships.

Siling Du, Hongyan Liu

Natural Language Processing Recommendation & Information Retrieval

Mar 31, 2026

Tsinghua AIMar 31, 2026·also ByteDance, Rice

From Natural Alignment to Conditional Controllability in Multimodal Dialogue

Current multimodal dialogue systems can't capture the subtle expressiveness of human interaction, as revealed by a new benchmark dataset of movie and TV dialogues.

Zeyu Jin, Songtao Zhou, Ming Tian +4

Multimodal Models Natural Language Processing Speech & Audio

Mar 26, 2026

Tsinghua AIMar 26, 2026

Natural-Language Agent Harnesses

Stop burying your agent harness logic in code: NLAHs let you express it in natural language, making it portable, editable, and analyzable.

Lin Pan, Lexiao Zou, Shuo Guo +2

Eval Frameworks & Benchmarks Natural Language Processing Tool Use & Agents

Mar 19, 2026

Mar 19, 2026·also Tsinghua AI, HKU

SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing

Instruction-guided video editing can achieve impressive zero-shot performance simply by pre-training on motion-centric video restoration tasks *before* fine-tuning on paired editing data.

Xinyao Zhang, Wenkai Dong, Yuxin Song +11

Computer Vision Multimodal Models Natural Language Processing

Mar 17, 2026

Tsinghua AIMar 17, 2026

Parametric Social Identity Injection and Diversification in Public Opinion Simulation

LLM-based simulations of public opinion suffer from "Diversity Collapse," but injecting explicit social identity representations into hidden states can fix it.

Hexi Wang, Yujia Zhou, Bangde Du +1

Constitutional AI & AI Ethics Natural Language Processing World Models & Planning

Tsinghua AIMar 17, 2026·also Tencent AI

AdaMem: Adaptive User-Centric Memory for Long-Horizon Dialogue Agents

LLM agents can now leverage a unified memory framework that dynamically adapts to different question types, enabling more coherent and user-centric long-horizon dialogues.

Shannan Yan, Jingchen Ni, Leqi Zheng +7

Natural Language Processing Recommendation & Information Retrieval Tool Use & Agents

Mar 12, 2026

Mar 12, 2026·also Tsinghua AI

QChunker: Learning Question-Aware Text Chunking for Domain RAG via Multi-Agent Debate

Forget brittle retrieval: QChunker uses a question-aware multi-agent debate to restructure RAG from retrieval-augmentation to *understanding*-retrieval-augmentation, boosting performance across diverse domains.

Jihao Zhao, Daixuan Li, Shuaishuai Zu +2

Natural Language Processing Recommendation & Information Retrieval Tool Use & Agents

Mar 5, 2026

Tsinghua AIMar 5, 2026·also China Southern

Retrieval-Augmented Generation with Covariate Time Series

RAG4CTS achieves state-of-the-art time-series forecasting by ditching static embeddings for a hierarchical, physics-informed retrieval approach that leverages raw historical regimes.

Kenny Ye Liang, Zhongyi Pei, Huan Zhang +2

Natural Language Processing Recommendation & Information Retrieval

Mar 4, 2026

Tsinghua AIMar 4, 2026·also ECNU, Hebei University of Science and Technology, Yale

Evolutionary Multimodal Reasoning via Hierarchical Semantic Representation for Intent Recognition

Achieve state-of-the-art multimodal intent recognition by structuring semantics into progressively abstracted levels and dynamically refining representations through MLLM feedback.

Qianrui Zhou, Hua Xu, Yunjin Gu +4

Multimodal Models Natural Language Processing Reasoning & Chain-of-Thought

Feb 26, 2026

Microsoft ResearchFeb 26, 2026·also Tsinghua AI, Beihang, CAS, Shanghai AI Lab +1

MoDora: Tree-Based Semi-Structured Document Analysis System

LLMs can now more accurately answer questions on complex documents thanks to a new system that understands layout and hierarchical relationships between document components.

Bangrui Xu, Qihang Yao, Qihang Yao +10

Computer Vision Natural Language Processing Recommendation & Information Retrieval

Feb 25, 2026

Independent ResearcherFeb 25, 2026·also Tsinghua AI

When AI Writes, Whose Voice Remains? Quantifying Cultural Marker Erasure Across World English Varieties in Large Language Models

LLMs scrub away up to 20% of culturally specific language, even while preserving the core meaning, revealing a "Semantic Preservation Paradox" that threatens linguistic diversity.

Satyam Kumar Navneet, Satyam Kumar Navneet, Joydeep Chandra +1

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

Feb 23, 2026

Feb 23, 2026·also Tsinghua AI, BUPT, Shanghai University, Xiangtan

Hyper-KGGen: A Skill-Driven Knowledge Extractor for High-Quality Knowledge Hypergraph Generation

Domain-specific knowledge hypergraphs can now be extracted with significantly improved quality by dynamically learning and applying extraction skills, outperforming static few-shot learning.

Rizhuo Huang, Yifan Feng, Yifan Feng +9

Data Curation & Synthetic Data Natural Language Processing Reasoning & Chain-of-Thought+1

Feb 23, 2026·also DAMO, Tsinghua AI, Hunan, National Technology Innovation Center

FuzzySQL: Uncovering Hidden Vulnerabilities in DBMS Special Features with LLM-Driven Fuzzing

LLMs can uncover previously hidden vulnerabilities in database management systems by intelligently fuzzing obscure, system-level features that traditional fuzzers miss.

Yongxin Chen, Zhiyuan Jiang, Zhiyuan Jiang +10

Code Generation & Program Synthesis Natural Language Processing Red-Teaming & Adversarial Robustness

Feb 15, 2026

Tsinghua AIFeb 15, 2026

HiVid: LLM-Guided Video Saliency For Content-Aware VOD And Live Streaming

LLMs can now guide video streaming optimization, outperforming traditional saliency models and human annotation in predicting content importance for both VOD and live streams.

Lianchen Jia, Tianchi Huang, Lifeng Sun

Computer Vision Multimodal Models Natural Language Processing

Feb 12, 2026

Tsinghua AIFeb 12, 2026

PatientHub: A Unified Framework for Patient Simulation

PatientHub finally offers a standardized, reproducible framework for patient simulation, streamlining development and benchmarking across diverse methods and models.

Sahand Sabour, NG TszYam

Eval Frameworks & Benchmarks Natural Language Processing Open-Source Models & Weights

Search

Tsinghua AI