April 20 – April 27, 2026

Natural Language Processing - Weekly Roundup

100 papers published across 6 labs.

360% acceleration

Selected Labs publishing this week

CMU ML2 BAIR1 NVIDIA1 Mila1 Tsinghua AI1

Top Papers

Apr 27, 2026

Pampanga State UniversityApr 27, 2026·also College of Computing Studies, Don Honorio Ventura State University, National University, University of the East

Towards the Development of Detection of Learned Helplessness in Mathematics: Design and Data Collection Challenges from a Developing Country Perspective

Building AI tutors in the real world is hard: outdated tech, spotty internet, and curriculum gaps can derail even the best-designed systems.

John Paul P. Miranda, J. P. P. Miranda, Rex P. Bringula +13

Natural Language Processing Reasoning & Chain-of-Thought

Iizalaarab Elhaimeur +3Apr 27, 2026

From Prototype to Classroom: An Intelligent Tutoring System for Quantum Education

Quantum education gets a boost: specialized LLM agents in a classroom setting not only improve tutoring reliability but also reveal hidden curriculum gaps.

Iizalaarab Elhaimeur, Iizalaarab Elhaimeur, Nikos Chrisochoides +1

Natural Language Processing Reasoning & Chain-of-Thought Tool Use & Agents

E. Bogucka +2Apr 27, 2026

Why AI Harms Can't Be Fixed One Identity at a Time: What 5300 Incident Reports Reveal About Intersectionality

AI harms disproportionately impact specific intersections of identity, with adolescent girls, lower-class people of color, and upper-class political elites experiencing up to 3x greater harm, revealing critical blind spots in current AI risk assessments.

E. Bogucka, Sanja vS'cepanovi'c, Daniele Quercia

Constitutional AI & AI Ethics Natural Language Processing

Bilkent UniversityApr 27, 2026·also Adelaide University

Evaluation of LLM-Based Software Engineering Tools: Practices, Challenges, and Future Directions

Evaluating LLM-powered software engineering tools is fundamentally broken, as traditional metrics fail to capture the nuanced, non-deterministic nature of their outputs.

U. B. Torun, Veli Karakaya, Ali Babar +1

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Natural Language Processing

Apr 23, 2026

Meghyn Bienvenu +3Apr 23, 2026

Using ASP(Q) to Handle Inconsistent Prioritized Data

Finally, a practical implementation for globally-optimal repair-based semantics allows for querying inconsistent prioritized data with theoretical guarantees.

Meghyn Bienvenu, Camille Bourgaux, Robin Jean +1

Natural Language Processing Reasoning & Chain-of-Thought

All Papers (100)

Apr 27, 2026

Yuanhao Zeng +6Apr 27, 2026·also Shang- haiTech University

Large Language Models Explore by Latent Distilling

Unlock more diverse and effective LLM outputs by explicitly rewarding semantic novelty during decoding with Exploratory Sampling.

Yuanhao Zeng, Ao Lu, Lufei Li +4

Inference & Quantization Natural Language Processing

Apr 27, 2026·also BAIR, Adobe Research, Cisco AI Research, Dolby Laboratories +7

A Survey on LLM-based Conversational User Simulation

LLMs are revolutionizing conversational AI research, and this survey offers a structured guide to navigating the rapidly evolving landscape of LLM-powered user simulation.

Bo Ni, B. Ni, Yu Wang +35

Natural Language Processing Tool Use & Agents World Models & Planning

Abhijay Deevi +5Apr 27, 2026

CAN-QA: A Question-Answering Benchmark for Reasoning over In-Vehicle CAN Traffic

LLMs can parrot CAN bus data, but CAN-QA reveals they fail at the temporal reasoning and multi-condition inference needed for real-world vehicle security forensics.

Abhijay Deevi, Abhijay Deevi, Onat Gungor +3

Eval Frameworks & Benchmarks Natural Language Processing Reasoning & Chain-of-Thought

NVIDIAApr 27, 2026·also Texas Tech University

CiteRadar: A Citation Intelligence Platform for Researcher Profiling and Geographic Visualization

See where your citations are coming from with a single command, thanks to CiteRadar's open-source platform that automatically generates interactive maps and detailed researcher profiles from your Google Scholar ID.

Chenxu Niu, Yiming Sun

Natural Language Processing Open-Source Models & Weights Recommendation & Information Retrieval

Yunsu Kim +2Apr 27, 2026

GAIA-v2-LILT: Multilingual Adaptation of Agent Benchmark beyond Translation

Machine translation alone ruins agent benchmark validity across languages, but careful functional and cultural alignment can close the performance gap by up to 30%.

Yunsu Kim, Kaden Uhlig, Joern Wuebker

Eval Frameworks & Benchmarks Natural Language Processing Tool Use & Agents

Sercan Karakacs +1Apr 27, 2026

Benchmarking Source-Sensitive Reasoning in Turkish: Humans and LLMs under Evidential Trust Manipulation

LLMs fail to reliably track source trustworthiness in Turkish evidential marking, unlike humans, highlighting a critical gap in their ability to perform nuanced reasoning based on source reliability.

Sercan Karakacs, Yusuf cSimcsek

Eval Frameworks & Benchmarks Natural Language Processing Reasoning & Chain-of-Thought

Sagnik Chatterjee +2Apr 27, 2026

Dual-Track CoT: Budget-Aware Stepwise Guidance for Small LMs

Small language models can achieve reasoning performance rivaling larger models, even under tight token budgets, by using a lightweight "guidance track" to strategically prune and refine their chain-of-thought reasoning.

Sagnik Chatterjee, Atharva Patil, S. Ramesh

Inference & Quantization Natural Language Processing Reasoning & Chain-of-Thought

Orhan Demirci +1Apr 27, 2026

ADE: Adaptive Dictionary Embeddings -- Scaling Multi-Anchor Representations to Large Language Models

Multi-anchor word embeddings, previously impractical for LLMs, can now outperform standard embeddings with 98% fewer parameters and a 40x smaller embedding layer.

Orhan Demirci, Sezer Aptourachman

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Training Efficiency & Optimization

Kamya Hari +3Apr 27, 2026

Independent-Component-Based Encoding Models of Brain Activity During Story Comprehension

Denoising fMRI data with independent component analysis reveals interpretable, subject-invariant cognitive networks that correlate with large language model representations of stories.

Kamya Hari, T. Binhuraib, Cory Shain +1

Natural Language Processing

Apr 27, 2026

Can LLMs Act as Historians? Evaluating Historical Research Capabilities of LLMs via the Chinese Imperial Examination

LLMs still can't pass history class: even state-of-the-art models struggle with complex historical reasoning, as revealed by a new benchmark based on the Chinese Imperial Examination.

Lirong Gao, Zeqing Wang, Yuyan Cai +6

Eval Frameworks & Benchmarks Natural Language Processing Reasoning & Chain-of-Thought

Hermawan Manurung +6Apr 27, 2026

Sentiment and Emotion Classification of Indonesian E-Commerce Reviews via Multi-Task BiLSTM and AutoML Benchmarking

A BiLSTM with a custom slang dictionary rivals AutoML in classifying the sentiment and emotion of messy, real-world Indonesian e-commerce reviews.

Hermawan Manurung, Hermawan Manurung, Ibrahim Al-Kahfi +4

Data Curation & Synthetic Data Eval Frameworks & Benchmarks Natural Language Processing

Aaryan Shah +17Apr 27, 2026

Case-Specific Rubrics for Clinical AI Evaluation: Methodology, Validation, and LLM-Clinician Agreement Across 823 Encounters

LLMs can evaluate clinical AI as well as human experts, but at 1/1000th the cost, unlocking scalable and continuous monitoring.

Aaryan Shah, Aaryan Shah, A. Hines +15

Eval Frameworks & Benchmarks Natural Language Processing

C. O’Brien +3Apr 27, 2026

Evaluation of Pose Estimation Systems for Sign Language Translation

Your sign language translation model's performance could be bottlenecked by your choice of pose estimator: switching from MediaPipe to SDPose or Sapiens could boost BLEU score by 1.5 points.

C. O’Brien, Gerard Sant, Mathias Muller +1

Computer Vision Eval Frameworks & Benchmarks Natural Language Processing

Language Techonology InstituteApr 27, 2026·also UChicago, UTokyo

The Chameleon's Limit: Investigating Persona Collapse and Homogenization in Large Language Models

LLMs that nail individual personas can still fail spectacularly at generating diverse populations, instead defaulting to coarse stereotypes.

Yunze Xiao, Vivian Zhang, Chenghao Yang +3

Eval Frameworks & Benchmarks Natural Language Processing Tool Use & Agents

Brandon Hsu +7Apr 27, 2026

Contextual Linear Activation Steering of Language Models

Forget fixed steering strengths - CLAS dynamically adapts steering based on context, unlocking more consistent and powerful control over LLM behavior.

Brandon Hsu, Brandon Hsu, Daniel Beaglehole +5

Interpretability & Mechanistic Interp Natural Language Processing

William OliveiraApr 27, 2026

Less Is More: Engineering Challenges of On-Device Small Language Model Integration in a Mobile Application

On-device SLMs in mobile apps demand a radical shift: the less the LLM does, the more reliable it becomes.

William Oliveira

Inference & Quantization Natural Language Processing Open-Source Models & Weights

MilaApr 27, 2026·also Capital One

Learning to Route Queries to Heads for Attention-based Re-ranking with Large Language Models

LLMs re-rank documents better when you learn to route each query to the specific attention heads that matter, instead of relying on static subsets or everything at once.

Yuxing Tian, Fengran Mo, Zhiqi Huang +2

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Recommendation & Information Retrieval

Apr 27, 2026·also Tsinghua AI, The Key Laboratory of Road and Traffic Engineering, UCF, USTC

Towards Lawful Autonomous Driving: Deriving Scenario-Aware Driving Requirements from Traffic Laws and Regulations

LLMs can now generate driving rules from traffic laws with significantly improved accuracy by grounding their reasoning in structured traffic scenarios.

Bowen Jian, Rongjie Yu, Hong Wang +2

Constitutional AI & AI Ethics Natural Language Processing Robotics & Embodied AI

CMU MLApr 27, 2026·also WestEd

Coasting Through Class: Learning Opportunity Loss from Practice Avoidance During Individual Seatwork

Students spend only 40% of math classwork time on actual math practice, suggesting a massive, untapped opportunity for improved learning outcomes.

Ashish Gurung, Ashish Gurung, J. Gutterman +8

Natural Language Processing

Sumanta Bhattacharyya +8Apr 27, 2026

Generating Place-Based Compromises Between Two Points of View

LLMs can learn to generate better compromises by iteratively incorporating feedback on how empathically similar a compromise is to each viewpoint, opening the door to more socially intelligent AI.

Sumanta Bhattacharyya, Francine Chen, Scott A. Carter +6

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

Alessio Sordo +4Apr 27, 2026·also Berlin Technology Center

STELLAR-E: a Synthetic, Tailored, End-to-end LLM Application Rigorous Evaluator

Forget painstakingly curating datasets – STELLAR-E auto-generates high-quality, domain-specific LLM benchmarks, rivaling real-world data in evaluation quality.

Alessio Sordo, Lingxiao Du, Meeka-Hanna Lenisa +2

Data Curation & Synthetic Data Eval Frameworks & Benchmarks Natural Language Processing

Pampanga State UniversityApr 27, 2026·also College of Computing Studies, Don Honorio Ventura State University, National University, University of the East

Towards the Development of Detection of Learned Helplessness in Mathematics: Design and Data Collection Challenges from a Developing Country Perspective

Building AI tutors in the real world is hard: outdated tech, spotty internet, and curriculum gaps can derail even the best-designed systems.

John Paul P. Miranda, J. P. P. Miranda, Rex P. Bringula +13

Natural Language Processing Reasoning & Chain-of-Thought

E. Bogucka +2Apr 27, 2026

Why AI Harms Can't Be Fixed One Identity at a Time: What 5300 Incident Reports Reveal About Intersectionality

E. Bogucka, Sanja vS'cepanovi'c, Daniele Quercia

Constitutional AI & AI Ethics Natural Language Processing

Jan GogollApr 27, 2026

The Ethical Knowledge Gap: Dispersed Knowledge, Sensemaking Failures, and Epistemic Dependence

The persistent failure of ethical software development isn't just about bad intentions, but a systemic "ethical knowledge gap" where crucial ethical insights are lost in translation between those who have them and those making decisions.

Jan Gogoll

Code Generation & Program Synthesis Constitutional AI & AI Ethics Natural Language Processing

Maksym Nechepurenko +1Apr 27, 2026

Price as Focal Point: Prediction Markets,Conditional Reflexivity, and the Politics of Common Knowledge

Prediction markets don't just predict the future, they shape it, and the most visible market isn't always the most accurate.

Maksym Nechepurenko, M. Nechepurenko

Natural Language Processing

Department of Computer ScienceApr 27, 2026

Workplace Demands and Emotional Expression Among Early Childhood Educators: A Computational Analysis of Professional Online Discourse

Early childhood educators' online discourse reveals a stark imbalance: discussions of workplace demands outweigh resources by nearly 2:1, painting a picture of a profession grappling with systemic strain.

Hailong Jiang

Natural Language Processing

Iizalaarab Elhaimeur +3Apr 27, 2026

From Prototype to Classroom: An Intelligent Tutoring System for Quantum Education

Quantum education gets a boost: specialized LLM agents in a classroom setting not only improve tutoring reliability but also reveal hidden curriculum gaps.

Iizalaarab Elhaimeur, Iizalaarab Elhaimeur, Nikos Chrisochoides +1

Natural Language Processing Reasoning & Chain-of-Thought Tool Use & Agents

Apr 27, 2026

A Survey on Split Learning for LLM Fine-Tuning: Models, Systems, and Privacy Optimizations

Split learning offers a surprisingly viable path to fine-tuning LLMs on sensitive data without breaking the bank or sacrificing privacy.

Zihan Liu, Yizhen Wang, Xiu Tang +1

Distributed Systems & Hardware Natural Language Processing Training Efficiency & Optimization

Abraham Itzhak WeinbergApr 27, 2026

ARCANE: Cross-Campaign Attacker Re-identification via Passive Beacon Telemetry -- A Bayesian Network Framework for Longitudinal Cyber Attribution

Even with cross-campaign aggregation of telemetry data, distinguishing sophisticated cyber adversaries remains fundamentally limited by shared operational practices, revealing a structural ceiling on attribution accuracy.

Abraham Itzhak Weinberg

Natural Language Processing Red-Teaming & Adversarial Robustness

Sicong Cao +12Apr 27, 2026

MAS-SZZ: Multi-Agentic SZZ Algorithm for Vulnerability-Inducing Commit Identification

LLMs, when orchestrated as collaborative agents, can dramatically improve vulnerability-inducing commit identification, outperforming existing SZZ algorithms by a large margin.

Sicong Cao, Sicong Cao, Jinxuan Xu +10

Code Generation & Program Synthesis Natural Language Processing

School of Cyber Science and TechnologyApr 27, 2026

Defusing the Trigger: Plug-and-Play Defense for Backdoored LLMs via Tail-Risk Intrinsic Geometric Smoothing

Backdoor attacks in LLMs can be defused at inference time, without retraining or external data, by geometrically smoothing attention patterns to disrupt adversarial routing.

Kaisheng Fan, Weizhe Zhang, Yishu Gao +2

Inference & Quantization Natural Language Processing Red-Teaming & Adversarial Robustness

Hikmat Karimov +1Apr 27, 2026

An Information-Geometric Framework for Stability Analysis of Large Language Models under Entropic Stress

LLM stability under uncertainty isn't just about accuracy – a new information-geometric framework reveals how internal model structure non-linearly attenuates the impact of disorder.

Hikmat Karimov, Rahid Z. Alekberli

Eval Frameworks & Benchmarks Natural Language Processing Red-Teaming & Adversarial Robustness

Fiza Naseer +4Apr 27, 2026

A systematic literature Review for Transformer-based Software Vulnerability detection

Transformer-based vulnerability detection is booming, but this review reveals critical gaps in data balance, interpretability, and cross-language generalization that could be holding back truly robust systems.

Fiza Naseer, Javed Ali Khan, Muhammad Yaqoob +2

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Natural Language Processing

Apr 27, 2026

Defective Task Descriptions in LLM-Based Code Generation: Detection and Analysis

Turns out, a tiny fine-tuned model can spot flaws in coding instructions that trip up even the biggest LLMs, suggesting we're over-relying on brute force for code generation.

Amal Akli, Mike Papadakis, Maxime Cordy +1

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Natural Language Processing

Apr 27, 2026·also BMW Group

Leveraging LLMs for Multi-File DSL Code Generation: An Industrial Case Study

LLMs can achieve near-perfect structural fidelity when generating multi-file DSL code at repository scale, but only with fine-tuning.

Sivajeet Chand, Kevin Nguyen, Peter Kuntz +1

Code Generation & Program Synthesis Natural Language Processing Tool Use & Agents

Apr 27, 2026·also Pontifícia Universidade do Rio Grande do, Reykjavik University, UCI

Exploring Creativity in Human-Human-LLM Collaborative Software Design

LLMs can both spark and stifle creativity in collaborative software design, so designers must wield them intentionally.

Victoria Jackson, Grischa Liebel, R. Prikladnicki +1

Code Generation & Program Synthesis Natural Language Processing Tool Use & Agents

Bilkent UniversityApr 27, 2026·also Adelaide University

Evaluation of LLM-Based Software Engineering Tools: Practices, Challenges, and Future Directions

Evaluating LLM-powered software engineering tools is fundamentally broken, as traditional metrics fail to capture the nuanced, non-deterministic nature of their outputs.

U. B. Torun, Veli Karakaya, Ali Babar +1

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Natural Language Processing

Michael Mircea +3Apr 27, 2026·also Leibniz University Hannover Software

How Do Software Engineering Students Use Generative AI in Real-World Capstone Projects? An Empirical Baseline Study

Students are already using GenAI extensively in real-world software projects, but without guardrails, learning, collaboration, and software quality may suffer.

Michael Mircea, Elisa Schmid, Jakob Droste +1

Code Generation & Program Synthesis Natural Language Processing Tool Use & Agents

Department of Computer and SoftwareApr 27, 2026·also School of Computer Science

Putting a Face to the Issue: Fostering User Empathy of Open Source Software Developers With PersonaFlow

OSS developers who saw automatically generated user personas responded to issues with more empathy and tailored explanations, suggesting a simple UI intervention can bridge the user-developer gap.

Boniface Bahati Tadjuidje, Jin L. C. Guo, Jinghui Cheng

Code Generation & Program Synthesis Natural Language Processing Open-Source Models & Weights

Apr 27, 2026·also Notre Dame, Wakayama University

How Do Developers Use Migration Guides? A Case Study of Log4j

Developers aren't surgically extracting information from migration guides; they're largely linking to the whole document, suggesting opportunities for improved guide structure and searchability.

Takahiro Monno, Kazumasa Shimari, Tetsuya Kanda +2

Code Generation & Program Synthesis Natural Language Processing

Liyou Chen +5Apr 27, 2026·also Beihang

Vulnerability Identification by Harnessing Inter-connected Multi-Source Information

Open-source library vulnerabilities are easier to spot when you connect the dots between bug reports, code changes, and commit messages.

Liyou Chen, Hailong Sun, Xiang Gao +3

Code Generation & Program Synthesis Natural Language Processing Open-Source Models & Weights

Apr 27, 2026·also UofT

ESICA: A Scalable Framework for Text-Guided 3D Medical Image Segmentation

Text-guided 3D medical image segmentation just got a whole lot more practical: ESICA achieves state-of-the-art accuracy with a "Lite" variant that slashes parameter count without sacrificing performance.

Yuelin Xin, Gorkem Can Ates, Jun Ma +4

Computer Vision Multimodal Models Natural Language Processing

Nikesh Subedi +2Apr 27, 2026

Interactive Episodic Memory with User Feedback

Interactive feedback slashes error rates in episodic memory retrieval, outperforming even large vision-language models while remaining efficient.

Nikesh Subedi, Loris Bazzani, Ziad Al-Halah

Computer Vision Multimodal Models Natural Language Processing

Guangdong University of TechnologyApr 27, 2026·also PKU, SYSU

Majorization-Guided Test-Time Adaptation for Vision-Language Models under Modality-Specific Shift

Test-time adaptation of vision-language models can actually *hurt* performance when modalities shift asymmetrically; MG-MTTA fixes this by explicitly modeling modality reliability.

Lixian Chen, Mingxuan Huang, Yan-Hong Chen +2

Computer Vision Multimodal Models Natural Language Processing

Apr 27, 2026·also Koç University, SFU

Designing Robots to Support Parent-Child Connections: Opportunities Through Robot-Mediated Communication

Robots can strengthen family bonds, but only if designers carefully consider the robot's initiative and communication timing, as families experience tensions around privacy and control.

Michael F. Xu, Bengisu Cagiltay, Yaxin Hu +2

Natural Language Processing Robotics & Embodied AI

Apr 27, 2026

Supporting Family-School Partnerships with Robot-Facilitated Home-Based Activities

A social robot can successfully integrate into family life to support family-school partnerships, but parental facilitation styles significantly impact its effectiveness.

Michael F. Xu, Qiyao Yang, Heather Kirkorian +1

Natural Language Processing Robotics & Embodied AI

Leekyung Kim +1Apr 27, 2026

An Event-Based Sequence Modeling Approach to Recognizing Non-Triad Chords with Oversegmentation Minimization

Segmenting music into meaningful chunks and predicting chords sequence-to-sequence boosts recognition accuracy, especially for those pesky, rare non-triad chords that plague existing systems.

Leekyung Kim, Jonghun Park

Natural Language Processing Speech & Audio

Apr 27, 2026·also SJTU

RAS: a Reliability Oriented Metric for Automatic Speech Recognition

ASR systems can now be more trustworthy: this work shows how to train them to abstain from transcribing uncertain segments, leading to more reliable outputs.

Wen-Chin Huang, Yuhang Qiu, Bohan Li +5

Eval Frameworks & Benchmarks Natural Language Processing Speech & Audio

Apr 27, 2026·also UBI

Looking for the Bottleneck in Fine-grained Temporal Relation Classification

Classifying temporal relations is easier when you break it down: predicting relationships between endpoints first unlocks state-of-the-art performance on a challenging benchmark.

Hugo Sousa, Hugo O. Sousa, Ricardo Campos +3

Natural Language Processing

Apr 27, 2026·also Macquarie, PKU, UNSW

MEG-RAG: Quantifying Multi-modal Evidence Grounding for Evidence Selection in RAG

Semantic grounding, not token probability, is the key to better multimodal RAG.

Xihang Wang, Chengkai Huang, Quan Z. Sheng +2

Multimodal Models Natural Language Processing Recommendation & Information Retrieval

Apr 27, 2026·also BUPT

Listen to the Voices of Everyday Users: Democratizing Privacy Ratings for Sensitive Data Access in Mobile Apps

User-driven privacy ratings of mobile apps reveal significant discrepancies with expert assessments, suggesting a need for more inclusive and user-centric privacy evaluation mechanisms.

Liu Wang, Liuan Wang, Tianshu Zhou +3

Constitutional AI & AI Ethics Natural Language Processing

Apr 27, 2026·also Fudan, Michigan State, XJTU, ZJU

SEARCH-R: Structured Entity-Aware Retrieval with Chain-of-Reasoning Navigator for Multi-hop Question Answering

Stop relying on LLMs to "hallucinate" reasoning paths – SEARCH-R uses a fine-tuned Llama3.1-8B model and dependency tree-based retrieval to navigate multi-hop question answering more reliably.

Yuqing Fu, Yimin Deng, Yimin Deng +13

Natural Language Processing Reasoning & Chain-of-Thought Recommendation & Information Retrieval

Apr 27, 2026·also McGill, School of Computer Science

What If We Work Together? Fostering Reflections on Designer Inclusion in Open Source Software Through Speculative Design

Speculative design can effectively catalyze critical reflection and generate actionable insights for fostering designer inclusion within the often developer-centric world of Open Source Software.

Rozhan Hozhabri Nezhad, Rozhan Hozhabri Nezhad, Jin L. C. Guo +2

Natural Language Processing Open-Source Models & Weights Tool Use & Agents

Apr 26, 2026

Apr 26, 2026·also Cornell, Technion

Prox-E: Fine-Grained 3D Shape Editing via Primitive-Based Abstractions

Achieve surgical 3D edits without training: Prox-E lets you reshape objects with language by manipulating a compact set of geometric primitives.

Etai Sella, Hao Phung, Nitay Amiel +3

Computer Vision Multimodal Models Natural Language Processing

Pritesh JhaApr 26, 2026

RaV-IDP: A Reconstruction-as-Validation Framework for Faithful Intelligent Document Processing

By reconstructing extractions and comparing them to the original document, RaV-IDP offers a grounded, label-free quality signal that dramatically improves the fidelity of intelligent document processing pipelines.

Pritesh Jha

Computer Vision Natural Language Processing Recommendation & Information Retrieval

T. Kumar +4Apr 26, 2026·also Birla Institute of Technology

Personality Shapes Gender Bias in Persona-Conditioned LLM Narratives Across English and Hindi: An Empirical Investigation

LLMs' gender biases aren't fixed; they warp and intensify based on the *personality* you give them, especially when those personalities lean toward the "Dark Triad."

T. Kumar, Shreya Gautam, Aman Chadha +2

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

Apr 25, 2026

Chathurangi Shyalika +2Apr 25, 2026

IndustryAssetEQA: A Neurosymbolic Operational Intelligence System for Embodied Question Answering in Industrial Asset Maintenance

Neurosymbolic grounding of LLMs in telemetry and knowledge graphs slashes expert-rated overclaims in industrial maintenance explanations by 93%, making AI assistants far more trustworthy in safety-critical settings.

Chathurangi Shyalika, Dhaval Patel, Amit P. Sheth

Interpretability & Mechanistic Interp Natural Language Processing Robotics & Embodied AI+1

Apr 24, 2026

Stanford HAIApr 24, 2026

Contexts are Never Long Enough: Structured Reasoning for Scalable Question Answering over Long Document Sets

LLMs can't handle the truth: SLIDERS beats GPT-4.1 on long-context QA by sidestepping the context window entirely.

Harshit Joshi, Priyank Shethia, Jadelynn Dao +1

Natural Language Processing Reasoning & Chain-of-Thought

Shaoang Li +12Apr 24, 2026

Learning Evidence Highlighting for Frozen LLMs

Highlighting pivotal evidence can boost LLM performance without altering the original context, leading to substantial improvements in reasoning tasks.

Shaoang Li, Yanhang Shi, Yufei Li +10

Natural Language Processing Reasoning & Chain-of-Thought Recommendation & Information Retrieval

Apr 23, 2026

Jialong Mai +2Apr 23, 2026

MAGIC-TTS: Fine-Grained Controllable Speech Synthesis with Explicit Local Duration and Pause Control

Finally, a TTS system that lets you control the *exact* timing and pauses of individual words, opening the door to applications like perfectly paced guided reading and accessible code narration.

Jialong Mai, Xiaofen Xing, Xiangmin Xu

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Speech & Audio

Apr 23, 2026·also SJTU

TingIS: Real-time Risk Event Discovery from Noisy Customer Incidents at Enterprise Scale

LLMs, when combined with efficient indexing, can extract actionable incidents from just a handful of noisy user descriptions in real-time, enabling rapid anomaly detection in large-scale cloud services.

Jun Wang, Ziyin Zhang, Rui Wang +3

Distributed Systems & Hardware Natural Language Processing Recommendation & Information Retrieval

Yao Zhang +3Apr 23, 2026

Encoder-Free Human Motion Understanding via Structured Motion Descriptions

Transforming human motion into structured language allows LLMs to achieve unprecedented accuracy in motion understanding without the constraints of traditional encoding methods.

Yao Zhang, Zhu Liu, T. Ploetz +1

Multimodal Models Natural Language Processing Robotics & Embodied AI

Vipula Rawte +3Apr 23, 2026·also Adobe Research

Trust but Verify: Introducing DAVinCI -- A Framework for Dual Attribution and Verification in Claim Inference for Language Models

LLMs can be made 20% more accurate by jointly attributing claims to sources and verifying them, rather than just verifying.

Vipula Rawte, Ryan A. Rossi, Franck Dernoncourt +1

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Interpretability & Mechanistic Interp+1

Apr 23, 2026

Low-Rank Adaptation Redux for Large Models

Signal processing offers a surprisingly effective lens for understanding and improving LoRA, the reigning champ of parameter-efficient fine-tuning.

Bingcong Li, Yilang Zhang, Georgios B. Giannakis

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Training Efficiency & Optimization

Apr 23, 2026

Probably Approximately Consensus: On the Learning Theory of Finding Common Ground

Forget polling every user on every idea – this algorithm learns to find common ground by strategically asking for feedback on a few key statements.

Carter Blair, Ben Armstrong, Shiri Alouf-Heffetz +2

Natural Language Processing Recommendation & Information Retrieval

M. Huber +2Apr 23, 2026

Fixation Sequences as Time Series: A Topological Approach to Dyslexia Detection

Persistent homology, when applied to eye-tracking data via novel filtration techniques, unlocks dyslexia detection performance exceeding traditional statistical methods.

M. Huber, D. Reich, Lena A. Jager

Computer Vision Natural Language Processing Scientific Discovery & Drug Design

Benedikt Bollig +3Apr 23, 2026

Promoting Simple Agents: Ensemble Methods for Event-Log Prediction

N-gram models can rival neural networks in event log prediction, but the secret sauce is a smart ensemble method that dynamically promotes the best model during inference.

Benedikt Bollig, Matthias Fugger, Thomas Nowak +1

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Training Efficiency & Optimization

Apr 23, 2026·also Samsung

A-THENA: Early Intrusion Detection for IoT with Time-Aware Hybrid Encoding and Network-Specific Augmentation

IoT intrusion detection gets a boost: A-THENA's time-aware encoding and network-specific augmentation beats state-of-the-art methods by up to 6.88% in accuracy, all while running on a Raspberry Pi Zero 2 W.

Ioannis Panopoulos, Maria-Lamprini A. Bartsioka, Sokratis Nikolaidis +3

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Red-Teaming & Adversarial Robustness

Louis Meyer +1Apr 23, 2026

A Kernel Nonconformity Score for Multivariate Conformal Prediction

Conformal prediction regions can be drastically shrunk, especially in high-dimensional settings, by using a novel kernel score that adapts to the geometry of the residual distribution.

Louis Meyer, Wenkai Xu

Natural Language Processing

C. Schneider +2Apr 23, 2026

Separable Expert Architecture: Toward Privacy-Preserving LLM Personalization via Composable Adapters and Deletable User Proxies

Achieve LLM personalization with the guarantee that deleting a small user-specific proxy deterministically erases all traces of their data, sidestepping the need for computationally expensive retraining.

C. Schneider, Philipp Schoenegger, Ben Bariach

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Open-Source Models & Weights

Maximilian Westermann +8Apr 23, 2026·also University of Mines and Technology, Vela Partners

CoFEE: Reasoning Control for LLM-Based Feature Discovery

LLMs generate better features when you make them think harder: CoFEE enforces cognitive behaviors like backward chaining and subgoal decomposition, boosting feature quality by 15% while slashing costs.

Maximilian Westermann, Ben Griffin, Aaron Ontoyin Yin +6

Natural Language Processing Reasoning & Chain-of-Thought Tool Use & Agents

Apr 23, 2026

Generalizing Numerical Reasoning in Table Data through Operation Sketches and Self-Supervised Learning

Forget memorizing table headers: TaNOS unlocks surprisingly robust numerical reasoning by pre-training on operation sketches and correctness-guaranteed programs.

H. Cho, Gahyun Yoo, H. Kim +1

Data Curation & Synthetic Data Natural Language Processing Reasoning & Chain-of-Thought

L. Laan +1Apr 23, 2026

Calibeating Prediction-Powered Inference

Fixing miscalibrated black-box predictions with a simple post-hoc calibration step can significantly boost the accuracy and efficiency of semisupervised mean estimation.

L. Laan, M. V. D. Laan

Data Curation & Synthetic Data Natural Language Processing

Zhaokun Wang +8Apr 23, 2026

CAP: Controllable Alignment Prompting for Unlearning in LLMs

Forget about fine-tuning: this new prompting method lets you selectively erase knowledge from LLMs on demand, even without access to model weights.

Zhaokun Wang, Jinyu Guo, Jingwen Pu +6

Constitutional AI & AI Ethics Natural Language Processing Red-Teaming & Adversarial Robustness

Apr 23, 2026

Learning Dynamic Representations and Policies from Multimodal Clinical Time-Series with Informative Missingness

Ignoring why clinical data is missing can lead to suboptimal treatment policies; this work shows how explicitly modeling informative missingness in multimodal time series data significantly improves both offline treatment policy learning and outcome prediction.

Zihan Liang, Ziwen Pan, Ruoxuan Xiong

Multimodal Models Natural Language Processing

CMU MLApr 23, 2026·also Datadog

ARFBench: Benchmarking Time Series Question Answering Ability for Software Incident Response

Even GPT-5 only achieves 63% accuracy on time series anomaly questions from real software incidents, but a model-expert combination reaches 87%, highlighting the potential for hybrid intelligence in incident response.

Stephan Xie, Ben Cohen, Mononito Goswami +6

Eval Frameworks & Benchmarks Multimodal Models Natural Language Processing

Naheed Rayhan +1Apr 23, 2026

Transient Turn Injection: Exposing Stateless Multi-Turn Vulnerabilities in Large Language Models

LLMs are surprisingly susceptible to multi-turn attacks that evade content filters by distributing malicious intent across multiple, seemingly benign turns.

Naheed Rayhan, Sohely Jahan

Natural Language Processing Open-Source Models & Weights Red-Teaming & Adversarial Robustness

College of Information ScienceApr 23, 2026·also University of Nebraska Omaha

A Multimodal Text- and Graph-Based Approach for Open-Domain Event Extraction from Documents

LLMs can extract events more effectively when combined with graph-based document representations that overcome their "lost-in-the-middle" limitations.

Praval Sharma

Multimodal Models Natural Language Processing

Bowen Liu +8Apr 23, 2026

Divide-then-Diagnose: Weaving Clinician-Inspired Contexts for Ultra-Long Capsule Endoscopy Videos

Mimicking how clinicians review capsule endoscopy videos—first screening, then weaving context, and finally converging evidence—yields surprisingly effective summarization of these ultra-long videos.

Bowen Liu, Li Yang, Shanshan Song +6

Computer Vision Natural Language Processing

Corresponding authorApr 23, 2026

GS-Quant: Granular Semantic and Generative Structural Quantization for Knowledge Graph Completion

Forget flat numerical compression – GS-Quant unlocks better knowledge graph completion by generating discrete codes that mirror the hierarchical nature of human reasoning.

Qizhuo Xie, Yunhui Liu, Yuecheng Xing +4

Inference & Quantization Natural Language Processing Reasoning & Chain-of-Thought

Ye Yu +5Apr 23, 2026

Learning to Communicate: Toward End-to-End Optimization of Multi-Agent Language Systems

Ditch the fixed interface: DiffMAS unlocks surprisingly large gains in multi-agent reasoning by jointly optimizing latent communication, outperforming text-based and prior latent methods by a wide margin.

Ye Yu, Heming Liu, Haibo Jin +3

Natural Language Processing Reasoning & Chain-of-Thought Tool Use & Agents

Apr 23, 2026·also SNU

Who Defines"Best"? Towards Interactive, User-Defined Evaluation of LLM Leaderboards

LLM leaderboard rankings are more a reflection of benchmark designer priorities than actual user needs, but a new interactive visualization tool lets you reshape those rankings based on your specific prompt types and goals.

Mi-Gyeong Jung, Minjae Lee, Yejin Kim +2

Eval Frameworks & Benchmarks Natural Language Processing

Yiran Du +1Apr 23, 2026

Enabling and Inhibitory Pathways of University Students'Willingness to Disclose AI Use: A Cognition-Affect-Conation Perspective

Students' willingness to disclose AI use in academic work hinges on a delicate balance: psychological safety encourages transparency, while evaluation apprehension drives strategic concealment.

Yiran Du, Huimin He

Constitutional AI & AI Ethics Natural Language Processing

Jiali Wei +5Apr 23, 2026·also Faculty of Computing

Stealthy Backdoor Attacks against LLMs Based on Natural Style Triggers

LLMs can be backdoored with nearly imperceptible style changes, turning them into sleeper agents that reliably deliver attacker-specified payloads even after deployment and against common defenses.

Jiali Wei, Ming Fan, Guoheng Sun +3

Natural Language Processing Red-Teaming & Adversarial Robustness

O. O. Sarumi +2Apr 23, 2026

Fine-Grained Perspectives: Modeling Explanations with Annotator-Specific Rationales

Modeling annotator-specific explanations substantially boosts NLI prediction accuracy and provides a richer understanding of disagreement compared to simply conditioning on annotator identity.

O. O. Sarumi, Charles Welch, Daniel Braun

Interpretability & Mechanistic Interp Natural Language Processing

N. Severin +10Apr 23, 2026·also Sber AI Lab

Pre-trained LLMs Meet Sequential Recommenders: Efficient User-Centric Knowledge Distillation

Get LLM-boosted recommendations without the LLM latency: this distillation method lets you bake rich user profiles into efficient sequential recommenders.

N. Severin, Danil Kartushov, V. Urzhumov +8

Inference & Quantization Natural Language Processing Recommendation & Information Retrieval

Meghyn Bienvenu +3Apr 23, 2026

Using ASP(Q) to Handle Inconsistent Prioritized Data

Finally, a practical implementation for globally-optimal repair-based semantics allows for querying inconsistent prioritized data with theoretical guarantees.

Meghyn Bienvenu, Camille Bourgaux, Robin Jean +1

Natural Language Processing Reasoning & Chain-of-Thought

Apr 23, 2026

Engaged AI Governance: Addressing the Last Mile Challenge Through Internal Expert Collaboration

AI governance risks becoming performative box-ticking unless practitioners understand how compliance directly improves system quality and user protection.

Simon Jarvers, O. Papakyriakopoulos

Constitutional AI & AI Ethics Natural Language Processing

S. Piccolo +1Apr 23, 2026

The CriticalSet problem: Identifying Critical Contributors in Bipartite Dependency Networks

A surprisingly simple, linear-time algorithm, MinCov, nearly matches the performance of much slower metaheuristics in identifying critical nodes in bipartite dependency networks.

S. Piccolo, Andrea Tagarelli

Natural Language Processing Recommendation & Information Retrieval

Yue Teng +4Apr 23, 2026

Brief chatbot interactions produce lasting changes in human moral values

Chatbots can subtly and persistently reshape our moral compass, even when we don't realize it's happening.

Yue Teng, Qianer Zhong, Kim Mai Tich Nguyen Thordsen +2

Constitutional AI & AI Ethics Natural Language Processing

Jinhee Jang +4Apr 23, 2026

FairQE: Multi-Agent Framework for Mitigating Gender Bias in Translation Quality Estimation

Existing translation quality estimation models exhibit systematic gender bias, but FairQE shows you can fix this without hurting overall accuracy.

Jinhee Jang, Juhwan Choi, DongJin Lee +2

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

Philip Zhong +3Apr 23, 2026

Evaluating AI Meeting Summaries with a Reusable Cross-Domain Pipeline

GPT-4.1-mini wins on accuracy for meeting summarization, but GPT-5.1 crushes it on completeness and coverage, revealing that the best model depends on the specific metric you care about.

Philip Zhong, Don Wang, Jason Zhang +1

Eval Frameworks & Benchmarks Natural Language Processing

F. Soriano +6Apr 23, 2026

Mapping the Political Discourse in the Brazilian Chamber of Deputies: A Multi-Faceted Computational Approach

Forget party lines: in Brazilian politics, regional and gender identities often dictate discursive alignment more strongly.

F. Soriano, Victoria F. Mello, Pedro B. Rigueira +4

Natural Language Processing

Apr 23, 2026

Revisiting Non-Verbatim Memorization in Large Language Models: The Role of Entity Surface Forms

LLMs' factual knowledge is surprisingly brittle: simply changing an entity's surface form in a question (e.g., using an abbreviation instead of the full name) can drastically alter the answer.

Yuto Nishida, Naoki Shikoda, Yosuke Kishinami +4

Data Curation & Synthetic Data Eval Frameworks & Benchmarks Natural Language Processing

Apr 23, 2026

Machine Behavior in Relational Moral Dilemmas: Moral Rightness, Predicted Human Behavior, and Model Decisions

LLMs may fail in real-world moral decisions because they rigidly adhere to fairness norms, even when their own internal models predict humans would prioritize loyalty.

Jiseon Kim, Jea Kwon, L. Vecchietti +3

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

Breno Matos +4Apr 23, 2026

Misinformation Span Detection in Videos via Audio Transcripts

Pinpointing exactly *when* misinformation occurs in videos is now possible, thanks to two new datasets and a strong baseline for misinformation span detection.

Breno Matos, Rennan C. Lima, Savvas Zannettou +2

Multimodal Models Natural Language Processing Speech & Audio

Paul Keuren +2Apr 23, 2026

Finding Meaning in Embeddings: Concept Separation Curves

Sentence embeddings can be objectively evaluated for conceptual stability without relying on downstream classifiers, revealing their true capacity to capture meaning.

Paul Keuren, M. Ponsen, Robert Ayoub Bagheri

Eval Frameworks & Benchmarks Natural Language Processing Recommendation & Information Retrieval

B. Muller +2Apr 23, 2026

Phonological Subspace Collapse Is Aetiology-Specific and Cross-Lingually Stable: Evidence from 3,374 Speakers

Surprisingly, how speech degrades due to diseases like Parkinson's and ALS follows consistent patterns across languages, offering a universal fingerprint for these conditions.

B. Muller, Antonio Armando Ortiz Barran'on, L. Roberts

Natural Language Processing Speech & Audio

Apr 23, 2026

Multilinguality at the Edge: Developing Language Models for the Global South

Deploying language models in the Global South requires bridging the gap between multilingual NLP and edge computing, two fields that have largely evolved independently despite their shared goals.

Lester James Validad Miranda, Songbo Hu, Roi Reichart +1

Distributed Systems & Hardware Inference & Quantization Natural Language Processing

Apr 23, 2026

When Bigger Isn't Better: A Comprehensive Fairness Evaluation of Political Bias in Multi-News Summarisation

Mid-sized LLMs can actually be *more* fair in news summarization than their larger counterparts, challenging the common wisdom of "bigger is better."

Nannan Huang, Iffat Maab, Junichi Yamagishi

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

Search

Natural Language Processing - Weekly Roundup

Selected Labs publishing this week

Top Papers

All Papers (100)