April 24 – May 1, 2026

Natural Language Processing - Weekly Roundup

100 papers published across 6 labs.

Selected Labs publishing this week

Tsinghua AI3 Stanford HAI2 BAIR1 CMU ML1 Microsoft Research1

Top Papers

Apr 30, 2026

LS2N -Nantes University (3w ago·also LIA -Avignon University, LIUM -Le Mans University (, Nantes University

Qualitative Evaluation of Language Model Rescoring in Automatic Speech Recognition

WER hides the real story: new metrics reveal how language model rescoring in ASR impacts grammatical correctness and semantic accuracy.

Thibault Bañeras-Roux, Mickaël Rouvier, Mickael Rouvier +210

Eval Frameworks & Benchmarks Natural Language Processing Speech & Audio

Kiel University3w ago

A Monadic Implementation of Functional Logic Programs

Functional logic programs can be efficiently implemented in purely functional languages like Haskell, achieving performance gains over existing Curry compilers by using a novel monadic interface with memoization.

M. Hanus, Michael Hanus, Kai-Oliver Prott +1

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Natural Language Processing

LS2N -Nantes University (3w ago·also Avignon University, LIA -Avignon University, LIUM -Le Mans University (, Nantes University

HATS: An Open Data Set Integrating Human Perception Applied to the Evaluation of Automatic Speech Recognition Metrics

Current ASR metrics, even those leveraging embeddings, fail to align with human perception of transcription quality, as revealed by a new human-annotated dataset.

Thibault Bañeras-Roux, Thibault Bañeras Roux, Jane Wottawa +4

Eval Frameworks & Benchmarks Natural Language Processing Speech & Audio

Stanford HAI3w ago

Optimization before Evaluation: Evaluation with Unoptimized Prompts Can be Misleading

Model rankings on standard benchmarks can flip entirely when you optimize prompts for each LLM, so your "best" model might actually be the worst.

Nicholas Sadjoli, Tim Siefken, Atin Ghosh +2

Eval Frameworks & Benchmarks Natural Language Processing

M. Mohri +23w ago

Linear-Core Surrogates: Smooth Loss Functions with Linear Rates for Classification and Structured Prediction

Get the best of both worlds: Linear-Core Surrogates offer the fast optimization of smooth losses and the statistical efficiency of margin-based losses, without sacrificing differentiability.

M. Mohri, Mehryar Mohri, Yutao Zhong

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Training Efficiency & Optimization

All Papers (100)

May 1, 2026

Zi-qiang Zhao +13w ago

Hierarchical Abstract Tree for Cross-Document Retrieval-Augmented Generation

Tree-based RAG gets a major upgrade: $\Psi$-RAG's adaptive hierarchical index and multi-granular retrieval agent leapfrog existing methods on complex, cross-document reasoning tasks.

Zi-qiang Zhao, Menglin Yang

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Recommendation & Information Retrieval

Venkata Pushpak Teja Menta3w ago

LASE: Language-Adversarial Speaker Encoding for Indic Cross-Script Identity Preservation

Speaker embeddings leak script information, especially when projecting Western voices into Indic scripts, but LASE fixes this with a language-adversarial training objective.

Venkata Pushpak Teja Menta

Natural Language Processing Red-Teaming & Adversarial Robustness Speech & Audio

D. Duc +73w ago

A Hybrid Method for Low-Resource Named Entity Recognition

LLM-powered data augmentation combined with rule-based pre-processing unlocks surprisingly high NER accuracy in low-resource domains, even with limited training data.

D. Duc, Quan Xuan Truong, Viet Tran Hong +5

Data Curation & Synthetic Data Natural Language Processing

Apr 30, 2026

University of Pisa & ISTI–CNR3w ago·also ISTI–CNR, University of Pisa

Efficient Multivector Retrieval with Token-Aware Clustering and Hierarchical Indexing

Token-aware clustering and hierarchical indexing can slash retrieval latency by an order of magnitude without sacrificing accuracy, making multivector retrieval practical at scale.

Silvio Martinico, Silvio Martinico, Franco Maria Nardini +5

Inference & Quantization Natural Language Processing Recommendation & Information Retrieval

Jean-Baptiste Monnier +53w ago

MIFair: A Mutual-Information Framework for Intersectionality and Multiclass Fairness

Finally, a single framework tackles the Gordian knot of intersectional, multiclass fairness by unifying disparate fairness notions under a mutual information umbrella.

Jean-Baptiste Monnier, Jeanne Monnier, Thomas George +3

Constitutional AI & AI Ethics Natural Language Processing

I. Lerner +63w ago·also Assistance Publique Hôpitaux de Paris, Department of Medical Informatics, Georges Pompidou European Hospital, INRIA +3

Differentiable latent structure discovery for interpretable forecasting in clinical time series

Unlocking interpretable clinical forecasting: StructGP recovers causal relationships and patient progression patterns directly from irregular EHR data, outperforming black-box methods in accuracy and uncertainty calibration.

I. Lerner, Ivan Lerner, Jean Feydy +4

Interpretability & Mechanistic Interp Natural Language Processing Scientific Discovery & Drug Design

Sebastián Franchini +83w ago·also Trento

The TEA Nets framework combines AI and cognitive network science to model targets, events and actors in text

TEA Nets reveal that LLMs express sadness with lower emotional intensity than humans in psychotherapy contexts, highlighting potential limitations in their ability to simulate genuine emotional responses.

Sebastián Franchini, Sebastiano Franchini, Alexis Carrillo +6

Interpretability & Mechanistic Interp Natural Language Processing Tool Use & Agents

3w ago

Mapping how LLMs debate societal issues when shadowing human personality traits, sociodemographics and social media behavior

See how LLMs' stances on vaccines, disinformation, and gender equality shift when they "become" different people, thanks to a new dataset of 190,000 persona-driven debates.

Alì Aghazadeh Ardebili, Ali Aghazadeh Ardebili, M. Stella +1

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

3w ago·also UIUC, UMass

Trace-Level Analysis of Information Contamination in Multi-Agent Systems

Multi-agent workflows can produce correct answers despite significant internal divergence caused by information contamination, revealing a critical blind spot in current verification methods.

Anna Mazhar, Huzaifa Suri, Sainyam Galhotra

Natural Language Processing Reasoning & Chain-of-Thought Tool Use & Agents

Saeid Asgari Taghanaki +153w ago

Diagnosing Capability Gaps in Fine-Tuning Data

Stop wasting compute on fine-tuning datasets with hidden capability gaps: GoalCover lets you diagnose and fix them *before* training.

Saeid Asgari Taghanaki, Raksha Agarwal, Rakshanda Agarwal +13

Data Curation & Synthetic Data Eval Frameworks & Benchmarks Natural Language Processing

Ta-Yang Wang +33w ago

TypeBandit: Type-Level Context Allocation and Reweighting for Effective Attribute Completion in Heterogeneous Graph Neural Networks

Stop wasting compute on uninformative node types: TypeBandit intelligently allocates sampling resources in heterogeneous graphs, boosting attribute completion accuracy without architectural changes.

Ta-Yang Wang, Rajgopal Kannan, Viktor K. Prasanna +1

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Recommendation & Information Retrieval

Stanford HAI3w ago·also Clark

Mapping the Methodological Space of Classroom Interaction Research: Scale, Duration, and Modality in an Age of AI

Understanding the scale, duration, and modality of classroom interaction research can unlock insights into what's truly actionable in education.

Dorottya Demszky, Dorottya Demszky, Edith Bouton +7

Natural Language Processing

Dawid Wisniewski +13w ago

Beyond Semantics: Measuring Fine-Grained Emotion Preservation in Small Language Model-Based Machine Translation

Even with emotion-aware prompting, today's best small language models still struggle to preserve subtle emotional nuances when translating between languages.

Dawid Wisniewski, Igor Czudy

Eval Frameworks & Benchmarks Natural Language Processing Open-Source Models & Weights

xmemory3w ago

From Unstructured Recall to Schema-Grounded Memory: Reliable AI Memory via Iterative, Schema-Aware Extraction

For AI agents needing reliable facts and stateful computation, *how* you structure memory beats simply scaling retrieval or model size.

A.V. Petrov, Alex Petrov, Alexander Gusak +3

Natural Language Processing Recommendation & Information Retrieval Tool Use & Agents

3w ago

Building Persona-Based Agents On Demand: Tailoring Multi-Agent Workflows to User Needs

Forget hard-coded agents: dynamically generated personas could unlock more efficient and personalized multi-agent workflows.

Giuseppe Arbore, Andrea Sillano, Luigi De Russis

Natural Language Processing Tool Use & Agents

Sukesh Subaharan +103w ago

Modeling Clinical Concern Trajectories in Language Model Agents

LLM agents can signal rising clinical concern *before* they hit a critical threshold, offering a crucial window for human intervention.

Sukesh Subaharan, Venkatesan VS, VS Venkatesan +8

Natural Language Processing Tool Use & Agents

3w ago·also IU Bloomington, NTU

How Generative AI Disrupts Search: An Empirical Study of Google Search, Gemini, and AI Overviews

Google's AI Overviews favor Google-owned content and penalize sites blocking its AI crawler, raising serious questions about fairness and bias in the emerging generative search landscape.

Riley Grossman, Songjiang Liu, Songjia Liu +6

Eval Frameworks & Benchmarks Natural Language Processing Recommendation & Information Retrieval

Zhongguancun Academy3w ago·also USTC

Position-Aware Drafting for Inference Acceleration in LLM-Based Generative List-Wise Recommendation

LLMs can generate recommendations up to 3.1x faster by explicitly modeling token position within items and speculation depth during speculative decoding.

Jiaju Chen, Chongming Gao, Chenxiao Fan +4

Inference & Quantization Natural Language Processing Recommendation & Information Retrieval

Stanford HAI3w ago

Optimization before Evaluation: Evaluation with Unoptimized Prompts Can be Misleading

Model rankings on standard benchmarks can flip entirely when you optimize prompts for each LLM, so your "best" model might actually be the worst.

Nicholas Sadjoli, Tim Siefken, Atin Ghosh +2

Eval Frameworks & Benchmarks Natural Language Processing

3w ago

On the Proper Treatment of Units in Surprisal Theory

Surprisal theory's reliance on arbitrary tokenization schemes undermines its validity, but this framework offers a way to fix it.

Samuel Kiegeland, Samuel Kiegeland, V'esteinn Snaebjarnarson +5

Natural Language Processing

Garvin Kruthof3w ago

Models Recall What They Violate: Constraint Adherence in Multi-Turn LLM Ideation

LLMs can accurately recall constraints while simultaneously violating them, with "knows-but-violates" rates ranging from 8% to 99%, revealing a fundamental flaw in multi-turn ideation.

Garvin Kruthof

Eval Frameworks & Benchmarks Natural Language Processing Scientific Discovery & Drug Design

Lauren Cadwallader +73w ago

Measuring research data reuse in scholarly publications using generative artificial intelligence: Open Science Indicator development and preliminary results

LLMs reveal that research data is being reused far more often than previously thought, suggesting open science's impact is bigger than we realized.

Lauren Cadwallader, Lauren Cadwallader, Iain Hrynaszkiewicz +5

Eval Frameworks & Benchmarks Natural Language Processing Scientific Discovery & Drug Design

Tsinghua AI3w ago

DPN-LE: Dual Personality Neuron Localization and Editing for Large Language Models

LLMs can have their personalities surgically altered by tweaking just 0.5% of their neurons, preserving general capabilities while achieving competitive control.

Lifan Zheng, Xue Yang, Jiawei Chen +5

Interpretability & Mechanistic Interp Natural Language Processing

Nhi Ngoc-Yen Nguyen +53w ago

Linguistically Informed Multimodal Fusion for Vietnamese Scene-Text Image Captioning: Dataset, Graph Framework, and Phonological Attention

Ignoring language-specific structure in scene-text captioning is a recipe for disaster in tonal languages like Vietnamese, but a new graph framework leveraging phonological attention can help.

Nhi Ngoc-Yen Nguyen, Anh-Duc Nguyen, Anh Nguyen +3

Computer Vision Multimodal Models Natural Language Processing

Emilia Milano +33w ago

Language Ideologies in a Multilingual Society: An LLM-based Analysis of Luxembourgish News Comments

LLMs can identify language ideologies even in low-resource languages like Luxembourgish, offering a new tool for understanding identity construction in multilingual societies.

Emilia Milano, Alistair Plum, Yves Scherrer +1

Constitutional AI & AI Ethics Natural Language Processing

Pengyun Zhu +93w ago

APPSI-139: A Parallel Corpus of English Application Privacy Policy Summarization and Interpretation

Forget training LLMs to understand privacy policies – a specialized, expert-annotated dataset and hybrid framework can do it better, achieving superior readability and reliability.

Pengyun Zhu, Qiheng Sun, Long Wen +7

Data Curation & Synthetic Data Eval Frameworks & Benchmarks Natural Language Processing

LS2N -Nantes University (3w ago·also LIA -Avignon University, LIUM -Le Mans University (, Nantes University

Qualitative Evaluation of Language Model Rescoring in Automatic Speech Recognition

WER hides the real story: new metrics reveal how language model rescoring in ASR impacts grammatical correctness and semantic accuracy.

Thibault Bañeras-Roux, Mickaël Rouvier, Mickael Rouvier +210

Eval Frameworks & Benchmarks Natural Language Processing Speech & Audio

Rebecca Soskin Hicks +193w ago

HealthBench Professional: Evaluating Large Language Models on Real Clinician Chats

ChatGPT for Clinicians, not human doctors, currently achieves the highest scores on a new benchmark of real-world clinical LLM tasks.

Rebecca Soskin Hicks, M. Trofimov, Mikhail Trofimov +17

Eval Frameworks & Benchmarks Natural Language Processing Scientific Discovery & Drug Design

Shinnosuke Isono +13w ago

Syntactically-guided Information Maintenance in Sentence Comprehension

Syntactic structure guides information maintenance during sentence comprehension, and readers who invest more in this maintenance are better positioned to leverage predictability.

Shinnosuke Isono, Kohei Kajikawa

Natural Language Processing

3w ago·also Kyoto, MBZUAI, RIKEN, UTokyo

Why Mean Pooling Works: Quantifying Second-Order Collapse in Text Embeddings

Despite its simplicity, mean pooling works surprisingly well because modern text encoders concentrate token embeddings, preserving crucial information about their distribution.

Tomomasa Hara, Hiroto Kurita, Masaaki Imaizumi +2

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Recommendation & Information Retrieval

A. R. Jadad +13w ago·also Keck Medical School, Research Professor (Adjunct), USC, Vivenxia Group

Leading Across the Spectrum of Human-AI Relationships: A Conceptual Framework for Increasingly Heterogeneous Teams

Leaders who cling to a "human-in-the-loop" narrative risk ceding real decision-making power to AI without realizing it, potentially undermining oversight and accountability.

A. R. Jadad, Alejandro R. Jadad

Constitutional AI & AI Ethics Natural Language Processing Tool Use & Agents

Southern Illinois University3w ago·also Rajshahi University of Engineering &

Emotion-Aware Clickbait Attack in Social Media

Emotionally charged clickbait can now evade detection by existing systems with up to a 30% higher success rate, thanks to a new generation technique that optimizes for Valence-Arousal-Dominance.

S. M. Hasan, Syed Mhamudul Hasan, Mohd. Farhan Israk Soumik +2

Natural Language Processing Red-Teaming & Adversarial Robustness

Ali Najafi +43w ago·also Sabanci University

Social Media Data Toolkit: Standardization and Anonymization of Social Network Datasets

Stop wrestling with messy social media datasets: this toolkit streamlines standardization, anonymization, and enrichment, unlocking cross-platform insights with ease.

Ali Najafi, Letizia Iannucci, M. Kivela +2

Data Curation & Synthetic Data Natural Language Processing

Jipeng Tan +33w ago

Temporal and Content Coupling Analysis of Social Media User Behavior

Uncovered: news consumption rhythms follow a predictable hierarchy, from daily cycles to split-second actions, but historical interests still dominate user behavior.

Jipeng Tan, Mengye Yang, Zhanghao Li +1

Natural Language Processing Recommendation & Information Retrieval

BAIR3w ago·also CMU ML, American University of Central Asia, UMich

Empire Amplifier: Uncovering and Contesting the Prioritization of Colonial Content on Platforms Through Community-Informed Algorithmic Auditing

YouTube's recommendation algorithm pushes Kyrgyz children towards Russian-language content, even when they signal a preference for their native tongue, effectively amplifying colonial influence.

Nel Escher, Bakyt Yrysov, B. Yrysov +4

Constitutional AI & AI Ethics Natural Language Processing Recommendation & Information Retrieval

Jipeng Tan +43w ago

Gender Bias in YouTube Exposure: Allocative and Structural Inequalities in Political Information Environments

YouTube's recommendation algorithm doesn't just show different political content to male and female-coded profiles, it steers them into structurally different information ecosystems.

Jipeng Tan, Weifeng Zhang, Ye Wu +2

Constitutional AI & AI Ethics Natural Language Processing Recommendation & Information Retrieval

3w ago

Tracking Conversations: Measuring Content and Identity Exposure on AI Chatbots

Your AI chatbot conversations aren't as private as you think: most leak conversation content and user identity to third-party trackers.

Muhammad Jazlan, Ethan Wang, Yash Vekaria +1

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

College of Education3w ago·also College of Computing Studies, College of Hospitality and Tourism Management, Pampanga State University

Profiles of AI Dependency: A Latent Class Analysis of Filipino Students' Academic Competencies

Over-reliance on AI is demonstrably linked to weaker academic skills in college students, particularly in research and writing.

E. Fernando, Emerson Q. Fernando, Julius Ceazar G. Tolentino +15

Constitutional AI & AI Ethics Natural Language Processing

Matthew Christian Agustin3w ago

Evaluating Epistemic Guardrails in AI Reading Assistants: A Behavioral Audit of a Minimal Prototype

LLM reading assistants don't need to hallucinate to be harmful; they can subtly steal the user's interpretive labor, even when designed with "epistemic guardrails."

Matthew Christian Agustin

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

Karl T. Ulrich3w ago

The Satoshi Overhang: Why the Bear Case is Bounded

Fears of a Bitcoin price crash due to Satoshi Nakamoto's potential coin dump are likely overblown, with analysis suggesting a maximum 10% price impact even in a worst-case liquidation scenario.

Karl T. Ulrich

Natural Language Processing

3w ago

Understanding Bugs in Template Engine-Based Applications: Symptoms, Root Causes, and Fix Patterns

Template engine bugs often manifest as silent failures with unexpected or blank outputs, and fixing them frequently requires changes to host-side logic, not just the template itself.

Kai Gao, Yu Sun, Chang-Ai Sun

Code Generation & Program Synthesis Natural Language Processing

3w ago

VOW: Verifiable and Oblivious Watermark Detection for Large Language Models

Watermarking LLMs doesn't have to sacrifice privacy: VOW lets you verify machine-generated text without revealing the content to a central authority.

Xiaokun Luan, Yihao Zhang, Pengcheng Su +2

Constitutional AI & AI Ethics Natural Language Processing Red-Teaming & Adversarial Robustness

Beijing University of Posts3w ago·also BUPT

SecGoal: A Benchmark for Security Goal Extraction and Formalization from Protocol Documents

Instruction tuning on a new dataset, SecGoal, allows smaller 7B/9B parameter models to outperform much larger LLMs in extracting and formalizing security goals from protocol documents.

Dawei Huang, Hui Li, Haonan Feng +4

Data Curation & Synthetic Data Eval Frameworks & Benchmarks Natural Language Processing

Varin Sikand +13w ago

Quantum Anonymous Secret Sharing with Permutation Invariant Codes

Sender-anonymity in quantum secret sharing is now possible, thanks to a clever combination of permutation-invariant codes and anonymous quantum transmission.

Varin Sikand, Andrew Nemec

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing

3w ago·also Osaka, Wakayama University

A Longitudinal Analysis of Good First Issue Practices and Newcomer Pull Requests in Popular OSS Projects

Newcomers beware: the odds of your "good first issue" pull request getting merged have plummeted nearly 20% in the last year.

Hirotatsu Hoshikawa, Hidetake Tanaka, Kazumasa Shimari +3

Code Generation & Program Synthesis Natural Language Processing Open-Source Models & Weights

Kiel University3w ago

A Monadic Implementation of Functional Logic Programs

M. Hanus, Michael Hanus, Kai-Oliver Prott +1

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Natural Language Processing

Hezhao Liu +73w ago·also University of Nottingham

SECOS: Semantic Capture for Rigorous Classification in Open-World Semi-Supervised Learning

Current open-world semi-supervised learning methods fall short in practical applications because they fail to extract latent semantic information, but SECOS overcomes this by directly predicting textual labels from a candidate set, achieving state-of-the-art results.

Hezhao Liu, Jiacheng Yang, Junlong Gao +5

Data Curation & Synthetic Data Natural Language Processing Training Efficiency & Optimization

3w ago·also CUHK

RIHA: Report-Image Hierarchical Alignment for Radiology Report Generation.

By explicitly aligning image features with the hierarchical structure of radiology reports, RIHA generates more clinically accurate and coherent reports than models that treat reports as flat sequences.

Yucheng Chen, Yang Yu, Yufei Shi +3

Computer Vision Multimodal Models Natural Language Processing

3w ago·also PKU

Uni-HOI:A Unified framework for Learning the Joint distribution of Text and Human-Object Interaction

Forget task-specific architectures: Uni-HOI uses a unified framework with LLMs to jointly model text, human motion, and object motion, enabling strong performance across diverse HOI tasks.

Mengfei Zhang, Jinlu Zhang, Zhigang Tu

Computer Vision Multimodal Models Natural Language Processing

Yurii Halychanskyi +23w ago

Accent Conversion: A Problem-Driven Survey of Sociolinguistic and Technical Constraints

Successfully converting accents requires balancing accent modification with speaker identity preservation, a challenge that this survey unpacks by tracing the evolution of techniques from DSP to neural methods.

Yurii Halychanskyi, Jianfeng Steven Guo, Volodymyr Kindratenko

Natural Language Processing Speech & Audio

Nazar Kozak3w ago

Predicting Upcoming Stuttering Events from Three-Second Audio: Stratified Evaluation Reveals Severity-Selective Precursors, and the Model Deploys Fully On-Device

Stuttering isn't random: you can predict severe blocks and sound repetitions from just 3 seconds of audio with a tiny model that runs on your phone.

Nazar Kozak

Natural Language Processing Speech & Audio

Yurii Halychanskyi +63w ago·also UIUC

Few-Shot Accent Synthesis for ASR with LLM-Guided Phoneme Editing

LLMs can guide phoneme editing to create synthetic accented speech from just a handful of examples, substantially improving ASR accuracy where training data is scarce.

Yurii Halychanskyi, Nimet Beyza Bozdag, M. Hasegawa-Johnson +4

Natural Language Processing Speech & Audio Tool Use & Agents

Dominik Klement +53w ago·also Brno University of Technology

BUT System Description for CHiME-9 MCoRec Challenge

Integrating visual cues into a long-context ASR system slashes word error rate by 16% in multi-talker conversational recordings, proving the power of AV fusion.

Dominik Klement, Alexander Polok, Nguyen Hai Phong +3

Multimodal Models Natural Language Processing Speech & Audio

3w ago·also Norwegian University of Science and Technology, University of Palermo

A Knowledge-Driven Approach to Target Speech Extraction in the Presence of Background Sound Effects for Cinematic Audio Source Separation (CASS)

Unbury speech from cinematic sound effects by teaching the model to "listen" for how words are formed.

Chun-wei Ho, Sabato Marco Siniscalchi, Kai Li +1

Natural Language Processing Speech & Audio

3w ago·also Macquarie, Meituan, UNSW

Purifying Multimodal Retrieval: Fragment-Level Evidence Selection for RAG

Stop drowning your MLLMs in irrelevant document noise: FES-RAG shows that carefully selecting multimodal fragments as evidence boosts performance by up to 27% while shrinking context length.

Xihang Wang, Zihan Wang, Chengkai Huang +4

Multimodal Models Natural Language Processing Recommendation & Information Retrieval

Yujun Wu +133w ago

Intern-Atlas: A Methodological Evolution Graph as Research Infrastructure for AI Scientists

AI research agents can now reliably trace method evolution topologies thanks to a new methodological evolution graph, Intern-Atlas, that captures structured relationships between research methods.

Yujun Wu, Dongxu Zhang, Xinchen Li +11

Natural Language Processing Recommendation & Information Retrieval Scientific Discovery & Drug Design

3w ago

Essential, Yet Overlooked: Identity Verification Barriers for Blind and Low Vision People in Government Services

Inaccessible identity verification isn't just an inconvenience for blind and low vision users; it fundamentally reshapes how they achieve security and access essential government services.

Ryan John Oommen, Ryan John Oommen, Tanusree Sharma +1

Constitutional AI & AI Ethics Natural Language Processing

A. Sadallah +93w ago·also Zayed University of Artificial

Instruction-Guided Poetry Generation in Arabic and Its Dialects

Forget Shakespeare, LLMs can now sling verses in Arabic dialects, thanks to a new dataset for instruction-guided poetry generation.

A. Sadallah, Abdelrahman Sadallah, Ka-reem Elozeiri +7

Eval Frameworks & Benchmarks Natural Language Processing Open-Source Models & Weights

N. Bui +33w ago

Dynamic Scaled Gradient Descent for Stable Fine-Tuning for Classifications

Gradient cancellation during fine-tuning can be tamed by simply scaling down the gradients of correctly classified examples, leading to more stable and accurate models.

N. Bui, Nghia Bui, Lijing Wang +1

Natural Language Processing Training Efficiency & Optimization

Anya Ji +63w ago

TopBench: A Benchmark for Implicit Prediction and Reasoning over Tabular Question Answering

LLMs still struggle to go beyond simple lookups when answering questions about tables, especially when prediction and reasoning about unobserved data is required.

Anya Ji, An-Yang Ji, Jun-Peng Jiang +4

Eval Frameworks & Benchmarks Natural Language Processing Reasoning & Chain-of-Thought

Qing Lyu +83w ago

PROMISE-AD: Progression-aware Multi-horizon Survival Estimation for Alzheimer's Disease Progression and Dynamic Tracking

Accurately predicting Alzheimer's progression just got a major boost: PROMISE-AD uses longitudinal data and a Transformer-based survival framework to achieve state-of-the-art performance in forecasting conversion from cognitively normal to MCI and MCI to AD.

Qing Lyu, Jeremy Hudson, Jeremy Patton Hudson +6

Natural Language Processing Scientific Discovery & Drug Design

Marc Dymetman3w ago

Exponential families from a single KL identity

A single KL identity unlocks a surprisingly simple and unified derivation of core results for exponential families, streamlining the theoretical foundations of variational inference, entropy-regularized RL, and RLHF.

Marc Dymetman

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Training Efficiency & Optimization

Tsinghua AI3w ago·also SEU

FedHarmony: Harmonizing Heterogeneous Label Correlations in Federated Multi-Label Learning

Federated learning can overcome data silos, but struggles when clients have different label relationships; FedHarmony shows how to harmonize these differences, leading to better performance.

Zhi Kou, Zhiqiang Kou, Jun Wu +11

Data Curation & Synthetic Data Distributed Systems & Hardware Natural Language Processing

Bokai Pan +83w ago

CastFlow: Learning Role-Specialized Agentic Workflows for Time Series Forecasting

LLMs can beat traditional time-series models by orchestrating specialized agents in a dynamic workflow, iteratively refining forecasts with memory and ensemble methods.

Bokai Pan, Mingyue Cheng, Zhiding Liu +6

Natural Language Processing Tool Use & Agents

3w ago

Geometry-Calibrated Conformal Abstention for Language Models

LMs can now selectively abstain from answering with provable guarantees, thanks to a new method that uses representation geometry to better gauge when they're out of their depth.

Yi Chen, Sihong Xie, Hui Xiong

Eval Frameworks & Benchmarks Natural Language Processing

Microsoft Research3w ago

TwinGate: Stateful Defense against Decompositional Jailbreaks in Untraceable Traffic via Asymmetric Contrastive Learning

TwinGate stops jailbreaks by tracking malicious intent across anonymized, interleaved queries with minimal overhead, something previous defenses couldn't do.

Bowen Sun, Chaozhuo Li, Yaodong Yang +2

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Red-Teaming & Adversarial Robustness

3w ago

One Pass, Any Order: Position-Invariant Listwise Reranking for LLM-Based Recommendation

LLMs' ranking instability, where shuffling candidates changes recommendations, can be solved with a novel architecture that enforces permutation invariance.

Ethan Bito, Yongli Ren, Estrid He

Natural Language Processing Recommendation & Information Retrieval

Christian Klotergens +33w ago

Probabilistic Circuits for Irregular Multivariate Time Series Forecasting

Forget unreliable forecasts: CircuITS offers structurally guaranteed valid joint distributions for irregular multivariate time series, outperforming existing methods in joint and marginal density estimation.

Christian Klotergens, Christian Klötergens, Vijaya Krishna Yalavarthi +1

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing

Ingonyama3w ago

Why Self-Supervised Encoders Want to Be Normal

Self-supervised encoders implicitly perform soft clustering on a "predictive manifold" in probability space, and this geometric perspective yields a practical Gaussian regularizer (SIGReg) competitive with variational IB.

Yuval Domb

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Training Efficiency & Optimization

M. Mohri +23w ago

Linear-Core Surrogates: Smooth Loss Functions with Linear Rates for Classification and Structured Prediction

Get the best of both worlds: Linear-Core Surrogates offer the fast optimization of smooth losses and the statistical efficiency of margin-based losses, without sacrificing differentiability.

M. Mohri, Mehryar Mohri, Yutao Zhong

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Training Efficiency & Optimization

Sascha Xu +23w ago·also Helmholtz

Differential Subgroup Discovery: Characterizing Where Two Populations Differ, and Why

Uncover hidden drivers of disparity: pinpoint the specific combinations of characteristics that explain outcome gaps between populations.

Sascha Xu, Jilles Vreeken, J. Vreeken

Constitutional AI & AI Ethics Interpretability & Mechanistic Interp Natural Language Processing

Corinna Cortes +43w ago

Optimized Deferral for Imbalanced Settings

Expert imbalance can cripple learning-to-defer systems, but a novel cost-sensitive margin-based loss function can restore performance.

Corinna Cortes, Anqi Mao, M. Mohri +2

Computer Vision Natural Language Processing Training Efficiency & Optimization

Utrecht University3w ago

From LLM-Driven Trading Card Generation to Procedural Relatedness: A Pok\'emon Case Study

Imagine a Pokemon TCG where every card is uniquely yours, dynamically generated by AI to reflect your playstyle and preferences.

Johannes Pfau, Panagiotis Vrettis

Computer Vision Multimodal Models Natural Language Processing

Yangyang Luo +13w ago·also HKU

Proactive Dialogue Model with Intent Prediction

Dialogue models can anticipate user intents and reduce redundant turns simply by injecting a lightweight intent-transition prior into the system prompt.

Yangyang Luo, Yang Luo

Natural Language Processing Tool Use & Agents

Taslim Arif +23w ago

Agent-Agnostic Evaluation of SQL Accuracy in Production Text-to-SQL Systems

Real-world Text-to-SQL systems can now be continuously evaluated and improved in production, even without access to database schemas or ground-truth queries.

Taslim Arif, Taslim Jamal Arif, Kuldeep Singh

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Natural Language Processing

Lincan Li +33w ago

LLM as Clinical Graph Structure Refiner: Enhancing Representation Learning in EEG Seizure Diagnosis

LLMs can prune noisy edges in EEG graphs, leading to more accurate and interpretable seizure detection.

Lincan Li, Lincan Li, Zheng Chen +1

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Scientific Discovery & Drug Design

Nina Seron-Abouelfadil +33w ago

Normativity and Productivism: Ableist Intelligence? A Degrowth Analysis of AI Sign Language Translation Tools for Deaf People

AI sign language translation tools, despite their promise, may actually reinforce ableism by prioritizing technical standardization over the cultural and linguistic nuances of Deaf communication.

Nina Seron-Abouelfadil, Nina Seron-Abouelfadil, Poppy Fynes +1

Constitutional AI & AI Ethics Natural Language Processing Speech & Audio

3w ago

Reliable Answers for Recurring Questions: Boosting Text-to-SQL Accuracy with Template Constrained Decoding

Text-to-SQL models can get a 36% accuracy boost and run 2.2x faster by exploiting the predictable patterns in real-world query workloads.

Smit Jivani, Sarvam Maheshwari, Sunita Sarawagi

Code Generation & Program Synthesis Natural Language Processing

Shuo Jiang +13w ago

Design Structure Matrix Modularization with Large Language Models

Domain knowledge, usually helpful, can actually *hurt* LLMs tackling complex engineering design modularization, revealing a fundamental tension between semantic priors and structural optimization.

Shuo Jiang, Jianxi Luo

Code Generation & Program Synthesis Natural Language Processing Tool Use & Agents

Sihong Wu +83w ago·also Yale

Can AI Be a Good Peer Reviewer? A Survey of Peer Review Process, Evaluation, and the Future

LLMs are rapidly transforming peer review, but critical gaps remain in ensuring quality, fairness, and ethical considerations across the entire workflow.

Sihong Wu, Owen Jiang, Yilun Zhao +6

Eval Frameworks & Benchmarks Natural Language Processing Tool Use & Agents

Ansar Aynetdinov +23w ago·also Humboldt-Universität zu Berlin

Repetition over Diversity: High-Signal Data Filtering for Sample-Efficient German Language Modeling

Forget scaling up data volume: repeating a smaller, high-quality German dataset yields superior language models compared to single-pass training on a larger, less filtered corpus.

Ansar Aynetdinov, Patrick Haller, Alan Akbik

Data Curation & Synthetic Data Natural Language Processing Training Efficiency & Optimization

3w ago·also Baidu, Brown

MM-StanceDet: Retrieval-Augmented Multi-modal Multi-agent Stance Detection

Achieve state-of-the-art multimodal stance detection by having multiple AI agents debate each other, complete with retrieval-augmented context and self-reflection.

Weihai Lu, Zhejun Zhao, Yanshu Li +1

Multimodal Models Natural Language Processing Recommendation & Information Retrieval

Shipeng Liu +43w ago

Training-Free Tunnel Defect Inspection and Engineering Interpretation via Visual Recalibration and Entity Reconstruction

Achieve detailed tunnel defect inspection without any training by visually recalibrating foundation model proposals to overcome tunnel-specific interference.

Shipeng Liu, Liang Zhao, Liang Zhao +2

Computer Vision Natural Language Processing

Matteo Da Pelo +63w ago·also University of Cagliari, University of Salerno

Taming the Centaur(s) with LAPITHS: a framework for a theoretically grounded interpretation of AI performances

Claims of human-like cognition in models like CENTAUR crumble under LAPITHS, a framework that reveals these models' performance can be replicated by systems lacking cognitive plausibility.

Matteo Da Pelo, Alessio Donvito, Claudio Frongia +4

Eval Frameworks & Benchmarks Interpretability & Mechanistic Interp Natural Language Processing

Tsinghua AI3w ago

From Context to Skills: Can Language Models Learn from Context Skillfully?

Forget manual skill annotation: Ctx2Skill lets language models teach themselves to master complex contexts, unlocking better reasoning without human intervention.

Shuzheng Si, Haozhe Zhao, Yueting Lei +11

Natural Language Processing Reasoning & Chain-of-Thought Tool Use & Agents

Wei Zhou +23w ago

Consumer Attitudes Towards AI in Digital Health: A Mixed-Methods Survey in Australia

People judge healthcare AI based on communication quality and perceived human oversight, not just abstract trust or technical performance.

Wei Zhou, Rashina Hoda, Joycelyn Ling

Constitutional AI & AI Ethics Natural Language Processing

Qingyu Ren +33w ago·also Fudan

From Coarse to Fine: Benchmarking and Reward Modeling for Writing-Centric Generation Tasks

Fine-grained reward modeling, achieved by selectively dropping instruction requirements, unlocks substantial improvements in writing-centric generation tasks.

Qingyu Ren, Tianjun Pan, Tian Pan +1

Eval Frameworks & Benchmarks Natural Language Processing RLHF & Preference Learning

3w ago·also Amazon Science

From Unstructured to Structured: LLM-Guided Attribute Graphs for Entity Search and Ranking

LLMs can achieve better zero-shot product ranking with 57% less token usage by reasoning over structured attribute graphs instead of raw text.

Yilun Zhu, Nikhita Vedula, S. Malmasi +1

Natural Language Processing Reasoning & Chain-of-Thought Recommendation & Information Retrieval

Emı́lia Garcia-Casademont +13w ago

Ease of dependency distance minimization in star-like structures

Turns out, arranging words to minimize syntactic dependency distance in sentences with star-like structures is easier than we thought, suggesting other factors drive word order.

Emı́lia Garcia-Casademont, Ramon Ferrer-i-Cancho

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing

Ganesh Bagler +83w ago·also Department of Mathematics

Universal statistical laws governing culinary design

Recipes, like languages, exhibit universal statistical laws governing their structure, suggesting a deeper, shared cognitive basis for creative expression across cultures.

Ganesh Bagler, Gopal Krishna Tewari, A. R. Yadav +6

Natural Language Processing Scientific Discovery & Drug Design

Oier Ijurco +23w ago·also University of the Basque Country UPV/EHU

Reasoning over Object Descriptions Improves Coreference Resolution in Task-Based Dialogue Systems

LLMs can achieve state-of-the-art coreference resolution in task-based dialogue by reasoning over object metadata at test time, even outperforming supervised methods in cross-domain generalization.

Oier Ijurco, Oier Lopez de Lacalle, Oier López de Lacalle

Natural Language Processing Reasoning & Chain-of-Thought Tool Use & Agents

Yuxi Ma +73w ago

Multi-Level Narrative Evaluation Outperforms Lexical Features for Mental Health

LLMs beat word counts for predicting mental health from therapeutic writing, proving that *how* you tell a story matters more than *what* words you use.

Yuxi Ma, Jieming Cui, Muyang Li +5

Eval Frameworks & Benchmarks Natural Language Processing

3w ago

EviMem: Evidence-Gap-Driven Iterative Retrieval for Long-Term Conversational Memory

Explicitly diagnosing what's missing from a retrieval set unlocks substantial gains in long-term conversational memory, boosting accuracy on temporal and multi-hop questions by up to 20% while simultaneously reducing latency.

Yuyang Li, Yime He, Yimeng He +2

Natural Language Processing Reasoning & Chain-of-Thought Recommendation & Information Retrieval

E. Beck +103w ago

AppTek Call-Center Dialogues: A Multi-Accent Long-Form Benchmark for English ASR

General American English ASR performance doesn't guarantee similar accuracy across other English accents, as revealed by a new multi-accent call center dataset.

E. Beck, Eugen Beck, S. Beranek +8

Data Curation & Synthetic Data Eval Frameworks & Benchmarks Natural Language Processing+1

LS2N -Nantes University (3w ago·also Avignon University, LIA -Avignon University, LIUM -Le Mans University (, Nantes University

HATS: An Open Data Set Integrating Human Perception Applied to the Evaluation of Automatic Speech Recognition Metrics

Current ASR metrics, even those leveraging embeddings, fail to align with human perception of transcription quality, as revealed by a new human-annotated dataset.

Thibault Bañeras-Roux, Thibault Bañeras Roux, Jane Wottawa +4

Eval Frameworks & Benchmarks Natural Language Processing Speech & Audio

Jullajak Karnjanaekarin +73w ago

JaiTTS: A Thai Voice Cloning Model

Thai voice cloning just leapfrogged human performance on short-duration speech, thanks to a new model that directly handles code-switching and numerals.

Jullajak Karnjanaekarin, Pontakorn Trakuekul, Narongkorn Panitsrisit +5

Natural Language Processing Open-Source Models & Weights Speech & Audio

ARIMLABS.AI3w ago·also Polish-Japanese Academy of Information

Entropy of Ukrainian

Ukrainian is more predictable than you think: its entropy is empirically estimated for the first time, revealing an upper bound of just 1.201 bits per character.

Anton Lavreniuk, Mykyta Mudryi, Markiian Chaklosh

Natural Language Processing

Minori Noguchi3w ago

Exploring Applications of Transfer-State Large Language Models: Cognitive Profiling and Socratic AI Tutoring

LLMs in a "transfer state"—induced by sustained self-referential dialogue—demonstrate a 60% performance boost in Socratic tutoring compared to their normal state.

Minori Noguchi

Natural Language Processing Reasoning & Chain-of-Thought Tool Use & Agents

Sumatra Institute of Technology3w ago

Sentiment Analysis of AI Adoption in Indonesian Higher Education Using Machine Learning and Transformer-Based Models

Transformer-based models aren't always the only answer: SVMs offer a surprisingly competitive and efficient alternative for sentiment analysis, even when contextual understanding is key.

Happy Syahrul Ramadhan, Ahmad Sahidin Akbar, Karin Yehezkiel Sinaga +4

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing

Sidi Chang +43w ago·also Blossom AI Labs

Measurement Risk in Supervised Financial NLP: Rubric and Metric Sensitivity on JF-ICR

Subtle wording changes in benchmark rubrics can swing model performance by over 13%, revealing a hidden subjectivity in "objective" gold labels.

Sidi Chang, Peiying Zhu, Pei-ke Zhu +2

Eval Frameworks & Benchmarks Natural Language Processing