March 18 – March 25, 2026

Natural Language Processing - Weekly Roundup

100 papers published across 7 labs.

Selected Labs publishing this week

Amazon Science2 Tsinghua AI2 MIT CSAIL1 Meta AI1 Microsoft Research1

Top Papers

Mar 25, 2026

Terry Chen +221w ago

AVO: Agentic Variation Operators for Autonomous Evolutionary Search

Autonomous coding agents can now outperform expert-engineered attention kernels on NVIDIA's latest Blackwell GPUs, discovering optimizations that eluded human experts.

Terry Chen, Zhifan Ye, Bing Xu +20

Code Generation & Program Synthesis Natural Language Processing Tool Use & Agents

MIT CSAIL1w ago

Reaching Beyond the Mode: RL for Distributional Reasoning in Language Models

LMs can learn to generate multiple plausible answers in a single forward pass, outperforming traditional single-answer models on tasks requiring distributional reasoning and offering a compute-efficient alternative to best-of-k sampling.

Isha Puri, Mehul Damani, Idan Shenfeld +3

Natural Language Processing Reasoning & Chain-of-Thought RLHF & Preference Learning

Mar 20, 2026

cSuayp Talha Kocabay +11w ago

Diffutron: A Masked Diffusion Language Model for Turkish Language

A compact masked diffusion model can rival multi-billion parameter models in a morphologically rich language like Turkish, challenging the assumption that bigger is always better.

cSuayp Talha Kocabay, Talha Ruzgar Akkucs

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Training Efficiency & Optimization

1w ago

Sommelier: Scalable Open Multi-turn Audio Pre-processing for Full-duplex Speech Language Models

Unlock the potential of full-duplex speech language models with Sommelier, a new open-source pipeline that tackles the messy reality of multi-speaker conversations.

Kyudan Jung, Ji-Hoon Kim, Soyoon Kim +3

Data Curation & Synthetic Data Natural Language Processing Speech & Audio

Strukto.AI1w ago·also Infron.AI

SEAR: Schema-Based Evaluation and Routing for LLM Gateways

Stop relying on brittle classifiers: SEAR uses LLM reasoning and a unified SQL query layer to evaluate, route, and explain decisions in LLM gateways.

Zecheng Zhang, Han Zheng, Yue Xu

Eval Frameworks & Benchmarks Natural Language Processing Tool Use & Agents

All Papers (100)

Mar 25, 2026

Terry Chen +221w ago

AVO: Agentic Variation Operators for Autonomous Evolutionary Search

Autonomous coding agents can now outperform expert-engineered attention kernels on NVIDIA's latest Blackwell GPUs, discovering optimizations that eluded human experts.

Terry Chen, Zhifan Ye, Bing Xu +20

Code Generation & Program Synthesis Natural Language Processing Tool Use & Agents

MIT CSAIL1w ago

Reaching Beyond the Mode: RL for Distributional Reasoning in Language Models

Isha Puri, Mehul Damani, Idan Shenfeld +3

Natural Language Processing Reasoning & Chain-of-Thought RLHF & Preference Learning

Mar 20, 2026

cSuayp Talha Kocabay +11w ago

Diffutron: A Masked Diffusion Language Model for Turkish Language

A compact masked diffusion model can rival multi-billion parameter models in a morphologically rich language like Turkish, challenging the assumption that bigger is always better.

cSuayp Talha Kocabay, Talha Ruzgar Akkucs

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Training Efficiency & Optimization

1w ago

Sommelier: Scalable Open Multi-turn Audio Pre-processing for Full-duplex Speech Language Models

Unlock the potential of full-duplex speech language models with Sommelier, a new open-source pipeline that tackles the messy reality of multi-speaker conversations.

Kyudan Jung, Ji-Hoon Kim, Soyoon Kim +3

Data Curation & Synthetic Data Natural Language Processing Speech & Audio

Strukto.AI1w ago·also Infron.AI

SEAR: Schema-Based Evaluation and Routing for LLM Gateways

Stop relying on brittle classifiers: SEAR uses LLM reasoning and a unified SQL query layer to evaluate, route, and explain decisions in LLM gateways.

Zecheng Zhang, Han Zheng, Yue Xu

Eval Frameworks & Benchmarks Natural Language Processing Tool Use & Agents

Mar 19, 2026

Gagan Bhatia +31w ago

What Really Controls Temporal Reasoning in Large Language Models: Tokenisation or Representation of Time?

LLMs' temporal reasoning crumbles in low-resource languages and rarer calendar formats, not due to a lack of reasoning ability, but because poor tokenization fragments dates and times.

Gagan Bhatia, Ahmad Muhammad Isa, Maxime Peyrard +1

Eval Frameworks & Benchmarks Natural Language Processing Reasoning & Chain-of-Thought

TU Dortmund University1w ago

Hardness of High-Dimensional Linear Classification

Linear classification, a cornerstone of machine learning, is provably harder than we thought in high dimensions.

Alexander Munteanu, Simon Omlor, Jeff M. Phillips

Computer Vision Natural Language Processing

Ezekiel Nii Noye Nortey +91w ago

An Optimised Greedy-Weighted Ensemble Framework for Financial Loan Default Prediction

Forget static model averaging: dynamically weighting ensembles based on empirical performance can significantly boost accuracy and interpretability in financial loan default prediction.

Ezekiel Nii Noye Nortey, E. Nortey, Jones Asante-Koranteng +7

Natural Language Processing Recommendation & Information Retrieval Training Efficiency & Optimization

Maxime Poli +51w ago

DiscoPhon: Benchmarking the Unsupervised Discovery of Phoneme Inventories With Discrete Speech Units

Unsupervised phoneme discovery from self-supervised speech models is surprisingly viable, but language-specific challenges remain a significant hurdle.

Maxime Poli, Manel Khentout, Angelo Ortiz Tandazo +3

Eval Frameworks & Benchmarks Natural Language Processing Speech & Audio

Zikang Ding +61w ago

UGID: Unified Graph Isomorphism for Debiasing Large Language Models

By enforcing graph isomorphism across counterfactual inputs, UGID reveals that debiasing LLMs can be achieved by directly manipulating internal representations and attention mechanisms.

Zikang Ding, Junchi Yao, Junhao Li +4

Architecture Design (Transformers, SSMs, MoE)Constitutional AI & AI Ethics Natural Language Processing

Meta AI1w ago·also Oxford, TU Eindhoven, TU Munich

Agentic Business Process Management: A Research Manifesto

Agentic Business Process Management offers a blueprint for aligning AI agents with organizational goals, moving beyond simple automation to a framework of constrained autonomy.

Diego Calvanese, Angelo Casciani, G. D. Giacomo +18

Natural Language Processing Tool Use & Agents

Madeline Bittner +101w ago

A Dataset and Resources for Identifying Patient Health Literacy Information from Clinical Notes

Unlock automated health literacy assessment from clinical notes with HEALIX, the first publicly available dataset of its kind.

Madeline Bittner, Dina Demner-Fushman, Yasmeen Shabazz +8

Data Curation & Synthetic Data Natural Language Processing

Wenxuan Zhang +131w ago

dTRPO: Trajectory Reduction in Policy Optimization of Diffusion Large Language Models

Scale up offline policy training for diffusion LLMs without breaking the bank: dTRPO slashes trajectory computation costs while boosting performance up to 9.6% on STEM tasks.

Wenxuan Zhang, Lemeng Wu, Changsheng Zhao +11

Natural Language Processing RLHF & Preference Learning Training Efficiency & Optimization

Zuher Jahshan +21w ago

MANAR: Memory-augmented Attention with Navigational Abstract Conceptual Representation

By mimicking the brain's "global workspace," MANAR achieves linear-time attention without sacrificing performance, offering a drop-in replacement for standard attention that's both faster and potentially more creative.

Zuher Jahshan, Ben Ben Ishay, Leonid Yavits

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing

1w ago

Why Better Cross-Lingual Alignment Fails for Better Cross-Lingual Transfer: Case of Encoders

Cross-lingual alignment can actually *hurt* transfer learning performance because aligning embeddings doesn't necessarily help with the downstream task.

Yana Veitsman, Yihong Liu, Hinrich Schütze

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Training Efficiency & Optimization

Amazon Science1w ago

RADIUS: Ranking, Distribution, and Significance - A Comprehensive Alignment Suite for Survey Simulation

LLM-generated survey responses can be statistically accurate yet still miss the option most preferred by humans, highlighting a critical flaw in current evaluation methods.

Weronika Łajewska, Weronika Lajewska, Paul Missault +2

Eval Frameworks & Benchmarks Natural Language Processing Recommendation & Information Retrieval

L3S Research Center Leibniz University1w ago·also IIT

Cross-Modal Rationale Transfer for Explainable Humanitarian Classification on Social Media

Skip annotating image rationales: this method transfers text-based rationales to images for explainable crisis classification, saving annotation effort while boosting performance.

Thi Huyen Nguyen, Koustav Rudra, Wolfgang Nejdl

Interpretability & Mechanistic Interp Multimodal Models Natural Language Processing

Haonan Yu +41w ago

WASD: Locating Critical Neurons as Sufficient Conditions for Explaining and Controlling LLM Behavior

Control LLMs without retraining: pinpointing just a few key neurons lets you steer outputs more reliably than attribution methods.

Haonan Yu, Junhao Liu, Zhenyu Yan +2

Interpretability & Mechanistic Interp Natural Language Processing

Chuxuan Hu +31w ago

SODIUM: From Open Web Data to Queryable Databases

Automating web data integration for expert querying is now possible: SODIUM-Agent achieves a 2x accuracy boost over existing systems on a new benchmark of 105 real-world tasks.

Chuxuan Hu, Philip Li, Maxwell Yang +1

Data Curation & Synthetic Data Natural Language Processing Recommendation & Information Retrieval+1

Ruishuo Chen +71w ago

PowerFlow: Unlocking the Dual Nature of LLMs via Principled Distribution Matching

Unleashing an LLM's inner creativity or laser-sharp logic is now as simple as turning a knob, thanks to a new distribution-matching method that avoids heuristic rewards.

Ruishuo Chen, Ruishuo Chen, Yu Chen +5

Natural Language Processing RLHF & Preference Learning Training Efficiency & Optimization

Aram Abrahamyan +11w ago

A Comparative Empirical Study of Catastrophic Forgetting Mitigation in Sequential Task Adaptation for Continual Natural Language Processing Systems

Naive fine-tuning leads to catastrophic forgetting, but combining replay-based and parameter isolation strategies can actually *improve* performance over joint training in continual learning for intent classification.

Aram Abrahamyan, Sachin Kumar

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Training Efficiency & Optimization

1w ago

AU Codes, Language, and Synthesis: Translating Anatomy to Text for Facial Behavior Synthesis

Ditch one-hot vectors: representing facial action units as natural language unlocks more realistic and nuanced facial expression synthesis, especially when dealing with conflicting muscle movements.

Jiahe Wang, Cong Liang, Xuandong Huang +5

Computer Vision Natural Language Processing Speech & Audio

1w ago·also Shenzhen Loop Area Institude

UT-ACA: Uncertainty-Triggered Adaptive Context Allocation for Long-Context Inference

LLMs can maintain generation quality in long-context scenarios while using significantly less context, simply by adaptively allocating context based on uncertainty.

Lang Zhou, Shuxuan Li, Zhuohao Li +4

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Natural Language Processing

Neil Fernandes +71w ago

"You've got a friend in me": Co-Designing a Peer Social Robot for Young Newcomers'Language and Cultural Learning

A peer-like social robot can effectively augment literacy tutor support for newcomer children, offering personalized language and cultural learning in resource-constrained community settings.

Neil Fernandes, Cheng Tang, Tehniyat Shahbaz +5

Natural Language Processing Robotics & Embodied AI Tool Use & Agents

1w ago

Language Model Maps for Prompt-Response Distributions via Log-Likelihood Vectors

Forget comparing models with benchmarks – mapping them by prompt-response likelihoods reveals hidden relationships between architecture, training data, and even how prompts compose.

Yusuke Takase, Yusuke Takase, Momose Oyama +3

Eval Frameworks & Benchmarks Interpretability & Mechanistic Interp Natural Language Processing+1

Tsinghua AI1w ago·also Baidu, HKU

SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing

Instruction-guided video editing can achieve impressive zero-shot performance simply by pre-training on motion-centric video restoration tasks *before* fine-tuning on paired editing data.

Xinyao Zhang, Xinyao Zhang, Wenkai Dong +19

Computer Vision Multimodal Models Natural Language Processing

Zhelin Xu +51w ago

AutoScreen-FW: An LLM-based Framework for Resume Screening

Open-source LLMs, when carefully prompted with representative examples, can rival or even surpass smaller commercial models like GPT-3.5-nano in resume screening tasks, offering a privacy-preserving alternative.

Zhelin Xu, Zhelin Xu, Shuhei Yamamoto +3

Eval Frameworks & Benchmarks Natural Language Processing Open-Source Models & Weights

1w ago·also Microsoft Research

HypeMed: Enhancing Medication Recommendations with Hypergraph-Based Patient Relationships

Hypergraph modeling of patient visits, coupled with contrastive pre-training, significantly boosts medication recommendation accuracy and safety by capturing complex relationships missed by traditional graph-based approaches.

Xiangxu Zhang, Xiao Zhou, Hongteng Xu +1

Natural Language Processing Recommendation & Information Retrieval Scientific Discovery & Drug Design

Hung-Yue Suen +21w ago

Dual-Model Prediction of Affective Engagement and Vocal Attractiveness from Speaker Expressiveness in Video Learning

You can predict how engaged and attracted viewers are to a video lecture just by analyzing the speaker's face and voice, no audience data needed.

Hung-Yue Suen, Kuo-En Hung, Fan-Hsun Tseng

Computer Vision Natural Language Processing Speech & Audio

Aditi Naiknaware +11w ago

T-QPM: Enabling Temporal Out-Of-Distribution Detection and Domain Generalization for Vision-Language Models in Open-World

VLMs can now better detect when they're seeing something they shouldn't, even as the world changes around them, thanks to a new method that dynamically fuses visual and textual cues.

Aditi Naiknaware, S. Sekeh

Computer Vision Multimodal Models Natural Language Processing

Google Research1w ago·also University of Georgia, UT Austin, Vienna

Geography According to ChatGPT -- How Generative AI Represents and Reasons about Geography

ChatGPT's geographic reasoning can be surprisingly brittle, with minor syntactic changes causing significant output variations and task composition revealing unexpected distributional shifts.

Krzysztof Janowicz, Gengchen Mai, Rui Zhu +4

Eval Frameworks & Benchmarks Natural Language Processing Reasoning & Chain-of-Thought

INESC TEC1w ago·also U Porto

WeNLEX: Weakly Supervised Natural Language Explanations for Multilabel Chest X-ray Classification

Get faithful and plausible natural language explanations for chest X-rays with as few as 5 human-annotated examples per diagnosis, and even boost classification accuracy in the process.

Isabel Rio-Torto, Jaime S. Cardoso, L. Teixeira +1

Computer Vision Interpretability & Mechanistic Interp Multimodal Models+1

Ziyin Zhang +81w ago

F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual World

Multilingual embeddings just got a whole lot smaller and faster, with F2LLM-v2 models outperforming larger counterparts while supporting over 200 languages.

Ziyin Zhang, Ziyin Zhang, Zihan Liao +6

Natural Language Processing Open-Source Models & Weights Training Efficiency & Optimization

Zhouting Zhao +11w ago

A Model Ensemble-Based Post-Processing Framework for Fairness-Aware Prediction

Achieve fairness without sacrificing accuracy: this post-processing ensemble method boosts fairness across diverse tasks and models.

Zhouting Zhao, Tin Lok James Ng

Constitutional AI & AI Ethics Natural Language Processing

1w ago·also HKU

Parallelograms Strike Back: LLMs Generate Better Analogies than People

LLMs aren't just regurgitating facts; they're actually better at generating high-quality, relation-preserving word analogies than humans.

Qiawen Ella Liu, Qiawen Liu, Raja Marjieh +3

Eval Frameworks & Benchmarks Natural Language Processing Reasoning & Chain-of-Thought

Yikai Zheng +81w ago

Em-Garde: A Propose-Match Framework for Proactive Streaming Video Understanding

Proactive VideoLLMs can finally be both accurate AND efficient thanks to a novel propose-match framework that decouples semantic understanding from streaming perception.

Yikai Zheng, Xin Ding, Yifan Yang +6

Computer Vision Multimodal Models Natural Language Processing

Peng Gang1w ago

Evaluating 5W3H Structured Prompting for Intent Alignment in Human-AI Interaction

LLMs understand your intent better when you structure your prompts with "who, what, when, where, why, how, how much, and how many," but only if you present it in natural language, not raw JSON.

Peng Gang

Eval Frameworks & Benchmarks Natural Language Processing

Arthur Dyevre +11w ago

Man and machine: artificial intelligence and judicial decision making

Despite the hype, AI decision aids have had surprisingly little impact on actual judicial decisions, revealing a critical gap between algorithmic potential and real-world application.

Arthur Dyevre, Ahmad Shahvaroughi

Constitutional AI & AI Ethics Natural Language Processing

Nicolas Martorell +11w ago

Quantitative Introspection in Language Models: Tracking Internal States Across Conversation

LLMs can introspect on their own internal emotive states during conversations with surprising accuracy, opening a new avenue for monitoring and influencing their behavior.

Nicolas Martorell, N. Martorell

Eval Frameworks & Benchmarks Interpretability & Mechanistic Interp Natural Language Processing

Vedant Pandya1w ago

Progressive Training for Explainable Citation-Grounded Dialogue: Reducing Hallucination to Zero in English-Hindi LLMs

Citation-grounded supervised fine-tuning slashes hallucination rates to zero in encoder-decoder models, proving that explicit citation mechanisms are a potent tool for factual accuracy in dialogue systems.

Vedant Pandya

Natural Language Processing Recommendation & Information Retrieval Training Efficiency & Optimization

Samsung R&D Institute Philippines1w ago

Evaluating LLM-Generated Lessons from the Language Learning Students'Perspective: A Short Case Study on Duolingo

Language learners find that Duolingo's general lessons are great for building a foundation, but personalized, work-related scenarios are key to achieving professional fluency.

Carlos Rafael Catalan, Patricia Nicole Monderin, Lheane Marie Dizon +3

Eval Frameworks & Benchmarks Natural Language Processing

Bauhaus University1w ago·also Stanford HAI

Through the Looking-Glass: AI-Mediated Video Communication Reduces Interpersonal Trust and Confidence in Judgments

AI-mediated video calls erode trust and confidence, even though they don't actually make people worse at spotting lies.

Nelson Navajas Fernández, Jeffrey T. Hancock, Maurice Jakesch

Constitutional AI & AI Ethics Natural Language Processing

1w ago

AutORAN: LLM-driven Natural Language Programming for Agile xApp Development

Forget months of manual coding: AutORAN lets you build and deploy O-RAN xApps from natural language in minutes.

Xin Li, Xin Li, Shiming Yu +9

Code Generation & Program Synthesis Natural Language Processing Tool Use & Agents

Kenneth Joseph +21w ago

A conceptual framework for ideology beyond the left and right

Stop shoehorning ideology into a left/right box: this framework lets you model complex belief systems as interconnected networks of concepts, revealing hidden relationships in social discourse.

Kenneth Joseph, Kim Williams, David Lazer

Constitutional AI & AI Ethics Natural Language Processing

Lourdes Moreno +21w ago

A Human-in/on-the-Loop Framework for Accessible Text Generation

Human oversight can be systematically integrated into LLM-based text generation to improve accessibility, creating a traceable and auditable process.

Lourdes Moreno, P. Mart'inez, Paloma Martínez

Eval Frameworks & Benchmarks Natural Language Processing RLHF & Preference Learning

Amazon Science1w ago

Cross-Lingual LLM-Judge Transfer via Evaluation Decomposition

Forget expensive multilingual annotations: this framework lets you evaluate LLMs in new languages by transferring knowledge from English, with surprisingly strong results.

Ivaxi Sheth, Ivaxi Sheth, Zeno Jonke +5

Eval Frameworks & Benchmarks Natural Language Processing

1w ago

Adaptive Decoding via Test-Time Policy Learning for Self-Improving Generation

Forget fixed decoding strategies – RL can learn a lightweight policy to adapt LLM sampling *at test time*, boosting summarization quality by up to 88% without retraining the LLM.

Asmita Bhardwaj, Yuya Jeremy Ong, Eelaaf Zahid +1

Inference & Quantization Natural Language Processing RLHF & Preference Learning

Xiaoyu Liu +11w ago

TopoChunker: Topology-Aware Agentic Document Chunking Framework

RAG systems can achieve state-of-the-art performance by explicitly preserving document topology, outperforming LLM-based chunking while simultaneously reducing token overhead.

Xiaoyu Liu, Xiaoyu Liu

Natural Language Processing Recommendation & Information Retrieval Tool Use & Agents

Tsinghua AI1w ago·also OPPO, Shenzhen Institutes of Advanced

Click-to-Ask: An AI Live Streaming Assistant with Offline Copywriting and Online Interactive QA

AI can now handle the tedious copywriting and real-time Q&A for live-streaming commerce, freeing up human streamers to focus on engagement.

Ruizhi Yu, Keyang Zhong, Peng Liu +5

Multimodal Models Natural Language Processing Tool Use & Agents

Mingyang Liu +111w ago

Online Learning and Equilibrium Computation with Ranking Feedback

Learning from ranked preferences alone can be surprisingly difficult: even with access to the full ranking of actions, standard online learning guarantees break down unless the environment is sufficiently stable.

Mingyang Liu, Mingyang Liu, Yongshang Chen +9

Natural Language Processing Recommendation & Information Retrieval RLHF & Preference Learning

Harshvardhan J. Pandit +41w ago

Terms of (Ab)Use: An Analysis of GenAI Services

GenAI terms of service make you solely responsible for your AI's outputs, even though you have no control over how the model works.

Harshvardhan J. Pandit, H. Pandit, Dick A. H. Blankvoort +2

Constitutional AI & AI Ethics Natural Language Processing

Wenxiu Li +31w ago

The Spillover Effects of Peer AI Rinsing on Corporate Green Innovation

AI washing isn't just a marketing problem; it actively harms corporate green innovation, especially for smaller players in competitive markets.

Wenxiu Li, Zhanjie Wen, Jiechang Xia +1

Constitutional AI & AI Ethics Natural Language Processing

1w ago

ADAPT: Attention Driven Adaptive Prompt Scheduling and InTerpolating Orthogonal Complements for Rare Concepts Generation

Ditch the finetuning: this training-free method uses attention scores to generate rare concepts in images with more precision and control than LLM-guided approaches.

Kwanyoung Lee, Hyunwoo Oh, SeungJu Cha +3

Computer Vision Multimodal Models Natural Language Processing

Nicholas D'Silva +21w ago

SoK: Practical Aspects of Releasing Differentially Private Graphs

Navigating the maze of differentially private graph release methods just got easier: a new framework helps practitioners choose the right approach, avoid common pitfalls, and make sound evaluations.

Nicholas D'Silva, Surya Nepal, Salil S. Kanhere

Constitutional AI & AI Ethics Natural Language Processing Open-Source Models & Weights

Julian Allagan +131w ago

Robustness, Cost, and Attack-Surface Concentration in Phishing Detection

Phishing detectors, despite near-perfect accuracy, crumble under budget-constrained attacks that exploit a handful of low-cost features, revealing a critical vulnerability in real-world deployment.

Julian Allagan, Julian D. Allagan, M. Elbakary +11

Natural Language Processing Red-Teaming & Adversarial Robustness

Alexandre Bloch +91w ago

The Exponentially Weighted Signature

Forget uniform weighting: the Exponentially Weighted Signature lets you inject temporal context and richer memory dynamics into path representations.

Alexandre Bloch, Alexandre Bloch, Samuel N. Cohen +7

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing

1w ago

Optimal Splitting of Language Models from Mixtures to Specialized Domains

Stop guessing how much to pretrain vs. specialize your language model – scaling laws can now tell you the optimal compute allocation for maximizing performance on downstream tasks.

Skyler Seto, Pierre Ablin, Anastasiia Filippova +5

Data Curation & Synthetic Data Natural Language Processing Training Efficiency & Optimization

Jonathan Lys +91w ago

D5P4: Partition Determinantal Point Process for Diversity in Parallel Discrete Diffusion Decoding

Discrete diffusion models can now generate more diverse text without sacrificing quality, thanks to a new decoding method that explicitly optimizes for diversity during beam search.

Jonathan Lys, Vincent Gripon, B. Pasdeloup +7

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Natural Language Processing

Ruilin Li +51w ago

Enhancing Pretrained Model-based Continual Representation Learning via Guided Random Projection

Random projections in continual learning don't have to be random: carefully guiding them with target-aligned data beats the SOTA.

Ruilin Li, Heming Zou, Xiufeng Yan +3

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Training Efficiency & Optimization

Ines Aitsahalia +21w ago

Hierarchical Latent Structure Learning through Online Inference

Discovering hierarchical structure in sequential data is now tractable, thanks to a new model that learns online without supervision.

Ines Aitsahalia, K. Iigaya, Kiyohito Iigaya

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Training Efficiency & Optimization

Qin Jiang +51w ago

Position: Spectral GNNs Are Neither Spectral Nor Superior for Node Classification

Spectral GNNs' purported spectral advantages for node classification are illusory; their performance actually hinges on their underlying MPNN structure, debunking the "graph Fourier transform" narrative.

Qin Jiang, Chengjia Wang, Michael Lones +3

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing

Christian Di Maio +71w ago

Book your room in the Turing Hotel! A symmetric and distributed Turing Test with multiple AIs and humans

LLMs in a group Turing Test still make tell-tale mistakes that betray their AI origins, even when their language skills are otherwise convincing.

Christian Di Maio, Tommaso Guidi, L. Quarantiello +5

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

1w ago·also Cornell, Institute of Science Tokyo, LY Corporation, Meiji University +2

Off-Policy Learning with Limited Supply

Greedy off-policy learning, optimal in theory, can fail spectacularly when supplies are limited, but a simple fix—prioritizing items with high *relative* reward—can restore performance.

Koichi Tanaka, Ren Kishimoto, Bushun Kawagishi +4

Natural Language Processing Recommendation & Information Retrieval RLHF & Preference Learning

Xuan Liu +31w ago

Elastic Weight Consolidation Done Right for Continual Learning

EWC, a classic method for continual learning, has been underperforming because it suffers from gradient vanishing and protects the wrong parameters – but a simple "Logits Reversal" trick fixes both.

Xuan Liu, Xuan Liu, Xiaobin Chang +1

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Training Efficiency & Optimization

Hoang-Tien Cao +31w ago

Transformers Learn Robust In-Context Regression under Distributional Uncertainty

Transformers can nail in-context learning for regression even when the data is a mess of non-Gaussian noise, heavy tails, and non-i.i.d. distributions, outperforming classical estimators.

Hoang-Tien Cao, H. Trinh, Tho Quan +1

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing

Chonghan Liu +121w ago

VEPO: Variable Entropy Policy Optimization for Low-Resource Language Foundation Models

Low-resource language models can get a major boost in translation quality and tokenization efficiency by using reinforcement learning to directly enforce structural constraints like sequence length and linguistic well-formedness during training.

Chonghan Liu, Yiming Du, Yimin Du +10

Natural Language Processing RLHF & Preference Learning Training Efficiency & Optimization

1w ago·also Toyota Motor North America

Serendipity by Design: Evaluating the Impact of Cross-domain Mappings on Human and LLM Creativity

Humans get a creativity boost from random analogies, but LLMs are already so creative that the same trick doesn't help—unless you make the analogy really, really weird.

Qiawen Ella Liu, Qiawen Liu, M. Dubova +4

Eval Frameworks & Benchmarks Natural Language Processing

Nitay Alon +41w ago

Proceedings of the 2nd Workshop on Advancing Artificial Intelligence through Theory of Mind

A snapshot of the cutting-edge research uniting Theory of Mind and AI, all in one open-access collection.

Nitay Alon, Joseph M. Barnby, Reuth Mirsky +2

Natural Language Processing Reasoning & Chain-of-Thought Tool Use & Agents

1w ago

Analysis Of Linguistic Stereotypes in Single and Multi-Agent Generative AI Architectures

Chain-of-Thought prompting can reduce LLM bias against African-American English, but only if you pick the right model.

Martina Ullasci, Marco Rondina, Riccardo Coppola +7

Constitutional AI & AI Ethics Natural Language Processing Red-Teaming & Adversarial Robustness

Rudra Jadhav +21w ago

Implicit Grading Bias in Large Language Models: How Writing Style Affects Automated Assessment Across Math, Programming, and Essay Tasks

LLMs penalize informal language in essays so severely that it's like marking a B+ down to a C+, even when explicitly told to ignore writing style.

Rudra Jadhav, Janhavi Danve, Sonalika Shaw

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

Maria Milkova +21w ago

Detecting Basic Values in A Noisy Russian Social Media Text Data: A Multi-Stage Classification Framework

LLMs, when used to annotate social media for human values, systematically overestimate "Openness to Change" compared to human experts, revealing a potential bias in automated value detection.

Maria Milkova, Maria A. Milkova, Maksim Rudnev

Constitutional AI & AI Ethics Data Curation & Synthetic Data Natural Language Processing

Cristian Buttaro +21w ago

Automatic detection of Gen-AI texts: A comparative framework of neural models

Supervised learning models can reliably outperform widely-used commercial AI text detectors, even across different languages and specialized domains like mental health.

Cristian Buttaro, C. Buttaro, Irene Amerini

Architecture Design (Transformers, SSMs, MoE)Eval Frameworks & Benchmarks Natural Language Processing

Esteban Garces Arias +71w ago

The Truncation Blind Spot: How Decoding Strategies Systematically Exclude Human-Like Token Choices

Language model text is detectable because it misses the "long tail" of human word choice, not because it's less intelligent.

Esteban Garces Arias, E. Arias, Nurzhan Sapargali +5

Eval Frameworks & Benchmarks Natural Language Processing

1w ago

Follow the Rules (or Not): Community Norms and AI-Generated Support in Online Health Communities

AI's attempts to provide support in online health communities can backfire by inappropriately conforming to, or outright violating, established community norms.

Shravika Mittal, E. Kasson, Erin Kasson +6

Constitutional AI & AI Ethics Natural Language Processing

Jason Dury1w ago

From Topic to Transition Structure: Unsupervised Concept Discovery at Corpus Scale via Predictive Associative Memory

Move over, topic models: this method discovers functional text categories like "courtroom cross-examination" and "lyrical meditation" by learning how text *does*, not just what it's *about*.

Jason Dury

Data Curation & Synthetic Data Natural Language Processing

Li Wenxiu +71w ago

The Impact of Corporate AI Washing on Farmers'Digital Financial Behavior Response -- An Analysis from the Perspective of Digital Financial Exclusion

Overstating AI capabilities in fintech erodes trust and hinders digital financial inclusion among farmers, particularly those lacking strong social networks.

Li Wenxiu, Wenxiu Li, Zhanjie Wen +5

Constitutional AI & AI Ethics Natural Language Processing

1w ago

SQL-Commenter: Aligning Large Language Models for SQL Comment Generation with Direct Preference Optimization

Forget struggling with cryptic SQL: a new LLM fine-tuned with human preferences generates comments so good, they beat Qwen3-14B by up to 13% on standard metrics.

Lei Yu, Jingyuan Zhang, Xin Wang +5

Code Generation & Program Synthesis Natural Language Processing RLHF & Preference Learning

1w ago

Interplay: Training Independent Simulators for Reference-Free Conversational Recommendation

Escape the scripted feel of simulated conversations: Interplay trains independent user and recommender LLMs that interact in real-time, without pre-defined target items, for more realistic and diverse conversational recommendation data.

Jerome Ramos, Feng Xia, Xi Wang +4

Data Curation & Synthetic Data Natural Language Processing Recommendation & Information Retrieval

Anagani Bhanusree +41w ago

Comparative Analysis of Large Language Models in Generating Telugu Responses for Maternal Health Queries

Prompting language significantly impacts the accuracy and coherence of LLM responses for maternal health queries in Telugu, with GeminiAI favoring English prompts and Perplexity AI preferring Telugu.

Anagani Bhanusree, Sai Divya Vissamsetty, K. Rao +2

Eval Frameworks & Benchmarks Natural Language Processing

Md Takrim Ul Alam +91w ago

Prompt Control-Flow Integrity: A Priority-Aware Runtime Defense Against Prompt Injection in LLM Systems

Stop prompt injections cold: PCFI's priority-aware runtime defense intercepts all attacks in testing with zero false positives and negligible overhead.

Md Takrim Ul Alam, Md Takrim Ul Alam, Akif Islam +7

Constitutional AI & AI Ethics Natural Language Processing Recommendation & Information Retrieval+1

1w ago

Improving RCT-Based Treatment Effect Estimation Under Covariate Mismatch via Calibrated Alignment

Aligning covariates across RCTs and observational studies via calibrated embeddings dramatically improves treatment effect estimation, especially when dealing with nonlinear relationships where traditional imputation struggles.

Amir Asiaee, Amir Asiaee, Samhita Pal +1

Data Curation & Synthetic Data Natural Language Processing

1w ago

Hypothesis-Conditioned Query Rewriting for Decision-Useful Retrieval

Stop retrieving background noise: HCQR refines RAG by generating targeted queries that seek evidence to directly support or refute candidate answers.

Hangeol Chang, Changsu Lee, Changsun Lee +5

Natural Language Processing Reasoning & Chain-of-Thought Recommendation & Information Retrieval

Bruna Alves +21w ago

Unified Taxonomy for Multivariate Time Series Anomaly Detection using Deep Learning

The chaos of MTSAD research gets a little tamer with a new taxonomy that exposes the field's hidden convergence on Transformers and reconstruction, hinting at where the next breakthroughs will come from.

Bruna Alves, Armando J. Pinho, Sónia Gouveia

Architecture Design (Transformers, SSMs, MoE)Computer Vision Natural Language Processing

Xiucheng Wang +21w ago

BeamAgent: LLM-Aided MIMO Beamforming with Decoupled Intent Parsing and Alternating Optimization for Joint Site Selection and Precoding

LLMs can orchestrate complex wireless communication optimization tasks by translating natural language intent into actionable spatial constraints, enabling gradient-based solvers to outperform traditional methods without requiring domain-specific fine-tuning.

Xiucheng Wang, Yue Zhang, Nan Cheng

Code Generation & Program Synthesis Natural Language Processing Tool Use & Agents

Kevin Baum +11w ago

Constitutive vs. Corrective: A Causal Taxonomy of Human Runtime Involvement in AI Systems

The crucial difference between "Human-in-the-Loop" and "Human-on-the-Loop" isn't *where* the human is, but *how* their involvement causally shapes the AI's decisions.

Kevin Baum, Johann Laux

Constitutional AI & AI Ethics Natural Language Processing Scalable Oversight & Alignment Theory

Yogesh Agrawal +51w ago

FinTradeBench: A Financial Reasoning Benchmark for LLMs

LLMs still struggle to reason about financial time-series data, even when they ace the textual fundamentals.

Yogesh Agrawal, Aniruddha Dutta, Mahadi Hasan +3

Eval Frameworks & Benchmarks Natural Language Processing Reasoning & Chain-of-Thought

Yilin Wang +71w ago

DaPT: A Dual-Path Framework for Multilingual Multi-hop Question Answering

Multilingual question answering is harder than you think: even state-of-the-art RAG systems stumble when dealing with questions and knowledge in multiple languages.

Yilin Wang, Yuchun Fan, Jiaoyang Li +5

Eval Frameworks & Benchmarks Natural Language Processing Recommendation & Information Retrieval

Tudor-Dan Mihoc +41w ago·also Babes Bolyai University

Student Views in AI Ethics and Social Impact

Men and women see AI's impact very differently, with implications for how we teach ethics to future AI developers.

Tudor-Dan Mihoc, T. Mihoc, M. Petrescu +2

Constitutional AI & AI Ethics Natural Language Processing

Djamel Bouchaffra +71w ago

NeuroGame Transformer: Gibbs-Inspired Attention Driven by Game Theory and Statistical Physics

By recasting attention as a cooperative game and a statistical physics system, NeuroGame Transformer captures higher-order token dependencies, outperforming standard pairwise attention mechanisms.

Djamel Bouchaffra, D. Bouchaffra, Fayccal Ykhlef +5

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing

Zikang Ding +51w ago

Functional Subspace Watermarking for Large Language Models

LLM watermarks can now survive fine-tuning, quantization, and distillation thanks to a new method that embeds them in a stable functional subspace.

Zikang Ding, Junhao Li, Suling Wu +3

Inference & Quantization Natural Language Processing Open-Source Models & Weights

Pius Horn +21w ago

Benchmarking PDF Parsers on Table Extraction with LLM-based Semantic Evaluation

LLMs beat traditional metrics at judging PDF table extraction quality, finally offering a way to evaluate semantic correctness, not just structural similarity.

Pius Horn, J. Keuper, Janis Keuper

Data Curation & Synthetic Data Eval Frameworks & Benchmarks Natural Language Processing

L. Arkoh +21w ago

Where are the Hidden Gems? Applying Transformer Models for Design Discussion Detection

ChatGPT-4o-mini can spot design discussions in code repositories better than other models, offering a new path to automatically surfacing valuable context for software engineers.

L. Arkoh, Daniel Feitosa, Wesley K. G. Assunccao

Code Generation & Program Synthesis Natural Language Processing

Mar 18, 2026

José Palazzo Moreira de Oliveira2w ago

From Symbol to Meaning: Ontological and Philosophical Reflections on Large Language Models in Information Systems Engineering

LLMs aren't just better tools; they're forcing us to rethink the very nature of information, knowledge, and meaning in system design.

José Palazzo Moreira de Oliveira

Architecture Design (Transformers, SSMs, MoE)Constitutional AI & AI Ethics Natural Language Processing

2w ago

Harm or Humor: A Multimodal, Multilingual Benchmark for Overt and Covert Harmful Humor

Current AI safety filters can't tell a joke from a threat, especially when humor relies on cultural context – this new benchmark exposes that blind spot.

Ahmed Sharshar, Hosam Elgendy, Saad El Dine Ahmed +2

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Multimodal Models+1

2w ago

KA2L: A Knowledge-Aware Active Learning Framework for LLMs

LLMs can be actively trained to master specific knowledge domains with 50% less data and computation by focusing on what they *don't* know, not what they already do.

Haoxuan Yin, Bojian Liu, Chen Tang +3

Data Curation & Synthetic Data Natural Language Processing Training Efficiency & Optimization

Michel Schimpf +22w ago

AI-Assisted Goal Setting Improves Goal Progress Through Social Accountability

AI career coaches can boost short-term goal progress not just through reflection, but by making users feel more socially accountable.

Michel Schimpf, Julian Voigt, Thomas Bohné

Natural Language Processing Tool Use & Agents

Noam H. Rotenberg +22w ago

Classifier Pooling for Modern Ordinal Classification

Unlock the power of your favorite classifier for ordinal data: Classifier Pooling consistently beats standard methods, especially when data is scarce or categories are numerous.

Noam H. Rotenberg, A. V. Faria, Brian S. Caffo

Natural Language Processing Open-Source Models & Weights

Sofía Aguilar-Valdez +12w ago

Modeling Changing Scientific Concepts with Complex Networks: A Case Study on the Chemical Revolution

Forget static embeddings: this paper shows how modeling scientific concepts as evolving complex networks reveals surprising connections between conceptual change and network topology.

Sofía Aguilar-Valdez, Stefania Degaetano-Ortlieb

Interpretability & Mechanistic Interp Natural Language Processing Scientific Discovery & Drug Design

Cem Uluoglakci +12w ago

Inducing Epistemological Humility in Large Language Models: A Targeted SFT Approach to Reducing Hallucination

Teaching LLMs to say "I don't know" is now possible via targeted SFT, slashing hallucination rates without sacrificing performance on other tasks.

Cem Uluoglakci, Tugba Taskaya Temizel

Data Curation & Synthetic Data Eval Frameworks & Benchmarks Natural Language Processing

Gexin Zhao2w ago

Beyond bouba/kiki: Multidimensional semantic signals are deeply woven into the fabric of natural language

LLMs can extract consistent, multidimensional semantic information directly from the phonological structure of language, revealing a non-arbitrary relationship between sound and meaning.

Gexin Zhao

Natural Language Processing Speech & Audio