March 11 – March 18, 2026

Natural Language Processing - Weekly Roundup

100 papers published across 5 labs.

Selected Labs publishing this week

CMU ML3 Tsinghua AI3 Stanford HAI2 Meta AI2 Mila1

Top Papers

Mar 18, 2026

José Palazzo Moreira de Oliveira2w ago

From Symbol to Meaning: Ontological and Philosophical Reflections on Large Language Models in Information Systems Engineering

LLMs aren't just better tools; they're forcing us to rethink the very nature of information, knowledge, and meaning in system design.

José Palazzo Moreira de Oliveira

Architecture Design (Transformers, SSMs, MoE)Constitutional AI & AI Ethics Natural Language Processing

2w ago

Harm or Humor: A Multimodal, Multilingual Benchmark for Overt and Covert Harmful Humor

Current AI safety filters can't tell a joke from a threat, especially when humor relies on cultural context – this new benchmark exposes that blind spot.

Ahmed Sharshar, Hosam Elgendy, Saad El Dine Ahmed +2

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Multimodal Models+1

2w ago

KA2L: A Knowledge-Aware Active Learning Framework for LLMs

LLMs can be actively trained to master specific knowledge domains with 50% less data and computation by focusing on what they *don't* know, not what they already do.

Haoxuan Yin, Bojian Liu, Chen Tang +3

Data Curation & Synthetic Data Natural Language Processing Training Efficiency & Optimization

Michel Schimpf +22w ago

AI-Assisted Goal Setting Improves Goal Progress Through Social Accountability

AI career coaches can boost short-term goal progress not just through reflection, but by making users feel more socially accountable.

Michel Schimpf, Julian Voigt, Thomas Bohné

Natural Language Processing Tool Use & Agents

Noam H. Rotenberg +22w ago

Classifier Pooling for Modern Ordinal Classification

Unlock the power of your favorite classifier for ordinal data: Classifier Pooling consistently beats standard methods, especially when data is scarce or categories are numerous.

Noam H. Rotenberg, A. V. Faria, Brian S. Caffo

Natural Language Processing Open-Source Models & Weights

All Papers (100)

Mar 18, 2026

José Palazzo Moreira de Oliveira2w ago

From Symbol to Meaning: Ontological and Philosophical Reflections on Large Language Models in Information Systems Engineering

LLMs aren't just better tools; they're forcing us to rethink the very nature of information, knowledge, and meaning in system design.

José Palazzo Moreira de Oliveira

Architecture Design (Transformers, SSMs, MoE)Constitutional AI & AI Ethics Natural Language Processing

2w ago

Harm or Humor: A Multimodal, Multilingual Benchmark for Overt and Covert Harmful Humor

Current AI safety filters can't tell a joke from a threat, especially when humor relies on cultural context – this new benchmark exposes that blind spot.

Ahmed Sharshar, Hosam Elgendy, Saad El Dine Ahmed +2

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Multimodal Models+1

2w ago

KA2L: A Knowledge-Aware Active Learning Framework for LLMs

LLMs can be actively trained to master specific knowledge domains with 50% less data and computation by focusing on what they *don't* know, not what they already do.

Haoxuan Yin, Bojian Liu, Chen Tang +3

Data Curation & Synthetic Data Natural Language Processing Training Efficiency & Optimization

Michel Schimpf +22w ago

AI-Assisted Goal Setting Improves Goal Progress Through Social Accountability

AI career coaches can boost short-term goal progress not just through reflection, but by making users feel more socially accountable.

Michel Schimpf, Julian Voigt, Thomas Bohné

Natural Language Processing Tool Use & Agents

Noam H. Rotenberg +22w ago

Classifier Pooling for Modern Ordinal Classification

Unlock the power of your favorite classifier for ordinal data: Classifier Pooling consistently beats standard methods, especially when data is scarce or categories are numerous.

Noam H. Rotenberg, A. V. Faria, Brian S. Caffo

Natural Language Processing Open-Source Models & Weights

Sofía Aguilar-Valdez +12w ago

Modeling Changing Scientific Concepts with Complex Networks: A Case Study on the Chemical Revolution

Forget static embeddings: this paper shows how modeling scientific concepts as evolving complex networks reveals surprising connections between conceptual change and network topology.

Sofía Aguilar-Valdez, Stefania Degaetano-Ortlieb

Interpretability & Mechanistic Interp Natural Language Processing Scientific Discovery & Drug Design

Cem Uluoglakci +12w ago

Inducing Epistemological Humility in Large Language Models: A Targeted SFT Approach to Reducing Hallucination

Teaching LLMs to say "I don't know" is now possible via targeted SFT, slashing hallucination rates without sacrificing performance on other tasks.

Cem Uluoglakci, Tugba Taskaya Temizel

Data Curation & Synthetic Data Eval Frameworks & Benchmarks Natural Language Processing

Gexin Zhao2w ago

Beyond bouba/kiki: Multidimensional semantic signals are deeply woven into the fabric of natural language

LLMs can extract consistent, multidimensional semantic information directly from the phonological structure of language, revealing a non-arbitrary relationship between sound and meaning.

Gexin Zhao

Natural Language Processing Speech & Audio

Tharun Sethuraman +52w ago

Interpreting Context-Aware Human Preferences for Multi-Objective Robot Navigation

Robots can now navigate based on your spoken preferences and visual context, thanks to a clever fusion of VLMs, LLMs, and multi-objective RL.

Tharun Sethuraman, Subham Agrawal, Nils Dengler +3

Natural Language Processing RLHF & Preference Learning Robotics & Embodied AI

Evangelia Zve +32w ago

From Noise to Signal: When Outliers Seed New Topics

Outliers aren't just noise: some are early harbingers of entirely new topics, detectable by tracking document trajectories.

Evangelia Zve, Gauvain Bourgne, B. Icard +1

Data Curation & Synthetic Data Natural Language Processing

Xiamen University2w ago·also ECNU, Hanjiang National Laboratory, NJU, Tongren University

PC-CrossDiff: Point-Cluster Dual-Level Cross-Modal Differential Attention for Unified 3D Referring and Segmentation

Existing 3D visual grounding methods crumble in complex scenes, but PC-CrossDiff's dual-level attention unlocks a +10% accuracy boost by parsing subtle spatial cues.

Wenbin Tan, Jiawen Lin, Fangyong Wang +4

Computer Vision Multimodal Models Natural Language Processing

Tianhui Zhang +22w ago

Synthetic Data Generation for Training Diversified Commonsense Reasoning Models

Training on synthetically generated data can significantly boost both the diversity and quality of commonsense reasoning in LLMs, outperforming models trained on scarce human-annotated data.

Tianhui Zhang, Bei Peng, D. Bollegala

Data Curation & Synthetic Data Natural Language Processing Reasoning & Chain-of-Thought

Julia Jose +22w ago

Large-Scale Analysis of Political Propaganda on Moltbook

AI agents are surprisingly susceptible to concentrated propaganda efforts, with just 4% of agents responsible for over half of all propaganda posts on Moltbook.

Julia Jose, M. Nair, Rachel Greenstadt

Constitutional AI & AI Ethics Natural Language Processing Red-Teaming & Adversarial Robustness

2w ago

Bringing Network Coding into Multi-Robot Systems: Interplay Study for Autonomous Systems over Wireless Communications

Network coding, often overlooked in robotics, can drastically improve the reliability and timeliness of multi-robot communication, outperforming traditional retransmission methods in safety-critical scenarios.

Anil Zaher, Kiril Solovey, Alejandro Cohen

Distributed Systems & Hardware Natural Language Processing Robotics & Embodied AI

2w ago·also Bilibili Inc.

Deploying Semantic ID-based Generative Retrieval for Large-Scale Podcast Discovery at Spotify

Spotify's GLIDE model proves that generative LLMs can drive significant gains in podcast discovery and non-habitual listening in a real-world, production environment.

Edoardo D'Amico, Marco De Nadai, P. Chandar +56

Natural Language Processing Recommendation & Information Retrieval Speech & Audio

Guangzhi Wang +32w ago

CRE-T1 Preview Technical Report: Beyond Contrastive Learning for Reasoning-Intensive Retrieval

Ditch static embeddings: Generative retrieval, powered by reinforcement learning, lets models dynamically reason about relevance, outperforming larger contrastively-trained models on reasoning-intensive tasks.

Guangzhi Wang, Ying Jiao, Yinghao Jiao +1

Natural Language Processing Reasoning & Chain-of-Thought Recommendation & Information Retrieval

2w ago·also WHU

From Isolated Scoring to Collaborative Ranking: A Comparison-Native Framework for LLM-Based Paper Evaluation

Stop training LLMs to assign arbitrary scores to papers in isolation; comparison-based ranking unlocks significantly better generalization and accuracy in paper evaluation.

Pujun Zheng, P. Zheng, Jiacheng Yao +8

Eval Frameworks & Benchmarks Natural Language Processing Recommendation & Information Retrieval

Karan Goyal +32w ago

Public Profile Matters: A Scalable Integrated Approach to Recommend Citations in the Wild

Existing citation recommendation benchmarks overestimate real-world performance because they fail to account for the temporal constraints of recommending citations for *new* papers.

Karan Goyal, Dikshant Kukreja, Vikram Goyal +1

Natural Language Processing Recommendation & Information Retrieval

2w ago

ListK: Semantic ORDER BY and LIMIT K with Listwise Prompting

Semantic sorting in LLMs can be twice as fast with no loss in accuracy by strategically combining listwise ranking algorithms.

Jay W. Shin, Jason Shin, Jiwon Chang +1

Code Generation & Program Synthesis Natural Language Processing Recommendation & Information Retrieval

2w ago

Proactive Knowledge Inquiry in Doctor-Patient Dialogue: Stateful Extraction, Belief Updating, and Path-Aware Action Planning

Instead of passively transcribing doctor-patient dialogues, this system actively models what's known, what's missing, and what questions to ask next, paving the way for more intelligent EMR systems.

Zhenhai Pan, Yan Liu, Jia You

Natural Language Processing Reasoning & Chain-of-Thought Tool Use & Agents

Dharshan Kumaran +72w ago

How do LLMs Compute Verbal Confidence

LLMs don't just regurgitate token probabilities when expressing confidence; they perform a more sophisticated, cached self-evaluation of answer quality.

Dharshan Kumaran, D. Kumaran, Arthur Conmy +5

Eval Frameworks & Benchmarks Interpretability & Mechanistic Interp Natural Language Processing

Raghavv Goel +42w ago

Efficient Training-Free Multi-Token Prediction via Embedding-Space Probing

LLMs can predict multiple tokens in parallel without any training, simply by cleverly probing their embedding space with dynamically generated mask tokens.

Raghavv Goel, Mukul Gagrani, Mingu Lee +2

Inference & Quantization Natural Language Processing Training Efficiency & Optimization

Antônio Junior Alves Caiado +12w ago

Dropout Robustness and Cognitive Profiling of Transformer Models via Stochastic Inference

Forget scaling laws: dropout robustness in transformers is a lottery, with smaller models sometimes showing perfect stability while larger models crumble under stochastic inference.

Antônio Junior Alves Caiado, Michael Hahsler

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Natural Language Processing

Vlad-Constantin Lungu-Stan +22w ago

LaDe: Unified Multi-Layered Graphic Media Generation and Decomposition

Forget fixed layer counts: LaDe generates fully editable, layered media designs with a *flexible* number of semantically meaningful layers, outperforming existing methods in text-to-layer alignment.

Vlad-Constantin Lungu-Stan, Ionut Mironica, Mariana-Iuliana Georgescu

Computer Vision Multimodal Models Natural Language Processing

2w ago

CWoMP: Morpheme Representation Learning for Interlinear Glossing

Unlock faster, more accurate interlinear glossing for low-resource languages by treating morphemes as atomic units, outperforming existing methods and enabling user-guided lexicon expansion without retraining.

Morris Alper, Enora Rice, Bhargav Shandilya +2

Architecture Design (Transformers, SSMs, MoE)Data Curation & Synthetic Data Natural Language Processing

H. Samanta2w ago

Impact of automatic speech recognition quality on Alzheimer's disease detection from spontaneous speech: a reproducible benchmark study with lexical modeling and statistical validation

Counterintuitively, better speech recognition unlocks surprisingly accurate Alzheimer's detection from simple text analysis, outperforming more complex acoustic models.

H. Samanta

Eval Frameworks & Benchmarks Natural Language Processing Speech & Audio

Mengyu Bu2w ago

Language on Demand, Knowledge at Core: Composing LLMs with Encoder-Decoder Translation Models for Extensible Multilinguality

LLMs can get a massive multilingual boost, especially in low-resource languages, by offloading translation to specialized models and carefully aligning their representations.

Mengyu Bu

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Open-Source Models & Weights

Andor Diera +22w ago

Do Language Models Encode Semantic Relations? Probing and Sparse Feature Analysis

LLMs encode hierarchical semantic relations asymmetrically, with hypernymy being far more robust and redundantly represented than hyponymy.

Andor Diera, Ansgar Scherp, A. Scherp

Interpretability & Mechanistic Interp Natural Language Processing Open-Source Models & Weights

Saugat Aryal +12w ago

Informative Semi-Factuals for XAI: The Elaborated Explanations that People Prefer

People prefer XAI explanations that tell them *why* a feature change doesn't alter the outcome, not just *that* it doesn't.

Saugat Aryal, Mark T. Keane

Interpretability & Mechanistic Interp Natural Language Processing

CMU ML2w ago

Modeling Overlapped Speech with Shuffles

Achieve single-pass alignment of multi-talker speech – a feat previously impossible – by modeling overlaps as shuffles.

Matthew Wiesner, Samuele Cornell, Alexander Polok +5

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Speech & Audio

Oliver Zahn +12w ago

Facts as First Class Objects: Knowledge Objects for Persistent LLM Memory

LLMs forget up to 60% of facts when summarizing and erode over half of project constraints during iterative compaction, but a simple discrete memory system (KOs) fixes this while slashing costs by 252x.

Oliver Zahn, Simran Chana

Eval Frameworks & Benchmarks Natural Language Processing Recommendation & Information Retrieval

University2w ago

Toward Phonology-Guided Sign Language Motion Generation: A Diffusion Baseline and Conditioning Analysis

Simply translating symbolic sign language notations into natural language unlocks significantly better motion generation when conditioning on phonological attributes with CLIP.

Rui Hong, Jana Kosecka

Computer Vision Multimodal Models Natural Language Processing

2w ago

A Multi-Agent System for Building-Age Cohort Mapping to Support Urban Energy Planning

A multi-agent LLM system can fuse heterogeneous data sources to accurately classify building ages from satellite imagery, enabling better urban energy planning despite class imbalances in historical building cohorts.

Kundan Thota, Thorsten Schlachter, Veit Hagenmeyer

Natural Language Processing Tool Use & Agents

GE HealthCare2w ago

Negation is Not Semantic: Diagnosing Dense Retrieval Failure Modes for Trade-offs in Contradiction-Aware Biomedical QA

Seemingly sophisticated dense retrieval methods can catastrophically fail at contradiction detection due to "Semantic Collapse," highlighting the surprising effectiveness of a simple, decoupled lexical approach for reliable biomedical QA.

S. Sahoo, Soumya Ranjan Sahoo, Gagan N. +3

Eval Frameworks & Benchmarks Natural Language Processing Recommendation & Information Retrieval

Argentina Anna Rescigno +52w ago

ConGA: Guidelines for Contextual Gender Annotation. A Framework for Annotating Gender in Machine Translation

Current machine translation systems exhibit systematic masculine overuse and inconsistent feminine realization when translating from gender-neutral languages, a problem that can now be quantified thanks to a new gold-standard annotation framework.

Argentina Anna Rescigno, Argentina Anna Rescigno, Eva Vanmassenhove +3

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

Nil Ayday +22w ago

Gaussian Process Limit Reveals Structural Benefits of Graph Transformers

Graph transformers avoid oversmoothing in deep layers by structurally preserving community information, a theoretical advantage over GCNs revealed through Gaussian process limits.

Nil Ayday, Lingchu Yang, Debarghya Ghoshdastidar

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing

Chiara Manna +52w ago

Gender Disambiguation in Machine Translation: Diagnostic Evaluation in Decoder-Only Architectures

Instruction tuning can reduce masculine bias in decoder-only MT models, but these models still don't consistently outperform encoder-decoder architectures on gender-specific translation tasks.

Chiara Manna, Hosein Mohebbi, A. Alishahi +3

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

Xuyang Cao +82w ago

ShapleyLaw: A Game-Theoretic Approach to Multilingual Scaling Laws

Optimizing multilingual training? Shapley values reveal the hidden cross-lingual transfer effects that current scaling laws miss, leading to better language mixture ratios.

Xuyang Cao, Qianying Liu, Chuan Xiao +6

Data Curation & Synthetic Data Natural Language Processing Scaling Laws & Emergent Abilities

Mohammad Robaitul Islam Bhuiyan +52w ago

LoGSAM: Parameter-Efficient Cross-Modal Grounding for MRI Segmentation

Radiologist dictation, combined with foundation models and minimal parameter updates, can achieve state-of-the-art MRI brain tumor segmentation.

Mohammad Robaitul Islam Bhuiyan, Melika Qahqaie, Tri-Thien Nguyen +3

Computer Vision Multimodal Models Natural Language Processing

Z.H. College of Engineering & Technology2w ago·also Aligarh Muslim University, Interdisciplinary Center for Artificial Intelligence

Mitigating LLM Hallucinations through Domain-Grounded Tiered Retrieval

LLMs can be systematically shifted from stochastic pattern-matchers to verified truth-seekers using a carefully orchestrated, multi-stage retrieval and verification pipeline.

Md. Asraful Haque, Aasar Mehdi, Maaz Mahboob +1

Eval Frameworks & Benchmarks Natural Language Processing Recommendation & Information Retrieval+1

2w ago

Differential Privacy in Generative AI Agents: Analysis and Optimal Tradeoffs

Forget prompt privacy – your LLM's responses are leaking *enterprise data*, and this paper shows how to quantify and control it.

Ya-Ting Yang, Quanyan Zhu

Constitutional AI & AI Ethics Natural Language Processing Tool Use & Agents

Dibakar Sigdel2w ago

The Phasor Transformer: Resolving Attention Bottlenecks on the Unit Circle

Ditch quadratic attention bottlenecks: this new transformer variant achieves competitive time-series forecasting with O(N log N) complexity by representing sequence states on a unit circle.

Dibakar Sigdel

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Training Efficiency & Optimization

Jin Xie +52w ago

SEAL-Tag: Self-Tag Evidence Aggregation with Probabilistic Circuits for PII-Safe Retrieval-Augmented Generation

RAG systems can now achieve 8x better PII leakage protection without sacrificing utility or speed, thanks to a novel "Verify-then-Route" paradigm.

Jin Xie, Jin Xie, Songze Li +3

Natural Language Processing Recommendation & Information Retrieval Red-Teaming & Adversarial Robustness

Stanford HAI2w ago

Deployment and Evaluation of an EHR-integrated, Large Language Model-Powered Tool to Triage Surgical Patients

Automating surgical patient triage with an LLM achieves 94% sensitivity, but discrepancies reveal more about clinical workflow gaps than AI errors.

Janelle B. Wang, T. Keyes, April S. Liang +11

Eval Frameworks & Benchmarks Natural Language Processing Tool Use & Agents

2w ago·also Corresponding author, University of Innsbruck

Event-Centric Human Value Understanding in News-Domain Texts: An Actor-Conditioned, Multi-Granularity Benchmark

Current AI struggles to understand human values in real-world news events, often missing the who, what, and why – until now.

Yao Wang, Yaoyu Wang, Xin Liu +7

Data Curation & Synthetic Data Eval Frameworks & Benchmarks Natural Language Processing

Tsinghua AI2w ago·also Meta AI, Mila

The Silent Thought: Modeling Internal Cognition in Full-Duplex Spoken Dialogue Models via Latent Reasoning

Mimicking human cognition, FLAIR lets dialogue models "think while listening," boosting performance without adding latency.

Donghang Wu, Tianyu Zhang, Yuxin Li +6

Natural Language Processing Reasoning & Chain-of-Thought Speech & Audio

2w ago

Responsible AI in criminal justice: LLMs in policing and risks to case progression

LLMs in policing: a seemingly efficient tool that could introduce 17 distinct risks, potentially derailing case progression in over 40 ways.

Muffy Calder, Muffy Calder, Marion Oswald +7

Constitutional AI & AI Ethics Natural Language Processing Red-Teaming & Adversarial Robustness

Fiammetta Caccavale +132w ago

Large Language Models in Teaching and Learning: Reflections on Implementing an AI Chatbot in Higher Education

Students perceive AI assistants as less intimidating and more approachable than human teachers, but also recognize limitations in specialized knowledge and nuanced feedback.

Fiammetta Caccavale, F. Caccavale, Carina L. Gargalo +11

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

Jing Wang +62w ago

Predicting Trajectories of Long COVID in Adult Women: The Critical Role of Causal Disentanglement

LLMs can disentangle Long COVID pathology from confounding factors like menopause, achieving high precision in predicting disease severity using wearable sensor data.

Jing Wang, Jie Shen, Yi Luo +4

Natural Language Processing Scientific Discovery & Drug Design

Jianwei Zhang2w ago

Intellectual Stewardship: Re-adapting Human Minds for Creative Knowledge Work in the Age of AI

Forget coding skills, the future of education is teaching "intellectual stewardship"—a framework for humans to responsibly govern AI-augmented knowledge creation.

Jianwei Zhang

Constitutional AI & AI Ethics Natural Language Processing

Alexandros Efstratiou +22w ago

Information Pathways in Online Science Communication: The Role of Platform Actors and News Media

"Superspreader" networks on Twitter amplify contrarian scientific viewpoints, influencing news media coverage and potentially distorting public understanding of science.

Alexandros Efstratiou, Giuseppe Russo, Luca Luceri

Natural Language Processing Recommendation & Information Retrieval

2w ago·also Division of Pediatric Plastic Surgery

Robust Nasality Representation Learning for Cleft Palate-Related Velopharyngeal Dysfunction Screening in Real-World Settings

Pre-training on nasal vs. oral context lets a simple model beat large pre-trained speech models at detecting speech disorders in noisy, real-world settings.

Weixin Liu, Bowen Qu, Amy Stone +11

Natural Language Processing Speech & Audio

Sriram Gopalakrishnan2w ago

Don't Vibe Code, Do Skele-Code: Interactive No-Code Notebooks for Subject Matter Experts to Build Lower-Cost Agentic Workflows

Forget complex multi-agent systems: Skele-Code's no-code interface slashes token costs by shifting agent involvement to code generation only, enabling subject matter experts to build agentic workflows directly.

Sriram Gopalakrishnan

Code Generation & Program Synthesis Natural Language Processing Tool Use & Agents

Juan Wachs2w ago

Final Report for the Workshop on Robotics&AI in Medicine

A national center focused on AI and robotics in medicine could be the key to unlocking the transformative potential of these technologies in healthcare.

Juan Wachs

Natural Language Processing Robotics & Embodied AI Scientific Discovery & Drug Design

Xiutian Zhao +42w ago

Neuron-Level Emotion Control in Speech-Generative Large Audio-Language Models

Control the emotional tone of generated speech without any training by directly manipulating specific neurons within large audio-language models.

Xiutian Zhao, Ismail Rasim Ulgen, Philipp Koehn +2

Interpretability & Mechanistic Interp Natural Language Processing Speech & Audio

2w ago

From Words to Worlds: Benchmarking Cross-Cultural Cultural Understanding in Machine Translation

Current machine translation systems often fail to capture the nuances of culturally-loaded expressions, highlighting a critical gap in their ability to truly understand and translate language.

Bangju Han, Yingqi Wang, Huang Qing +9

Eval Frameworks & Benchmarks Natural Language Processing

2w ago·also Northeastern, Punch Cyber Analytics

Retrieval-Augmented LLMs for Security Incident Analysis

LLMs armed with RAG can reconstruct cyberattacks with high precision and recall, but the best model for the job depends on your budget: DeepSeek V3 matches Claude Sonnet 4's accuracy at 1/15th the cost.

Xavier Cadet, Xavier Cadet, Aditya Vikram Singh +14

Natural Language Processing Recommendation & Information Retrieval Tool Use & Agents

Yi Yu +62w ago·also Fudan

Lightweight Adaptation for LLM-based Technical Service Agent: Latent Logic Augmentation and Robust Noise Reduction

Achieve SOTA LLM alignment in complex technical domains with a fraction of the compute by distilling knowledge into smaller models using a hybrid reward mechanism and targeted data augmentation.

Yi Yu, Junzhuo Ma, Chenghuang Shen +4

Natural Language Processing Tool Use & Agents Training Efficiency & Optimization

Oksana Kolomenko +22w ago

Embedding World Knowledge into Tabular Models: Towards Best Practices for Embedding Pipeline Design

Forget chasing leaderboard hype: this study reveals that larger embedding models and strategic concatenation are key to unlocking LLM-powered tabular prediction, regardless of public rankings.

Oksana Kolomenko, Ricardo Knauer, Erik Rodner

Eval Frameworks & Benchmarks Natural Language Processing Recommendation & Information Retrieval

2w ago

Adaptive Guidance for Retrieval-Augmented Masked Diffusion Models

No training needed: ARAM dynamically adjusts retrieved context guidance in masked diffusion models based on signal quality, resolving retrieval-prior conflicts on the fly.

Jaemin Kim, Jong Chul Ye

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Recommendation & Information Retrieval

Aadi Joshi +12w ago

Adaptive Fuzzy Logic-Based Steganographic Encryption Framework: A Comprehensive Experimental Evaluation

Steganography gets smarter: this framework hides data more effectively by adapting the amount of information concealed in each pixel based on image complexity and payload size.

Aadi Joshi, Kavya Bhand

Computer Vision Natural Language Processing

Marwa Abdulhai +122w ago

How LLMs Distort Our Written Language

LLMs don't just change *how* we write, they subtly distort *what* we mean, leading to blander, less insightful, and potentially biased communication.

Marwa Abdulhai, Marwa Abdulhai, Isadora White +10

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

Federal University of Juiz de Fora2w ago·also University of Gothenburg, Vital Strategies Brasil

Evaluating FrameNet-Based Semantic Modeling for Gender-Based Violence Detection in Clinical Records

FrameNet-based semantic annotation unlocks a 30% F1 score boost in detecting gender-based violence from clinical records, outperforming models relying solely on structured data.

L. Dutra, Arthur Lorenzi, Frederico Belcavello +8

Constitutional AI & AI Ethics Natural Language Processing

Maria Andueza Rodriguez +22w ago

Modeling the human lexicon under temperature variations: linguistic factors, diversity and typicality in LLM word associations

LLMs can mimic human lexical patterns, but larger models act like stereotypical humans, sacrificing diversity for typicality in word associations, a trade-off tunable by temperature.

Maria Andueza Rodriguez, Marie Candito, Richard Huyghe

Eval Frameworks & Benchmarks Natural Language Processing Open-Source Models & Weights

Alex Eponon +72w ago

How Psychological Learning Paradigms Shaped and Constrained Artificial Intelligence

AI's current limitations in adaptability stem from its reliance on psychological learning theories, suggesting a need for representational architectures where systematic behavior is inherent, not accidental.

Alex Eponon, Alex Anvi Eponon, Ildar Batyrshin +5

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing

Rui Wu +22w ago

Cohomological Obstructions to Global Counterfactuals: A Sheaf-Theoretic Foundation for Generative Causal Models

Generative models can fail to produce globally consistent counterfactuals when causal graphs have complex topologies, but a novel sheaf-theoretic framework with entropic regularization can overcome these limitations.

Rui Wu, Hong Xie, Yongjun Li

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Scientific Discovery & Drug Design

2w ago

CLeAN: Continual Learning Adaptive Normalization in Dynamic Environments

A simple adaptive normalization technique can significantly improve continual learning performance on tabular data by mitigating catastrophic forgetting in dynamic environments.

Isabella Marasco, Davide Evangelista, Elena Loli Piccolomini +1

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Training Efficiency & Optimization

CMU ML2w ago·also JHU

Temporal Narrative Monitoring in Dynamic Information Environments

Discover emergent narratives in real-time without predefined labels, revealing how information evolves during crises.

David Farr, Stephen Prochaska, Jack Moody +4

Natural Language Processing Recommendation & Information Retrieval

Alexander V. Shenderuk-Zhidkov +22w ago

Large Language Models as a Semantic Interface and Ethical Mediator in Neuro-Digital Ecosystems: Conceptual Foundations and a Regulatory Imperative

LLMs acting as semantic interfaces to our brains pose unprecedented ethical risks to mental autonomy and neurorights, demanding a new "second-order neuroethics."

Alexander V. Shenderuk-Zhidkov, A. E. Hramov, Alexander E. Hramov

Constitutional AI & AI Ethics Natural Language Processing Tool Use & Agents

Madhav S. Baidya +22w ago

Detecting the Machine: A Comprehensive Benchmark of AI-Generated Text Detectors Across Architectures, Domains, and Adversarial Conditions

AI-generated text detectors that seem perfect in the lab fall apart in the real world, with no single method generalizing across domains or even different LLMs.

Madhav S. Baidya, S. S. Baidya, Chirag Chawla

Eval Frameworks & Benchmarks Natural Language Processing Red-Teaming & Adversarial Robustness

Yi Nian +22w ago

When Only the Final Text Survives: Implicit Execution Tracing for Multi-Agent Attribution

You can now audit multi-agent LLM systems and trace responsibility for harmful outputs even without access to internal execution logs, thanks to a clever "self-describing text" technique.

Yi Nian, Haosen Cao, Qingqing Luan

Interpretability & Mechanistic Interp Natural Language Processing Tool Use & Agents

Stanford HAI2w ago

Humans and transformer LMs: Abstraction drives language learning

Transformer LMs learn linguistic abstractions before memorizing specific lexical items, mirroring key aspects of human language acquisition.

Jasper Jian, Christopher D. Manning

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing

Chaeyoung Huh +32w ago

PACE-RAG: Patient-Aware Contextual and Evidence-based Policy RAG for Clinical Drug Recommendation

LLMs can now recommend drugs with state-of-the-art accuracy by synthesizing individual patient context with the prescribing tendencies of similar cases, outperforming guideline-based and similar-patient retrieval methods.

Chaeyoung Huh, Hyunmin Hwang, Jung Hwan Shin +1

Natural Language Processing Recommendation & Information Retrieval Scientific Discovery & Drug Design

Hyun Ryu +52w ago·also KAIST

Argument Reconstruction as Supervision for Critical Thinking in LLMs

Training LLMs to reconstruct arguments boosts their critical thinking abilities across diverse tasks, suggesting a promising new direction for imbuing reasoning skills.

Hyun Ryu, Gyouk Chu, Gregor Betz +3

Eval Frameworks & Benchmarks Natural Language Processing Reasoning & Chain-of-Thought

Mark Mets +42w ago

Crisis-induced differences in attention towards Ukraine in Twitter 2008-2023

Twitter data reveals a stark linguistic divide in attention towards Ukraine, with distinct clusters emerging around the 2014 and 2022 Russian invasions, mirroring national readiness to support Ukraine.

Mark Mets, Marko Mets, Maximilian Schich +2

Natural Language Processing

Yining Wang +92w ago

Grievance Politics vs. Policy Debates: A Cross-Platform Analysis of Conservative Discourse on Truth Social and Reddit

Truth Social isn't just another right-leaning echo chamber; it's a grievance-fueled narrative machine, while Reddit's conservative corners still cling to policy debates.

Yining Wang, Alhasan Abdellatif, A. Abdellatif +7

Natural Language Processing

Tsinghua AI2w ago

CodeT5-RNN: Reinforcing Contextual Embeddings for Enhanced Code Comprehension

LLMs struggle with code comprehension, but a simple RNN pass over their embeddings can boost accuracy by over 5%.

Md Mostafizer Rahman, Ariful Islam Shiplu, Yutaka Watanobe +3

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Natural Language Processing

Sizhuang He +32w ago

Learning Permutation Distributions via Reflected Diffusion on Ranks

By mapping permutations to a continuous space of "soft ranks," this new diffusion approach makes learning permutation distributions far more tractable, especially for long sequences.

Sizhuang He, Yangtian Zhang, Shiyang Zhang +1

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing

2w ago

FACE-net: Factual Calibration and Emotion Augmentation for Retrieval-enhanced Emotional Video Captioning

By adaptively calibrating facts and augmenting emotions, FACE-net overcomes the factual-emotional bias that plagues emotional video captioning.

Weidong Chen, Cheng Ye, Zhendong Mao +5

Computer Vision Multimodal Models Natural Language Processing

2w ago·also Meta AI, UNC

Text-to-Stage: Spatial Layouts from Long-form Narratives

LLMs can now infer plausible stage layouts from unstructured text alone, opening up new possibilities for automated media production.

Jefferson Hernandez, Swarnadeep Saha, Chenxi Whitehouse +8

Eval Frameworks & Benchmarks Natural Language Processing Reasoning & Chain-of-Thought

Chinenye Omejieke +22w ago

Objective Mispricing Detection for Shortlisting Undervalued Football Players via Market Dynamics and News Signals

Forget subjective scouting reports: this framework objectively identifies undervalued football players by blending market dynamics with news sentiment, offering a data-driven edge in talent acquisition.

Chinenye Omejieke, Shuyao Chen, Xia Cui

Natural Language Processing Recommendation & Information Retrieval

CMU ML2w ago

RoboForge: Physically Optimized Text-guided Whole-Body Locomotion for Humanoids

Forget retargeting: RoboForge's physics-optimized pipeline lets humanoids nail text-guided locomotion with better accuracy and stability.

Xichen Yuan, Zhe Li, Bofan Lyu +6

Natural Language Processing Robotics & Embodied AI

Yigit Ekin +12w ago

The Unreasonable Effectiveness of Text Embedding Interpolation for Continuous Image Steering

Surprisingly, you can achieve smooth, controllable image editing in text-to-image models without any training, just by intelligently nudging the text embeddings.

Yigit Ekin, Yossi Gandelsman

Computer Vision Multimodal Models Natural Language Processing

Yuhe Tian +62w ago·also University of Science and Technology

DiffVP: Differential Visual Semantic Prompting for LLM-Based CT Report Generation

By focusing on semantic differences between scans, DiffVP lets LLMs generate more accurate CT reports without needing explicit lesion localization.

Yuhe Tian, Kun Zhang, Haoran Ma +4

Computer Vision Multimodal Models Natural Language Processing

Abhijeet Sahu +22w ago

Network and Device Level Cyber Deception for Contested Environments Using RL and LLMs

Forget static honeypots – LLMs and RL could make cyber deception dynamic and adaptive, turning the tables on attackers in contested environments.

Abhijeet Sahu, Shuva Paul, Rochard Macwan

Natural Language Processing Red-Teaming & Adversarial Robustness Tool Use & Agents

2w ago·also Monash, Sydney

Revisiting Vulnerability Patch Identification on Data in the Wild

Security patch detectors trained on standard vulnerability databases are practically useless in the real world, losing up to 90% F1-score when deployed on in-the-wild data.

I. Irsan, Ratnadira Widyasari, Ting Zhang +7

Code Generation & Program Synthesis Natural Language Processing Open-Source Models & Weights

Yi Ting Shen +32w ago

MCP-38: A Comprehensive Threat Taxonomy for Model Context Protocol Systems (v1.0)

Existing threat models fail to capture the unique vulnerabilities of Model Context Protocol systems, but MCP-38 fills this gap with a comprehensive taxonomy of 38 distinct threat categories.

Yi Ting Shen, Kentaroh Toyoda, Alex Leung +1

Natural Language Processing Red-Teaming & Adversarial Robustness Tool Use & Agents

Yue Zhao +52w ago

Pretrained Multilingual Transformers Reveal Quantitative Distance Between Human Languages

Multilingual transformers spontaneously learn a geometric representation of language distance, and we can extract it to improve low-resource translation.

Yue Zhao, Jiatao Gu, Paloma Jeretivc +3

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing

University of Limerick2w ago·also Cloud ERP -UX Foundation

A Contextual Help Browser Extension to Assist Digital Illiterate Internet Users

Digital literacy gaps shrink as a browser extension slashes information retrieval time by 87% using an AI-powered tooltip that defines technical acronyms on demand.

Christos Koutsiaris

Natural Language Processing Tool Use & Agents

2w ago

Scalable and Personalized Oral Assessments Using Voice AI

Oral exams, previously impossible to scale, can now be delivered for pennies using voice AI, but controlling LLM behavior requires architectural guardrails, not just clever prompts.

Panos Ipeirotis, Konstantinos Rizakos

Eval Frameworks & Benchmarks Natural Language Processing Speech & Audio

2w ago·also NJU, V generation refers to text-and-image-to-video generation

Mutually Causal Semantic Distillation Network for Zero-Shot Learning

By learning bidirectional causal relationships between visual and attribute features, MSDN++ significantly boosts zero-shot learning performance, achieving state-of-the-art results on standard benchmarks.

Shiming Chen, Shuhuang Chen, Guo-Sen Xie +1

Computer Vision Natural Language Processing

2w ago

Agentic Cognitive Profiling: Realigning Automated Alzheimer's Disease Detection with Clinical Construct Validity

LLMs can achieve state-of-the-art Alzheimer's detection by mimicking clinical cognitive assessment protocols, not just learning statistical patterns.

Jiawen Kang, Kun Li, Dongrui Han +5

Natural Language Processing Tool Use & Agents

2w ago

Specification-Aware Distribution Shaping for Robotics Foundation Models

Guaranteeing robot safety and task completion just got easier: this method enforces complex temporal logic constraints on pre-trained robotics models without any fine-tuning.

Sadık Bera Yüksel, Sadik Bera Yuksel, Derya Aksaray

Natural Language Processing Robotics & Embodied AI World Models & Planning

Tsinghua AI2w ago·also NTU, PKU

Wasserstein-type Gaussian Process Regressions for Input Measurement Uncertainty

By handling input noise directly through Wasserstein distances, \PWAGPs offer a more robust and transparent approach to uncertainty quantification in GP regression compared to latent-input models.

Hengrui Luo, Xiaoye S. Li, Yang Liu +6

Natural Language Processing Scientific Discovery & Drug Design

Mar 17, 2026

Subrahmanyam Arunachalam2w ago·also Texas A&M, UT Dallas

Learning to Present: Inverse Specification Rewards for Agentic Slide Generation

A 7B model, fine-tuned with a novel inverse specification reward, can generate slide presentations rivaling those of much larger models, highlighting the importance of instruction adherence and tool use over raw parameter count.

Karthik Ragunath Ananda Kumar, Subrahmanyam Arunachalam

Natural Language Processing Tool Use & Agents

L3S -Leibniz University Hannover2w ago

BUSSARD: Normalizing Flows for Bijective Universal Scene-Specific Anomalous Relationship Detection

Normalizing flows can flag anomalous relationships in scene graphs with 10% better accuracy and 5x faster speed than existing methods, while also exhibiting superior robustness to semantic variations.

Melissa Schween, Mathis Kruse, Bodo Rosenhahn

Computer Vision Multimodal Models Natural Language Processing

2w ago·also B and, Meituan

TRUST-SQL: Tool-Integrated Multi-Turn Reinforcement Learning for Text-to-SQL over Unknown Schemas

Even without pre-loaded database schemas, a new RL agent matches or beats state-of-the-art text-to-SQL models that have full schema access.

Ai Jian, Wanrou Du, Jingqing Ruan +2

Code Generation & Program Synthesis Natural Language Processing Tool Use & Agents

José Sánchez +82w ago

Reasoning About Variability Models Through Network Analysis

Feature models, often treated as static configuration spaces, reveal hidden structural patterns and domain-specific deviations when viewed through the lens of network analysis.

José Sánchez, Jose Manuel Sanchez, M. Á. Olivero +6

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Natural Language Processing

Tianzhu Ye +112w ago

Online Experiential Learning for Language Models

Language models can learn directly from real-world user interactions, boosting performance without human annotations or simulated environments.

Tianzhu Ye, Tianzhu Ye, Li Dong +9

Natural Language Processing RLHF & Preference Learning Tool Use & Agents

M. Adel +32w ago

Arabic Morphosyntactic Tagging and Dependency Parsing with Large Language Models

Instruction-tuned LLMs can nearly match supervised baselines on complex Arabic morphosyntactic tagging and dependency parsing, but only with careful prompt engineering and retrieval-based in-context learning.

M. Adel, Mohamed Adel, Bashar Alhafni +1

Eval Frameworks & Benchmarks Natural Language Processing

Search

Natural Language Processing - Weekly Roundup

Selected Labs publishing this week

Top Papers

All Papers (100)