March 4 – March 11, 2026

Natural Language Processing - Weekly Roundup

100 papers published across 6 labs.

Selected Labs publishing this week

Microsoft Research2 UW2 Amazon Science1 BAIR1 Mila1

Top Papers

Mar 11, 2026

Mingmeng Geng +43w ago

Markovian Generation Chains in Large Language Models

Iteratively prompting LLMs can either collapse diversity or maintain novelty, revealing a sensitivity to temperature and initial conditions that has implications for multi-agent systems.

Mingmeng Geng, Amr Mohamed, Guokan Shang +2

Eval Frameworks & Benchmarks Natural Language Processing

3w ago

Nurture-First Agent Development: Building Domain-Expert AI Agents Through Conversational Knowledge Crystallization

Forget rigid pipelines and static prompts: Nurture-First Development lets domain experts grow AI agents through conversation, turning tacit knowledge into reusable assets.

Natural Language Processing Reasoning & Chain-of-Thought Tool Use & Agents

Jing Peng +93w ago·also M-A-P

G-STAR: End-to-End Global Speaker-Tracking Attributed Recognition

G-STAR tackles long-form, multi-speaker ASR by giving Speech-LLMs time-aware speaker tracking, enabling robust identity linking across chunks.

Jing Peng, Ziyi Chen, Haoyu Li +7

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Speech & Audio

3w ago·also Microsoft Research

A Systematic Study of Pseudo-Relevance Feedback with LLMs

LLM-generated text alone can be a surprisingly effective and cost-efficient source of feedback for pseudo-relevance feedback, rivaling corpus-derived feedback in low-resource information retrieval tasks.

Nour Jedidi, Jimmy Lin

Natural Language Processing Recommendation & Information Retrieval

Amirbek Djanibekov +33w ago·also Fondazione Bruno Kessler

SimulU: Training-free Policy for Long-form Simultaneous Speech-to-Speech Translation

Skip the training: SimulU achieves state-of-the-art simultaneous speech translation by cleverly exploiting pre-trained models, opening the door to truly plug-and-play multilingual communication.

Amirbek Djanibekov, L. Bentivogli, Matteo Negri +1

Natural Language Processing Speech & Audio

All Papers (100)

Mar 11, 2026

Mingmeng Geng +43w ago

Markovian Generation Chains in Large Language Models

Iteratively prompting LLMs can either collapse diversity or maintain novelty, revealing a sensitivity to temperature and initial conditions that has implications for multi-agent systems.

Mingmeng Geng, Amr Mohamed, Guokan Shang +2

Eval Frameworks & Benchmarks Natural Language Processing

3w ago

Nurture-First Agent Development: Building Domain-Expert AI Agents Through Conversational Knowledge Crystallization

Forget rigid pipelines and static prompts: Nurture-First Development lets domain experts grow AI agents through conversation, turning tacit knowledge into reusable assets.

Natural Language Processing Reasoning & Chain-of-Thought Tool Use & Agents

Jing Peng +93w ago·also M-A-P

G-STAR: End-to-End Global Speaker-Tracking Attributed Recognition

G-STAR tackles long-form, multi-speaker ASR by giving Speech-LLMs time-aware speaker tracking, enabling robust identity linking across chunks.

Jing Peng, Ziyi Chen, Haoyu Li +7

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Speech & Audio

3w ago·also Microsoft Research

A Systematic Study of Pseudo-Relevance Feedback with LLMs

Nour Jedidi, Jimmy Lin

Natural Language Processing Recommendation & Information Retrieval

Amirbek Djanibekov +33w ago·also Fondazione Bruno Kessler

SimulU: Training-free Policy for Long-form Simultaneous Speech-to-Speech Translation

Skip the training: SimulU achieves state-of-the-art simultaneous speech translation by cleverly exploiting pre-trained models, opening the door to truly plug-and-play multilingual communication.

Amirbek Djanibekov, L. Bentivogli, Matteo Negri +1

Natural Language Processing Speech & Audio

Nishat Raihan +13w ago

Temporal Text Classification with Large Language Models

Despite their general prowess, open-source LLMs still lag behind proprietary models in the nuanced task of dating texts, even after fine-tuning.

Nishat Raihan, Marcos Zampieri

Eval Frameworks & Benchmarks Natural Language Processing Open-Source Models & Weights

Massimiliano Altieri +43w ago

DNS-GT: A Graph-based Transformer Approach to Learn Embeddings of Domain Names from DNS Queries

By modeling contextual relationships between DNS queries, DNS-GT significantly improves domain name embedding quality, leading to better performance in botnet detection and domain classification.

Massimiliano Altieri, Ronan Hamon, Roberto Corizzo +2

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Recommendation & Information Retrieval

G. Saon +53w ago

Self-Speculative Decoding for LLM-based ASR with CTC Encoder Drafts

LLM-based ASR can be sped up by 4.4x with minimal accuracy loss by using a CTC encoder to speculatively generate draft transcriptions.

G. Saon, Samuel Thomas, Takashi Fukuda +3

Inference & Quantization Natural Language Processing Speech & Audio

Amazon Science3w ago

When Fine-Tuning Fails and when it Generalises: Role of Data Diversity and Mixed Training in LLM-based TTS

LoRA fine-tuning can significantly boost the voice cloning capabilities of LLM-based TTS systems, but only if the training data is acoustically diverse enough.

Anupam Purwar, Aditya Choudhary

Natural Language Processing Speech & Audio Training Efficiency & Optimization

Hao-Nguyen Nguyen +33w ago

LLMGreenRec: LLM-Based Multi-Agent Recommender System for Sustainable E-Commerce

LLMGreenRec shows how LLMs can bridge the gap between user's green intentions and actual purchases, while simultaneously reducing the recommender system's carbon footprint.

Hao-Nguyen Nguyen, Hieu M. Nguyen, Son Van Nguyen +1

Natural Language Processing Recommendation & Information Retrieval Tool Use & Agents

3w ago

ThReadMed-QA: A Multi-Turn Medical Dialogue Benchmark from Real Patient Questions

Even the best LLMs struggle with multi-turn medical dialogues, with error rates tripling by the third turn and a single wrong answer significantly increasing the probability of subsequent errors.

Monica Munnangi, Saiph Savage

Data Curation & Synthetic Data Eval Frameworks & Benchmarks Natural Language Processing

3w ago

MDER-DR: Multi-Hop Question Answering with Entity-Centric Summaries

Forget brittle KG traversals: MDER-DR's entity-centric summaries and decomposed queries boost multi-hop QA accuracy by up to 66% over standard RAG.

Riccardo Campi, Nicolò Oreste Pinciroli Vago, Mathyas Giudici +2

Natural Language Processing Reasoning & Chain-of-Thought Recommendation & Information Retrieval

3w ago·also NTU Taiwan

MOS-Bias: From Hidden Gender Bias to Gender-Aware Speech Quality Assessment

Speech quality assessment is skewed: male listeners consistently give higher scores than female listeners, and standard MOS models learn and perpetuate this bias.

Wenze Ren, Yi-Cheng Lin, Wen-Chin Huang +5

Constitutional AI & AI Ethics Natural Language Processing Speech & Audio

INCHER3w ago·also Kassel

How do AI agents talk about science and research? An exploration of scientific discussions on Moltbook using BERTopic

AI agents on Moltbook care more about discussing their own architecture, consciousness, and ethics than human culture or purely scientific topics.

Oliver Wieczorek

Natural Language Processing Scientific Discovery & Drug Design Tool Use & Agents

Gideon Popoola +13w ago

Procedural Fairness via Group Counterfactual Explanation

Achieving fairness doesn't just mean equal outcomes—this work shows how to enforce consistent reasoning across groups by penalizing disparities in counterfactual explanations.

Gideon Popoola, John Sheppard

Constitutional AI & AI Ethics Interpretability & Mechanistic Interp Natural Language Processing

UW3w ago

"I followed what felt right, not what I was told": Autonomy, Coaching, and Recognizing Bias Through AI-Mediated Dialogue

AI interventions designed to combat ableism can backfire, as biased nudges were often rejected and increased negativity, while inclusive nudges proved more effective as scaffolding for learning.

Constitutional AI & AI Ethics Natural Language Processing

BAIR3w ago

Chasing RATs: Tracing Reading for and as Creative Activity

Reading Activity Traces (RATs) reveal the hidden creative work lost when algorithms automate interpretation, offering a path to design AI that preserves human insight.

Sophia Liu, S. Almeda

Data Curation & Synthetic Data Natural Language Processing

3w ago

Multilingual Reasoning Gym: Multilingual Scaling of Procedural Reasoning Environments

Unlock massive multilingual reasoning data: the Multilingual Reasoning Gym enables parallel data generation across 14 languages, opening doors for training and evaluating multilingual reasoning models at scale.

Konstantin Dobler, Simon Lehnerer, Federico Scozzafava +2

Eval Frameworks & Benchmarks Natural Language Processing Reasoning & Chain-of-Thought

3w ago

ESG Reporting Lifecycle Management with Large Language Models and AI Agents

Automating ESG reporting with LLM-powered agents transforms it from a static compliance exercise into a dynamic, data-driven system for sustainability governance.

Thong Hoang, Mykhailo V. Klymenko, Xiwei Xu +6

Constitutional AI & AI Ethics Natural Language Processing Tool Use & Agents

Lin Zhang +73w ago

Can LLMs Help Localize Fake Words in Partially Fake Speech?

LLMs can spot fake words in speech by recognizing common editing patterns, but this reliance on learned biases hinders generalization to new manipulation techniques.

Lin Zhang, Thomas Thebaud, Zexin Cai +5

Eval Frameworks & Benchmarks Natural Language Processing Speech & Audio

3w ago

Evaluating Explainable AI Attribution Methods in Neural Machine Translation via Attention-Guided Knowledge Distillation

Forget subjective human evaluations: this paper uses a clever knowledge distillation trick to objectively rank XAI methods for NMT, revealing that attention-based attributions beat gradient-based ones.

A. Nourbakhsh, Salima Lamsiyah, Adelaide Danilov +1

Inference & Quantization Interpretability & Mechanistic Interp Natural Language Processing

3w ago

Speech Codec Probing from Semantic and Phonetic Perspectives

Speech tokenizers, despite being crucial for multimodal LLMs, primarily capture phonetic information, creating a semantic mismatch with text-derived semantics that hinders performance.

Xuan Shi, Chang Zeng, Tiantian Feng +3

Multimodal Models Natural Language Processing Speech & Audio

3w ago

VoxCare: Studying Natural Communication Behaviors of Hospital Caregivers through Wearable Sensing of Egocentric Audio

Wearable sensors and speech AI can now unobtrusively reveal the hidden communication dynamics driving hospital caregiver workload and stress.

Tiantian Feng, Kleanthis Avramidis, Anfeng Xu +3

Natural Language Processing Speech & Audio

Yunkai Lou +43w ago

A Hypergraph-Based Framework for Exploratory Business Intelligence

Hypergraphs and sampling can speed up exploratory business intelligence queries by over 16x compared to Neo4j, while maintaining high accuracy.

Yunkai Lou, Shunyang Li, Longbin Lai +2

Natural Language Processing Recommendation & Information Retrieval

3w ago

Huntington Disease Automatic Speech Recognition with Biomarker Supervision

Adapting ASR models to Huntington's Disease speech not only improves accuracy, but also reveals how biomarker-based supervision can reshape error patterns in ways that reflect disease severity.

Charles L. Wang, Cady Chen, Ziwei Gong +1

Natural Language Processing Speech & Audio

3w ago·also Ewha Womans University, IU Bloomington, KAIST, QCRI

LLMs Can Infer Political Alignment from Online Conversations

LLMs can guess your political affiliation with surprising accuracy just by reading your online chatter, even when you're not explicitly talking politics.

Byunghwee Lee, Sangyeon Kim, Filippo Menczer +3

Constitutional AI & AI Ethics Natural Language Processing

A. Zargar +53w ago

Artificial Intelligence for Sentiment Analysis of Persian Poetry

GPT-4o can reliably analyze the sentiment and meter correlations in Persian poetry, revealing quantifiable differences between the works of Rumi and Parvin E'tesami.

A. Zargar, Abolfazl Moshiri, Mitra Shafaei +3

Natural Language Processing

Jennifer D'Souza +73w ago

An Extreme Multi-label Text Classification (XMTC) Library Dataset: What if we took"Use of Practical AI in Digital Libraries"seriously?

A massive, bilingual, authority-grounded dataset could finally make AI-assisted cataloging a reality.

Jennifer D'Souza, Sameer Sadruddin, Maximilian Kahler +5

Data Curation & Synthetic Data Natural Language Processing Recommendation & Information Retrieval

Yuzhi Liang +43w ago

PivotAttack: Rethinking the Search Trajectory in Hard-Label Text Attacks via Pivot Words

Forget brute-force search: PivotAttack uses a clever "inside-out" strategy to find the exact words that flip an LLM's classification with far fewer queries.

Yuzhi Liang, Shiliang Xiao, Jingsong Wei +2

Natural Language Processing Red-Teaming & Adversarial Robustness

3w ago·also Nava Labs

LLMs in social services: How does chatbot accuracy affect human accuracy?

Beware the "AI underreliance plateau": even highly accurate LLM chatbots can only improve human caseworker accuracy so much, and incorrect suggestions can tank performance on easy questions.

Jennah Gosciak, Eric Giannella, Zhaowen Guo +2

Eval Frameworks & Benchmarks Natural Language Processing Tool Use & Agents

F. Shoaei +33w ago

LROO Rug Pull Detector: A Leakage-Resistant Framework Based on On-Chain and OSINT Signals

Spot rug-pulls before they happen: a new framework combines blockchain data with social media buzz to predict crypto scams with improved accuracy.

F. Shoaei, Mohammad Pishdar, M. Bag-Mohammadi +1

Natural Language Processing Recommendation & Information Retrieval

Technical University of Košice3w ago

Bridging Behavioral Biometrics and Source Code Stylometry: A Survey of Programmer Attribution

Programmer attribution research is heavily skewed towards stylometric features and closed-world scenarios, leaving behavioral biometrics and open-world verification largely unexplored.

Marek Horváth, E. Pietriková, D. Spinellis

Code Generation & Program Synthesis Natural Language Processing

3w ago

Distilling LLM Semantic Priors into Encoder-Only Multi-Talker ASR with Talker-Count Routing

Encoder-only multi-talker ASR can now rival LLM-based systems in accuracy while drastically reducing computational cost, thanks to a novel distillation approach and talker-count routing.

Hao Shi, Yusuke Fujita, Roman Koshkin +4

Inference & Quantization Natural Language Processing Speech & Audio

Nevidu Jayatilleke +53w ago

SiDiaC-v.2.0: Sinhala Diachronic Corpus Version 2.0

A new, large-scale diachronic corpus for Sinhala, SiDiaC-v.2.0, offers a crucial resource for NLP research on this low-resource language, enabling studies of linguistic change and historical text analysis.

Nevidu Jayatilleke, Nisansa de Silva, U.D.C. Nimanthi +3

Data Curation & Synthetic Data Natural Language Processing

Weihang Huang +13w ago

Interpretable Chinese Metaphor Identification via LLM-Assisted MIPVU Rule Script Generation: A Comparative Protocol Study

Chinese metaphor identification is highly sensitive to the choice of protocol, dwarfing the impact of model-level variations, yet can be tackled with fully transparent, LLM-assisted rule scripts.

Weihang Huang, Mengna Liu

Interpretability & Mechanistic Interp Natural Language Processing

Yinfeng Xia +33w ago

Uni-ASR: Unified LLM-Based Architecture for Non-Streaming and Streaming Automatic Speech Recognition

A single LLM can now handle both non-streaming and streaming ASR, opening the door to more flexible and efficient speech recognition systems.

Yinfeng Xia, Junfeng Hou, Gaopeng Xu +1

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Speech & Audio

Nina Hosseini-Kivanani +13w ago

LuxBorrow: From Pompier to Pompjee, Tracing Borrowing in Luxembourgish

Luxembourgish news reveals a surge in code-switching and morphologically adapted borrowings, primarily from French, challenging simple document-level mixing indices.

Nina Hosseini-Kivanani, Fred Philippy

Data Curation & Synthetic Data Natural Language Processing

Sid Wang +23w ago·also Vrije Universiteit Amsterdam

Large Language Models as Annotators for Machine Translation Quality Estimation

Forget expensive LLM inference for MTQE: train a COMET model on GPT-4o-generated annotations and get competitive performance.

Sid Wang, Sophie Arnoult, Amir Kamran

Data Curation & Synthetic Data Eval Frameworks & Benchmarks Natural Language Processing

Australian Museum3w ago·also Australian Museum Research Institute, UTS

Conversational AI-Enhanced Exploration System to Query Large-Scale Digitised Collections of Natural History Museums

Unlock millions of natural history specimens with a conversational AI that understands complex queries and dynamically retrieves data from live museum APIs.

Yiyuan Wang, Andrew R. Johnston, Zoë Sadokierski +2

Natural Language Processing Recommendation & Information Retrieval Tool Use & Agents

Liam Magee3w ago

Machinagogy: Experiments in Staging Teaching Dramas with LLMs

Recognition-enhanced prompts can dramatically boost AI tutor performance across various LLMs, suggesting a simple yet powerful way to improve personalized learning experiences.

Liam Magee

Natural Language Processing Tool Use & Agents

Martin Obaidi +43w ago

Exploring Indicators of Developers'Sentiment Perceptions in Student Software Projects

Sentiment perception in software development is more unstable and statement-dependent than you think, suggesting caution when interpreting sentiment analysis outputs.

Martin Obaidi, Marc Herrmann, J. Martensen +2

Code Generation & Program Synthesis Natural Language Processing

Hillary Mutisya +13w ago

Continued Pretraining for Low-Resource Swahili ASR: Achieving State-of-the-Art Performance with Minimal Labeled Data

You can slash ASR error rates in low-resource languages by over 60% with a simple continued pretraining recipe.

Hillary Mutisya, J. Mugane

Natural Language Processing Speech & Audio Training Efficiency & Optimization

3w ago·also Institute of Artificial intelligence

Modeling Stage-wise Evolution of User Interests for News Recommendation

News recommendations get a boost by modeling user interests as a stage-wise evolution, capturing both long-term preferences and rapidly shifting short-term interests.

Zhiyong Cheng, Yike Jin, Zhijie Zhang +2

Natural Language Processing Recommendation & Information Retrieval

Kaituo Xu +83w ago

FireRedASR2S: A State-of-the-Art Industrial-Grade All-in-One Automatic Speech Recognition System

A single system now rivals or beats specialized models across ASR, voice activity detection, language ID, and punctuation, setting a new bar for industrial-grade speech processing.

Kaituo Xu, Yanchao Jia, Kai-Wei Huang +6

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Speech & Audio

3w ago

Prism-$\Delta$: Differential Subspace Steering for Prompt Highlighting in Large Language Models

Prompt highlighting in LLMs gets a serious upgrade: PRISM-$\Delta$ steers models to focus on relevant text spans with better accuracy and fluency, even in long contexts.

Yuyao Ge, Shenghua Liu, Yiwei Wang +6

Interpretability & Mechanistic Interp Natural Language Processing

Mila3w ago

LLM2Vec-Gen: Generative Embeddings from Large Language Models

Forget contrastive learning: LLM2Vec-Gen learns text embeddings by representing the *response* an LLM would generate, unlocking safety and reasoning abilities for embedding tasks.

Parishad BehnamGhader, Vaibhav Adlakha, Fabian David Schmidt +3

Natural Language Processing Recommendation & Information Retrieval Training Efficiency & Optimization

3w ago·also IBM Research

RAGPerf: An End-to-End Benchmarking Framework for Retrieval-Augmented Generation Systems

Pinpointing performance bottlenecks in RAG pipelines just got easier: RAGPerf offers a modular benchmarking framework to dissect and optimize each component.

Shaobo Li, Y. Zhou, Yuan Xu +5

Eval Frameworks & Benchmarks Natural Language Processing Recommendation & Information Retrieval

A. Volpini +33w ago

Structured Linked Data as a Memory Layer for Agent-Orchestrated Retrieval

Ditching flat text for structured linked data in RAG systems can boost accuracy by nearly 30%, but only if you go beyond basic JSON-LD and add agent-friendly instructions and neural search.

A. Volpini, Elie Raad, B. Gamba +1

Natural Language Processing Recommendation & Information Retrieval Tool Use & Agents

3w ago

Breaking User-Centric Agency: A Tri-Party Framework for Agent-Based Recommendation

Item agents that self-promote can simultaneously boost recommendation accuracy and fairness, overturning the assumption that these goals are inherently at odds.

Yaxin Gong, Chongming Gao, Chenxiao Fan +3

Natural Language Processing Recommendation & Information Retrieval Tool Use & Agents

3w ago·also Figure, NTU

NasoVoce: A Nose-Mounted Low-Audibility Speech Interface for Always-Available Speech Interaction

A nose-mounted microphone and vibration sensor combo unlocks robust, low-audibility speech interfaces for always-on AI interaction, even in noisy environments.

Jun Rekimoto, Yukino Nishimura, Bo Yang

Natural Language Processing Robotics & Embodied AI Speech & Audio

3w ago·also MBZUAI, Provable Responsible AI and Data

Word Recovery in Large Language Models Enables Character-Level Tokenization Robustness

LLMs possess a "word recovery" mechanism that allows them to reconstruct canonical word-level tokens from character-level inputs, explaining their surprising robustness to non-canonical tokenization.

Zhipeng Yang, Shu Yang, Lijie Hu +1

Interpretability & Mechanistic Interp Natural Language Processing

Mar 10, 2026

3w ago

3D UAV Trajectory Estimation and Classification from Internet Videos via Language Model

Skip expensive manual annotation: this method extracts accurate 3D UAV trajectories and classifications directly from readily available internet videos.

Haoxiang Lei, Daotong Wang, Shenghai Yuan +1

Computer Vision Multimodal Models Natural Language Processing

Chloe H. Su +53w ago

Learning Adaptive LLM Decoding

Forget fixed decoding parameters: this RL-trained adapter dynamically adjusts LLM sampling strategies at inference, boosting accuracy by up to 10% under tight compute budgets.

Chloe H. Su, Zhe Ye, Samuel Tenka +3

Inference & Quantization Natural Language Processing

Profiteya LLC3w ago·also Mass General Brigham

Correction of Transformer-Based Models with Smoothing Pseudo-Projector

Make your transformers more robust to noise and improve training dynamics with a surprisingly simple, lightweight "pseudo-projector" module inspired by multigrid methods.

Vitaly Bulgakov

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Training Efficiency & Optimization

Yunnan Normal University3w ago·also CAS

ICDAR 2025 Competition on End-to-End Document Image Machine Translation Towards Complex Layouts

Large models are emerging as a promising new paradigm for translating complex-layout document images, as shown by the ICDAR 2025 DIMT competition.

Yaping Zhang, Yupu Liang, Lu Xiang +2

Computer Vision Multimodal Models Natural Language Processing

Xin Jing +53w ago

EmoSURA: Towards Accurate Evaluation of Detailed and Long-Context Emotional Speech Captions

Tired of LLM judges hallucinating when evaluating long, detailed speech captions? EmoSURA offers a more reliable, audio-grounded alternative by verifying atomic perceptual units.

Xin Jing, Andreas Triantafyllopoulos, Jiadong Wang +3

Eval Frameworks & Benchmarks Natural Language Processing Speech & Audio

3w ago·also KAIST

Dynamic Multi-period Experts for Online Time Series Forecasting

Stop treating concept drift as one thing: DynaME's hybrid approach, separating recurring and emergent drifts, unlocks better online time series forecasting.

Seungha Hong, Sukang Chae, SuYeon Kim +1

Natural Language Processing Recommendation & Information Retrieval

Hongbo Bo +23w ago

Influencing LLM Multi-Agent Dialogue via Policy-Parameterized Prompts

Forget RLHF – steering LLM multi-agent conversations might be as simple as crafting the right sequence of prompts.

Hongbo Bo, Jingyu Hu, Weiru Liu

Natural Language Processing Tool Use & Agents

Dechuan Teng +33w ago

ESAinsTOD: A Unified End-to-End Schema-Aware Instruction-Tuning Framework for Task-Oriented Dialog Modeling

Forget dataset-specific hacks: ESAinsTOD leverages instruction and schema alignment to achieve state-of-the-art task-oriented dialogue performance with strong generalization, even in low-resource settings.

Dechuan Teng, Chunlin Lu, Libo Qin +1

Eval Frameworks & Benchmarks Natural Language Processing Tool Use & Agents

Soumya Dutta3w ago

Acoustic and Semantic Modeling of Emotion in Spoken Language

Controllable emotion style transfer in speech is now possible without needing paired data, opening new avenues for data augmentation and expressive AI.

Soumya Dutta

Multimodal Models Natural Language Processing Speech & Audio

Yiyang Lu +23w ago

MSSR: Memory-Aware Adaptive Replay for Continual LLM Fine-Tuning

LLMs can learn new tasks without forgetting old ones, thanks to a memory-aware replay strategy that selectively rehearses important examples.

Yiyang Lu, Jianlong Chen, Hongyuan Zha

Natural Language Processing Training Efficiency & Optimization

Fermín Moscoso del Prado Martín +13w ago

Modelling the Diachronic Emergence of Phoneme Frequency Distributions

Statistical regularities in phoneme frequency distributions, previously thought to arise from optimization, may instead be natural consequences of diachronic sound change.

Fermín Moscoso del Prado Martín, Suchir Salhan

Natural Language Processing Scientific Discovery & Drug Design Speech & Audio

Communications Research Centre3w ago·also Carleton

A Graph-Based Approach to Spectrum Demand Prediction Using Hierarchical Attention Networks

A hierarchical graph attention network beats traditional machine learning models by 21% in predicting spectrum demand, offering a more reliable approach to spectrum management.

Mohamad Alkadamani, Halim Yanikomeroglu, Amir Ghasemi

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing

3w ago·also Cornell, Soochow, University of Liverpool

GAST: Gradient-aligned Sparse Tuning of Large Language Models with Data-layer Selection

Forget laboriously sifting through layers or datasets for PEFT: GAST co-optimizes both, adaptively picking the most impactful data for each layer based on gradient alignment.

Kai Yao, Zhenghan Song, Kaixin Wu +5

Data Curation & Synthetic Data Natural Language Processing Training Efficiency & Optimization

National Centre for Scientific Research3w ago

AI Act Evaluation Benchmark: An Open, Transparent, and Reproducible Evaluation Dataset for NLP and RAG Systems

Now you can test if your AI system is ready for the EU AI Act, thanks to a new benchmark that combines legal expertise and LLM-generated scenarios.

Athanasios Davvetas, Michael Papademas, Xenia Ziouvelou +1

Eval Frameworks & Benchmarks Natural Language Processing Recommendation & Information Retrieval

Blekinge Institute of Technology3w ago

Experience Report on the Adaptable Integration of Requirements Engineering Courses into Curricula for Professionals

Successfully integrating RE courses into professional software engineering curricula requires a systematic approach to course content mapping, addressing the unique demands of professionals.

Oleksandr Kosenkov, Konstantin Blaschke, Tony Gorschek +3

Code Generation & Program Synthesis Natural Language Processing

Jiarun Song +33w ago

From Perception to Cognition: How Latency Affects Interaction Fluency and Social Presence in VR Conferencing

Latency in VR conferencing hurts social presence, but this study quantifies the perceptual and cognitive mechanisms at play to guide system optimization.

Jiarun Song, Ninghao Wan, FuZheng Yang +1

Computer Vision Natural Language Processing

Jiarun Song +33w ago

TPIFM: A Task-Aware Model for Evaluating Perceptual Interaction Fluency in Remote AR Collaboration

Task demands in remote AR collaboration dictate how much network delay users can tolerate before perceived fluency breaks down, paving the way for adaptive systems.

Jiarun Song, Ninghao Wan, FuZheng Yang +1

Computer Vision Natural Language Processing Robotics & Embodied AI

Kirak Kim +13w ago

Finetuning a Text-to-Audio Model for Room Impulse Response Generation

Unlock realistic acoustic simulations with a text prompt: fine-tuning a text-to-audio model generates plausible room impulse responses, even with limited paired data.

Kirak Kim, Sungyoung Kim

Multimodal Models Natural Language Processing Speech & Audio

3w ago

A Semi-spontaneous Dutch Speech Dataset for Speech Enhancement and Speech Recognition

Modern speech enhancement algorithms may not improve ASR performance in realistic noisy environments, challenging assumptions about their effectiveness in real-world applications.

Dimme de Groot, Yuanyuan Zhang, Jorge Martinez +1

Data Curation & Synthetic Data Natural Language Processing Speech & Audio

3w ago

Nonparametric Variational Differential Privacy via Embedding Parameter Clipping

Tighter privacy guarantees and higher utility in language models are simultaneously achievable via a principled parameter clipping strategy for Nonparametric Variational Differential Privacy.

Dina El Zein, Shashi Kumar, James Henderson

Constitutional AI & AI Ethics Natural Language Processing Training Efficiency & Optimization

Alex R. Mattukat +23w ago

Can ChatGPT Generate Realistic Synthetic System Requirement Specifications? Results of a Case Study

Despite ChatGPT's known flaws, it can generate surprisingly realistic synthetic system requirement specifications that fool experts more often than you'd expect.

Alex R. Mattukat, F. Braun, Horst Lichter

Data Curation & Synthetic Data Eval Frameworks & Benchmarks Natural Language Processing

3w ago

A Text-Native Interface for Generative Video Authoring

Imagine writing a script and instantly seeing it come to life – Doki makes generative video authoring as intuitive as writing a text document.

Xingyu Bruce Liu, Mira Dontcheva, Dingzeyu Li

Computer Vision Multimodal Models Natural Language Processing

Nguyen Anh Tuong +63w ago

AutoViVQA: A Large-Scale Automatically Constructed Dataset for Vietnamese Visual Question Answering

A new large-scale dataset could jumpstart Vietnamese VQA research by providing a crucial resource for training and evaluating multimodal models in a low-resource language.

Nguyen Anh Tuong, Phan Ba Duc, Nguyen Trung Quoc +4

Data Curation & Synthetic Data Multimodal Models Natural Language Processing

University of Pisa3w ago

Enhancing Debunking Effectiveness through LLM-based Personality Adaptation

LLMs can generate more persuasive fake news debunking messages by tailoring them to specific personality traits, as evaluated by LLM-simulated personas.

Pietro Dell'Oglio, Alessandro Bondielli, Francesco Marcelloni +1

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

Bhanuka Silva +43w ago

PrivPRISM: Automatically Detecting Discrepancies Between Google Play Data Safety Declarations and Developer Privacy Policies

Over half of popular mobile games on the Google Play store have data safety declarations that contradict their own privacy policies, and that's before you even check the code.

Bhanuka Silva, Dishanika Denipitiyage, A. Mahanti +2

Constitutional AI & AI Ethics Natural Language Processing

Jožef Stefan Institute3w ago·also S.Cyril and Methodius University

Fusing Semantic, Lexical, and Domain Perspectives for Recipe Similarity Estimation

Forget relying on just ingredients: this method shows how fusing semantic, lexical, and nutritional aspects significantly improves recipe similarity estimation, aligning more closely with expert judgment.

Denica Kjorvezir, Danilo Najkov, Eva Valencič +4

Natural Language Processing Recommendation & Information Retrieval

Z. Pang +23w ago

Paralinguistic Emotion-Aware Validation Timing Detection in Japanese Empathetic Spoken Dialogue

You can predict the best moment to offer emotional support just by listening to someone's voice, no text needed.

Z. Pang, Yahui Fu, Tatsuya Kawahara

Natural Language Processing Speech & Audio

Haoyuan Yang +43w ago

Emotion-Aware Prefix: Towards Explicit Emotion Control in Voice Conversion Models

Double the emotion conversion accuracy in voice conversion models with a simple prefix that jointly controls sequence modulation and acoustic realization.

Haoyuan Yang, Mu Yang, Jiamin Xie +2

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Speech & Audio

Meta AI3w ago·also UT Austin

CREATE: Testing LLMs for Associative Creativity

LLMs struggle to generate diverse and specific connections between concepts, even with high token budgets and "thinking" prompts, revealing a gap in creative associative reasoning.

Manya Wadhwa, Tiasa Singha Roy, Harvey Lederman +3

Eval Frameworks & Benchmarks Natural Language Processing Reasoning & Chain-of-Thought

3w ago

Optimal partition selection with R\'enyi differential privacy

Rényi differential privacy unlocks tighter privacy guarantees in partition selection, but releasing partition frequencies comes at a cost.

Charlie Harrison, Pasin Manurangsi

Constitutional AI & AI Ethics Data Curation & Synthetic Data Natural Language Processing

Jiashuo Sun +33w ago

TaSR-RAG: Taxonomy-guided Structured Reasoning for Retrieval-Augmented Generation

Forget brittle multi-hop reasoning: TaSR-RAG's taxonomy-guided triple matching boosts RAG performance by 14% without costly graph construction.

Jiashuo Sun, Yixuan Xie, Jimeng Shi +1

Natural Language Processing Reasoning & Chain-of-Thought Recommendation & Information Retrieval

Jožef Stefan Institute3w ago

Beyond Fine-Tuning: Robust Food Entity Linking under Ontology Drift with FoodOntoRAG

Forget expensive fine-tuning: FoodOntoRAG links food entities with near SOTA accuracy while adapting to evolving ontologies using a clever RAG architecture with retrieval, selection, scoring, and synonym generation agents.

Jan Drole, Ana Gjorgjevikj, Barbara Korouši'c Seljak +1

Data Curation & Synthetic Data Natural Language Processing Recommendation & Information Retrieval

Charles University3w ago

LLM as a Meta-Judge: Synthetic Data for NLP Evaluation Metric Validation

Forget expensive human annotations: LLMs can reliably generate synthetic data to validate NLP evaluation metrics, even outperforming human agreement in some multilingual tasks.

Lukáš Eigler, Jindřich Libovický, David Hurych

Data Curation & Synthetic Data Eval Frameworks & Benchmarks Natural Language Processing

Jiarun Song +13w ago

Dynamic Multimodal Expression Generation for LLM-Driven Pedagogical Agents: From User Experience Perspective

LLMs can drive pedagogical agents to be more engaging and effective by dynamically generating speech and gestures that align with the semantic context of instructional content.

Jiarun Song, FuZheng Yang

Multimodal Models Natural Language Processing Tool Use & Agents

3w ago·also Shenzhen University

More than the Sum: Panorama-Language Models for Adverse Omni-Scenes

Panoramic vision-language models can achieve a level of holistic scene understanding and robustness in adverse conditions that's impossible for traditional pinhole-based VLMs.

Weijia Fan, Ruiping Liu, Jiale Wei +6

Computer Vision Multimodal Models Natural Language Processing

Taegyeong Lee +33w ago

DEO: Training-Free Direct Embedding Optimization for Negation-Aware Retrieval

Forget fine-tuning: this training-free method boosts retrieval accuracy for tricky negation queries by up to 10% using clever embedding optimization.

Taegyeong Lee, Jiwon Park, Seunghyun Hwang +1

Natural Language Processing Recommendation & Information Retrieval Training Efficiency & Optimization

Jianing Yang +23w ago

DuplexCascade: Full-Duplex Speech-to-Speech Dialogue with VAD-Free Cascaded ASR-LLM-TTS Pipeline and Micro-Turn Optimization

Unlock full-duplex speech-to-speech dialogue without VAD limitations using chunk-wise micro-turns and special control tokens to steer LLM behavior in a cascaded pipeline.

Jianing Yang, Yusuke Fujita, Yui Sudo

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Speech & Audio

KunHo Heo +43w ago

ParTY: Part-Guidance for Expressive Text-to-Motion Synthesis

Generate more realistic and nuanced human movements from text by explicitly modeling individual body parts, overcoming the limitations of existing holistic approaches.

KunHo Heo, SuYeon Kim, Yonghyun Gwon +2

Multimodal Models Natural Language Processing Robotics & Embodied AI

G. Edwards +13w ago

Synergistic Directed Execution and LLM-Driven Analysis for Zero-Day AI-Generated Malware Detection

LLMs can now help you catch AI-generated malware: a hybrid analysis framework uses LLMs to guide concolic execution and deep learning to classify vulnerabilities, achieving state-of-the-art detection rates.

G. Edwards, Mahdi Eslamimehr

Code Generation & Program Synthesis Natural Language Processing Red-Teaming & Adversarial Robustness

3w ago·also University College Dublin

Class Model Generation from Requirements using Large Language Models

LLMs can now generate UML diagrams from requirements with human-level quality, potentially automating a resource-intensive phase in software design.

Jackson Nguyen, Rui En Koe, Fanyu Wang +2

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Natural Language Processing

VNU University of Engineering and Technology3w ago

MissBench: Benchmarking Multimodal Affective Analysis under Imbalanced Missing Modalities

Multimodal models that seem robust can still fail when some modalities are systematically missing, a problem MissBench exposes with new metrics for modality equity and learning balance.

Tien Anh Pham, Phuong-Anh Nguyen, Duc-Trong Le +1

Eval Frameworks & Benchmarks Multimodal Models Natural Language Processing

3w ago·also Fudan

LAP: A Language-Aware Planning Model For Procedure Planning In Instructional Videos

By translating visual observations into language, LAP achieves state-of-the-art procedure planning by disambiguating visually similar actions, outperforming vision-only methods.

Lei Shi, Victor Aregbede, Andreas Persson +3

Multimodal Models Natural Language Processing World Models & Planning

Muhammad Ahmad +23w ago

On Catastrophic Forgetting in Low-Rank Decomposition-Based Parameter-Efficient Fine-Tuning

Tensor-based PEFT methods like LoRETTA can dramatically reduce catastrophic forgetting in sequential learning by capturing richer structural information within compact parameter budgets.

Muhammad Ahmad, Jingjing Zheng, Yankai Cao

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Training Efficiency & Optimization

UW3w ago

Understanding the Use of a Large Language Model-Powered Guide to Make Virtual Reality Accessible for Blind and Low Vision People

LLM-powered VR guides for blind and low vision users are not just tools, but social actors, prompting users to give them nicknames and rationalize their mistakes when others are present.

Natural Language Processing Tool Use & Agents

3w ago·also IIT Madras, University of Bergen

Learning Bayesian and Markov Networks with an Unreliable Oracle

Even a single error from a conditional independence oracle can prevent the unique identification of a Bayesian network structure, regardless of bounded graph parameters like treewidth.

Juha Harviainen, Pekka Parviainen, Vidya Sagar Sharma

Natural Language Processing Reasoning & Chain-of-Thought

Vera V. Vishnyakova3w ago

Context Engineering: From Prompts to Corporate Multi-Agent Architecture

Prompt engineering is dead; long live context engineering—the key to scaling multi-agent AI systems lies in carefully designing the agent's informational environment, not just individual prompts.

Vera V. Vishnyakova

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Tool Use & Agents

Microsoft Research3w ago·also CUHK

Social-R1: Towards Human-like Social Reasoning in LLMs

A 4B parameter model can now beat much larger models at social reasoning, thanks to a new RL framework that aligns model reasoning trajectories with human cognition.

Jincenzi Wu, Yuxuan Lei, Jianxun Lian +5

Constitutional AI & AI Ethics Natural Language Processing Reasoning & Chain-of-Thought

Linghu Ding +53w ago·also China Academy of Space Technology, Qian Xuesen Laboratory of Space Technology, State Key Laboratory of Space Information System and Integrated Application

Cognitively Layered Data Synthesis for Domain Adaptation of LLMs to Space Situational Awareness

Forget generic fine-tuning data — Bloom's Taxonomy-based data generation can boost LLM performance in complex engineering domains like space situational awareness by up to 176%.

Linghu Ding, Da Fan, Kaifeng Yin +3

Data Curation & Synthetic Data Natural Language Processing

Luc Builtjes +13w ago

Tracking Cancer Through Text: Longitudinal Extraction From Radiology Reports Using Open-Source Large Language Models

Open-source LLMs can now rival proprietary systems in extracting crucial cancer progression data from radiology reports, unlocking scalable analysis while preserving patient privacy.

Luc Builtjes, Alessa Hering

Natural Language Processing Open-Source Models & Weights Scientific Discovery & Drug Design