Natural Language Processing

Applications

Text understanding, generation, summarization, translation, information extraction, and linguistic analysis.

Keywords

natural language processingNLPtext generationsummarizationmachine translationinformation extractionsentiment analysisnamed entity recognition

Recent Papers

Feb 12, 2026

2d ago

HLA: Hadamard Linear Attention

This paper introduces Hadamard Linear Attention (HLA), a novel linear attention mechanism designed to more accurately approximate softmax attention. HLA applies a nonlinearity after the computation of pairwise similarities, unlike existing linear attention methods that apply nonlinear kernel functions independently to queries and keys. The authors demonstrate that this approach results in a higher-degree rational function approximation of softmax and show its effectiveness in a large diffusion transformer model for video generation.

Introduces Hadamard Linear Attention (HLA), a linear attention variant that applies a nonlinearity after pairwise similarity computation to better approximate softmax.

Hanno Ackermann, Mohsen Ghafoorian, A. Habibian2602.12128

Architecture Design (Transformers, SSMs, MoE)Training Efficiency & OptimizationNatural Language Processing

Department of Artificial2d ago

SAGEO Arena: A Realistic Environment for Evaluating Search-Augmented Generative Engine Optimization

The paper introduces SAGEO Arena, a realistic evaluation environment for Search-Augmented Generative Engine Optimization (SAGEO) that addresses limitations of existing benchmarks by incorporating a full generative search pipeline over a large-scale corpus of web documents with rich structural information. They demonstrate that existing optimization approaches are often impractical and degrade performance in retrieval and reranking stages under realistic conditions. The study highlights the importance of structural information and stage-specific optimization for effective SAGEO.

Introduces SAGEO Arena, a novel benchmark environment enabling realistic, stage-level evaluation of search-augmented generative engine optimization strategies.

Sunghwan Kim, Wooseok Jeong, Serin Kim +22602.12187

Eval Frameworks & BenchmarksRecommendation & Information RetrievalNatural Language Processing

2d ago

Choose Your Agent: Tradeoffs in Adopting AI Advisors, Coaches, and Delegates in Multi-Party Negotiation

This paper investigates the impact of different LLM-powered AI assistance modalities (Advisor, Coach, Delegate) on human performance in multi-party negotiation games. Participants played bargaining games with access to one of these modalities, despite all modalities using the same underlying LLM. The key finding is a preference-performance misalignment: participants preferred the Advisor but achieved higher individual gains with the Delegate, which acted as a "market maker" by injecting Pareto-improving proposals.

Demonstrates a preference-performance misalignment in AI-assisted negotiation, revealing that users do not always adopt the AI modality that maximizes their gains or overall group welfare.

Lithium Thain, Vivian Tsai, James Wexler +12602.12089

Tool Use & AgentsNatural Language Processing

2d ago

Learning Conditional Averages

This paper introduces a PAC learning framework for learning conditional averages, where the goal is to predict the average label within an instance-specific neighborhood rather than the label itself. The work provides a complete characterization of learnability in this setting, demonstrating that it depends on the joint finiteness of two novel combinatorial parameters related to the independence number of the neighborhood graph. The authors derive sample complexity bounds that are tight up to logarithmic factors, offering insights into the learnability of conditional averages.

Characterizes the PAC learnability of conditional averages by introducing and analyzing two novel combinatorial parameters related to the independence number of the neighborhood graph.

Marco Bressan, Nataly Brukhim, Nicolò Cesa-Bianchi +42602.11920

Natural Language Processing

2d ago

Bandit Learning in Matching Markets with Interviews

This paper studies bandit learning in two-sided matching markets where agents and firms conduct interviews to learn preferences. The authors introduce strategic deferral, allowing firms to delay hiring decisions and recover from suboptimal matches, and model interviews as low-cost hints that reveal partial preference information. They develop novel algorithms for centralized and decentralized settings that achieve time-independent regret, improving upon logarithmic regret bounds for learning stable matchings without interviews.

Introduces strategic deferral for firms in matching markets, enabling decentralized learning and recovery from suboptimal hires.

Amirmahdi Mirfakhar, Xuchuang Wang, Hedyeh Beyhaghi +12602.12224

Recommendation & Information RetrievalNatural Language Processing

2d ago

Towards Fair and Comprehensive Evaluation of Routers in Collaborative LLM Systems

The paper introduces RouterXBench, a comprehensive evaluation framework for LLM routers, addressing limitations of existing benchmarks by considering router ability, scenario alignment, and cross-domain robustness. They propose ProbeDirichlet, a novel router that leverages internal hidden states and learnable Dirichlet distributions for probabilistic training, capturing model uncertainty more effectively than methods relying on output probabilities or external embeddings. Empirical results demonstrate that ProbeDirichlet outperforms existing routers, achieving significant improvements in router ability and high-accuracy scenarios, while exhibiting robust generalization across diverse model families, scales, tasks, and workflows.

Introduces ProbeDirichlet, a router that aggregates cross-layer hidden states via learnable Dirichlet distributions for improved uncertainty estimation and routing decisions.

Wanxing Wu, He Zhu, Yixia Li +52602.11877

Eval Frameworks & BenchmarksDistributed Systems & HardwareNatural Language Processing

2d ago

RELATE: A Reinforcement Learning-Enhanced LLM Framework for Advertising Text Generation

The paper introduces RELATE, a reinforcement learning framework for end-to-end advertising text generation that directly optimizes for conversion-oriented metrics and compliance constraints. RELATE integrates performance and compliance objectives into the text generation process via policy learning, moving beyond the traditional two-stage generation and alignment paradigm. Experiments on industrial datasets and online deployment show that RELATE significantly improves click-through conversion rate (CTCVR) while adhering to policy constraints.

Introduces an end-to-end reinforcement learning framework, RELATE, that unifies advertising text generation with conversion-oriented objective alignment and compliance constraints.

Jinfang Wang, Jiajie Liu, Jianwei Wu +62602.11780

RLHF & Preference LearningNatural Language ProcessingRecommendation & Information Retrieval

2d ago

DHPLT: large-scale multilingual diachronic corpora and word representations for semantic change modelling

The paper introduces DHPLT, a large-scale multilingual diachronic corpus comprising web-crawled data from 41 languages across three time periods (2011-2015, 2020-2021, 2024-present). The authors leverage web crawl timestamps as a proxy for document creation time, providing 1 million documents per time period per language. They also provide pre-computed word embeddings and lexical substitutions to facilitate semantic change modeling research, addressing the scarcity of such resources for many languages.

Introduces DHPLT, a novel multilingual diachronic corpus with pre-computed embeddings and lexical substitutions, designed to facilitate research in semantic change modeling across 41 languages.

Mariia Fedorova, Andrey Kutuzov, Khonzoda Umarova2602.11968

Data Curation & Synthetic DataNatural Language ProcessingOpen-Source Models & Weights

2d ago

Who Does What? Archetypes of Roles Assigned to LLMs During Human-AI Decision-Making

This paper introduces the concept of human-LLM archetypes, defined as recurring socio-technical interaction patterns that structure the roles of humans and LLMs in collaborative decision-making. Through a scoping literature review and thematic analysis of 113 papers, the authors identified 17 distinct human-LLM archetypes. They then evaluated these archetypes across clinical diagnostic cases, demonstrating that the choice of archetype influences LLM outputs and decision outcomes.

Defines and categorizes 17 human-LLM interaction archetypes to demonstrate how these archetypes impact LLM outputs and decisions in human-AI collaborative decision-making.

S. Chappidi, A. Krauze2602.11924

Tool Use & AgentsConstitutional AI & AI EthicsNatural Language Processing

2d ago

A Subword Embedding Approach for Variation Detection in Luxembourgish User Comments

This paper introduces a subword embedding approach to detect lexical and orthographic variation in user-generated text, specifically addressing the challenges of "noisy" and low-resource settings without relying on normalization or predefined variant lists. The method trains subword embeddings on raw Luxembourgish user comments and clusters related forms using a combination of cosine similarity and n-gram similarity. The results demonstrate the effectiveness of distributional modeling in uncovering meaningful patterns of variation, aligning with existing dialectal and sociolinguistic research.

Introduces a novel subword embedding method that automatically discovers and clusters lexical variations in user-generated text, even in low-resource languages, without requiring prior normalization or predefined variant lists.

Anne-Marie Lutgen, Alistair Plum, Christoph Purschke2602.11795

Natural Language ProcessingData Curation & Synthetic DataRecommendation & Information Retrieval

2d ago

Finding Sense in Nonsense with Generated Contexts: Perspectives from Humans and Language Models

This paper investigates the sensicality of sentences in existing semantically deviant datasets by comparing human and LLM judgments, both with and without provided contexts. The study reveals that humans generally perceive sentences as anomalous rather than nonsensical, suggesting existing datasets may not be as nonsensical as assumed. Furthermore, the research demonstrates LLMs' ability to generate plausible contexts that render anomalous sentences more sensible.

Empirically demonstrates that existing "nonsensical" datasets are largely composed of anomalous sentences interpretable with context, and that LLMs can generate such contexts.

Katrin Olsen, Sebastian Pad'o2602.11699

Natural Language ProcessingEval Frameworks & Benchmarks

2d ago

Manifold-Aware Temporal Domain Generalization for Large Language Models

This paper addresses temporal domain generalization (TDG) for LLMs by reformulating it geometrically under parameter-efficient fine-tuning. It posits that the low-dimensional temporal structure of model evolution can be preserved under parameter-efficient reparameterization. The authors introduce Manifold-aware Temporal LoRA (MaT-LoRA), which constrains temporal updates to a shared low-dimensional manifold within a low-rank adaptation subspace, modeling its evolution through a structured temporal core, and achieving superior temporal generalization performance with practical scalability.

Introduces MaT-LoRA, a parameter-efficient fine-tuning method that constrains temporal updates to a low-dimensional manifold within a LoRA subspace and models its evolution with a structured temporal core for improved temporal domain generalization in LLMs.

Xinyuan Song, Xuan Song, Ryosuke Shibasaki2602.11965

Training Efficiency & OptimizationNatural Language Processing

2d ago

Unknown Attack Detection in IoT Networks using Large Language Models: A Robust, Data-efficient Approach

The paper introduces SiamXBERT, a Siamese meta-learning framework leveraging a transformer-based language model, to address the challenge of detecting unknown (zero-day) attacks in IoT networks under data scarcity and encrypted traffic conditions. SiamXBERT constructs a dual-modality feature representation from flow and packet-level information and uses meta-learning for rapid adaptation to new attack types with limited labeled data. Experiments on IoT intrusion datasets demonstrate that SiamXBERT outperforms state-of-the-art baselines, achieving up to 78.8% improvement in unknown F1-score, showcasing its robustness and data efficiency.

Introduces SiamXBERT, a novel Siamese meta-learning framework empowered by a transformer-based language model, for robust and data-efficient unknown attack detection in IoT networks.

Feifei Niu, Paria Shirani, L. Briand2602.12183

Red-Teaming & Adversarial RobustnessNatural Language Processing

2d ago

PosterOmni: Generalized Artistic Poster Creation via Task Distillation and Unified Reward Feedback

The paper introduces PosterOmni, a framework for generalized artistic poster creation that tackles both local image editing and global design creation aspects of the task. It achieves this by constructing a multi-task dataset, distilling knowledge from local and global expert models, and applying a unified reward feedback mechanism to align visual fidelity and aesthetic preferences. Experiments on the new PosterOmni-Bench demonstrate that PosterOmni outperforms existing open-source and proprietary systems in reference adherence, composition, and aesthetics.

Introduces a novel data-distillation-reward pipeline to unify local image editing and global design creation for generalized artistic poster generation.

Sixiang Chen, Jianyu Lai, Hengyu Shi +42602.12127

Multimodal ModelsComputer VisionNatural Language Processing

2d ago

ULTRA:Urdu Language Transformer-based Recommendation Architecture

The paper introduces ULTRA, a transformer-based recommendation architecture for Urdu, a low-resource language, to improve personalized news retrieval. ULTRA employs a dual-embedding architecture with a query-length aware routing mechanism to handle varying query lengths, directing queries to either title/headline-level or full-content pipelines. Experiments on a large Urdu news corpus demonstrate that ULTRA achieves over 90% precision compared to single-pipeline baselines, showing improved recommendation relevance.

Introduces a query-adaptive dual-embedding architecture for semantic content recommendation in low-resource languages, dynamically routing queries based on length to optimize retrieval relevance.

Alishbah Bashir, Fatima Qaiser, Ijaz Hussain2602.11836

Architecture Design (Transformers, SSMs, MoE)Natural Language ProcessingRecommendation & Information Retrieval

2d ago

Creative Ownership in the Age of AI

This paper addresses the limitations of current copyright law in the age of generative AI, where style imitation without content copying complicates infringement detection. The authors propose a new criterion for infringement based on whether an AI output could have been generated without a specific work in its training corpus. Through a model of generative systems as closure operators, they demonstrate a dichotomy: AI generation is either asymptotically unconstrained with light-tailed organic creations or persistently constrained with heavy-tailed creations.

Introduces a novel criterion for copyright infringement in the context of generative AI, focusing on whether an output could have been generated without a specific work in the training corpus.

Annie Liang, Jay Lu2602.12270

Constitutional AI & AI EthicsNatural Language Processing

2d ago

Data-Driven Trajectory Imputation for Vessel Mobility Analysis

The paper introduces HABIT, a data-driven framework for imputing missing segments in vessel trajectories using historical Automatic Identification System (AIS) data. HABIT leverages H3 geospatial indexing to aggregate and analyze vessel motion patterns, enabling the imputation of missing trajectory segments based on learned historical behaviors. Empirical evaluation demonstrates that HABIT achieves comparable accuracy to existing methods while offering improved latency and better accounting for vessel characteristics.

Introduces HABIT, a novel H3 Aggregation-Based Imputation framework, to impute missing vessel trajectories by learning and leveraging historical vessel motion patterns.

G. Spiliopoulos, Alexandros Troupiotis-Kapeliaris, Kostas Patroumpas +42602.11890

Data Curation & Synthetic DataNatural Language Processing

2d ago

"Sorry, I Didn't Catch That": How Speech Models Miss What Matters Most

The paper investigates the failure of speech recognition models on transcribing U.S. street names, finding a 44% error rate across 15 models from major vendors and disproportionately larger routing distance errors for non-English primary speakers. It highlights the gap between benchmark performance and real-world reliability, particularly for high-stakes tasks involving named entities. The authors then demonstrate that fine-tuning with a small, synthetically generated dataset of diverse pronunciations improves street name transcription accuracy by nearly 60% for non-English primary speakers.

Demonstrates that speech recognition models exhibit significant transcription errors on street names, particularly impacting non-English speakers, and mitigates this issue through synthetic data augmentation.

Martijn Bartelds, Federico Bianchi2602.12249

Eval Frameworks & BenchmarksNatural Language ProcessingSpeech & Audio

2d ago

Temperature as a Meta-Policy: Adaptive Temperature in LLM Reinforcement Learning

The paper introduces Temperature Adaptive Meta Policy Optimization (TAMPO), a novel framework that learns to control the temperature hyperparameter of an LLM during reinforcement learning. TAMPO uses a hierarchical two-loop process where an inner loop updates the LLM policy using trajectories sampled at temperatures selected by a meta-policy, and an outer loop updates the meta-policy to favor temperatures that maximize the likelihood of high-advantage trajectories. Experiments on mathematical reasoning benchmarks demonstrate that TAMPO outperforms baselines with fixed or heuristic temperature schedules, showing the effectiveness of learned temperature control for adaptive exploration.

Introduces a hierarchical reinforcement learning framework, TAMPO, that learns a meta-policy to dynamically adjust the temperature parameter of an LLM, optimizing exploration during policy learning.

Haoran Dang, Cuiling Lan, Hai Wan +22602.11779

RLHF & Preference LearningTraining Efficiency & OptimizationNatural Language Processing

2d ago

From Atoms to Trees: Building a Structured Feature Forest with Hierarchical Sparse Autoencoders

This paper introduces Hierarchical Sparse Autoencoders (HSAEs) to explicitly model the hierarchical relationships between features extracted from LLMs, addressing the limitation of standard SAEs that treat features in isolation. HSAEs incorporate a structural constraint loss and random feature perturbation to encourage alignment between parent and child features in the learned hierarchy. Experiments across various LLMs and layers demonstrate that HSAEs recover semantically meaningful hierarchies while preserving reconstruction fidelity and interpretability.

Introduces Hierarchical Sparse Autoencoders (HSAEs) to learn and represent the hierarchical relationships between features extracted from LLMs.

Jiedong Jiang2602.11881

Interpretability & Mechanistic InterpArchitecture Design (Transformers, SSMs, MoE)Natural Language Processing

2d ago

Talk2DM: Enabling Natural Language Querying and Commonsense Reasoning for Vehicle-Road-Cloud Integrated Dynamic Maps with Large Language Models

This paper introduces Talk2DM, a plug-and-play module designed to enhance vehicle-road-cloud dynamic map (VRC-DM) systems with natural language querying and commonsense reasoning capabilities. To facilitate this, the authors created VRCsim, a VRC cooperative perception simulation framework, and VRC-QA, a question-answering dataset focused on spatial reasoning in mixed-traffic scenarios. Talk2DM leverages a novel chain-of-prompt (CoP) mechanism to integrate human-defined rules with LLM knowledge, achieving high accuracy and reasonable response times with models like Qwen3:8B, Gemma3:27B, and GPT-oss.

Introduces a chain-of-prompting method (CoP) that enables LLMs to effectively query and reason about dynamic maps by combining human-defined rules with the LLM's inherent commonsense knowledge.

Lu Tao, Jinxuan Luo, Shen Ying +22602.11860

Reasoning & Chain-of-ThoughtTool Use & AgentsNatural Language Processing

2d ago

MEME: Modeling the Evolutionary Modes of Financial Markets

The paper introduces MEME, a novel framework that models financial markets as an evolving ecosystem of investment narratives ("Modes of Thought") to improve portfolio construction. MEME uses a multi-agent extraction module to convert noisy data into Investment Arguments, then employs Gaussian Mixture Modeling to identify consensus within a semantic space and a temporal evaluation mechanism to track the lifecycle of these modes. Experiments on Chinese stock pools from 2023-2025 show MEME outperforms seven state-of-the-art baselines, demonstrating its ability to adapt to evolving market consensus.

Introduces a logic-oriented framework, MEME, that models financial markets as a dynamic ecosystem of evolving investment narratives to guide portfolio construction.

Junyu Luo, Zhongshi Xing, Hanchun Lian +22602.11918

Natural Language ProcessingReasoning & Chain-of-Thought

2d ago

T3D: Few-Step Diffusion Language Models via Trajectory Self-Distillation with Direct Discriminative Optimization

This paper introduces Trajectory Self-Distillation (T3D), a novel framework for improving the generation quality of few-step Diffusion Language Models (DLLMs) by distilling the model's own generative trajectories. T3D incorporates Direct Discriminative Optimization (DDO), a reverse-KL objective, to encourage mode-seeking behavior during distillation, focusing the student model on high-probability regions of the teacher model's output space. Experiments across various benchmarks demonstrate that T3D significantly outperforms existing few-step DLLM baselines, substantially reducing the performance gap with full-step decoding.

Introduces a trajectory self-distillation framework, T3D, that leverages direct discriminative optimization to improve the generation quality of few-step diffusion language models.

Tunyu Zhang, Xinxi Zhang, Ligong Han +22602.12262

Inference & QuantizationTraining Efficiency & OptimizationNatural Language Processing

2d ago

Towards On-Policy SFT: Distribution Discriminant Theory and its Applications in LLM Training

This paper introduces Distribution Discriminant Theory (DDT) to quantify the alignment between training data and the model-induced distribution in supervised fine-tuning (SFT) of LLMs. Based on DDT, they propose In-Distribution Finetuning (IDFT), a loss-level method, and Hinted Decoding, a data-level technique, to improve generalization by aligning the training data distribution with the model's. Experiments show that the proposed framework achieves generalization performance comparable to offline RL methods like DPO and SimPO, while retaining the efficiency of SFT.

Introduces Distribution Discriminant Theory (DDT) to quantify and improve the alignment between training data and model-induced distributions in LLM supervised fine-tuning.

Miaosen Zhang, Yishan Liu, Shuxia Lin +52602.12222

RLHF & Preference LearningTraining Efficiency & OptimizationNatural Language Processing

2d ago

In-Context Function Learning in Large Language Models

This paper investigates in-context learning in LLMs by framing it as Gaussian Process (GP) regression, using controlled experiments with function samples drawn from known GP priors. They compare LLM prediction error against empirical GP-regression (lower bound) and 1-NN (upper bound) baselines, finding that LLM learning curves approach the GP lower bound with increasing demonstrations. The authors also analyze LLM inductive biases via likelihood analysis, revealing a preference for less smooth GP kernels, and demonstrate that post-training can shift these biases to improve sample efficiency on smoother kernels.

Quantifies the extent to which LLMs behave like GP learners and provides methods for steering their inductive biases for continuous function learning tasks.

Elif Akata, Konstantinos Voudouris, Vincent Fortuin +12602.11863

Scaling Laws & Emergent AbilitiesNatural Language ProcessingArchitecture Design (Transformers, SSMs, MoE)

2d ago

Are Two LLMs Better Than One? A Student-Teacher Dual-Head LLMs Architecture for Pharmaceutical Content Optimization

The paper introduces LRBTC, a modular LLM and VLM-driven architecture for quality control in pharmaceutical content, addressing the need for scalable and verifiable validation in regulated domains. LRBTC employs a Student-Teacher dual model architecture combined with a human-in-the-loop workflow and waterfall rule filtering. The approach achieves significant improvements on AIReg-Bench (83.0% F1, 97.5% recall) and CSpelling (26.7% accuracy improvement), demonstrating its effectiveness in reducing missed violations and improving content quality.

Introduces LRBTC, a novel LLM and VLM-driven quality control architecture that leverages a Student-Teacher dual model and HITL workflow for pharmaceutical content optimization.

Suyash Mishra, Anubhav Girdhar2602.11957

Architecture Design (Transformers, SSMs, MoE)Scientific Discovery & Drug DesignNatural Language Processing

2d ago

Human-Like Gaze Behavior in Social Robots: A Deep Learning Approach Integrating Human and Non-Human Stimuli

This paper introduces a deep learning approach to enhance social robot gaze behavior by incorporating both human and non-human stimuli, using LSTM and Transformer models trained on human gaze data collected via VR in simulated and real-world scenarios. The models predict human gaze direction with accuracies up to 72% and 71.6% for LSTM and Transformer respectively in real-world settings, outperforming existing methods by uniquely considering non-human stimuli. The system was deployed on a NAO robot and evaluated with 275 participants, demonstrating high user satisfaction.

Demonstrates a novel approach to predicting human gaze in social settings by integrating non-human stimuli and achieving state-of-the-art accuracy using LSTM and Transformer models.

Faezeh Vahedi, Morteza Memari, Ramtin Tabatabaei +12602.11648

Robotics & Embodied AIComputer VisionNatural Language Processing

2d ago

CitiLink-Minutes: A Multilayer Annotated Dataset of Municipal Meeting Minutes

The paper introduces CitiLink-Minutes, a novel multilayer dataset of 120 European Portuguese municipal meeting minutes from six municipalities, designed to address the lack of annotated datasets for NLP and IR research in this domain. The dataset features over one million tokens with de-identified personal information and includes manual annotations across metadata, subjects of discussion, and voting outcomes. Experiments demonstrate the dataset's utility for downstream tasks like metadata extraction, topic classification, and vote labeling, facilitating transparent access to municipal decisions.

Contributes CitiLink-Minutes, a unique multilayer annotated dataset of municipal meeting minutes, enabling NLP and IR research on local governance.

Ricardo Campos, Ana Filipa Pacheco, Ana Lu´ısa Fernandes +112602.12137

Data Curation & Synthetic DataNatural Language ProcessingRecommendation & Information Retrieval

2d ago

Do Large Language Models Adapt to Language Variation across Socioeconomic Status?

This paper investigates the ability of Large Language Models (LLMs) to adapt to language variations across different socioeconomic status (SES) communities by comparing LLM-generated text completions with original text from a novel Reddit and YouTube dataset stratified by SES. The study analyzes 94 sociolinguistic features to assess the degree of stylistic adaptation exhibited by four LLMs. Results indicate that LLMs show limited stylistic modulation with respect to SES, often producing approximations or caricatures, and demonstrate a bias towards emulating upper SES styles, highlighting the risk of amplifying linguistic hierarchies.

Reveals that LLMs exhibit limited stylistic adaptation across socioeconomic strata and tend to favor upper SES linguistic styles, raising concerns about perpetuating linguistic biases.

Elisa Bassignana, Mike Zhang, Dirk Hovy +12602.11939

Constitutional AI & AI EthicsNatural Language ProcessingEval Frameworks & Benchmarks

2d ago

Who is the richest club in the championship? Detecting and Rewriting Underspecified Questions Improve QA Performance

This paper investigates the impact of underspecified questions on QA performance, finding that a significant portion of questions in standard QA benchmarks are underspecified. They introduce an LLM-based classifier to identify these questions and demonstrate that LLMs perform worse on them. Through a controlled rewriting experiment, they show that rewriting underspecified questions into fully specified variants, while keeping the gold answers fixed, consistently improves QA performance.

Demonstrates that question underspecification is a significant confound in QA evaluation by showing that rewriting underspecified questions improves QA performance.

Yunchong Huang, Gianni Barlacchi, Sandro Pezzelle2602.11938

Eval Frameworks & BenchmarksNatural Language Processing

2d ago

More Haste, Less Speed: Weaker Single-Layer Watermark Improves Distortion-Free Watermark Ensembles

The paper identifies a limitation in watermark ensembles for LLMs where strong single-layer watermarks reduce token distribution entropy, hindering subsequent layers' effectiveness. They theoretically and empirically demonstrate that detectability is bounded by entropy and that watermark ensembles monotonically decrease entropy and the expected green-list ratio across layers. To address this, they propose a framework using weaker single-layer watermarks to preserve entropy, achieving improved detectability and robustness compared to strong watermark baselines.

Demonstrates that weaker single-layer watermarks in ensembles can outperform stronger ones by preserving token distribution entropy, leading to improved detectability and robustness.

Yihan Wu, Jingqi Zhang2602.11793

Natural Language ProcessingRed-Teaming & Adversarial Robustness

2d ago

Analytical Search

The paper introduces "analytical search" as a new search paradigm tailored for complex analytical information needs, addressing the limitations of relevance-based ranking and retrieval-augmented generation (RAG) in tasks requiring trend analysis, causal inference, and verifiable conclusions. It proposes a system framework that integrates query understanding, recall-oriented retrieval, reasoning-aware fusion, and adaptive verification to support structured, multi-step inference. The authors argue that analytical search offers improved control over reasoning, evidence usage, and verifiability, leading to more accountable and utility-driven results compared to existing search paradigms.

Introduces and formalizes the concept of "analytical search" as a distinct search paradigm designed to address complex analytical information needs by emphasizing evidence-governed, process-oriented workflows.

Shuo Miao, Yiqun Liu, Qingyao Ai2602.11581

Recommendation & Information RetrievalNatural Language Processing

2d ago

ADRD-Bench: A Preliminary LLM Benchmark for Alzheimer's Disease and Related Dementias

The authors introduce ADRD-Bench, a new benchmark dataset for evaluating LLMs on Alzheimer's Disease and Related Dementias (ADRD), comprising a unified QA set from existing medical benchmarks and a novel QA set derived from the Aging Brain Care (ABC) program. They aim to address the lack of ADRD-specific evaluation resources and practical caregiving context in existing benchmarks. Evaluating 33 state-of-the-art LLMs, they found that while some models achieve high accuracy, inconsistencies in reasoning quality and stability remain a significant limitation.

Introduces ADRD-Bench, the first ADRD-specific benchmark dataset designed for rigorous evaluation of LLMs, incorporating both unified clinical knowledge and practical caregiving questions.

Jiahao Zheng, Malaz Boustani, J. Nabrzyski2602.11460

Eval Frameworks & BenchmarksScientific Discovery & Drug DesignNatural Language Processing

2d ago

Digital Ecosystems: Enabling Collaboration in a Fragmented World

This paper introduces a spectrum framework for polycentric digital ecosystems, conceptualizing them as nested socio-technical systems across personal, organizational, inter-organizational, and global layers. It addresses the increasing need for resilient digital collaboration amidst geopolitical and technological fragmentation. The framework highlights how AI and automation, blockchain trust, federated data spaces, and immersive technologies can orchestrate digital integration in these ecosystems.

Introduces a multi-layered framework for polycentric digital ecosystems to facilitate collaboration in fragmented environments.

Marc Schmitt2602.11707

Tool Use & AgentsNatural Language ProcessingDistributed Systems & Hardware

Applied AI Institute2d ago

U-Former ODE: Fast Probabilistic Forecasting of Irregular Time Series

The paper introduces U-Former ODE (UFO), a novel architecture for probabilistic forecasting of irregular time series data that combines U-Nets, Transformers, and Neural CDEs. UFO enables parallelizable computation and global receptive fields, addressing the scalability limitations of existing Neural CDE approaches. Experiments on five benchmarks demonstrate that UFO outperforms ten state-of-the-art baselines in predictive accuracy and achieves up to 15x faster inference, particularly on long and multivariate sequences.

Introduces a fully causal, parallelizable architecture, U-Former ODE (UFO), that integrates U-Nets, Transformers, and Neural CDEs for efficient and accurate probabilistic forecasting of irregular time series.

Ilya Kuleshov, Alexander Marusov, Alexey Zaytsev2602.11738

Architecture Design (Transformers, SSMs, MoE)Training Efficiency & OptimizationNatural Language Processing

TH Koeln2d ago

A technical curriculum on language-oriented artificial intelligence in translation and specialised communication

This paper introduces a technical curriculum designed to enhance AI literacy within the language and translation (L&T) industry, covering vector embeddings, neural networks, tokenization, and transformer networks. The curriculum aims to cultivate computational thinking, algorithmic awareness, and agency among L&T professionals to improve their digital resilience. Evaluation in an MA course at TH Koeln suggests the curriculum's effectiveness, while also highlighting the need for additional lecturer support to maximize learning outcomes.

Proposes and evaluates a technical curriculum focused on language-oriented AI to improve AI literacy and digital resilience in the language and translation industry.

Ralph Kruger2602.12251

Natural Language ProcessingArchitecture Design (Transformers, SSMs, MoE)Training Efficiency & Optimization

2d ago

Prototype Transformer: Towards Language Model Architectures Interpretable by Design

The paper introduces the Prototype Transformer (ProtoT), an autoregressive language model architecture that uses prototypes (parameter vectors) instead of self-attention to improve interpretability. ProtoT establishes two-way communication between the input sequence and the prototypes, causing the prototypes to capture nameable concepts during training and creating interpretable communication channels. Experiments demonstrate that ProtoT scales linearly with sequence length, performs well on text generation and downstream tasks (GLUE), and exhibits robustness to input perturbations while providing interpretable pathways for understanding robustness and sensitivity.

Introduces the Prototype Transformer, a novel autoregressive language model architecture designed for interpretability by using prototypes to capture nameable concepts and create interpretable communication channels.

Yordan Yordanov, Matteo Forasassi, Bayar Menzat +62602.11852

Interpretability & Mechanistic InterpArchitecture Design (Transformers, SSMs, MoE)Natural Language Processing

2d ago

Differentiable Modal Logic for Multi-Agent Diagnosis, Orchestration and Communication

This paper introduces Differentiable Modal Logic (DML) implemented via Modal Logical Neural Networks (MLNNs) to enable multi-agent systems to learn relationships like trust networks and causal chains from behavioral data. DML addresses the limitations of traditional modal logic, which requires manual specification of relationship structures. The authors demonstrate a neurosymbolic debugging framework across epistemic, temporal, deontic, and doxastic modalities, showing how logical contradictions can be formulated as learnable optimization objectives in scenarios ranging from diplomacy games to LLM hallucination detection.

Introduces Differentiable Modal Logic (DML) and Modal Logical Neural Networks (MLNNs) to learn interpretable relationship structures in multi-agent systems directly from data, replacing manual specification.

Antonin Sulc2602.12083

Reasoning & Chain-of-ThoughtTool Use & AgentsNatural Language Processing

2d ago

Artificial intelligence is creating a new global linguistic hierarchy

The paper analyzes the availability of AI resources across 6003 languages to assess systemic inequalities in language AI, finding that a small number of languages dominate, exacerbating disparities. It contrasts the diffusion of AI with earlier IT technologies, revealing a hype-driven pattern. Finally, the authors introduce the Language AI Readiness Index (EQUATE) to map technological, socio-economic, and infrastructural prerequisites for AI deployment across languages, aiming to guide prioritization efforts for more equitable diffusion.

Introduces the Language AI Readiness Index (EQUATE) to map the state of technological, socio-economic, and infrastructural prerequisites for AI deployment across languages.

G. Occhini, Kumiko Tanaka-Ishii, A. Barford +92602.12018

Natural Language ProcessingConstitutional AI & AI Ethics

2d ago

A Rule-based Computational Model for Gaidhlig Morphology

The paper introduces a rule-based computational model for Gaidhlig morphology, addressing the challenge of limited data availability for low-resource languages that hinders the application of neural models. The model leverages data from Wiktionary and uses SQL queries to identify lexical patterns, constructing a declarative rule-base for generating inflected word forms via Python utilities. This approach demonstrates that rule-based systems can effectively utilize limited data while providing interpretability and supporting the development of educational tools.

Presents a functional rule-based system for Gaidhlig morphology using Wiktionary data and SQL queries to generate inflected word forms.

Peter J Barclay2602.12132

Natural Language ProcessingInterpretability & Mechanistic Interp

2d ago

Scaling Model and Data for Multilingual Machine Translation with Open Large Language Models

This paper investigates the impact of model and data scaling on multilingual machine translation (MT) performance using open large language models (LLMs). The authors adapt Gemma3 models via continual pretraining and instruction finetuning, creating MiLMMT-46, a model covering 46 languages. Results demonstrate that MiLMMT-46 surpasses existing open-source SOTA models and rivals proprietary systems like Google Translate and Gemini 3 Pro in multilingual translation quality.

Demonstrates that scaling model size and training data via continual pretraining and instruction finetuning significantly improves the multilingual translation capabilities of open LLMs, achieving performance competitive with proprietary systems.

Wei Liu, Jian Luan2602.11961

Scaling Laws & Emergent AbilitiesNatural Language ProcessingOpen-Source Models & Weights

2d ago

PatientHub: A Unified Framework for Patient Simulation

The paper introduces PatientHub, a unified framework to standardize the creation, composition, and deployment of simulated patients for training counselors and scaling therapeutic assessment using Large Language Models. PatientHub addresses the fragmentation in existing patient simulation approaches by providing standardized data formats, prompts, and evaluation metrics, thus improving reproducibility and enabling fair comparisons. The authors demonstrate PatientHub's utility through case studies, showcasing standardized cross-method evaluation, seamless integration of custom evaluation metrics, and the prototyping of new simulator variants.

Introduces PatientHub, a modular framework that unifies patient simulation by standardizing data formats, prompts, and evaluation metrics to facilitate reproducibility and fair comparison of different methods.

Sahand Sabour, NG TszYam2602.11684

Eval Frameworks & BenchmarksOpen-Source Models & WeightsNatural Language Processing

2d ago

Differentially Private and Communication Efficient Large Language Model Split Inference via Stochastic Quantization and Soft Prompt

The paper introduces DEL, a framework for differentially private and communication-efficient split inference of large language models (LLMs). DEL uses an embedding projection module and differentially private stochastic quantization to reduce communication overhead while preserving privacy. It then employs soft prompts on the server side to mitigate utility degradation caused by the privacy mechanisms, eliminating the need for local models.

Introduces a novel framework, DEL, that leverages soft prompts to improve the privacy-utility trade-off in LLM split inference, achieving differential privacy and communication efficiency.

Richeng Jin2602.11513

Inference & QuantizationDistributed Systems & HardwareNatural Language Processing

2d ago

EmoSpace: Fine-Grained Emotion Prototype Learning for Immersive Affective Content Generation

The paper introduces EmoSpace, a framework for emotion-aware content generation that learns dynamic emotion prototypes via vision-language alignment to enable fine-grained emotional control in VR content creation. EmoSpace uses a hierarchical emotion representation with learnable prototypes that evolve during training, allowing for control without explicit emotion labels. Experiments demonstrate EmoSpace's superior performance in emotional image outpainting, stylized generation, and emotional panorama generation, further validated by a user study comparing emotional perception in VR versus desktop environments.

Introduces a novel emotion-aware content generation framework, EmoSpace, that learns dynamic, interpretable emotion prototypes through vision-language alignment.

Zeyu Wang2602.11658

Multimodal ModelsComputer VisionNatural Language Processing

BNU-BNBU Institute of Artificial Intelligence and Future Networks2d ago

Meta-Sel: Efficient Demonstration Selection for In-Context Learning via Supervised Meta-Learning

The paper introduces Meta-Sel, a supervised meta-learning approach for efficient demonstration selection in in-context learning, which addresses the challenge of selecting optimal few-shot examples under a limited prompt budget. Meta-Sel learns a scoring function based on TF-IDF cosine similarity and length-compatibility ratio between candidate demonstrations and queries, trained on a meta-dataset constructed from training data using class agreement as supervision. Empirical evaluation across four intent datasets and five LLMs demonstrates that Meta-Sel achieves competitive accuracy and selection-time overhead compared to 12 other demonstration selection methods, especially benefiting smaller models.

Introduces Meta-Sel, a lightweight supervised meta-learning approach that learns a fast, interpretable scoring function for selecting demonstrations for in-context learning.

Xubin Wang2602.12123

Natural Language ProcessingTraining Efficiency & OptimizationRecommendation & Information Retrieval

Università della Svizzera2d ago

Studying Quality Improvements Recommended via Manual and Automated Code Review

This paper investigates the overlap between code review comments generated by human reviewers and those produced by ChatGPT-4, focusing on the types of quality improvements recommended. The authors manually classified 739 human-generated comments from 240 pull requests and compared them to ChatGPT-4's recommendations on the same PRs. Results indicate that while ChatGPT-4 suggests more changes overall, it only identifies 10% of the issues flagged by humans, though 40% of ChatGPT-4's additional suggestions are valuable, highlighting the complementary nature of both approaches.

Quantifies the overlap and differences in quality improvement recommendations between human code reviewers and ChatGPT-4, revealing the strengths and weaknesses of each approach.

Giuseppe Crupi, Rosalia Tufano, Gabriele Bavota2602.11925

Code Generation & Program SynthesisEval Frameworks & BenchmarksNatural Language Processing

Basque Center for Applied Mathematics (BCAM)2d ago

Safe Fairness Guarantees Without Demographics in Classification: Spectral Uncertainty Set Perspective

This paper addresses the challenge of achieving fairness in classification without relying on demographic information by proposing a novel minimax-fair method called SPECTRE. SPECTRE adjusts the spectrum of a Fourier feature mapping and constrains the deviation of the worst-case distribution from the empirical distribution, mitigating the over-pessimism of existing robust optimization techniques. Empirical results on American Community Survey datasets across 20 states demonstrate that SPECTRE achieves superior fairness guarantees and robustness compared to state-of-the-art methods, even those with access to demographic data.

Introduces SPECTRE, a minimax-fair classification method that enhances fairness without demographic information by adjusting the spectrum of a Fourier feature mapping and constraining the worst-case distribution's deviation from the empirical distribution.

Ainhize Barrainkua, Santiago Mazuelas, Novi Quadrianto +12602.11785

Constitutional AI & AI EthicsNatural Language Processing

2d ago

Automatic Simplification of Common Vulnerabilities and Exposures Descriptions

This paper investigates the application of large language models (LLMs) to automatic text simplification (ATS) of Common Vulnerability and Exposure (CVE) descriptions, a task previously unexplored in the cybersecurity domain. The authors created a baseline for cybersecurity ATS and a test dataset of 40 CVE descriptions, which were evaluated by cybersecurity experts. Results indicate that while LLMs can superficially simplify text, they often fail to preserve the original meaning.

Establishes a baseline and dataset for automatic text simplification of cybersecurity vulnerability descriptions using large language models.

Varpu Vehomaki, K. Kaski2602.11982

Natural Language Processing

2d ago

A Large Language Model for Disaster Structural Reconnaissance Summarization

This paper introduces LLM-DRS, a novel Large Language Model (LLM)-based framework for disaster reconnaissance summarization in structural health monitoring. The framework integrates vision data and metadata from on-site investigations, using deep convolutional neural networks to extract key attributes like damage state and material type. The extracted data, along with carefully designed prompts, are then fed into an LLM to generate summary reports for individual structures or affected regions.

Introduces a novel LLM-based framework, LLM-DRS, that automates the generation of structural reconnaissance reports by integrating vision data, metadata, and deep learning-extracted attributes.

Yuqing Gao, Guanren Zhou, K. Mosalam2602.11588

Natural Language ProcessingComputer VisionMultimodal Models

2d ago

ExtractBench: A Benchmark and Evaluation Methodology for Complex Structured Extraction

The authors introduce ExtractBench, a new benchmark and evaluation framework for end-to-end PDF-to-JSON structured extraction, designed to address the lack of comprehensive benchmarks and principled evaluation methodologies for complex, nested extraction tasks. ExtractBench comprises 35 PDF documents paired with JSON Schemas and human-annotated gold labels across diverse domains, resulting in 12,867 evaluatable fields with varying schema complexities. Evaluations using ExtractBench reveal that state-of-the-art LLMs struggle with realistic schemas, particularly as schema breadth increases, with some models achieving 0% valid output on a 369-field schema.

Introduces ExtractBench, a novel benchmark and evaluation framework, to address the limitations of existing methods in evaluating complex structured extraction from PDFs using LLMs.

Nick Ferguson, Josh Pennington, Narek Beghian +32602.12247

Eval Frameworks & BenchmarksNatural Language Processing

Lattice is designed for desktop

Natural Language Processing

Keywords

Top Labs in This Topic

Recent Papers