Artificial Intelligence news as of 6am UTC on Saturday, April 25, 2026

Every breakthrough. Every lab. Every day.

We track OpenAI, DeepMind, Anthropic, and 17 other labs daily - with AI-powered summaries, trend charts, and a weekly digest.

Choose from 100+ institutions to build your own feed

Safety & AlignmentCapabilitiesInfrastructureApplications

The papers worth reading, picked for you

We read everything so you don't have to. One email, zero noise.

AnthropicDeepMindOpenAI

Showing 50 of 107 selected papers · 50 of 497 other papers

This week - Selected (50)The latest this week from Selected Labs (50)

Apr 23, 2026

Tsinghua AI2d ago

UniGenDet: A Unified Generative-Discriminative Framework for Co-Evolutionary Image Generation and Generated Image Detection

By unifying generative and discriminative models, UniGenDet achieves state-of-the-art image generation and detection, proving that the best fakes are made with a deep understanding of what makes them detectable.

Yanran Zhang, Wenzhao Zheng, Yifei Li +5

Architecture Design (Transformers, SSMs, MoE)Computer Vision Data Curation & Synthetic Data

CMU ML2d ago·also Datadog

ARFBench: Benchmarking Time Series Question Answering Ability for Software Incident Response

Even GPT-5 only achieves 63% accuracy on time series anomaly questions from real software incidents, but a model-expert combination reaches 87%, highlighting the potential for hybrid intelligence in incident response.

Stephan Xie, Ben Cohen, Mononito Goswami +6

Eval Frameworks & Benchmarks Multimodal Models Natural Language Processing

CMU ML2d ago·also NTU, UB

A Deployable Embodied Vision-Language Navigation System with Hierarchical Cognition and Context-Aware Exploration

Real-world robots can now navigate complex environments with human-level instructions, thanks to a new system that combines efficient perception with high-level reasoning, all while running in real-time on limited hardware.

Kuan Xu, Ruimeng Liu, Yizhuo Yang +5

Computer Vision Multimodal Models Robotics & Embodied AI

2d ago·also Tsinghua AI, Hengqin Laboratory, Sheffield

Reinforcing 3D Understanding in Point-VLMs via Geometric Reward Credit Assignment

Point-VLMs can learn to see the world as it really is: targeted reward assignment and cross-modal verification nearly close the reality gap in 3D reasoning.

Jingkun Chen, Ru Xu, Mingqi Gao +2

Computer Vision Multimodal Models Robotics & Embodied AI

UW2d ago

An effective variant of the Hartigan $k$-means algorithm

A surprisingly simple tweak to Hartigan's k-means algorithm unlocks another 2-5% accuracy boost, especially when clustering high-dimensional data.

Training Efficiency & Optimization

2d ago·also Microsoft Research, University of Wyoming

Position Paper: Denial-of-Service Against Multi-Round Transaction Simulation

MEV searchers beware: a new, low-cost DoS attack can cripple transaction bundling services like Flashbots by exploiting inter-transaction dependencies and atomic block inclusion.

Yuzhe Tang, Yibo Wang, Wanning Ding +2

Distributed Systems & Hardware Red-Teaming & Adversarial Robustness

DAMO2d ago

ReaGeo: Reasoning-Enhanced End-to-End Geocoding with LLMs

LLMs can now directly predict geographic coordinates with high accuracy, even for vague locations and complex regions, bypassing the need for traditional geocoding pipelines.

Gong Wenbin

Natural Language Processing Reasoning & Chain-of-Thought Recommendation & Information Retrieval

Tsinghua AI2d ago

Do MLLMs Understand Pointing? Benchmarking and Enhancing Referential Reasoning in Egocentric Vision

MLLMs often *hallucinate* the referent of a pointing gesture, latching onto nearby or salient objects instead of truly understanding spatial semantics.

Chentao Li, Zirui Gao, Mingze Gao +3

Eval Frameworks & Benchmarks Multimodal Models Robotics & Embodied AI

Tsinghua AI2d ago

Provably Secure Steganography Based on List Decoding

Unlock higher-capacity covert communication with LLMs: a new steganography scheme uses list decoding to substantially outperform existing methods without sacrificing security or efficiency.

Kaiyi Pang, Minhao Bai

Natural Language Processing

2d ago·also DAMO

Counterfactual Multi-task Learning for Delayed Conversion Modeling in E-commerce Sales Pre-Promotion

Predicting pre-promotion conversions in e-commerce gets a boost with a new model that understands how users "window shop" before sales actually start.

Kaiyuan Li

Natural Language Processing Recommendation & Information Retrieval

2d ago·also NUS, Beihang, Passau

Generalizing Test Cases for Comprehensive Test Scenario Coverage

Stop writing incomplete tests: TestGeneralizer can automatically expand your existing tests to cover 31% more scenarios and catch more bugs.

Yun Lin, Xinyi Weng, Hailong Sun +2

Code Generation & Program Synthesis

Apr 22, 2026

CMU ML3d ago

SkillLearnBench: Benchmarking Continual Learning Methods for Agent Skill Generation on Real-World Tasks

Continual learning for LLM agents hits a wall: scaling models doesn't reliably improve skill generation, and self-feedback can lead to recursive drift.

Shan Zhong, Shanshan Zhong, Yi Lu +17

Eval Frameworks & Benchmarks Robotics & Embodied AI Tool Use & Agents

The papers worth reading, picked for you

We read everything so you don't have to. One email, zero noise.

Tsinghua AI3d ago·also Huawei, Shenzhen University

GRPO-VPS: Enhancing Group Relative Policy Optimization with Verifiable Process Supervision for Effective Reasoning

LLMs can reason more effectively by directly tracking their own belief in the correct answer throughout the reasoning process, enabling more targeted policy updates.

Jingyi Wang, Lei Zhu, Tengjin Weng +8

Reasoning & Chain-of-Thought RLHF & Preference Learning

MIT CSAIL3d ago·also Perseus Labs

pAI/MSc: ML Theory Research with Humans on the Loop

Imagine slashing the human effort needed to go from hypothesis to submission-ready ML theory paper by orders of magnitude.

Mahmoud Abdelmoneum, Pierfrancesco Beneventano, Tomaso Poggio

Open-Source Models & Weights Scientific Discovery & Drug Design Tool Use & Agents

D observations into3d ago·also NUS, Tsinghua AI, CAS, DGS-based methods [47 +2

PokeVLA: Empowering Pocket-Sized Vision-Language-Action Model with Comprehensive World Knowledge Guidance

Pocket-sized VLA models can now achieve state-of-the-art robot manipulation performance by pre-training on a curated multimodal dataset and injecting manipulation-relevant representations into the action space.

Yupeng Zheng, Xiang Li, Songen Gu +11

Multimodal Models Robotics & Embodied AI

MIT CSAIL3d ago·also Bristol, Conservation X Labs, Cornell, CTU Prague +10

Centering Ecological Goals in Automated Identification of Individual Animals

Automated identification of individual animals can only be effective if it aligns with ecological questions and data practices, not just algorithmic accuracy.

Lukas Picek, Timm Haucke, Lukáš Adam +16

Computer Vision Scientific Discovery & Drug Design

ETH3d ago

Participatory provenance as representational auditing for AI-mediated public consultation

AI-driven summaries of public consultations can systematically exclude dissenting voices, raising concerns about biased policy recommendations even when individual outputs seem reasonable.

Sachit Mahajan

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

RoboScience3d ago·also NUS, HUST, SCUT

FingerEye: Continuous and Unified Vision-Tactile Sensing for Dexterous Manipulation

A low-cost, compact sensor provides continuous vision-tactile feedback, enabling robots to "see" and "feel" their way through dexterous manipulation tasks.

Xuanye Wu, Tianyu Qiu

Computer Vision Robotics & Embodied AI

Amazon Science3d ago

Text-to-Distribution Prediction with Quantile Tokens and Neighbor Context

Directly embedding quantile tokens into input sequences leads to sharper and more accurate distribution predictions, outperforming traditional methods by a substantial margin.

Yilun Zhu, Zhuang Yuan, Nikhita Vedula +6

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing

3d ago·also Microsoft Research, KAUST, Northeastern, University of Missouri

LAFA: A Framework for Reproducible Longitudinal Assessment of Protein Function Annotation Models

Continuous benchmarking of protein function prediction models is now possible, enabling faster iteration and more robust performance tracking as annotations evolve.

An Phan, Yanli Wang, Frimpong Boadu +5

Eval Frameworks & Benchmarks Scientific Discovery & Drug Design

3d ago·also Tsinghua AI, Fudan, Hamburg, Hubei University of Chinese Medicine

ALAS: Adaptive Long-Horizon Action Synthesis via Async-pathway Stream Disentanglement

Achieve superhuman dexterity: ALAS unlocks robust long-horizon task completion by decoupling environment understanding from motor control, enabling generalization across diverse human-scene interaction scenarios.

Yutong Shen, Hangxu Liu, Lei Zhang +4

Robotics & Embodied AI World Models & Planning

School of Computer Science and Software Engineering3d ago·also Tsinghua AI, University of Nottingham, Wenzhou Medical University

X-PCR: A Benchmark for Cross-modality Progressive Clinical Reasoning in Ophthalmic Diagnosis

MLLMs still struggle to integrate diverse data for clinical reasoning, as evidenced by their poor performance on a new ophthalmology benchmark spanning image quality assessment to diagnosis.

Gui Wang, Zehao Zhong, YongSong Zhou +6

Eval Frameworks & Benchmarks Multimodal Models Reasoning & Chain-of-Thought

3d ago·also ETH, AI Center Tübingen, ELLIS, Tübingen

Efficient Test-Time Inference via Deterministic Exploration of Truncated Decoding Trees

Deterministic decoding can outperform stochastic self-consistency in constrained domains by systematically exploring high-probability reasoning traces, leading to better performance with less computation.

Johannes Zenn, Guinan Su, Mrinmaya Sachan +1

Code Generation & Program Synthesis Inference & Quantization Reasoning & Chain-of-Thought

Google Research3d ago

RSRCC: A Remote Sensing Regional Change Comprehension Benchmark Constructed via Retrieval-Augmented Best-of-N Ranking

Current remote sensing change captioning datasets miss fine-grained localized semantic reasoning, but RSRCC fills this gap with 126k change-specific questions.

Roie Kazoom, Yotam Gigi, George Leifman +2

Computer Vision Eval Frameworks & Benchmarks Multimodal Models

The papers worth reading, picked for you

We read everything so you don't have to. One email, zero noise.

DAMO3d ago

Learning to Evolve: A Self-Improving Framework for Multi-Agent Systems via Textual Parameter Graph Optimization

TPGO allows multi-agent systems to learn from their own optimization history, leading to unprecedented self-improvement in performance.

Shan He, Runze Wang, Zhuoyun Du +4

Natural Language Processing Tool Use & Agents Training Efficiency & Optimization

3d ago·also Google Research, VIA Research Center

R-CoV: Region-Aware Chain-of-Verification for Alleviating Object Hallucinations in LVLMs

LVLMs can self-detect and correct object hallucinations by focusing on specific image regions, offering a simple, training-free fix.

Jiahao Xie, Alessio Tonioni, Nathalie Rauschmayr +2

Computer Vision Multimodal Models Reasoning & Chain-of-Thought

Google Research3d ago·also Max Planck

Semantic Recall for Vector Search

Stop penalizing your ANN search algorithms for failing to retrieve irrelevant neighbors – Semantic Recall offers a more nuanced and effective way to measure retrieval quality.

Leonardo Kuffó, Ioanna Tsakalidou, Roberta De Viti +3

Eval Frameworks & Benchmarks Recommendation & Information Retrieval

CMU ML3d ago·also HKU, INFIFORCE Intelligent Technology

Occupancy Reward Shaping: Improving Credit Assignment for Offline Goal-Conditioned Reinforcement Learning

Extracting temporal geometry from generative models can boost reinforcement learning performance by over 2x without changing the optimal policy.

Aravind Venugopal, Jiayu Chen, Xudong Wu +3

World Models & Planning

3d ago·also Microsoft Research, California State Polytechnic University

Auditing and Controlling AI Agent Actions in Spreadsheets

Users who actively participate in an AI agent's spreadsheet execution not only improve task outcomes, but also gain a deeper understanding and feel more ownership over the results.

Sadra Sabouri, Zeinabsadat Saghi, Run Huang +4

Constitutional AI & AI Ethics Interpretability & Mechanistic Interp Tool Use & Agents

Stanford HAI3d ago

The Origin of Edge of Stability

The trajectory of gradient descent is not random; it is systematically forced toward the critical threshold of $2/η$, revealing a hidden structure in neural network optimization.

Elon Litman

Architecture Design (Transformers, SSMs, MoE)Training Efficiency & Optimization

Mila3d ago

Generative Flow Networks for Model Adaptation in Digital Twins of Natural Systems

Sampling plausible configurations of digital twins can reveal multiple valid parameterizations, enhancing model adaptation in complex natural systems.

Pascal Archambault, Houari Sahraoui, Eugene Syriani

Scientific Discovery & Drug Design World Models & Planning

3d ago·also Microsoft Research, Independent

From Hidden Profiles to Governable Personalization: Recommender Systems in the Age of LLM Agents

LLMs are poised to flip the script on personalization, giving users unprecedented control over their data and how it's used across platforms.

Jiahao Liu, Mingzhe Han, Guanming Liu +5

Recommendation & Information Retrieval Tool Use & Agents

Apr 21, 2026

Tsinghua AI4d ago

Diff-SBSR: Learning Multimodal Feature-Enhanced Diffusion Models for Zero-Shot Sketch-Based 3D Shape Retrieval

Freezing a Stable Diffusion backbone and injecting CLIP and BLIP features lets you beat the state-of-the-art in zero-shot sketch-based 3D shape retrieval, without any costly retraining.

Hang Cheng, Fanhe Dong, Fanhe Dong +1

Computer Vision Multimodal Models Recommendation & Information Retrieval

Beijing Language and Culture University4d ago·also DAMO, ELLIS, HIT, IBM Research +4

CulturALL: Benchmarking Multilingual and Multicultural Competence of LLMs on Grounded Tasks

LLMs still struggle to reason in context when cultural and linguistic nuances are involved, achieving only 44% accuracy on a new grounded benchmark spanning 14 languages.

Wenjiang Luo, Haotian Ye, Md Mehrab Hossain +16

Eval Frameworks & Benchmarks Natural Language Processing

Amazon Science4d ago·also ASU

Expert Upcycling: Shifting the Compute-Efficient Frontier of Mixture-of-Experts

Expert upcycling lets you scale MoEs for 32% less compute by intelligently duplicating and specializing existing experts, challenging the need to train massive MoEs from scratch.

Chaitanya Dwivedi, Binxuan Huang, Himanshu Gupta +3

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Scaling Laws & Emergent Abilities+1

NUS4d ago·also University of Nottingham

UniCon3R: Contact-aware 3D Human-Scene Reconstruction from Monocular Video

Contact-aware reconstruction transforms how we achieve realistic human-scene interactions in 3D environments, correcting artifacts that have plagued previous methods.

Shashank Tripathi, Nikos Athanasiou, Kai Xu +1

Computer Vision Robotics & Embodied AI World Models & Planning

The papers worth reading, picked for you

We read everything so you don't have to. One email, zero noise.

4d ago·also DAMO

Thinking Before Matching: A Reinforcement Reasoning Paradigm Towards General Person Re-Identification

Achieve state-of-the-art person re-identification with only 20% of the data by explicitly teaching the model to "think" before matching identities.

Quan Zhang, Jingze Wu, Xiaohua Xie +2

Computer Vision Reasoning & Chain-of-Thought

Stanford HAI4d ago·also Macquarie

Are Large Language Models Economically Viable for Industry Deployment?

Forget chasing the biggest LLM – this benchmark reveals that smaller models (<2B params) can deliver 3x better energy efficiency and faster ROI in real-world industry deployments.

Abdullah Mohammad, Sushant Kumar Ray, Pushkar Arora +4

Distributed Systems & Hardware Eval Frameworks & Benchmarks Inference & Quantization

NVIDIA4d ago

Reducing the Offline-Streaming Gap for Unified ASR Transducer with Consistency Regularization

Bridging the offline-streaming gap in ASR is now more achievable: a single RNN-Transducer model can deliver high accuracy in both settings, thanks to a novel consistency regularization technique.

A.S. Andrusenko, Vladimir Bataev, Lilit Grigoryan +3

Architecture Design (Transformers, SSMs, MoE)Speech & Audio Training Efficiency & Optimization

Stanford HAI4d ago

FASTER: Value-Guided Sampling for Fast RL

Get the performance boost of expensive sampling-based RL policies for a fraction of the compute by learning to prune action candidates early in the diffusion denoising process.

Perry Dong, Alexander Swerdlow, Dorsa Sadigh +1

Robotics & Embodied AI Training Efficiency & Optimization

CMU ML4d ago

EmbodiedMidtrain: Bridging the Gap between Vision-Language Models and Vision-Language-Action Models via Mid-training

VLMs can be significantly boosted on embodied tasks by mid-training on a carefully curated subset of VLM data that is highly aligned with the VLA domain, rivaling the performance of much larger models.

Yiyang Du, Zhanqiu Guo, Xin Ye +2

Multimodal Models Robotics & Embodied AI Training Efficiency & Optimization

ETH4d ago·also Tsinghua AI, NTU, UMich

Revisiting RaBitQ and TurboQuant: A Symmetric Comparison of Methods, Theory, and Experiments

TurboQuant's claimed advantages over RaBitQ in quantization don't hold up under rigorous, reproducible comparison, raising questions about its practical utility.

Jianyang Gao, Yutong Gou, Yuexuan Xu +5

Inference & Quantization Open-Source Models & Weights Training Efficiency & Optimization

MIT CSAIL4d ago·also NYU

An Efficient Black-Box Reduction from Online Learning to Multicalibration, and a New Route to $Φ$-Regret Minimization

Forget complex fixed-point machinery: this work offers a dramatically simpler and more efficient route from external regret to $Φ$-regret minimization.

Gabriele Farina, Juan Carlos Perdomo

Natural Language Processing Training Efficiency & Optimization

DeepMind4d ago·also INRIA, Paris-Saclay, SequeL team

Planning in entropy-regularized Markov decision processes and games

Entropy regularization makes planning provably easy: SmoothCruiser achieves polynomial sample complexity in MDPs where standard methods fail.

Jean-Bastien Grill, Omar Darwiche Domingues, Pierre Ménard +2

Training Efficiency & Optimization World Models & Planning

4d ago·also CMU ML

VLA Foundry: A Unified Framework for Training Vision-Language-Action Models

End-to-end training of Vision-Language-Action models just got a whole lot easier: VLA Foundry unifies LLM, VLM, and VLA training in a single open-source framework.

Jean Mercat, Jean-Pierre Mercat, Sedrick Scott Keh +8

Multimodal Models Open-Source Models & Weights Robotics & Embodied AI

Google Research4d ago·also Bar-Ilan, Cambridge

Location Not Found: Exposing Implicit Local and Global Biases in Multilingual LLMs

Multilingual LLMs exhibit a surprising "American bias," even when prompted in other languages, and instruction tuning makes it worse.

Guy Mor-Lan, Omer Goldman, Matan Eyal +6

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

Tsinghua AI4d ago·also UCL, UT Austin

Large language models perceive cities through a culturally uneven baseline

LLMs don't see cities neutrally; their perception is skewed towards a culturally uneven baseline, favoring Western perspectives.

Rong Zhao, Wanqi Liu, Zhizhou Sha +2

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

NUS4d ago·also HIT, SCU, UMN

Taming Actor-Observer Asymmetry in Agents via Dialectical Alignment

LLM agents suffer from a human-like cognitive bias, Actor-Observer Asymmetry, leading them to make inconsistent judgments about their own and others' failures.

Rui Wu, Mong-Li Lee

Constitutional AI & AI Ethics Reasoning & Chain-of-Thought Tool Use & Agents

The papers worth reading, picked for you

We read everything so you don't have to. One email, zero noise.

Univ. Lille4d ago·also DeepMind, Centrale Lille, INRIA, Paris-Saclay

On two ways to use determinantal point processes for Monte Carlo integration

DPP-based Monte Carlo integration can offer variance reduction, but choosing the right DPP—fixed vs. tailored to the integrand—determines whether you get a biased but faster converging estimator or an unbiased but standard-rate estimator.

Guillaume Gautier, Rémi Bardenet, Michal Valko

Scientific Discovery & Drug Design Training Efficiency & Optimization

Tsinghua AI4d ago·also Sheffield

HarmoniDiff-RS: Training-Free Diffusion Harmonization for Satellite Image Composition

Training-free diffusion models can now harmonize satellite imagery across diverse domains, enabling scalable remote-sensing synthesis without retraining.

Xiaoqi Zhuang, Jefersson A. Dos Santos

Computer Vision Data Curation & Synthetic Data

This week - Other Labs (50)The latest this week from everyone else (50)

Apr 23, 2026

2d ago·also SJTU

TingIS: Real-time Risk Event Discovery from Noisy Customer Incidents at Enterprise Scale

LLMs, when combined with efficient indexing and noise reduction, can extract actionable insights from noisy customer incident data with high accuracy and low latency at enterprise scale.

Jun Wang, Ziyin Zhang, Rui Wang +3

Distributed Systems & Hardware Natural Language Processing Recommendation & Information Retrieval

2d ago

Dissecting clinical reasoning failures in frontier artificial intelligence using 10,000 synthetic cases

Automated expert-level evaluation across 10,000 cases characterised artificial intelligence clinical blind spots hitherto invisible to small-scale testing and should become standard for uncovering serious failures and implementing safety guardrails before clinical deployment exposes patients to risk.

S. D. Auger, J. Varley, M. Hargovan +1

2d ago·also D consistency. Vista, D-grounded priors for the video diffusion model. 3.2 Training with noisy multiview data So far, Eyeline Labs

Vista4D: Video Reshooting with 4D Point Clouds

Reshoot dynamic videos from entirely new perspectives with unprecedented realism and control, thanks to a novel 4D point cloud grounding.

Kuan Heng Lin, Zhizheng Liu, Pablo Salamanca +9

Computer Vision Multimodal Models Robotics & Embodied AI

2d ago

StyleID: A Perception-Aware Dataset and Metric for Stylization-Agnostic Facial Identity Recognition

Face recognition systems can be fooled by artistic stylization, but StyleID offers a way to train models to see past the style and recognize the person.

Kwan Yun, Changmin Lee, Ayeong Jeong +4

Computer Vision Data Curation & Synthetic Data Eval Frameworks & Benchmarks

2d ago

When Prompts Override Vision: Prompt-Induced Hallucinations in LVLMs

LVLMs are often tripped up not by faulty vision, but by over-trusting the textual prompt, leading to surprisingly easy-to-fix hallucinations.

Pegah Khayatan, Jayneel Parekh, Arnaud Dapogny +3

Eval Frameworks & Benchmarks Multimodal Models Red-Teaming & Adversarial Robustness

2d ago

Low-Rank Adaptation Redux for Large Models

Signal processing offers a surprisingly effective lens for understanding and improving LoRA, the reigning champ of parameter-efficient fine-tuning.

Bingcong Li, Yilang Zhang, Georgios B. Giannakis

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Training Efficiency & Optimization

2d ago·also Graz University of Technology

PrismaDV: Automated Task-Aware Data Unit Test Generation

Automatically generate data unit tests that actually catch the data errors that matter for your specific downstream tasks.

Hao Chen, Arnab Phani, Sebastian Schelter

Code Generation & Program Synthesis Data Curation & Synthetic Data

2d ago

Probably Approximately Consensus: On the Learning Theory of Finding Common Ground

Forget polling every user on every idea – this algorithm learns to find common ground by strategically asking for feedback on a few key statements.

Carter Blair, Ben Armstrong, Shiri Alouf-Heffetz +2

Natural Language Processing Recommendation & Information Retrieval

2d ago

There Will Be a Scientific Theory of Deep Learning

Forget philosophical debates: a practical "learning mechanics" is crystallizing to explain *how* deep learning works, not just *why* it should.

James B. Simon, D. Kunin, Alexander Atanasov +11

Architecture Design (Transformers, SSMs, MoE)Training Efficiency & Optimization

2d ago·also KCL, Research Centre Trust, TU Munich

Fairness under uncertainty in sequential decisions

Ignoring uncertainty in sequential decision-making disproportionately harms disadvantaged groups, but accounting for it can improve fairness without sacrificing institutional goals.

M. Lee, Kirtan Padh, David S. Watson +2

Constitutional AI & AI Ethics

2d ago

Transferable SCF-Acceleration through Solver-Aligned Initialization Learning

ML models can accurately predict quantum properties out-of-distribution, but still fail to accelerate SCF convergence – until now.

Eike S. Eberhard, Viktor Kotsev, Timm Guthle +1

Scientific Discovery & Drug Design Training Efficiency & Optimization

2d ago·also Samsung

A-THENA: Early Intrusion Detection for IoT with Time-Aware Hybrid Encoding and Network-Specific Augmentation

IoT intrusion detection gets a boost: A-THENA's time-aware encoding and network-specific augmentation beats state-of-the-art methods by up to 6.88% in accuracy, all while running on a Raspberry Pi Zero 2 W.

Ioannis Panopoulos, Maria-Lamprini A. Bartsioka, Sokratis Nikolaidis +3

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Red-Teaming & Adversarial Robustness

The papers worth reading, picked for you

We read everything so you don't have to. One email, zero noise.

2d ago

Generalizing Numerical Reasoning in Table Data through Operation Sketches and Self-Supervised Learning

Forget memorizing table headers: TaNOS unlocks surprisingly robust numerical reasoning by pre-training on operation sketches and correctness-guaranteed programs.

H. Cho, Gahyun Yoo, H. Kim +1

Data Curation & Synthetic Data Natural Language Processing Reasoning & Chain-of-Thought

2d ago·also Basque Center for Applied Mathematics (BCAM), Ikerbasque, University of the Basque Country (UPV/EHU)

A Green-Integral-Constrained Neural Solver with Stochastic Physics-Informed Regularization

PINNs can now efficiently solve highly oscillatory wave equations in heterogeneous media, thanks to a Green's function-based integral formulation that cuts computation by 10x and avoids absorbing boundary layers.

Mohammad Mahdi Abedi, David Pardo, T. Alkhalifah

Architecture Design (Transformers, SSMs, MoE)Scientific Discovery & Drug Design Training Efficiency & Optimization

2d ago

Even More Guarantees for Variational Inference in the Presence of Symmetries

Even when your variational approximation is wrong, symmetries in the target distribution can guarantee you still get the mean right.

Lena Zellinger, Antonio Vergari

Training Efficiency & Optimization

2d ago

Beyond Single Plots: A Benchmark for Question Answering on Multi-Charts

LLMs struggle to answer human-generated questions about multi-chart images, highlighting a critical gap in their ability to reason about real-world data visualizations.

Azher Ahmed Efat, Seok Hwan Song, Wallapak Tavanapong

Data Curation & Synthetic Data Eval Frameworks & Benchmarks Multimodal Models

2d ago·also Meituan

Understanding and Mitigating Spurious Signal Amplification in Test-Time Reinforcement Learning for Math Reasoning

Test-time RL's vulnerability to noisy pseudo-labels is amplified by group-relative advantage estimation, but can be mitigated with a surprisingly simple debiasing and denoising approach.

Yongcan Yu, Lingxiao He, Jian Liang +5

Reasoning & Chain-of-Thought RLHF & Preference Learning

2d ago·also JD.com, Tencent AI

Measure Twice, Click Once: Co-evolving Proposer and Visual Critic via Reinforcement Learning for GUI Grounding

Learnable critics that evaluate the model's own GUI grounding proposals, rather than relying on static geometric heuristics, unlock substantial gains in accuracy.

Wenkai Wang, Xiyun Li, Hongcan Guo +5

Computer Vision Multimodal Models Tool Use & Agents

2d ago

Learning Dynamic Representations and Policies from Multimodal Clinical Time-Series with Informative Missingness

Ignoring why clinical data is missing can lead to suboptimal treatment policies; this work shows how explicitly modeling informative missingness in multimodal time series data significantly improves both offline treatment policy learning and outcome prediction.

Zihan Liang, Ziwen Pan, Ruoxuan Xiong

Multimodal Models Natural Language Processing

2d ago

AEL: Agent Evolving Learning for Open-Ended Environments

Forget complex architectures: the secret to self-improving LLM agents lies in teaching them how to *interpret* their past failures, not just remember them.

Wujiang Xu, Jiaojiao Han, Minghao Guo +4

Tool Use & Agents World Models & Planning

2d ago

Who Defines"Best"? Towards Interactive, User-Defined Evaluation of LLM Leaderboards

LLM leaderboard rankings are more a reflection of benchmark designer priorities than actual user needs, but a new interactive visualization tool lets you reshape those rankings based on your specific prompt types and goals.

Mi-Gyeong Jung, Minjae Lee, Yejin Kim +2

Eval Frameworks & Benchmarks Natural Language Processing

2d ago

Thinking with Reasoning Skills: Fewer Tokens, More Accuracy

LLMs can be both faster and smarter: pre-learned reasoning skills cut down token usage while boosting accuracy on coding and math problems.

Guangxiang Zhao, Qi Shi, Xusen Xiao +3

Inference & Quantization Reasoning & Chain-of-Thought Tool Use & Agents

2d ago·also NIST

Agentic AI-assisted coding offers a unique opportunity to instill epistemic grounding during software development

Forget prompt engineering – GROUNDING.md lets you bake domain expertise directly into AI coding agents, ensuring scientific validity even when users aren't experts.

Magnus Palmblad, Jared M Ragland, Benjamin A. Neely

Code Generation & Program Synthesis Tool Use & Agents

JetBrains Research2d ago·also TU Delft

A Metamorphic Testing Approach to Diagnosing Memorization in LLM-Based Program Repair

LLMs' apparent success at program repair crumbles when faced with slightly altered versions of known bugs, revealing a reliance on memorization rather than true understanding.

Milan De Koning, Milan de Koning, Ali Asgari +5

Code Generation & Program Synthesis Data Curation & Synthetic Data Eval Frameworks & Benchmarks

The papers worth reading, picked for you

We read everything so you don't have to. One email, zero noise.

2d ago

Engaged AI Governance: Addressing the Last Mile Challenge Through Internal Expert Collaboration

AI governance risks becoming performative box-ticking unless practitioners understand how compliance directly improves system quality and user protection.

Simon Jarvers, O. Papakyriakopoulos

Constitutional AI & AI Ethics Natural Language Processing

2d ago·also B (2.53) outperforms low-compression

Do LLM Decoders Listen Fairly? Benchmarking How Language Model Priors Shape Bias in Speech Recognition

Counterintuitively, scaling up LLM decoders in speech recognition doesn't guarantee fairness; audio encoder design matters more, as Whisper's pathological hallucinations on Indian-accented speech and repetition loops under masking demonstrate.

Srishti Ginjala, E. Fosler-Lussier, Christopher Myers +1

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Speech & Audio

2d ago

Revisiting Non-Verbatim Memorization in Large Language Models: The Role of Entity Surface Forms

LLMs' factual knowledge is surprisingly brittle: simply changing an entity's surface form in a question (e.g., using an abbreviation instead of the full name) can drastically alter the answer.

Yuto Nishida, Naoki Shikoda, Yosuke Kishinami +4

Data Curation & Synthetic Data Eval Frameworks & Benchmarks Natural Language Processing

2d ago

Machine Behavior in Relational Moral Dilemmas: Moral Rightness, Predicted Human Behavior, and Model Decisions

LLMs may fail in real-world moral decisions because they rigidly adhere to fairness norms, even when their own internal models predict humans would prioritize loyalty.

Jiseon Kim, Jea Kwon, L. Vecchietti +3

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

2d ago

Multilinguality at the Edge: Developing Language Models for the Global South

Deploying language models in the Global South requires bridging the gap between multilingual NLP and edge computing, two fields that have largely evolved independently despite their shared goals.

Lester James Validad Miranda, Songbo Hu, Roi Reichart +1

Distributed Systems & Hardware Inference & Quantization Natural Language Processing

2d ago

When Bigger Isn't Better: A Comprehensive Fairness Evaluation of Political Bias in Multi-News Summarisation

Mid-sized LLMs can actually be *more* fair in news summarization than their larger counterparts, challenging the common wisdom of "bigger is better."

Nannan Huang, Iffat Maab, Junichi Yamagishi

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

2d ago·also Ministry of Education Key Laboratory of Intelligent Networks and Network Security, Shaanxi Province Key Laboratory of Big Data Knowledge Engineering

OptiVerse: A Comprehensive Benchmark towards Optimization Problem Solving

Even the most advanced LLMs like GPT-5.2 and Gemini-3 stumble on complex optimization problems, achieving only 27% accuracy on a new benchmark spanning stochastic, dynamic, and game optimization.

Xinyu Zhang, Boxuan Zhang, Yuchen Wan +5

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought

2d ago·also Anhui Province Key Laboratory of Digital

When Agents Look the Same: Quantifying Distillation-Induced Similarity in Tool-Use Behaviors

LLM agent distillation leads to surprisingly high rates of behavioral mimicry, with some student models exhibiting tool-use habits *more* similar to their teachers than the teacher's own family members.

Chen Yang, Yuning Zhang, Zhoufutu Wen +4

Eval Frameworks & Benchmarks Inference & Quantization Tool Use & Agents

2d ago·also School of Information Engineering

Unlocking the Power of Large Language Models for Multi-table Entity Matching

LLMs can significantly boost multi-table entity matching by cleverly coordinating attributes, embedding entities, and pruning noise.

Yingkai Tang, Taoyu Su, Wenyuan Zhang +2

Eval Frameworks & Benchmarks Natural Language Processing Tool Use & Agents

2d ago

Assessing the Impact of Requirement Ambiguity on LLM-based Function-Level Code Generation

LLMs' impressive code generation skills crumble when faced with the messy reality of ambiguous requirements, highlighting a critical gap in their ability to handle real-world software development scenarios.

Di Yang, Xinou Xie, Xiuwen Yang +7

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Natural Language Processing

2d ago

Reshoot-Anything: A Self-Supervised Model for In-the-Wild Video Reshooting

Training a video reshooting model on internet-scale monocular videos is now possible, thanks to a clever self-supervision trick that generates multi-view training data from a single video.

Avinash Paliwal, Adithya Iyer, Shivin Yadav +2

Computer Vision Data Curation & Synthetic Data Training Efficiency & Optimization

2d ago

Grounding Video Reasoning in Physical Signals

Current video Q&A benchmarks can be fooled by textual regularities, failing to actually ground reasoning in the video's physical reality.

Alibay Osmanli, Zixu Cheng, Shaogang Gong

Computer Vision Eval Frameworks & Benchmarks Multimodal Models

The papers worth reading, picked for you

We read everything so you don't have to. One email, zero noise.

2d ago·also PKU

DualSplat: Robust 3D Gaussian Splatting via Pseudo-Mask Bootstrapping from Reconstruction Failures

Turn your 3D Gaussian Splatting failures into features: DualSplat uses initial reconstruction artifacts to bootstrap robust scene representations in the presence of transient objects.

Xu Wang, Zhiru Wang, Shiyun Xie +2

Computer Vision

2d ago·also Tsinghua AI, Westlake

OmniFit: Multi-modal 3D Body Fitting via Scale-agnostic Dense Landmark Prediction

Achieve millimeter-level accuracy in 3D human body fitting from multi-modal inputs, even with scale distortion common in AI-generated assets.

Zeyu Cai, Yuliang Xiu, Renke Wang +8

Computer Vision Multimodal Models Robotics & Embodied AI

2d ago·also JD.com

KD-CVG: A Knowledge-Driven Approach for Creative Video Generation

Forget boring ads: this new method uses creative knowledge to generate videos that actually match product features and move realistically.

Linkai Liu, Wei Feng, Xi Zhao +9

Computer Vision Multimodal Models Natural Language Processing

2d ago·also Ritsumeikan University

WildSplatter: Feed-forward 3D Gaussian Splatting with Appearance Control from Unconstrained Images

Unlock real-time, high-quality 3D scene reconstruction from unconstrained images with varying lighting, thanks to a feed-forward Gaussian Splatting model that learns appearance embeddings.

Yuki Fujimura, Takahiro Kushida, Kazuya Kitano +2

Architecture Design (Transformers, SSMs, MoE)Computer Vision Training Efficiency & Optimization

2d ago

FingerViP: Learning Real-World Dexterous Manipulation with Fingertip Visual Perception

Robot hands get a serious upgrade: embedding cameras in fingertips unlocks robust manipulation in cluttered environments where traditional wrist-mounted cameras fail.

Zhen Zhang, Weinan Wang, Hejiang Sun +4

Computer Vision Robotics & Embodied AI

IIT2d ago·also Edinburgh

Leveraging SIMD for Accelerating Large-number Arithmetic

SIMD parallelism can finally unlock substantial speedups in large-number arithmetic by rethinking algorithms around data-parallel operations, yielding up to 19.3% throughput gains in scientific computing.

Subhrajit Das, Abhishek Bichhawat, Yuvraj Patel

Distributed Systems & Hardware Inference & Quantization Training Efficiency & Optimization

2d ago·also NTU, Ripple Labs, UCL

Systematizing Blockchain Research Themes and Design Patterns: Insights from the University Blockchain Research Initiative (UBRI)

Bridging the gap between blockchain research and real-world deployment requires navigating recurring design tensions like scalability vs. security, decentralization vs. governance, and privacy vs. compliance.

Chien-Chih Chen, Yitian Wang, Emma Nasseri +2

Architecture Design (Transformers, SSMs, MoE)Constitutional AI & AI Ethics Open-Source Models & Weights

2d ago·also Cluster of Excellence PhoenixD, RWTH

Data-Driven Thermal and Mechanical Modeling of Defective Covalent Organic Frameworks

COFs can withstand defects surprisingly well: mechanical properties remain stable even with defects, but thermal conductivity plummets, revealing design trade-offs.

A. Szewczyk, L. M. Sandonas, David Bodesheim +2

Data Curation & Synthetic Data Scientific Discovery & Drug Design

2d ago

How to quantify long-time rotational motion in molecular systems

Existing methods for quantifying molecular rotation break down when motion becomes complex, but this new method accurately captures rotational dynamics from fluid to solid states.

Romain Simon, Hadrien Bobas, Franccois Villemot +2

Scientific Discovery & Drug Design

2d ago

WPGRec: Wavelet Packet Guided Graph Enhanced Sequential Recommendation

Achieve state-of-the-art sequential recommendations by aligning multi-resolution temporal dynamics with graph propagation at matching scales.

Peilin Liu, Zhiquan Ji, Gang Yan

Architecture Design (Transformers, SSMs, MoE)Recommendation & Information Retrieval

2d ago·also IBM Research

Towards Universal Tabular Embeddings: A Benchmark Across Data Tasks

Turns out, the best way to represent tabular data depends heavily on the task at hand, so a one-size-fits-all tabular foundation model may be a mirage.

Liane Vogel, Liane Vogel, Kavitha Srinivas +9

Eval Frameworks & Benchmarks Natural Language Processing Recommendation & Information Retrieval

2d ago

Planning Beyond Text: Graph-based Reasoning for Complex Narrative Generation

LLMs can write better stories if they plan the plot on a graph first.

Hanwen Gu, Chao Guo, Junle Wang +2

Natural Language Processing Reasoning & Chain-of-Thought World Models & Planning

The papers worth reading, picked for you

We read everything so you don't have to. One email, zero noise.

2d ago

Institutionalizing Best Practices in Research Computing: A Framework and Case Study for Improving User Onboarding

Frustrated by researchers struggling to access complex computing resources? This framework offers a practical solution for streamlining onboarding and boosting user success.

A. Chaturvedi, R. Pokorney, Elyn Fritz-Waters +4

Distributed Systems & Hardware

2d ago

On the Challenges of Holistic Intrusion Detection in ICS

Current ICS intrusion detection systems are too fragmented to effectively protect against sophisticated attacks targeting both cyber and physical components.

Stefan Lenz, Julia Raab, Benedikt Holzbach +3

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness