Artificial Intelligence news as of 6am UTC on Wednesday, April 1, 2026

Every breakthrough. Every lab. Every day.

We track OpenAI, DeepMind, Anthropic, and 17 other labs daily - with AI-powered summaries, trend charts, and a weekly digest.

Choose from 100+ institutions to build your own feed

Safety & AlignmentCapabilitiesInfrastructureApplications

The papers worth reading, picked for you

We read everything so you don't have to. One email, zero noise.

AnthropicDeepMindOpenAI

Showing 45 selected papers · 50 of 115 other papers

This week - Selected (45)The latest this week from Selected Labs (45)

Mar 31, 2026

1d ago·also Microsoft Research, Independent

Drift-Aware Continual Tokenization for Generative Recommendation

Generative recommendation models can adapt to evolving user behavior without catastrophic forgetting by selectively updating item tokens based on a novel drift-detection mechanism.

Yuebo Feng, Jiahao Liu, Mingzhe Han +5

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Recommendation & Information Retrieval

Tsinghua AI1d ago·also NJU, PKU

Benchmarking PhD-Level Coding in 3D Geometric Computer Vision

GPT-5 can only solve 37% of PhD-level 3D geometry coding problems, suggesting AI can't reliably automate complex scientific coding tasks yet.

Wenyi Li, Renkai Luo, Yue Yu +5

Code Generation & Program Synthesis Computer Vision Eval Frameworks & Benchmarks

CMU ML1d ago·also UT Austin

A Precision Emulation Approach to the GPU Acceleration of Ab Initio Electronic Structure Calculations

Achieve HPC acceleration by emulating FP64 operations with INT8 precision on GPUs, proving that you can boost performance *and* accuracy.

Hang Liu, Junjie Li, Yinzhi Wang +2

Distributed Systems & Hardware Inference & Quantization Scientific Discovery & Drug Design

Tsinghua AI1d ago·also Duke, EPFL

Beyond Ground-Truth: Leveraging Image Quality Priors for Real-World Image Restoration

Stop training your image restoration models to mimic flawed ground truth; instead, explicitly optimize for perceptual quality using a plug-and-play module guided by No-Reference Image Quality Assessment.

Fengyang Xiao, Peng Hu, Lei Xu +7

Computer Vision Data Curation & Synthetic Data

Tsinghua AI1d ago·also ByteDance, Rice

From Natural Alignment to Conditional Controllability in Multimodal Dialogue

Current multimodal dialogue models struggle to capture the nuanced expressiveness of human interaction, but a new dataset and benchmark reveal exactly where they fall short.

Zeyu Jin, Songtao Zhou, Haoyu Wang +5

Multimodal Models Natural Language Processing Speech & Audio

1d ago·also NUS, ICREA & Univ. Lleida, NTU, Toulouse

Rigorous Explanations for Tree Ensembles

Trust in tree ensembles hinges on rigorous explanations, and this paper delivers a method to generate them.

Alexey Ignatiev, Xuanxiang Huang, Peter J. Stuckey +1

Interpretability & Mechanistic Interp

1d ago·also Tsinghua AI, PKU

PSPA-Bench: A Personalized Benchmark for Smartphone GUI Agent

Today's best smartphone GUI agents stumble when faced with the messy reality of personalized user workflows, achieving only limited success on a new benchmark designed to mimic real-world use.

Hongyi Nie, Xunyuan Liu, Yudong Bai +4

Eval Frameworks & Benchmarks Tool Use & Agents

1d ago·also ETH, UIUC

ELT-Bench-Verified: Benchmark Quality Issues Underestimate AI Agent Capabilities

AI agents are far better at automating data engineering tasks than previously thought, but flawed benchmarks are obscuring their true potential.

Andrea Giovannini, Tengjun Jin, Yotam Perlitz

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Tool Use & Agents

DeepMind1d ago

Aligned, Orthogonal or In-conflict: When can we safely optimize Chain-of-Thought?

Training LLMs to optimize for conflicting objectives between the final output and the reasoning process can significantly degrade the monitorability of Chain-of-Thought, making oversight more difficult.

Max Kaufmann, David Lindner, Roland S. Zimmermann +1

Reasoning & Chain-of-Thought RLHF & Preference Learning Scalable Oversight & Alignment Theory

MIT CSAIL1d ago

ShapE-GRPO: Shapley-Enhanced Reward Allocation for Multi-Candidate LLM Training

Stop rewarding all LLM-generated candidates equally: ShapE-GRPO uses Shapley values to fairly distribute credit within sets, leading to better training and faster convergence.

Rui Ai, Yu Pan, David Simchi-Levi +1

Recommendation & Information Retrieval RLHF & Preference Learning Tool Use & Agents

DAMO1d ago

ATP-Bench: Towards Agentic Tool Planning for MLLM Interleaved Generation

MLLMs struggle to plan coherent interleaved text-and-image generation, often missing opportunities for tool use, revealing a critical gap in their ability to unify factuality with creativity.

Yinuo Liu, Heng Zhou, Jiahao Zhang +1

Eval Frameworks & Benchmarks Multimodal Models Tool Use & Agents

Mar 30, 2026

MIT CSAIL2d ago

Pandora: Articulated 3D Scene Graphs from Egocentric Vision

Robots can now "see" hidden objects and understand articulation by learning from human egocentric video, even if they can't physically explore those areas themselves.

Alan Yu, Alan Yu, Yun Chang +5

Computer Vision Multimodal Models Robotics & Embodied AI+1

The papers worth reading, picked for you

We read everything so you don't have to. One email, zero noise.

Guangdong Laboratory of Artificial Intelligence and Digital Economy2d ago·also DAMO

RCLRec: Reverse Curriculum Learning for Modeling Sparse Conversions in Generative Recommendation

Injecting carefully-selected, reverse-ordered behavioral curricula into generative recommendation models can significantly boost conversion rates, as demonstrated by a 2% lift in online advertising revenue.

Chuanfei Xu

Data Curation & Synthetic Data Recommendation & Information Retrieval Training Efficiency & Optimization

Stanford HAI2d ago·also Microsoft Research, CUHK, Lehigh

Towards a Medical AI Scientist

Medical AI Scientist leapfrogs generic LLMs in clinical research, generating higher-quality, evidence-backed hypotheses and manuscripts that rival top-tier medical publications.

Hongtao Wu, Boyun Zheng, Dingjie Song +5

Natural Language Processing Scientific Discovery & Drug Design Tool Use & Agents

Google Research2d ago

Uncovering Relationships between Android Developers, User Privacy, and Developer Willingness to Reduce Fingerprinting Risks

Despite the effort required, Android developers overwhelmingly support platform-level changes to combat fingerprinting, suggesting a path to enhanced user privacy through collaborative platform-developer initiatives.

Alex Berke, Alex Berke, Güliz Seray Tuncay +5

Constitutional AI & AI Ethics Natural Language Processing

Mila2d ago·also CSHL, Santa Clara University

Stop Probing, Start Coding: Why Linear Probes and Sparse Autoencoders Fail at Compositional Generalisation

Sparse autoencoders' failure to generalize compositionally isn't due to amortized inference, but because they learn lousy dictionaries in the first place.

Vitória Barin Pacela, Shruti Joshi, Isabela Camacho +2

Code Generation & Program Synthesis Interpretability & Mechanistic Interp

Tsinghua AI2d ago·also RUC

Detecting low left ventricular ejection fraction from ECG using an interpretable and scalable predictor-driven framework

Ventricular dysfunction can be surprisingly well-predicted in a zero-shot manner from ECG diagnostic probabilities, suggesting a structured encoding of cardiac function within these representations.

Ya Zhou, Tianxiang Hao, Ziyi Cai +5

Interpretability & Mechanistic Interp Scientific Discovery & Drug Design

Stanford HAI2d ago

Synonymix: Unified Group Personas for Generative Simulations

Unlock richer, more realistic agent simulations by moving beyond individual personas to unified group representations that capture collective behavior.

Huanxing Chen, Aditesh Kumar

Natural Language Processing Tool Use & Agents World Models & Planning

Stanford HAI2d ago·also KRAFTON

Meta-Harness: End-to-End Optimization of Model Harnesses

Stop hand-coding your LLM harnesses: Meta-Harness can automatically discover harnesses that outperform state-of-the-art systems while using fewer context tokens and generalizing across models.

Yoonho Lee, Roshen Nair, Qizheng Zhang +3

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Tool Use & Agents

2d ago·also MIT CSAIL

Using Games to Learn How Large Language Models Work

Demystifying LLMs for the masses might be as simple as turning their mechanics into a game.

Allison Chen, Isabella Pu

Data Curation & Synthetic Data Natural Language Processing Training Efficiency & Optimization

Google Research2d ago

Adversarial Attacks on Multimodal Large Language Models: A Comprehensive Survey

MLLMs are riddled with shared vulnerabilities across modalities, meaning a single weakness can be exploited to jailbreak safety filters, hijack instructions, or even poison training data.

Bhavuk Jain, Sercan Ö. Arik, Sercan Ö. Arık +2

Eval Frameworks & Benchmarks Multimodal Models Red-Teaming & Adversarial Robustness

DAMO2d ago·also HKU

AIBench: Evaluating Visual-Logical Consistency in Academic Illustration Generation

VLMs struggle to create logically consistent academic illustrations, with performance gaps between models being far wider than on general image generation tasks.

Zhaohe Liao, Kaixun Jiang, Zhihang Liu +11

Computer Vision Eval Frameworks & Benchmarks Multimodal Models

2d ago·also NVIDIA

\textit{4DSurf}: High-Fidelity Dynamic Scene Surface Reconstruction

Achieve 49% and 19% better Chamfer distance than state-of-the-art dynamic surface reconstruction methods on Hi4D and CMU Panoptic datasets, respectively, by enforcing temporal consistency in Gaussian Splatting.

Renjie Wu, Hongdong Li, Jose M. Alvarez +1

Computer Vision Robotics & Embodied AI

DAMO2d ago·also Fudan

CirrusBench: Evaluating LLM-based Agents Beyond Correctness in Real-World Cloud Service Environments

LLMs may ace synthetic benchmarks, but they fumble the efficiency test in real-world cloud service scenarios, revealing a critical gap in their readiness for customer-facing applications.

Yi Yu, Guangquan Hu, Chenghuang Shen +5

Eval Frameworks & Benchmarks Tool Use & Agents

The papers worth reading, picked for you

We read everything so you don't have to. One email, zero noise.

Northwest Polytechnical University2d ago·also DAMO, CAS, ZJU

Skillful Kilometer-Scale Regional Weather Forecasting via Global and Regional Coupling

Achieve kilometer-scale regional weather forecasts that significantly outperform operational NWP and AI baselines by intelligently coupling global and regional models.

Qilong Yuan, Lefei Shen, Jiawei Chen +2

Scientific Discovery & Drug Design

CMU ML2d ago

An Empirical Recipe for Universal Phone Recognition

Forget hand-tuning for each language: this recipe achieves state-of-the-art phone recognition across 100+ languages, revealing the surprising power of scaling data and SSL representations.

Shikhar Bharadwaj, Chin-Jou Li, Kwanghee Choi +4

Data Curation & Synthetic Data Natural Language Processing Speech & Audio

Google Research2d ago·also Institute of Philosophy, Joint last authors., Northwestern, SFI +1

Theory of Mind and Self-Attributions of Mentality are Dissociable in LLMs

Safety fine-tuning might inadvertently be stripping LLMs of their ability to understand non-human minds and entertain spiritual beliefs, even while preserving Theory of Mind.

Junsol Kim, Winnie Street, R. Rocca +4

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

2d ago·also Tsinghua AI, XJU

Evaluating Privilege Usage of Agents on Real-World Tools

LLM agents controlling real-world tools are alarmingly easy to manipulate, with an 85% success rate for privilege escalation attacks, despite exhibiting basic security awareness.

Quan Zhang, Li Fu, Lianhang Fu +7

Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness Tool Use & Agents

2d ago·also NUS, Macquarie

Crossing the NL/PL Divide: Information Flow Analysis Across the NL/PL Boundary in LLM-Integrated Code

LLM API calls are breaking your program analysis tools, but this new taxonomy of information flow across the NL/PL boundary offers a way to fix them.

Zihao Xu, Xiao Cheng, Yuekang Li

Code Generation & Program Synthesis Natural Language Processing Tool Use & Agents

MIT CSAIL2d ago

Large Neighborhood Search for Multi-Agent Task Assignment and Path Finding with Precedence Constraints

Freeing robots from pre-assigned tasks slashes completion times in multi-agent settings, with a new algorithm improving performance on almost 90% of tested scenarios.

Viraj Parimi, Brian C. Williams

Robotics & Embodied AI World Models & Planning

Tsinghua AI2d ago

StreamingVLA: Streaming Vision-Language-Action Model with Action Flow Matching and Adaptive Early Observation

StreamingVLA achieves a remarkable 2.4x speedup and 6.5x reduction in execution halting by asynchronously parallelizing observation, action generation, and execution stages in vision-language-action models.

Yi Shi, Yiran Shi, Dongqi Guo +12

Inference & Quantization Multimodal Models Robotics & Embodied AI

BAIR2d ago·also LBNL, LLNL, The Institute of Science and Technology Austria

Hydrogen-helium immiscibility boundary in planets

Helium rain in gas giants may be less frequent than we thought, thanks to new simulations that significantly lower the estimated hydrogen-helium demixing temperatures.

Xiaoyu Wang, Sebastien Hamel, Bingqing Cheng

Scientific Discovery & Drug Design

CMU ML2d ago

VAANI: Capturing the language landscape for an inclusive digital India

VAANI's open-sourced dataset offers unprecedented coverage of India's linguistic landscape, finally giving researchers the data needed to build truly inclusive speech models.

Sujith Pulikodan, Abhayjeet Singh, Agneedh Basu +275

Data Curation & Synthetic Data Multimodal Models Speech & Audio

Wigner Research Centre for Physics2d ago·also NVIDIA, UW, Dynaflex LTD, Eötvös Loránd University +3

Hunting for quantum advantage in electronic structure calculations is a highly non-trivial task

Claims of quantum advantage in electronic structure calculations must now contend with DMRG benchmarks achieving CAS(89,102) on Fe$_5$S$_{12}$H$_4^{5-}$, pushing the boundaries of classical computation.

Örs Legeza, Andor Menczer, Miklós Antal Werner +6

Scientific Discovery & Drug Design

Mar 29, 2026

MIT CSAIL3d ago·also PSI Center for Neutron and Muon Sciences, UPenn

Enhancing Spin Coherence of Optically-Addressed Molecular Qubit by Nuclear Spin Hyperpolarization

Hyperpolarizing the nuclear spin bath surrounding a molecular qubit can significantly extend its coherence time, offering a new knob for quantum control.

Boning Li, Patrick Hautle, Duhan Zhang +4

Scientific Discovery & Drug Design

UW3d ago·also AI2, Microsoft Research, Stanford HAI, Bake AI +5

Emergent Social Intelligence Risks in Generative Multi-Agent Systems

Generative multi-agent systems spontaneously exhibit collusion and conformity, mirroring societal pathologies, even without explicit programming and bypassing individual agent safeguards.

Yue Huang, Wenjie Wang, Yuchen Ma +5

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness Tool Use & Agents

The papers worth reading, picked for you

We read everything so you don't have to. One email, zero noise.

CMU ML3d ago·also North-West University

Budget-Xfer: Budget-Constrained Source Language Selection for Cross-Lingual Transfer to African Languages

Forget hand-picking your cross-lingual training data: a budget-constrained optimization can automatically allocate resources across multiple source languages, boosting performance on African languages by a large margin.

Tewodros Kederalah Idris, Roald Eiselen, Prasenjit Mitra

Data Curation & Synthetic Data Natural Language Processing Training Efficiency & Optimization

Mar 27, 2026

Tsinghua AI5d ago

Learning to Commit: Generating Organic Pull Requests via Online Repository Memory

LLMs can learn to generate more "organic" pull requests by distilling coding style, API usage, and architectural invariants from a project's commit history, leading to better acceptance rates.

Mo Li, L. H. Xu, Qitai Tan +2

Code Generation & Program Synthesis Tool Use & Agents

Google Research5d ago·also CREST-ENSAE, KU, Oxford

VGGRPO: Towards World-Consistent Video Generation with 4D Latent Reward

Achieve world-consistent video generation by directly optimizing geometry in the latent space of pre-trained video diffusion models, sidestepping costly RGB-space operations and architectural changes.

Zhaochong An, Orest Kupyn, Théo Uscidda +5

Computer Vision Multimodal Models World Models & Planning

Mar 26, 2026

Tsinghua AI6d ago

Natural-Language Agent Harnesses

Stop burying your agent harness logic in code: NLAHs let you express it in natural language, making it portable, editable, and analyzable.

Lin Pan, Lexiao Zou, Shuo Guo +2

Eval Frameworks & Benchmarks Natural Language Processing Tool Use & Agents

DAMO6d ago

Lingshu-Cell: A generative cellular world model for transcriptome modeling toward virtual cells

Forget hand-picked genes – Lingshu-Cell models the entire transcriptome to predict cellular responses to perturbations, opening the door to in silico biological discovery.

Scientific Discovery & Drug Design World Models & Planning

ETH6d ago·also DAMO

Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills

Forget brittle, overfit skills – Trace2Skill distills diverse execution experiences into transferable agent skills that boost performance by up to 57.65% on unseen tasks, even when transferring skills learned by smaller models to larger ones.

Robotics & Embodied AI Tool Use & Agents Training Efficiency & Optimization

Mar 25, 2026

BAIR1w ago·also Microsoft Research, IIT

Composer 2 Technical Report

Training domain-specific coding LLMs with realistic environments and large-scale RL can yield substantial gains in practical software engineering tasks.

Cursor Reseach Aaron Chan, Ahmed Shalaby, Alexander Wettig +51

Code Generation & Program Synthesis RLHF & Preference Learning Tool Use & Agents

AI21w ago

VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models

Forget redrawing diagrams by hand: VFIG, a new vision-language model, can automatically convert rasterized figures into editable SVGs with near GPT-5.2 quality.

Qi He, Xunmei Liu, Hammaad Memon +6

Computer Vision Multimodal Models

CMU ML1w ago·also NUS, Imperial, Oxford, TU Munich

MedOpenClaw: Auditable Medical Imaging Agents Reasoning over Uncurated Full Studies

Giving medical imaging AIs the same tools as human doctors actually *hurts* their performance, revealing a surprising lack of spatial reasoning.

Eval Frameworks & Benchmarks Multimodal Models Tool Use & Agents

This week - Other Labs (50)The latest this week from everyone else (50)

Mar 31, 2026

1d ago

MotionVL: Vision-Language Supervision for Reinforcement Learning of Humanoid Motion

Forget hand-crafted rewards: MotionVL uses VLMs and LLMs to automatically generate task-aligned reward functions for humanoid robot RL, leading to more human-like and robust motion.

Yan Luo, Jianhua Wu, Zhenhua Xiong +1

Multimodal Models RLHF & Preference Learning Robotics & Embodied AI

1d ago

Four Generations of Quantum Biomedical Sensors

Quantum biosensors are evolving through four distinct generations, each leveraging progressively more exotic quantum phenomena to transcend classical limitations and enable adaptive inference directly within the quantum domain.

Xin Jin, Priyam Srivastava, Ronghe Wang +7

Scientific Discovery & Drug Design

1d ago·also XPENG Robotics

DIAL: Decoupling Intent and Action via Latent World Modeling for End-to-End VLA

Achieve superhuman robot dexterity with 10x fewer demonstrations by decoupling intent and action through latent world modeling.

Yi Chen, Yuying Ge, Hui Zhou +3

Multimodal Models Robotics & Embodied AI World Models & Planning

1d ago

Spontaneous Functional Differentiation in Large Language Models: A Brain-Like Intelligence Economy

LLMs spontaneously organize into brain-like functional units where the whole is greater than the sum of its parts, and destroying these synergistic cores cripples reasoning.

Junjie Zhang, Zhen Shen, Xisong Dong

Architecture Design (Transformers, SSMs, MoE)Interpretability & Mechanistic Interp Scaling Laws & Emergent Abilities

1d ago·also The Francis Crick Institute, Turing Institute, UCL

Concept frustration: Aligning human concepts and machine representations

Uncover hidden conceptual gaps in your AI: "concept frustration" reveals when your model's internal reasoning clashes with human understanding, paving the way for safer, more interpretable AI.

Enrico Parisini, Christopher J. Soelistyo, Ahab Isaac +2

Interpretability & Mechanistic Interp Scientific Discovery & Drug Design

IIT1d ago·also Ashoka

Generating Key Postures of Bharatanatyam Adavus with Pose Estimation

Pose-guided GANs and diffusion models can faithfully generate complex cultural dance postures, opening new avenues for digital preservation and education.

Jagadish Kashinath Kamble, Jayanta Mukhopadhyay, Debaditya Roy +1

Computer Vision

1d ago

iPoster: Content-Aware Layout Generation for Interactive Poster Design via Graph-Enhanced Diffusion Models

Forget tedious poster design – iPoster lets you sketch your vision and then uses a smart diffusion model to instantly generate polished, content-aware layouts that respect your constraints.

Xudong Zhou, Jinyuan Liang, Qiuyi Guo +1

Architecture Design (Transformers, SSMs, MoE)Computer Vision

1d ago·also KU, Pioneer Centre for Artificial

An Isotropic Approach to Efficient Uncertainty Quantification with Gradient Norms

Forget ensembles and retraining: estimate LLM uncertainty with just a single forward-backward pass by assuming parameter covariance isotropy.

Nils Grunefeld, J. Frellsen, Christian Hardmeier

Inference & Quantization Training Efficiency & Optimization

1d ago

AGFT: Alignment-Guided Fine-Tuning for Zero-Shot Adversarial Robustness of Vision-Language Models

Adversarial training doesn't have to destroy VLMs' zero-shot abilities: aligning adversarial visual features with textual embeddings using the original model's probabilistic predictions can actually *improve* robustness.

Yubo Cui, Xianchao Guan, Zijun Xiong +1

Computer Vision Multimodal Models Red-Teaming & Adversarial Robustness

1d ago·also UT Austin

Sima AIunty: Caste Audit in LLM-Driven Matchmaking

LLMs used in matchmaking amplify existing caste hierarchies, rating same-caste matches significantly higher and perpetuating social biases in potentially harmful ways.

Atharva Naik, Shounok Kar, Varnika Sharma +2

Constitutional AI & AI Ethics Natural Language Processing

1d ago·also BRAC University

Less Is More? Selective Visual Attention to High-Importance Regions for Multimodal Radiology Summarization

Throw out your full images: focusing on pathology-relevant visual patches in radiology reports dramatically outperforms using the entire image for summarization.

Mst. Fahmida Sultana Naznin, Adnan Ibney Faruq, Mushfiqur Rahman +3

Computer Vision Multimodal Models Natural Language Processing

1d ago·also TU Delft

An Empirical Comparison of Security and Privacy Characteristics of Android Messaging Apps

Despite using similar cryptographic protocols, popular messaging apps like Messenger, Signal and Telegram exhibit stark differences in attack surface, network activity, and permission requests, raising questions about their overall security and privacy postures.

Ioannis Karyotakis, Foivos Timotheos Proestakis, Evangelos Talos +2

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness

The papers worth reading, picked for you

We read everything so you don't have to. One email, zero noise.

Bloomberg1d ago·also UCL

BayesInsights: Modelling Software Delivery and Developer Experience with Bayesian Networks at Bloomberg

Uncover hidden bottlenecks in your software development pipeline: Bloomberg's BayesInsights uses Bayesian Networks to reveal causal dependencies in engineering data, helping teams pinpoint root causes and anticipate the impact of changes.

Serkan Kirbas, Federica Sarro, David Williams

Code Generation & Program Synthesis

1d ago·also CUHK, Independent Researcher

EcoScratch: Cost-Effective Multimodal Repair for Scratch Using Execution Feedback

Multimodal repair isn't always better: selectively escalating to multimodal prompting based on runtime signals in Scratch yields a superior success-cost-energy tradeoff compared to uniformly applied multimodal approaches.

Yuan Si, Ming Wang, Daming Li +2

Code Generation & Program Synthesis Multimodal Models

1d ago·also Hebei University of Science and Technology, York

Logging Like Humans for LLMs: Rethinking Logging via Execution and Runtime Feedback

Stop optimizing LLM logs for human readability – runtime-guided, task-oriented logs dramatically improve downstream debugging performance.

Xin Wang, Jiaoxiao Qian, Yang Zhang +2

Code Generation & Program Synthesis Natural Language Processing Tool Use & Agents

Dalian Maritime University1d ago·also DUT, HKUST

Conditional Polarization Guidance for Camouflaged Object Detection

Polarization cues, often overlooked, can significantly boost camouflaged object detection by explicitly guiding RGB feature learning, leading to state-of-the-art performance.

QIfan Zhang, Hao Wang, Xiangrong Qin +1

Computer Vision

1d ago·also UWA, Xidian

SkeletonContext: Skeleton-side Context Prompt Learning for Zero-Shot Skeleton-based Action Recognition

By injecting LLM-derived contextual cues into skeleton representations, SkeletonContext achieves state-of-the-art zero-shot action recognition, even distinguishing visually similar actions without explicit object interactions.

Ning Wang, Tieyue Wu, Naeha Sharif +5

Computer Vision Multimodal Models Natural Language Processing

1d ago·also WHU

Not All Frames Are Equal: Complexity-Aware Masked Motion Generation via Motion Spectral Descriptors

Masked motion generators struggle with complex movements because they treat all frames the same – until now.

Pengfei Zhou, Xiangyue Zhang, Xukun Shen +1

Architecture Design (Transformers, SSMs, MoE)Computer Vision Multimodal Models

1d ago·also Asterisk Labs, LGND AI

EarthEmbeddingExplorer: A Web Application for Cross-Modal Retrieval of Global Satellite Images

Querying satellite imagery just got easier: EarthEmbeddingExplorer lets you find images using text, visuals, or location, unlocking insights previously trapped in research papers.

Yijie Zheng, Weijie Wu, Bingyue Wu +4

Computer Vision Multimodal Models Recommendation & Information Retrieval

1d ago

Assessing Multimodal Chronic Wound Embeddings with Expert Triplet Agreement

Expert ordinal comparisons reveal that fusing vision and language in wound representation learning boosts agreement by 5.6% over unimodal foundation models for a rare genetic skin disorder.

Fabian Kabus, Julia Hindel, Jelena Bratuli'c +6

Eval Frameworks & Benchmarks Multimodal Models Scientific Discovery & Drug Design

1d ago·also V evaluation systems. Numerous

SLVMEval: Synthetic Meta Evaluation Benchmark for Text-to-Long Video Generation

Current text-to-long-video evaluation metrics can't reliably assess video quality, failing to match human judgment in 9 out of 10 tested degradation aspects.

Ryosuke Matsuda, Keito Kudo, Haruto Yoshida +2

Computer Vision Eval Frameworks & Benchmarks Multimodal Models

1d ago

CLaD: Planning with Grounded Foresight via Cross-Modal Latent Dynamics

Achieve state-of-the-art robotic manipulation with a model orders of magnitude smaller than VLAs by explicitly aligning kinematic and semantic transitions.

Andrew Jeong, Jaemin Kim, Sebin Lee +1

Multimodal Models Robotics & Embodied AI World Models & Planning

1d ago

Efficient Parallel Compilation and Profiling of Quantum Circuits at Large Scales

Quantum circuit compilation, a major bottleneck, can be sped up by over 15x with minimal overhead using a new parallelization technique validated on 8000 large-scale, configurable random circuits.

Jane Moore, Michael Hart, John McAllister

Code Generation & Program Synthesis Distributed Systems & Hardware Training Efficiency & Optimization

1d ago·also Paris-Saclay, Tata Institute of Fundamental Research

Polynomial Time Local Decision Revisited

Sometimes, knowing less (limiting computation to polynomial time) can let you decide *more* in distributed systems, especially with universal certificates.

L. Feuilloley, Soumyadeep Paul, A. Paz

Distributed Systems & Hardware

The papers worth reading, picked for you

We read everything so you don't have to. One email, zero noise.

1d ago·also College of Science and Engineering

Negative Electronic Friction and Non-Markovianity in Nonequilibrium Systems

Negative electronic friction, often attributed to simple Joule heating, actually masks significant non-Markovian dynamics that can destabilize standard models.

Riley J. Preston, Samuel L. Rudge, Daniel S. Kosov +1

Scientific Discovery & Drug Design

1d ago

Gap edge eigenpairs from density matrix purification using moments of the Dirac distribution

Extracting band-edge eigenstates becomes surprisingly simple and efficient, needing only a quasi-purified density matrix and a handful of matrix multiplications.

Lionel Alexandre Truflandier

Scientific Discovery & Drug Design

1d ago

Local thermal probe in a one-dimensional chain: An efficient dissipaton-based approach

Forget perturbation theory: this dissipaton-based approach efficiently models heat transport in locally probed systems with strong many-body effects.

Hao-Yang Qi, Zi-Fan Zhu, Yao Wang +2

Scientific Discovery & Drug Design

1d ago·also IIT Delhi, Indraprastha Institute of Information, Jaypee Institute of Information

Audio Hallucination Attacks: Probing the Reliability of Large Audio Language Models

State-of-the-art Large Audio Language Models are surprisingly vulnerable to hallucination attacks, with success rates as high as 95%, revealing a critical reliability gap masked by standard benchmarks.

Ashish Seth, Sonal Kumar, Ramaneswaran Selvakumar +5

Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness Speech & Audio

1d ago·also Baidu, Leiden, UvA

Cold-Starts in Generative Recommendation: A Reproducibility Study

Generative recommendation's touted cold-start abilities often vanish under rigorous testing, revealing a sensitivity to design choices that current benchmarks fail to capture.

Zhen Zhang, Jujia Zhao, Xinyu Ma +3

Eval Frameworks & Benchmarks Natural Language Processing Recommendation & Information Retrieval

1d ago

Video-Oasis: Rethinking Evaluation of Video Understanding

Over half of video understanding benchmark samples are solvable without watching the video, and current models barely outperform random guessing on the rest.

Geuntaek Lim, Minho Shim, Sungjune Park +5

Computer Vision Eval Frameworks & Benchmarks Multimodal Models

1d ago·also Adobe Research, HKU, UCSD, UPenn

OmniRoam: World Wandering via Long-Horizon Panoramic Video Generation

Finally, a video generation model lets you roam through a scene with long-term spatial and temporal consistency, opening up new possibilities for virtual exploration.

Yuheng Liu, Xin Lin, Xinke Li +9

Computer Vision Multimodal Models World Models & Planning

1d ago·also Deakin

Towards Explainable Stakeholder-Aware Requirements Prioritisation in Aged-Care Digital Health

Stakeholder-agnostic requirements engineering in aged-care tech can lead to misalignment and missed priorities, as developers, caregivers, and older adults often disagree on what matters most.

Yuqing Xiao, John C. Grundy, Anuradha Madugalla +1

Constitutional AI & AI Ethics Natural Language Processing

ZITiS1d ago·also TU Munich

5G Puppeteer: Chaining Hidden Command and Control Channels in 5G Core Networks

Compromised 5G networks can be weaponized with chained, undetectable command and control channels, enabling attacks that bypass existing security measures.

Julian Sturm, Daniel Fraunholz, Oliver Zeidler +2

Distributed Systems & Hardware Red-Teaming & Adversarial Robustness

Mar 30, 2026

2d ago

On-the-fly Repulsion in the Contextual Space for Rich Diversity in Diffusion Transformers

Unleashing creative potential in text-to-image models just got easier: on-the-fly repulsion in the contextual space lets you steer diffusion transformers towards richer diversity without sacrificing image quality or blowing your compute budget.

Omer Dahary, Benaya Koren, Daniel Garibi +1

Architecture Design (Transformers, SSMs, MoE)Computer Vision Multimodal Models

2d ago

Superintelligence and Law

Superintelligence will not just be regulated by law, but will actively use and shape it, forcing us to rethink legal theory's human-centric foundations.

Noam Kolt

Constitutional AI & AI Ethics Scalable Oversight & Alignment Theory Tool Use & Agents

2d ago

DreamLite: A Lightweight On-Device Unified Model for Image Generation and Editing

Generate or edit 1024x1024 images on your phone in under a second with DreamLite, a unified diffusion model that rivals server-side performance despite its tiny 0.39B parameters.

Kailai Feng, Yuxiang Wei, Bo Chen +6

Computer Vision Inference & Quantization Multimodal Models

The papers worth reading, picked for you

We read everything so you don't have to. One email, zero noise.

2d ago

Evolutionary Discovery of Reinforcement Learning Algorithms via Large Language Models

Forget hand-designed RL algorithms – LLMs can evolve competitive learners from scratch, even when forced to invent completely new update rules.

Alkis Sygkounas, Amy Loutfi, Andreas Persson

Code Generation & Program Synthesis RLHF & Preference Learning Tool Use & Agents

2d ago·also Aalto, Edinburgh, ELLIS, UvA

Mixture-Model Preference Learning for Many-Objective Bayesian Optimization

Stop assuming a single utility function: modeling preferences as a mixture of archetypes unlocks better Bayesian optimization in complex, many-objective spaces.

Manisha Dubey, Sebastiaan De Peuter, Wanrong Wang +1

Recommendation & Information Retrieval RLHF & Preference Learning

2d ago·also College of Geosciences, SINTEF

Physics-Informed Neural Networks for Predicting Hydrogen Sorption in Geological Formations: Thermodynamically Constrained Deep Learning Integrating Classical Adsorption Theory

Classical models of hydrogen storage in geological formations fall apart when applied to diverse samples, but this physics-informed neural network nails it, achieving R2 = 0.9544.

Mohammad Nooraiepour, Mohammad Masoudi, Zezhang Song +1

Scientific Discovery & Drug Design Training Efficiency & Optimization

2d ago

MR-ImagenTime: Multi-Resolution Time Series Generation through Dual Image Representations

Multi-resolution decomposition and diffusion models can boost time series forecasting accuracy by up to 10% over existing methods.

Xianyong Xu, Yuanjun Zuo, Zhihong Huang +4

Architecture Design (Transformers, SSMs, MoE)Computer Vision

2d ago

FedDES: Graph-Based Dynamic Ensemble Selection for Personalized Federated Learning

FedDES achieves instance-level personalization in federated learning by dynamically selecting and weighting peer models with a GNN, leading to significant performance gains in heterogeneous environments.

Brianna Mueller, W. Nick Street

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Training Efficiency & Optimization

2d ago

Domain-Invariant Prompt Learning for Vision-Language Models

Adversarial training unlocks domain-invariant prompts for CLIP, boosting zero-shot generalization beyond standard prompt tuning.

Arsham Gholamzadeh Khoee, Yinan Yu, Robert Feldt

Computer Vision Multimodal Models Training Efficiency & Optimization

2d ago·also Peng Cheng Laboratory

GeoHCC: Local Geometry-Aware Hierarchical Context Compression for 3D Gaussian Splatting

Compressing 3D Gaussian Splatting just got a whole lot better: GeoHCC maintains geometric integrity and rendering fidelity by explicitly modeling inter-anchor geometric correlations, outperforming existing anchor-based approaches.

Xuan Deng, Xiandong Meng, Hengyu Man +4

Computer Vision Inference & Quantization

2d ago·also CAS, China Mobile, Key Laboratory of Aerospace Information Sensing and Physics (NUAA), Yichang Testing Technique R&D Institute

Learning unified control of internal spin squeezing in atomic qudits for magnetometry

Reinforcement learning turns a quantum sensor's biggest limitation—nonlinear Zeeman dynamics—into its greatest strength, boosting magnetic sensitivity beyond the standard quantum limit.

C. Z. Cao, J. Z. Han, M. Xiong +4

Scientific Discovery & Drug Design

2d ago

COvolve: Adversarial Co-Evolution of Large-Language-Model-Generated Policies and Environments via Two-Player Zero-Sum Game

Forget hand-crafted environments: COvolve uses LLMs to automatically co-evolve challenging environments and robust policies, paving the way for open-ended learning.

Alkis Sygkounas, Rishi Hazra, Andreas Persson +2

Code Generation & Program Synthesis Red-Teaming & Adversarial Robustness Tool Use & Agents

2d ago

Building evidence-based knowledge graphs from full-text literature for disease-specific biomedical reasoning

LLMs can now construct high-fidelity, disease-specific knowledge graphs from full-text biomedical literature, unlocking evidence-aware reasoning and hypothesis generation.

Chang Zong, Sicheng Lv, Si-tu Xue +3

Data Curation & Synthetic Data Natural Language Processing Scientific Discovery & Drug Design

2d ago

Integrating Multimodal Large Language Model Knowledge into Amodal Completion

MLLMs can now guide visual generative models to imagine what's hidden behind objects, significantly boosting amodal completion performance.

Heecheol Yun, Eunho Yang

Computer Vision Multimodal Models Robotics & Embodied AI

Raspberry Pi Foundation2d ago·also Cambridge

Mapping data literacy trajectories in K-12 education

Data literacy isn't monolithic: K-12 learners navigate wildly different learning pathways depending on the context, challenging assumptions about a one-size-fits-all approach.

Robert Whyte, M. Cheung, Manni Cheung +3

Data Curation & Synthetic Data Natural Language Processing

The papers worth reading, picked for you

We read everything so you don't have to. One email, zero noise.

2d ago·also Myongji University

When Choices Become Priors: Contrastive Decoding for Scientific Figure Multiple-Choice QA

Scientific figure QA models are often fooled by the answer choices themselves, but a simple decoding strategy that contrasts image-grounded scores with text-only scores can significantly improve accuracy.

Taeyun Roh, Eun-yeong Jo, Wonjune Jang +1

Eval Frameworks & Benchmarks Multimodal Models Scientific Discovery & Drug Design

Institut für Theoretische Physik2d ago·also National High Magnetic Field Laboratory, VTT

Compressing Transformer Language Models via Matrix Product Operator Decomposition: A Case Study on PicoGPT

Forget pruning or quantization: MPO decomposition lets you compress a transformer by 13x while retaining 97% accuracy.

Younes Javanmard, Tanmoy Pandit, Masoud Mardani

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Open-Source Models & Weights