ETH Zurich

AI-driven summaries of public consultations can systematically exclude dissenting voices, raising concerns about biased policy recommendations even when individual outputs seem reasonable.

Sachit Mahajan

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

3d ago·also ETH, AI Center Tübingen, ELLIS, Tübingen

Efficient Test-Time Inference via Deterministic Exploration of Truncated Decoding Trees

Deterministic decoding can outperform stochastic self-consistency in constrained domains by systematically exploring high-probability reasoning traces, leading to better performance with less computation.

Johannes Zenn, Guinan Su, Mrinmaya Sachan +1

Code Generation & Program Synthesis Inference & Quantization Reasoning & Chain-of-Thought

Apr 21, 2026

ETH4d ago·also Tsinghua AI, NTU, UMich

Revisiting RaBitQ and TurboQuant: A Symmetric Comparison of Methods, Theory, and Experiments

TurboQuant's claimed advantages over RaBitQ in quantization don't hold up under rigorous, reproducible comparison, raising questions about its practical utility.

Jianyang Gao, Yutong Gou, Yuexuan Xu +5

Inference & Quantization Open-Source Models & Weights Training Efficiency & Optimization

Apr 20, 2026

ETH5d ago

Proxics: an efficient programming model for far memory accelerators

Forget heavyweight processes and bandwidth bottlenecks: Proxics offers a lightweight programming model that unlocks the potential of near-data processing with efficient virtual processors and optimized communication channels.

Pengcheng Xu

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Distributed Systems & Hardware

5d ago·also ETH, Dartmouth, UAlberta

Bounded Ratio Reinforcement Learning

Bridging the gap between trust region methods and PPO, this new framework guarantees performance improvements while outperforming existing algorithms in stability and effectiveness.

Le Chen, Bruce D. Lee, Assefa S. Wahd +4

Scalable Oversight & Alignment Theory

5d ago·also ETH

Probing for Reading Times

Early layers of language models capture human-like processing signatures in reading, rivaling traditional measures like surprisal in predicting initial eye movements.

Tianyang Xu, Mario Giulianelli, Karolina Stanczak

Interpretability & Mechanistic Interp Natural Language Processing

Helmut Harbrecht5d ago·also ETH

Neural Shape Operator Surrogates -- Expression Rate Bounds

Neural operators can achieve uniform convergence rates for approximating solution maps across diverse geometric domains, challenging traditional assumptions about shape-dependent PDE solutions.

Helmut Harbrecht

Scientific Discovery & Drug Design

ETH5d ago

GSQ: Highly-Accurate Low-Precision Scalar Quantization for LLMs via Gumbel-Softmax Sampling

GSQ closes the accuracy gap in low-precision quantization, achieving results comparable to complex vector methods while remaining easy to implement.

Alireza Dadgarnia, Soroush Tabesh, Mahdi Nikdan +2

Inference & Quantization Scaling Laws & Emergent Abilities

ETH5d ago

Trustworthy Endoscopic Super-Resolution

Trustworthy super-resolution in surgery is now achievable, with a model-agnostic method that identifies and mitigates unreliable reconstructions in real-time.

Julio Silva-Rodríguez, Ender Konukoglu

Computer Vision Constitutional AI & AI Ethics

Apr 16, 2026

ETH1w ago·also Stanford HAI, Heidelberg, Institute of Computer Science, UZH

RadAgent: A tool-using AI agent for stepwise interpretation of chest computed tomography

RadAgent doesn't just give you the answer; it shows its work, offering clinicians a transparent, step-by-step reasoning trace for AI-generated CT reports.

Jean-Benoit Delbrouck, Christian Bluethgen, Bjoern Menze +1

Computer Vision Multimodal Models Tool Use & Agents

ETH1w ago·also SNU

SCENIC: Stream Computation-Enhanced SmartNIC

SCENIC delivers the best of both worlds: the high bandwidth and software integration of commercial SmartNICs, plus the customization and data processing offload capabilities of research prototypes.

Maximilian Jakob Heer, Heejae Kim, Jin-Soo Kim

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware

Apr 14, 2026

ETH1w ago·also CEA, Chalmers, Foundation for Research and Technology –, Fraunhofer +2

EPAC: The Last Dance

Europe's collaborative EPAC chip delivers a heterogeneous RISC-V accelerator, showcasing a path towards domain-specific HPC hardware built with open standards.

Oscar Palomar, Jesus Labarta, Pedro Marcuello +14

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware

1w ago·also ETH

Cross-Cultural Simulation of Citizen Emotional Responses to Bureaucratic Red Tape Using LLM Agents

LLMs struggle to simulate culturally nuanced emotional responses to bureaucratic processes, especially in Eastern cultures, suggesting current models lack the socio-cultural understanding needed for accurate policy simulation.

Yixian Liu

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Tool Use & Agents

1w ago·also ETH, Stanford HAI, Birmingham, Case Western +13

Perspective on a challenge: predicting the photochemistry of cyclobutanone

Blind predictions of cyclobutanone photochemistry reveal that nonadiabatic molecular dynamics can qualitatively capture experimental results, but the accuracy of underlying electronic structure calculations remains a key bottleneck.

Jiří Janoš, Nanna Holmgaard List, Andrew J. Orr-Ewing +35

Scientific Discovery & Drug Design

ETH1w ago

Beyond Pre-Training: The Full Lifecycle of Foundation Models on HPC Systems

Supercomputers can evolve beyond just pre-training to become comprehensive "AI Factories" by adopting hybrid cloud-native architectures that support the entire lifecycle of foundation models.

Stefano Schuppli, Joost VandeVondele, Maxime Martinasso

Distributed Systems & Hardware Scientific Discovery & Drug Design Training Efficiency & Optimization

Apr 13, 2026

DAMO1w ago·also ETH, Tsinghua AI, NJU, NTU +1

A Faster Path to Continual Learning

Continual learning just got a turbo boost: C-Flat Turbo cuts training time by up to 25% without sacrificing accuracy, thanks to a clever gradient-skipping trick.

Wei Li, Borui Kang, Ziwei Liu

Architecture Design (Transformers, SSMs, MoE)Training Efficiency & Optimization

Apr 10, 2026

ETH2w ago·also Heidelberg

Process Reward Agents for Steering Knowledge-Intensive Reasoning

Unlock up to 25.7% accuracy gains on frozen LLMs in knowledge-intensive domains, without any retraining, by dynamically rewarding reasoning steps.

Reasoning & Chain-of-Thought RLHF & Preference Learning Tool Use & Agents

Apr 9, 2026

Disney Entertainment and ESPN Product2w ago·also ETH, DisneyResearch|Studios

DiV-INR: Extreme Low-Bitrate Diffusion Video Compression with INR Conditioning

Achieve perceptually superior video compression at extremely low bitrates by using implicit neural representations to condition diffusion models, outperforming even VVC and prior neural codecs.

Yuanyi Xue, Christopher Schroers, Roberto Azevedo

Computer Vision Inference & Quantization

Apr 8, 2026

ETH2w ago

Towards foundation-style models for energy-frontier heterogeneous neutrino detectors via self-supervised pre-training

Self-supervised learning on heterogeneous neutrino detector data enables foundation-style models that achieve state-of-the-art performance with an order of magnitude less labeled data.

Saúl Alonso-Monsalve, Fabio Cufino, Umut Kose +1

Architecture Design (Transformers, SSMs, MoE)Scientific Discovery & Drug Design Training Efficiency & Optimization

2w ago·also ETH, Meta AI

GenLCA: 3D Diffusion for Full-Body Avatars from In-the-Wild Videos

Training 3D avatar diffusion models on millions of in-the-wild videos is now possible, thanks to a clever 3D tokenization and visibility-aware training strategy that overcomes partial observability.

Yiqian Wu, Rawal Khirodkar, Egor Zakharov +7

Architecture Design (Transformers, SSMs, MoE)Computer Vision Multimodal Models

Apr 7, 2026

ETH2w ago·also Microsoft Research, Stanford HAI, Max Planck, USI Lugano

FunRec: Reconstructing Functional 3D Scenes from Egocentric Interaction Videos

Unlock interactive digital twins from messy, real-world videos: FunRec automatically turns egocentric RGB-D recordings into simulation-ready 3D scenes.

Rishabh Dabral, Leonidas Guibas, Christian Theobalt +2

Computer Vision Robotics & Embodied AI

CENTAI2w ago·also ETH, ISI

Conditional Publics: Shared Events and Divergent Meanings in the European Twitter Debate on the Ukraine War

Polarization isn't always about echo chambers: Europeans can agree on *what* happened in the Ukraine war, but vehemently disagree on *why* it matters.

Corrado Monti, Yelena Mejova, Gianmarco De Francisci Morales

Natural Language Processing

Apr 2, 2026

ETH3w ago·also POSTECH

Large-scale Codec Avatars: The Unreasonable Effectiveness of Large-scale Avatar Pretraining

Scaling up avatar pre-training to 1M in-the-wild videos unlocks emergent generalization capabilities like relightability and garment support, even without direct supervision.

Chengan He, Zhongshi Jiang, Giljoo Nam +29

Computer Vision Data Curation & Synthetic Data Multimodal Models

Mar 31, 2026

3w ago·also ETH, UIUC

ELT-Bench-Verified: Benchmark Quality Issues Underestimate AI Agent Capabilities

AI agents are far better at automating data engineering tasks than previously thought, but flawed benchmarks are obscuring their true potential.

Andrea Giovannini, Tengjun Jin, Yotam Perlitz

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Tool Use & Agents

Mar 26, 2026

ETHMar 26, 2026·also DAMO

Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills

Forget brittle, overfit skills – Trace2Skill distills diverse execution experiences into transferable agent skills that boost performance by up to 57.65% on unseen tasks, even when transferring skills learned by smaller models to larger ones.

Robotics & Embodied AI Tool Use & Agents Training Efficiency & Optimization

Mar 16, 2026

ETHMar 16, 2026·also UZH

Rethinking Machine Unlearning: Models Designed to Forget via Key Deletion

Forget about retraining: MUNKEY offers zero-shot machine unlearning by simply deleting instance-identifying keys, outperforming traditional post-hoc methods.

Constitutional AI & AI Ethics Data Curation & Synthetic Data

Mar 12, 2026

ETHMar 12, 2026

Towards Universal Computational Aberration Correction in Photographic Cameras: A Comprehensive Benchmark Analysis

Current computational aberration correction methods struggle to generalize across different camera lenses, but this new benchmark and analysis pinpoint the key factors holding them back.

Computer Vision Eval Frameworks & Benchmarks

Mar 11, 2026

Mar 11, 2026·also ETH

Need for Speed: Zero-Shot Depth Completion with Single-Step Diffusion

Ditch the slow diffusion grind: Marigold-SSD delivers zero-shot depth completion in a single step, rivaling discriminative models in speed while retaining diffusion's accuracy.

Computer Vision Inference & Quantization Training Efficiency & Optimization

ETHMar 11, 2026·also CMU ML

ADMM-based Continuous Trajectory Optimization in Graphs of Convex Sets

Unlock superior trajectories in complex environments with a new ADMM-based solver that jointly optimizes spatial and temporal domains, eliminating the need for complex warm starting.

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

Mar 10, 2026

ETHMar 10, 2026

ActiveUltraFeedback: Efficient Preference Data Generation using Active Learning

Get 6x the RLHF alignment for your LLM with a new active learning pipeline that focuses on annotating the most informative response pairs.

Data Curation & Synthetic Data RLHF & Preference Learning Training Efficiency & Optimization

Mar 5, 2026

ETHMar 5, 2026·also Bologna

Network Design for Wafer-Scale Systems with Wafer-on-Wafer Hybrid Bonding

Clever reticle placement on wafer-scale systems can boost throughput by 2.5x and slash latency by over a third, offering a hardware-level speedup for LLM training.

Patrick Iff, Tommaso Bonato, Maciej Besta +2

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware

Mar 4, 2026

ETHMar 4, 2026

CodeTaste: Can LLMs Generate Human-Level Code Refactorings?

LLMs can follow detailed code refactoring instructions, but still fall short of mimicking human refactoring choices in real-world codebases, highlighting a critical gap in their ability to autonomously improve code quality.

Alex Thillen, Niels Mundler, Veselin Raychev +1

Code Generation & Program Synthesis Eval Frameworks & Benchmarks

ETHMar 4, 2026

Gaussian Wardrobe: Compositional 3D Gaussian Avatars for Free-Form Virtual Try-On

Finally, a virtual try-on system that actually works: Gaussian Wardrobe lets you swap clothes between 3D avatars with high-fidelity garment dynamics by learning shape-agnostic garment layers.

Jie Song

Computer Vision Multimodal Models

Mar 3, 2026

ETHMar 3, 2026

Spatial Autoregressive Modeling of DINOv3 Embeddings for Unsupervised Anomaly Detection

Ditch the memory banks and prototype comparisons: this method learns a compact, parametric model of normal image embeddings with an autoregressive CNN, slashing inference time and memory in unsupervised anomaly detection.

E. Erdil, Guney I. Tombak, E. Konukoglu

Architecture Design (Transformers, SSMs, MoE)Computer Vision

Mar 2, 2026

Mar 2, 2026·also ETH, NVIDIA

Adaptive Confidence Regularization for Multimodal Failure Detection

Multimodal models often exhibit lower confidence than their unimodal counterparts when they're about to fail, and this work leverages that insight to build a better failure detector.

Moru Liu, Mario Trapp

Eval Frameworks & Benchmarks Multimodal Models Red-Teaming & Adversarial Robustness

Mar 1, 2026

ETHMar 1, 2026

Reasoning Boosts Opinion Alignment in LLMs

Reasoning can boost LLM opinion alignment, but it's not a silver bullet for removing bias in political digital twins.

Frédéric Berdoz, Yann Billeter, Y. Vonlanthen +1

Constitutional AI & AI Ethics Natural Language Processing Reasoning & Chain-of-Thought

Feb 26, 2026

ETHFeb 26, 2026·also MIT CSAIL, Stanford HAI

Simple Models, Real Swimming: Digital Twins for Tendon-Driven Underwater Robots

Forget computationally expensive fluid dynamics: this work shows that a simple, stateless model, carefully calibrated to real-world data, can create surprisingly effective digital twins for soft underwater robots.

M. Michelis, Nana Obayashi, Josie Hughes +1

Robotics & Embodied AI World Models & Planning

Feb 25, 2026

ETHFeb 25, 2026·also Sofia University "St. Kliment Ohridski"

Recovered in Translation: Efficient Pipeline for Automated Translation of Benchmarks and Datasets

LLM benchmark translations can be dramatically improved by test-time compute scaling, revealing a surprisingly cheap way to get more reliable multilingual evaluations.

Hanna Yukhymenko, Hanna Yukhymenko, Anton Alexandrov +3

Data Curation & Synthetic Data Eval Frameworks & Benchmarks Natural Language Processing

Feb 23, 2026

BAIRFeb 23, 2026·also ETH, Microsoft Research, Center for Computational Biology, Dept. of EECS +2

JUCAL: Jointly Calibrating Aleatoric and Epistemic Uncertainty in Classification Tasks

Forget temperature scaling: JUCAL calibrates aleatoric and epistemic uncertainty in classifier ensembles, achieving SOTA results with significantly smaller ensembles and lower inference costs.

Jakob Heiss, Sören Lambrecht, Jakob Weissteiner +4

Eval Frameworks & Benchmarks Training Efficiency & Optimization

ETHFeb 23, 2026

Git Takes Two: Split-View Awareness for Collaborative Learning of Distributed Workflows in Git

Forget solo Git tutorials—GitAcademy's split-screen view, mirroring a partner's actions in real-time, makes learning collaborative workflows feel less like a lonely commit and more like a team sport.

Joel Bucher, Joel Bucher, L. Goswami +5

Code Generation & Program Synthesis

Feb 19, 2026

ETHFeb 19, 2026

Anti-causal domain generalization: Leveraging unlabeled data

Unlock domain generalization with unlabeled data by exploiting the structure of anti-causal relationships, where outcomes cause covariates.

Sorawit Saengkyongam, Juan L. Gamella, Andrew C. Miller +3

Data Curation & Synthetic Data Training Efficiency & Optimization

Feb 18, 2026

Feb 18, 2026·also ETH

E-Graphs as a Persistent Compiler Abstraction

E-graphs, typically confined to isolated optimization steps, can now persist as a first-class citizen within the compiler's intermediate representation, unlocking broader and more flexible program optimization.

Jules Merckx, Jules Merckx, Alexandre Lopoukhine +7

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Training Efficiency & Optimization

Feb 17, 2026

NVIDIAFeb 17, 2026·also ETH, Google Research

RaCo: Ranking and Covariance for Practical Learned Keypoints

Forget complex architectures: RaCo achieves SOTA keypoint matching and repeatability by cleverly combining ranking and covariance estimation in a lightweight network, trained without covisible image pairs.

Abhiram Shenoi, Philipp Lindenberger, Paul-Edouard Sarlin

Computer Vision Recommendation & Information Retrieval Robotics & Embodied AI

Feb 12, 2026

ETHFeb 12, 2026

Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?

Context files like AGENTS.md, intended to guide coding agents, often *hurt* performance and increase costs, challenging the common practice of using them.

Thibaud Gloaguen, Thibaud Gloaguen, Niels Mündler +7

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Tool Use & Agents

BAIRFeb 12, 2026·also ETH, Microsoft Research, BUPT, CAS +4

ScalSelect: Scalable Training-Free Multimodal Data Selection for Efficient Visual Instruction Tuning

Achieve >97.5% of full-data VIT performance with only 16% of the data using ScalSelect, a surprisingly effective and scalable training-free data selection method.

Changti Wu, Jiahuai Mao, Yuzhuo Miao +6

Data Curation & Synthetic Data Multimodal Models Training Efficiency & Optimization

Jan 26, 2026

ETHJan 26, 2026·also KU

The impacts of pricing and reimbursement policies on access to cell and gene therapies across Europe

To address the ethical imperative of improving access equality for CGT in Europe, further policy reforms are proposed including concurrent HTA, early benefit assessment, and incorporation of additional elements of value in HTA evaluations, alongside current initiatives to increase cross-border collaboration.

Yi Han, M. Andreoletti, Timo Minssen +2

Oct 16, 2025

National and Kapodistrian University of AthensOct 16, 2025·also ETH, ITACA Institute

ECG-XPLAIM: eXPlainable Locally-adaptive Artificial Intelligence Model for arrhythmia detection from large-scale electrocardiogram data

An interpretable deep learning model, ECG-XPLAIM, rivals ResNet in arrhythmia detection sensitivity while offering crucial insights into its decision-making process via Grad-CAM.

Panteleimon Pantelidis, Samuel Ruipérez-Campillo, Julia E. Vogt +9

Architecture Design (Transformers, SSMs, MoE)Interpretability & Mechanistic Interp Scientific Discovery & Drug Design

Sep 28, 2025

M-A-PSep 28, 2025·also ETH

Compose and Fuse: Revisiting the Foundational Bottlenecks in Multimodal Reasoning

Multimodal LLMs often perform worse with more modalities because they struggle to jointly recognize and reason across modalities, a problem solvable with simple prompting strategies.

Yucheng Wang, Yifan Hou, Aydin Javadov +2

Eval Frameworks & Benchmarks Multimodal Models Reasoning & Chain-of-Thought

Aug 28, 2025

ETHAug 28, 2025·also Harvard

Digital Scale: Open-Source On-Device BMI Estimation from Smartphone Camera Images Trained on a Large-Scale Real-World Dataset

A new deep learning model slashes the error rate for BMI estimation from smartphone photos, opening the door to more accessible and convenient health assessments.

Frederik Rajiv Manichand, Robin Deuber, Robert Jakob +4

Computer Vision Data Curation & Synthetic Data Multimodal Models

Aug 1, 2025

Inspire AGAug 1, 2025·also ETH

From text to design: a framework to leverage LLM agents for automated CAD generation

Automating CAD design from text prompts is now feasible, with visual feedback loops boosting performance, especially for multimodal LLMs.

Aurel Schüpbach, Raul San Miguel, Julian Ferchow +1

Code Generation & Program Synthesis Multimodal Models Tool Use & Agents

Jun 21, 2025

ETHJun 21, 2025·also Autonomous Systems Lab

Efficient Hierarchical Any-Angle Path Planning on Multi-Resolution 3D Grids

Achieve faster, near-optimal path planning in complex 3D environments by combining any-angle search with multi-resolution grids, outperforming even sampling-based methods.

Victor Reijgwart, Victor Reijgwart, César Cadena +3

Robotics & Embodied AI World Models & Planning

Feb 26, 2025

ETHFeb 26, 2025·also MBZUAI, TU Darmstadt

MathTutorBench: A Benchmark for Measuring Open-ended Pedagogical Capabilities of LLM Tutors

LLMs that excel at math don't necessarily make good math tutors, revealing a surprising trade-off between subject matter expertise and pedagogical skill.

Jakub Macina, Nico Daheim, Ido Hakimi +324

Eval Frameworks & Benchmarks Natural Language Processing Reasoning & Chain-of-Thought

Search

ETH Zurich

Output velocitypapers/week, last 8 weeks

Top Researchers

Topic Focusall time

Frequent Collaborators

Recent Papers