Timothy Baldwin

Bayesian control outperforms traditional orchestration methods, especially when verification costs are high, by providing a more nuanced understanding of candidate correctness.

Theodore Papamarkou, Vladislav Smirnov, Viktor Mazanov +4

Scalable Oversight & Alignment Theory Tool Use & Agents

Apr 8, 2026

Apr 8, 2026·also Lomonosov Moscow State University, Melbourne, Mohamed, National University of Science and Technology +2

ReDAct: Uncertainty-Aware Deferral for LLM Agents

Deferring to a larger LLM only when a smaller LLM is uncertain can match the performance of the larger model alone, while slashing inference costs.

Dzianis Piatrashyn, Nikita Kotelevskii, Kirill Grishchenkov +5

Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought Tool Use & Agents

Mar 2, 2026

Harry Stuart +5Mar 2, 2026·also Institute of Science Tokyo, MBZUAI, Melbourne

Beyond the Resum\'e: A Rubric-Aware Automatic Interview System for Information Elicitation

LLMs can act as subject matter experts to conduct cost-effective, nuanced interviews, potentially revolutionizing early-stage hiring decisions.

Harry Stuart, Harry Stuart, Masahiro Kaneko +3

Eval Frameworks & Benchmarks Natural Language Processing Tool Use & Agents

Mar 1, 2026

Mar 1, 2026·also Institute of Science Tokyo, Melbourne

JailNewsBench: Multi-Lingual and Regional Benchmark for Fake News Generation under Jailbreak Attacks

LLMs are shockingly susceptible to generating fake news under jailbreak attacks, especially when it comes to English and U.S.-related topics, exposing a dangerous safety imbalance.

Masahiro Kaneko, Ayana Niwa, Timothy Baldwin

Eval Frameworks & Benchmarks Natural Language Processing Red-Teaming & Adversarial Robustness

Feb 24, 2026

IIIT-DelhiFeb 24, 2026·also MBZUAI, Melbourne

Don't Ignore the Tail: Decoupling top-K Probabilities for Efficient Language Model Distillation

Distilling language models just got more efficient: a new loss function focuses on the long tail of token probabilities, boosting performance without extra compute.

Sayantan Dasgupta, Sayantani Dasgupta, Trevor Cohn +2

Inference & Quantization Natural Language Processing Training Efficiency & Optimization

Search

Timothy Baldwin

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (6)