Microsoft ResearchFeb 24, 2026arXiv:2602.20751

SibylSense: Adaptive Rubric Learning via Memory Tuning and Adversarial Probing

Yifei Xu, Guilherme Potje, Guilherme Potje, Shivam Shandilya, S. Shandilya, Tiancheng Yuan, Tiancheng Yuan, Leonardo Nunes, Leonardo de Oliveira Nunes, Rakshanda Agarwal, Rakshanda Agarwal, Saeid Asgari, Saeid Asgari, Adam Atkinson, Adam Atkinson, Emre Kiciman, Emre Kıcıman, Songwu Lu, Songwu Lu, Ranveer Chandra, Ranveer Chandra, Tusher Chakraborty, Tusher Chakraborty

AI Summary

The paper introduces SibylSense, a novel inference-time learning approach that adapts a frozen rubric generator using a tunable memory bank of validated rubric items to improve reward design for open-ended generation tasks. SibylSense updates the memory bank based on verifier-based item rewards derived from reference-candidate answer discriminative gaps and then alternates memory tuning with a rubric-adversarial policy update. Experiments on open-ended tasks demonstrate that SibylSense generates more discriminative rubrics and enhances downstream RL performance compared to static and non-adaptive baselines.

Key Contribution

Forget static rubrics: SibylSense adaptively learns rubrics at inference time, leading to more discriminative rewards and better RL performance in open-ended generation tasks.

Abstract

Designing aligned and robust rewards for open-ended generation remains a key barrier to RL post-training. Rubrics provide structured, interpretable supervision, but scaling rubric construction is difficult: expert rubrics are costly, prompted rubrics are often superficial or inconsistent, and fixed-pool discriminative rubrics can saturate and drift, enabling reward hacking. We present SibylSense, an inference-time learning approach that adapts a frozen rubric generator through a tunable memory bank of validated rubric items. Memory is updated via verifier-based item rewards measured by reference-candidate answer discriminative gaps from a handful of examples. SibylSense alternates memory tuning with a rubric-adversarial policy update that produces rubric-satisfying candidate answers, shrinking discriminative gaps and driving the rubric generator to capture new quality dimensions. Experiments on two open-ended tasks show that SibylSense yields more discriminative rubrics and improves downstream RL performance over static and non-adaptive baselines.

Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness RLHF & Preference Learning

Citation Metrics

Citations0

Influential citations0

References21

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

SibylSense: Adaptive Rubric Learning via Memory Tuning and Adversarial Probing

Related Papers