Search papers, labs, and topics across Lattice.
3
0
5
7
Emotional support chatbots can now learn to better understand and respond to your needs by actively probing for information, leading to more helpful and empathetic conversations.
LLM-based judges, widely used for automated evaluation, are riddled with diverse biases that can be significantly reduced through bias-aware training using RL and contrastive learning.
Token-level policy gradients fall short in complex reasoning tasks, but treating sequences of tokens as unified actions can significantly boost performance in mathematical and coding benchmarks.