Search papers, labs, and topics across Lattice.
2
0
5
2
Forget supervised fine-tuning: RL alone can unlock high-quality chain-of-thought reasoning in audio-language models, even starting from a model with no prior CoT capability.
Agent-as-a-Judge can outperform LLM-as-a-Judge in complex environments, but still struggles to reliably verify agent behavior, revealing a critical gap in current LLM-based agent evaluation.