Search papers, labs, and topics across Lattice.
The University of Texas at Austin
5
0
6
AI-generated peer reviews aren't just viable at scale, they're preferred by researchers over human reviews for technical accuracy and actionable feedback.
LLMs are twice as likely as humans to repeat the same support tactic in a conversation, but a simple RL reward for tactic novelty can fix it.
Turns out, LLMs aren't actually empathic, they're just really good at regurgitating a well-liked empathy template.
VLMs may ace the color coverage test, but they flunk the "do as I say, not as I do" test, routinely ignoring their own stated reasoning rules in ways that humans don't.
LLMs struggle to generate diverse and specific connections between concepts, even with high token budgets and "thinking" prompts, revealing a gap in creative associative reasoning.