Search papers, labs, and topics across Lattice.
Department of Data Science & AI, Monash University
2
0
6
LLMs struggle to balance task completion with cultural norms in dynamic social simulations, revealing critical gaps in their cross-cultural robustness and highlighting the need for human oversight in automated benchmarking.
Forget random noise – teaching models *how* to explore their reasoning process yields more reliable inference-time scaling.