Search papers, labs, and topics across Lattice.
1
0
3
0
LLMs can learn to reason *worse* from seemingly better training data: models trained on CoT data with lower loss can generalize poorly due to inheriting inefficient, divergent reasoning patterns.