Search papers, labs, and topics across Lattice.
1
0
2
Token-level policy gradients fall short in complex reasoning tasks, but treating sequences of tokens as unified actions can significantly boost performance in mathematical and coding benchmarks.