Search papers, labs, and topics across Lattice.
1
0
2
18
LLMs may ace coding tasks, but PACIFIC reveals their surprising struggles with sequential instruction following and dry-running code, even when benchmarks are automatically generated to avoid training data contamination.