Search papers, labs, and topics across Lattice.
State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China
2
0
4
LLM-generated test suites are shockingly bad at catching even simple code mutations, with even the best models failing to detect over 60% of them.
LLMs implicitly know if their reasoning steps are correct *during* generation, according to a new step-level interpretability method.