Search papers, labs, and topics across Lattice.
2
0
5
Over a quarter of tasks in popular AI benchmarks contain critical flaws that distort model evaluations, and this automated auditing framework can catch them.
Language models can bootstrap their reasoning abilities without human labels by learning from each other's aggregated answers, achieving significant gains in mathematical reasoning.