Search papers, labs, and topics across Lattice.
R-l2italic_R - italic_l 2 metrics. Figure 6: Performance of AQA models on FLEX dataset. R−l2(×100)R-l2(\times 100)italic_R - italic_l 2 ( × 100 ) AQA Model ρ↑↑𝜌absent\rho\uparrowitalic_ρ ↑ R−l2↓↓𝑅𝑙2absentR-l2\downarrowitalic_R - italic_l 2 ↓ Single viewyu2021group 0.8069
1
20
2
5
LLMs can learn better from human feedback by exploring more creatively, thanks to a simple coin-flip counting method that encourages them to try new things.