Search papers, labs, and topics across Lattice.
Chung-Ang University, KAIST
2
0
3
Leading LLMs falter in Korean web-browsing tasks, achieving less than half the accuracy found in previous benchmarks.
LLMs that ace standard multiple choice tests can crumble when the option count explodes, revealing hidden weaknesses in semantic understanding and a surprising bias towards the first answer choices.