Search papers, labs, and topics across Lattice.
Shanghai Jiao Tong University
1
0
3
Static benchmarks fail to predict LLM performance in dynamic clinical settings, with top models only achieving 60.4% of expert criteria in real-world simulations.