Search papers, labs, and topics across Lattice.
3
0
7
0
LLMs exhibit significant geographical performance disparities and task-specific gaps when evaluated on the new GaoYao benchmark, highlighting the need for more nuanced multilingual and multicultural training.
Noisy multimodal preference datasets are holding back reward model performance, but DT2IT-MRM offers a scalable curation strategy that achieves state-of-the-art results.
Forget human-annotated datasets: MathAgent synthesizes mathematical reasoning data so effectively that models trained on just 1K generated examples outperform those trained on existing datasets.