Search papers, labs, and topics across Lattice.
IRT, and other methods including IRT to refine our semi-synthetic benchmarks. First, we qualitatively observe that M
1
0
3
3
Current multimodal benchmarks are full of single-modality shortcuts, but this paper offers a way to prune them, yielding more reliable and efficient evaluations of true cross-modal reasoning.