Search papers, labs, and topics across Lattice.
Shanghai Innovation Institute, B [26] visual backbone. The action head is a conditional Flow Matching network implemented via an 8-layer Diffusion Transformer (DiT [16]) with a 1024 hidden dimension, trained to predict trajectories of horizon T=, Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications, Ministry of Education, China
1
0
3
26
By mimicking how humans use visual anchors, ChartVSR lets models iteratively correct their own visual perception errors, leading to more accurate chart parsing.