Search papers, labs, and topics across Lattice.
Beijing Key Laboratory of Intelligent Telecommunications Software and Multimedia, Beijing University of Posts and Telecommunications
6
0
8
Achieving a score of $2.68 \times 10^{-3}$ in a depth estimation challenge reveals the untapped potential of zero-shot learning in complex visual tasks.
A unified evaluation framework for portrait composition could revolutionize how AI interprets and generates artistic images.
Autonomous code generation combined with rigorous semantic review can drastically enhance scenario mining accuracy in complex driving environments.
By rethinking text-to-3D generation as a planning problem, this approach significantly reduces error propagation and enhances scene realism.
Achieving nearly 93% accuracy in video relational reasoning, this approach reveals how structured evidence can dramatically enhance model performance in complex visual contexts.
VLMs can achieve state-of-the-art video recognition by splitting temporal modeling experts into specialized roles for spatial understanding and motion processing.