Search papers, labs, and topics across Lattice.
Hong Kong University of Science and Technology
2
0
3
Expert-level video aesthetics can be captured and improved by decomposing it into interpretable criteria and training reward models with chain-of-thought reasoning.
Multimodal agents can now reason, plan, and execute actions more effectively by integrating perception as a core component, not just an auxiliary interface.