Search papers, labs, and topics across Lattice.
Zhejiang Lab
2
0
4
Scene graphs are all you need: AeroRAG shows that structured knowledge retrieval from visual data significantly boosts LLM performance on fine-grained visual reasoning tasks.
Current facial expression editing models can't simultaneously preserve identity and accurately manipulate expressions, revealing a critical need for better fine-grained instruction following.