Search papers, labs, and topics across Lattice.
Xi'an Jiaotong Uni- versity
2
0
4
Scene graphs are all you need: AeroRAG shows that structured knowledge retrieval from visual data significantly boosts LLM performance on fine-grained visual reasoning tasks.
Current facial expression editing models can't simultaneously preserve identity and accurately manipulate expressions, revealing a critical need for better fine-grained instruction following.