Search papers, labs, and topics across Lattice.
3
0
6
Decomposing rasterized graphic designs into editable layers is now significantly more faithful and controllable thanks to a hybrid generative approach that combines vision-language models and multi-branch diffusion.
Achieve more human-like empathy in multimodal response generation by explicitly modeling the hierarchical progression of emotion perception and iteratively refining responses to eliminate emotional biases.
By adaptively calibrating facts and augmenting emotions, FACE-net overcomes the factual-emotional bias that plagues emotional video captioning.