Search papers, labs, and topics across Lattice.
National Taiwan University
2
0
6
CLIP models suffer from a surprisingly strong "center bias," causing them to miss important objects outside the image's central region, even when those objects are crucial for accurate vision-language understanding.
Forget confidence scores: a modality-aware early exit strategy for spoken language models slashes decoding costs without sacrificing accuracy or perceptual quality, revealing that speech tokens require specialized handling compared to text.