Search papers, labs, and topics across Lattice.
1
0
3
2
Zero-shot scene understanding gets a boost: aligning pre-trained vision and language models yields up to 18% accuracy gains in real-world object recognition and captioning tasks.