Search papers, labs, and topics across Lattice.
University College London
3
0
5
Random splits in urban representation learning benchmarks inflate performance and alter model rankings, making cross-city generalization claims unreliable.
Panoramic vision-language models can achieve a level of holistic scene understanding and robustness in adverse conditions that's impossible for traditional pinhole-based VLMs.
You can now get SOTA street-view image classification from CLIP with a tiny 1.4M parameter adapter that focuses on local image patches.