Search papers, labs, and topics across Lattice.
Sofia University "St. Kliment Ohridski"
4
0
7
Get up to 20% faster ViT inference by hot-swapping certain attention heads for depthwise convolutions – without tanking accuracy.
Bridging the gap between third-person and first-person video generation is as simple as interpolating the videos, revealing that spatio-temporal discontinuities are the real bottleneck.
Autonomous vehicles can now better identify the unexpected, thanks to a new method that boosts out-of-distribution detection by up to 20% without retraining.
MLLMs can "hear" a little, but EgoSound reveals they're still largely deaf to the nuances of sound in egocentric video, especially when it comes to spatial and causal reasoning.