Search papers, labs, and topics across Lattice.
3
0
5
0
Text-to-image flow models get a preference alignment boost by generating multiple related captions per image, creating a richer reward landscape without expensive re-sampling.
Forget textual rules and coarse embeddings: fine-grained visual reward modeling is all you need to nail vision-to-code tasks, even surpassing models 30x larger.
By decoupling visual and motor information during pretraining, FutureVLA unlocks more effective visuomotor prediction for vision-language-action models, boosting performance without modifying downstream architectures.