Search papers, labs, and topics across Lattice.
Surflo introduces a novel approach to 3D surface reconstruction by leveraging a global latent state to compress multiple unposed RGB images into K latent tokens, allowing for flexible and efficient decoding of oriented 3D surface points. This method addresses the limitations of existing models that either produce redundant outputs or are constrained by fixed resolutions, achieving significant improvements in both speed and surface quality. The results demonstrate that Surflo not only matches but often surpasses traditional feed-forward baselines while maintaining the capability to generate outputs ranging from thousands to millions of points in a single forward pass.
Surflo revolutionizes 3D surface reconstruction by enabling arbitrary-resolution outputs from a single global state, outperforming traditional methods in both speed and accuracy.
Geometry is invariant to viewpoint, which makes any collection of images a redundant encoding of a single 3D state. Existing feed-forward reconstruction models fail to exploit this: per-view methods emit overlapping, unaligned pointmaps that grow linearly with input count, while global-latent methods commit to a fixed, low-resolution output. We introduce Surflo, which compresses a variable number of unposed RGB views into K latent tokens-one global state-and decodes oriented 3D surface points by independently transporting them from noise onto the surface via flow matching. This frees the output from any fixed grid or token budget: the same latent yields from a few thousand to a million points in a single forward pass. To suppress the local inconsistencies inherent to independent per-point decoding, an inference-time guidance term correlates nearby points by injecting a photometric gradient during ODE integration. Surflo matches or surpasses feed-forward baselines on surface metrics, runs an order of magnitude faster than optimization-based methods that require hundreds of views, and is the only feed-forward approach to combine a global latent with arbitrary-resolution decoding.