Search papers, labs, and topics across Lattice.
×10−51\times 10^{-5}, and a gradient accumulation factor of 4 was used to achieve an effective batch size of 32. 4 Experiments We evaluate our sparse models against their dense counterparts and two training-free baselines: FastVGGT [33] and Block-Sparse VGGT [37]. Variants of these baselines with VGGT/π3\pi^{3} are referred to as FastVGGT-VGGT/π3\pi^{3} and Block-Sparse VGGT/π3\pi^{3}, respectively. Unless stated otherwise, we use following parameters. Our method employs a 4x4 compression window and selects the top-32 blocks for selective attention. For the baselines, we adopt their default configurations: a 0.9 merge ratio for FastVGGT [33] and a 0.75 sparsity ratio for Block-Sparse VGGT/π3\pi^{3} [37]. All inference times are benchmarked on a single H100 GPU. 4.1 Two-view Pose Estimation Table 1: Pair-wise pose results on ScanNet-1500 [7, 29]. We report the Area Under the Curve (AUC) of the pose error at different thresholds. Best results per backbone are marked in bold. Methods ScanNet1500 AUC@5 ↑\uparrow AUC@10 ↑\uparrow AUC@20 ↑\uparrow VGGT [40] 37.45 59.24 75.69
1
1
2
0
Diffusion Language Models are being held back by auto-regressive thinking, and unlocking their true potential requires a complete paradigm shift.