Search papers, labs, and topics across Lattice.
Pseudocode of TAP 1:denoiser fθf_{\theta}, predictor set 𝒫\mathcal{P}, window NN, distance d(⋅,⋅)d(\cdot,\cdot) 2:Ch←∅,Cr←∅C_{h}\leftarrow\varnothing,\;C_{r}\leftarrow\varnothing ⊳\triangleright compact cache: first-layer modulated input and residual 3:for t←Tt\leftarrow T downto 11 do 4: 𝐱t←\mathbf{x}_{t}\leftarrow current model input 5: 𝐡t←Modulate(Norm1(𝐱t),𝐬t,𝐠t)\mathbf{h}^{t}\leftarrow\mathrm{Modulate}(\mathrm{Norm}_{1}(\mathbf{x}_{t}),\mathbf{s}_{t},\mathbf{g}_{t}) ⊳\triangleright Eq.(5) 6: if tmodN=0t\bmod N=0 then 7: 𝐫t←fθ(𝐱t,t)−𝐱t\mathbf{r}_{t}\leftarrow f_{\theta}(\mathbf{x}_{t},t)-\mathbf{x}_{t} ⊳\triangleright full residual (Eq.(6)) 8: Ch←𝐡t,Cr←𝐫tC_{h}\leftarrow\mathbf{h}_{t},\;C_{r}\leftarrow\mathbf{r}_{t} ⊳\triangleright store compact proxies 9: use fθ(𝐱t,t)f_{\theta}(\mathbf{x}_{t},t) as the model output for this step 10: else 11: for all p∈𝒫p\in\mathcal{P} do ⊳\triangleright parallel prediction from cached proxies (e.g., Taylor variants) 12: 𝐡^t,p←Predict(p,Ch)\widehat{\mathbf{h}}_{t,p}\leftarrow\mathrm{Predict}(p,C_{h}) ⊳\triangleright (Eq.(4)) 13: end for 14: for all tokens (b,n)(b,n) do 15: p
1
0
3
Achieve significant speedups in diffusion model inference, without training, by adaptively selecting the best predictor for each token at each step based on a low-cost probe of the first layer.