Search papers, labs, and topics across Lattice.
2
6
6
6
Autoregressive inference gets a potential 14x speed boost without retraining, thanks to a clever trick of reusing attention weights within semantically coherent chunks.
Open-sourcing SAIL-VL2 gives the multimodal community a new SOTA vision-language model under 4B parameters, driven by innovations in data curation, progressive training, and sparse MoE architectures.