Search papers, labs, and topics across Lattice.
4
6
9
5
Real-world proactive agents can now infer latent user needs and act on them in real-time, rivaling state-of-the-art models in intent detection while maintaining low latency.
Language models are increasingly doing their real work in the "invisible" latent space, not the tokens we see.
Autoregressive inference gets a potential 14x speed boost without retraining, thanks to a clever trick of reusing attention weights within semantically coherent chunks.
Open-sourcing SAIL-VL2 gives the multimodal community a new SOTA vision-language model under 4B parameters, driven by innovations in data curation, progressive training, and sparse MoE architectures.