Search papers, labs, and topics across Lattice.
1
0
4
2
Squeezing high-accuracy LLMs and VLMs onto client devices is now significantly more feasible, thanks to a new pipelined sharding technique that achieves up to 30x speedups and 10x VRAM reduction.