Search papers, labs, and topics across Lattice.
4
0
10
4
Multimodal perception is no longer just an add-on: GLM-5V-Turbo bakes it directly into the core of reasoning, planning, and action.
Serving LoRA adapters at scale doesn't have to crush your latency SLOs: InfiniLoRA disaggregates LoRA execution to achieve 3x higher throughput and dramatically improved tail latency.
Models can learn to self-differentiate between tasks requiring rigorous planning versus direct generation in creative writing, unlocking a new level of meta-cognitive ability.
Forget specialized tools: a standard Unix terminal and clever RL are all you need to beat much larger LLMs at code search.