Search papers, labs, and topics across Lattice.
UC San Diego 2 Zhejiang University
2
0
4
JetFlow breaks the speculative decoding speed ceiling, achieving up to 9.64x faster performance on complex tasks by aligning candidate tree generation with autoregressive models.
Current LLM memory systems falter when faced with the continuous, machine-generated interaction streams typical of real-world agentic applications, highlighting a critical need for causality-aware and tool-augmented memory architectures.