Search papers, labs, and topics across Lattice.
Harbin Institute of Technology (Shenzhen);, Shenzhen Loop Area Institute (SLAI);
3
0
6
0
Traditional text embedding benchmarks fail to capture the nuances of long-horizon memory retrieval, but this new benchmark reveals that bigger models don't always win, and performance on standard tasks doesn't guarantee success in complex, context-dependent memory scenarios.
Forget content, remember position: crafting pseudo-queries based on token position alone yields surprisingly effective KV cache compression for LLMs, rivaling methods that analyze input semantics.
Achieve 11.8x faster reasoning with 80% KV cache compression by estimating token importance directly from FlashAttention's intermediate results – no extra compute needed.