Search papers, labs, and topics across Lattice.
University of Science and Technology of China
5
0
10
5
LLM agents struggle to juggle multiple tasks when tool use involves realistic delays, revealing critical weaknesses in temporal reasoning and coordination.
Text-based prototypes in vision-language models are fundamentally misaligned with visual data for out-of-distribution detection, but this can be overcome with a novel online pseudo-supervised approach.
Unlock long-context reasoning in LLMs by turning agent trajectories into gold-standard QA pairs, outperforming models 8x larger on challenging reasoning tasks.
Even the best LLMs struggle to effectively discover, refine, and reuse skills over a lifetime of experience, suggesting current benchmarks significantly overestimate real-world agentic capabilities.
Current LLM efficiency metrics fail to capture the true cost of tool use, as measured by wall-clock latency, but a new hardware-aware metric closes the gap.