Search papers, labs, and topics across Lattice.
Harbin Institute of Technology, Shenzhen
5
0
6
Today's best GUI agents can barely handle real-world professional workflows, failing at tasks requiring reasoning across just three applications with success rates under 21%.
Reasoning across languages doesn't have to break the bank: a new framework slashes token costs by over 50% while maintaining accuracy, especially boosting performance in low-resource languages.
LLMs still struggle to reason in context when cultural and linguistic nuances are involved, achieving only 44% accuracy on a new grounded benchmark spanning 14 languages.
LLMs can now navigate the ever-expanding universe of external tools with significantly improved accuracy and generalization, thanks to a new agentic framework that proactively retrieves and grounds tool execution.
Traditional text embedding benchmarks fail to capture the nuances of long-horizon memory retrieval, but this new benchmark reveals that bigger models don't always win, and performance on standard tasks doesn't guarantee success in complex, context-dependent memory scenarios.