Search papers, labs, and topics across Lattice.
IBM Research
2
0
3
9
A unified assessment framework reveals hidden insights about agent performance, transforming how we evaluate AI systems.
General-purpose agents can match the performance of specialized agents across diverse environments without any environment-specific tuning, challenging the need for task-specific engineering.