Search papers, labs, and topics across Lattice.
IBM Research
2
0
3
Stop re-running full benchmarks: Calibrate new LLM datasets against existing suites with just 100 "anchor" questions and still get highly accurate performance predictions.
General-purpose agents can match the performance of specialized agents across diverse environments without any environment-specific tuning, challenging the need for task-specific engineering.