Search papers, labs, and topics across Lattice.
2
0
5
52
Even GPT-5 only achieves 63% accuracy on time series anomaly questions from real software incidents, but a model-expert combination reaches 87%, highlighting the potential for hybrid intelligence in incident response.
Multimodal agents still struggle with game development, solving only ~50% of tasks in a new benchmark, GameDevBench, highlighting the need for better multimodal reasoning in complex software environments.