Search papers, labs, and topics across Lattice.
The paper introduces CABench, a novel benchmark for composable AI consisting of 70 realistic tasks and 700 ready-to-use models across multiple modalities. It addresses the lack of systematic evaluation in composable AI by providing an evaluation framework for end-to-end assessment of solutions. The benchmark results, comparing human-designed solutions with LLM-based approaches, highlight the potential of composable AI while also revealing the need for automated pipeline generation methods.
CABench reveals that even with a large pool of pre-trained models, automatically generating effective execution pipelines for composable AI remains a significant challenge.
Composable AI offers a scalable and effective paradigm for solving complex AI tasks by decomposing them into sub-tasks, each handled by ready-to-use models. However, systematically evaluating methods under this setting remains largely unexplored. This paper introduces CABench, the first public benchmark for composable AI, comprising 70 realistic tasks, along with a curated pool of 700 models across multiple modalities and domains. We also propose an evaluation framework to enable end-to-end assessment of composable AI solutions. To establish initial baselines, we provide human-designed reference solutions and compare their performance with two LLM-based approaches. Our results demonstrate the promise of composable AI in addressing real-world challenges while underscoring the need for methods capable of automatically generating effective execution pipelines. This work lays the foundation for future research toward building scalable, efficient AI systems through principled reuse and orchestration of existing models.