University of Engineering and TechnologyAug 4, 2025arXiv:2508.02427

CABENCH: Benchmarking Composable Ai for Solving Complex Tasks through Composing Ready-to-Use Models

Tung-Thuy Pham, Duy-Quan Luong, Minh-Quan Duong, Trung-Hieu Nguyen, Thu-Trang Nguyen, Son Nguyen, Hieu Dinh Vo

AI Summary

The paper introduces CABench, a novel benchmark for composable AI consisting of 70 realistic tasks and 700 ready-to-use models across multiple modalities. It addresses the lack of systematic evaluation in composable AI by providing an evaluation framework for end-to-end assessment of solutions. The benchmark results, comparing human-designed solutions with LLM-based approaches, highlight the potential of composable AI while also revealing the need for automated pipeline generation methods.

Key Contribution

CABench reveals that even with a large pool of pre-trained models, automatically generating effective execution pipelines for composable AI remains a significant challenge.

Abstract

Composable AI offers a scalable and effective paradigm for solving complex AI tasks by decomposing them into sub-tasks, each handled by ready-to-use models. However, systematically evaluating methods under this setting remains largely unexplored. This paper introduces CABench, the first public benchmark for composable AI, comprising 70 realistic tasks, along with a curated pool of 700 models across multiple modalities and domains. We also propose an evaluation framework to enable end-to-end assessment of composable AI solutions. To establish initial baselines, we provide human-designed reference solutions and compare their performance with two LLM-based approaches. Our results demonstrate the promise of composable AI in addressing real-world challenges while underscoring the need for methods capable of automatically generating effective execution pipelines. This work lays the foundation for future research toward building scalable, efficient AI systems through principled reuse and orchestration of existing models.

Eval Frameworks & Benchmarks Multimodal Models Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References22

Year2025

VenueInternational Conference on Knowledge and Systems Engineering

Related Papers

Finding related papers...

Search

CABENCH: Benchmarking Composable Ai for Solving Complex Tasks through Composing Ready-to-Use Models

Related Papers