Search papers, labs, and topics across Lattice.
This paper introduces WORKSWORLD, a new domain for automated planning and scheduling of distributed data pipelines, represented as workflow and resource graphs. The framework allows users to specify data sources, workflow components, and desired destinations, enabling the planner to construct the workflow graph and schedule components jointly. Experiments demonstrate that a numeric planner can solve linear-chain workflows of up to 14 components across eight sites within reasonable time and resource constraints.
Automating the planning and scheduling of distributed data pipelines is now possible with WORKSWORLD, a new domain where planners can build and schedule workflows from high-level specifications.
This work pursues automated planning and scheduling of distributed data pipelines, or workflows. We develop a general workflow and resource graph representation that includes both data processing and sharing components with corresponding network interfaces for scheduling. Leveraging these graphs, we introduce WORKSWORLD, a new domain for numeric domain-independent planners designed for permanently scheduled workflows, like ingest pipelines. Our framework permits users to define data sources, available workflow components, and desired data destinations and formats without explicitly declaring the entire workflow graph as a goal. The planner solves a joint planning and scheduling problem, producing a plan that both builds the workflow graph and schedules its components on the resource graph. We empirically show that a state-of-the-art numeric planner running on commodity hardware with one hour of CPU time and 30GB of memory can solve linear-chain workflows of up to 14 components across eight sites.