Skip to main content

Orchestrators

Bauplan integrates seamlessly with common workflow orchestrators such as Airflow, Prefect, and Dagster. The general pattern we recommend is straightforward:

  • Use your orchestrator to manage scheduling, retries, triggers, and durable workflow execution.
  • Use Bauplan as the underlying data platform that actually runs your compute and manages your lakehouse.

This division of responsibilities has several advantages:

  1. Separation of orchestration and runtime. Orchestrators remain lightweight. They don’t execute heavy transformations directly. Instead they call into Bauplan, where compute is isolated, scalable, and data-aware. This avoids overloading your orchestration cluster and ensures your jobs run where they can take advantage of Bauplan’s optimizations (optimized S3 scans, smart caching, zero-copy branching, atomic merges, versioned inputs).
  2. Simpler DAGs, less boilerplate. Orchestration code focuses on when and under what conditions a job should run, not how data is processed. This keeps DAGs small and maintainable, reducing the risk of monolithic deployments that are hard to test and debug.
  3. Portable, reproducible transformation code. All transformation logic lives in Bauplan projects. Pipelines are versioned, testable, and reproducible without depending on the orchestration context, like Airflow operators. This makes your data workflows portable across orchestrators or even runnable ad-hoc from the CLI without rewriting pipeline code.
  4. Future-proof architecture. By keeping orchestration and compute loosely coupled, you can evolve each layer independently. Swap Airflow for Prefect (or vice versa), adopt event-driven triggers, or move workloads between environments without rewriting data business logic. Bauplan ensures consistent execution and data management regardless of how tasks are scheduled.