Skip to main content

Pipelines

In Bauplan, pipelines are implicitly defined by chaining models together through their declared inputs and outputs. Pipelines take the form of Directed Acyclic Graphs (DAGs).

You don't need to define a DAG manually, because Bauplan automatically analyzes the dependency graph based on model declarations and automatically infers the correct execution order.

To chain models into a DAG, simply pass a previous Bauplan model as an input.

@bauplan.model()
@bauplan.python('3.11')
def step_1(data=bauplan.Model('input_table')):
...

@bauplan.model()
@bauplan.python('3.11')
def step_2(data=bauplan.Model('step_1')):
...

@bauplan.model()
@bauplan.python('3.11')
def step_3(data=bauplan.Model('step_2')):
...

Pipeline constraints

Models can take multiple tabular inputs but must return a single tabular output (for example, a Pandas DataFrame or Arrow table).

✅ -- This is a valid Bauplan DAG

┌────────────┐
│ Model 1 │──────────────┐
└────────────┘ │

┌────────────┐ ┌────────────┐ ┌────────────┐
│ Model 2 │─────►│ Model 3 │─────►│ Model 4 │
└────────────┘ └────────────┘ └────────────┘

┌────────────┐ │
│ Iceberg │──────────────┘
│ Table │
└────────────┘
❌ -- This is **NOT** a valid Bauplan DAG

┌────────────┐
┌──────────►│ Model 3 │
│ └────────────┘

┌────────────┐ ┌────────────┐ ┌────────────┐
│ Model 1 │─────►│ Model 2 │─────►│ Model 4 │
└────────────┘ └────────────┘ └────────────┘

Parameterize DAGs with parameters and secrets

In production, it is common for the same DAG code to be run on different days, or to call an external service (for example, an LLM for data enrichment). Bauplan supports these scenarios with parameters and secrets.

You can define a parameter using the CLI:

bauplan parameter set prompt_summary "Write a concise, incisive summary of the news article." --description "Prompt passed to GPT for article summarization"

Each parameter is defined in bauplan_project.yml with a type and a default value.

parameters:
interest_rate:
default: 3.5
type: float
loan_amount:
default: 200000
type: int

Parameters can be:

  • Strings (str)
  • Numbers (int, float)
  • Booleans (bool)
  • Secrets (encrypted API keys or credentials)

Parameters

Parameters let you pass runtime values to models without changing code. They can be accessed in a variety of ways.

Parameters can be passed in bauplan.Model as a filter, using the $param_name syntax:

@bauplan.model()
@bauplan.python('3.11')
def qualifying_loans(
data=bauplan.Model(
'loans',
filter='interest_rate <= $interest_rate AND amount >= $loan_amount',
),
):
...
note

To see how to use parameters in a model as a variable, see reference.

In SQL, use $param_name directly in your query:

SELECT * FROM loans
WHERE interest_rate <= $interest_rate
AND amount >= $loan_amount

Defaults can be overridden at runtime:

bauplan run --param interest_rate=4.5 --param loan_amount=500000

Secrets

Secrets are encrypted parameters for sensitive values like API keys, database credentials, or tokens. They are encrypted, versioned, and injected into your pipeline code at runtime, avoiding hardcoding.

Use the CLI with --type secret to create one:

bauplan parameter set openai_api_key sk-abc123... --type secret

Bauplan encrypts the value via AWS KMS and stores only the ciphertext in your bauplan_project.yml. Secrets are never stored or transmitted in plaintext.

In your code, reference secrets with bauplan.Parameter just like regular parameters.

For a complete walkthrough using secrets with an LLM pipeline, see the Using LLMs with Secrets example.

Best practices

To keep your project organized and make it easy to run, test, and inspect individual pipelines, we recommend the following structure. See Projects for more details.

my_project/
└── pipelines/
├── sales_reporting/
│ ├── models.py
│ └── bauplan_project.yml
├── customer_segmentation/
│ ├── models.py
│ └── bauplan_project.yml
└── churn_prediction/
├── models.py
└── bauplan_project.yml
  • Group models that form a logical pipeline into a single file: models.py.
  • Place that file inside a folder named after the pipeline (sales_reporting, churn_prediction, etc.).
  • Each pipeline folder contains its own bauplan_project.yml file, which defines environment settings and dependencies for that pipeline only.
  • Separate function bodies into external modules and call them within the Bauplan models. This keeps business logic code neatly separated from the DAG and environment declaration, making code refactoring easier and future-proof.

This layout:

  • Makes each pipeline self-contained and easy to run with bauplan run from inside the folder.
  • Keeps models modular and promotes reuse.
  • Helps enforce consistent dependency tracking and ref resolution.