Skip to main content

Creating Your First Pipeline

Example Prompt

Try this conversational prompt with your AI agent:

I want to create a data pipeline that processes NYC taxi data.
The pipeline should:
1. Start with the raw taxi_fhvhv table (already in the lakehouse)
2. Create a cleaned version that:
- Select two months of data
- Filters to trips longer than 1 minute
- Removes rows with null pickup/dropoff times
3. Create an aggregated daily summary that shows:
- Total trips per day
- Average trip duration per day
- Total trips by borough per day

What the Agent Does

When you give this prompt, the agent will:

  1. Load the creating-bauplan-pipelines skill from .claude/skills/creating-bauplan-pipelines/SKILL.md
  2. Ask clarifying questions if needed. For instance:
    • “Which namespace contains the taxi_fhvhv table?” see Namespaces.
    • “Should the pipeline materialize (persist) all output tables or only the final outputs?”
    • ”For the daily summary, which borough/location identifier should we use?”
  3. Create a folder for a Bauplan project, generate a bauplan_project.yml and write the pipeline code in a file named models.py
  4. Set up branch workflow:
    • Creates a development branch for testing
    • Configures the pipeline to run on that branch
    • Never executes directly on main
  5. Run validation:
    • Uses bauplan run --dry-run to validate the pipeline
    • Checks for syntax errors, missing dependencies
    • May run the pipeline and show sample results

Expected Output

After the agent generates your pipeline, you'll see a structure like:

your-repository/
├── your-bauplan-project/
│ ├── models.py # transformation code
│ └── bauplan_project.yml # yml file to define the project params
└── .claude/ # Skills and references

The agent will report something like:

✅ Created pipeline with 2 models
✅ Validated pipeline structure with --dry-run
✅ Pipeline ready to run on branch: alice.pipeline-dev

Run: bauplan run --ref alice.pipeline-dev