Creating Your First Pipeline
Example Prompt
Try this conversational prompt with your AI agent:
I want to create a data pipeline that processes NYC taxi data.
The pipeline should:
1. Start with the raw taxi_fhvhv table (already in the lakehouse)
2. Create a cleaned version that:
- Select two months of data
- Filters to trips longer than 1 minute
- Removes rows with null pickup/dropoff times
3. Create an aggregated daily summary that shows:
- Total trips per day
- Average trip duration per day
- Total trips by borough per day
What the Agent Does
When you give this prompt, the agent will:
- Load the
creating-bauplan-pipelinesskill from.claude/skills/creating-bauplan-pipelines/SKILL.md - Ask clarifying questions if needed. For instance:
- “Which namespace contains the taxi_fhvhv table?” see Namespaces.
- “Should the pipeline materialize (persist) all output tables or only the final outputs?”
- ”For the daily summary, which borough/location identifier should we use?”
- Create a folder for a Bauplan project, generate a
bauplan_project.ymland write the pipeline code in a file namedmodels.py - Set up branch workflow:
- Creates a development branch for testing
- Configures the pipeline to run on that branch
- Never executes directly on
main
- Run validation:
- Uses
bauplan run --dry-runto validate the pipeline - Checks for syntax errors, missing dependencies
- May run the pipeline and show sample results
- Uses
Expected Output
After the agent generates your pipeline, you'll see a structure like:
your-repository/
├── your-bauplan-project/
│ ├── models.py # transformation code
│ └── bauplan_project.yml # yml file to define the project params
└── .claude/ # Skills and references
The agent will report something like:
✅ Created pipeline with 2 models
✅ Validated pipeline structure with --dry-run
✅ Pipeline ready to run on branch: alice.pipeline-dev
Run: bauplan run --ref alice.pipeline-dev