Creating Your First Pipeline

Example Prompt

Try this conversational prompt with your AI agent:

I want to create a data pipeline that processes NYC taxi data.
The pipeline should:
1. Start with the raw taxi_fhvhv table (already in the lakehouse)
2. Create a cleaned version that:
   - Select two months of data
   - Filters to trips longer than 1 minute
   - Removes rows with null pickup/dropoff times
3. Create an aggregated daily summary that shows:
   - Total trips per day
   - Average trip duration per day
   - Total trips by borough per day

What the Agent Does

When you give this prompt, the agent will:

Load the data-pipeline skill from .claude/skills/data-pipeline/SKILL.md
Ask clarifying questions if needed. For instance:
- “Which namespace contains the taxi_fhvhv table?” see Namespaces.
- “Should the pipeline materialize (persist) all output tables or only the final outputs?”
- ”For the daily summary, which borough/location identifier should we use?”
Create a folder for a Bauplan project, generate a bauplan_project.yml and write the pipeline code in a file named models.py
Set up branch workflow:
- Creates a development branch for testing
- Configures the pipeline to run on that branch
- Never executes directly on main
Run validation:
- Uses bauplan run --dry-run to validate the pipeline
- Checks for syntax errors, missing dependencies
- May run the pipeline and show sample results

Expected Output

After the agent generates your pipeline, you'll see a structure like:

your-repository/
├── your-bauplan-project/
│   ├── models.py                  # transformation code
│   └── bauplan_project.yml        # yml file to define the project params
└── .claude/                       # Skills and references

The agent will report something like:

✅ Created pipeline with 2 models
✅ Validated pipeline structure with --dry-run
✅ Pipeline ready to run on branch: alice.pipeline-dev

Run: bauplan run --ref alice.pipeline-dev

Example Prompt​

What the Agent Does​

Expected Output​

Example Prompt

What the Agent Does

Expected Output