Skip to main content

Data Branches

In the Quick Start you created a branch and ran a pipeline on it. This guide goes deeper into branches - importing external data and merging results back to main.

If you don't already have a branch, create one now:

bauplan checkout -b <YOUR_USERNAME>.hello_bauplan

You can see your current branch marked with a star by running:

bauplan branch ls   # list your branches (active branch is marked with a star)

Import data in a branch

To import data into a branch, you'll need a public S3 bucket with ListObject permission enabled (here is an example of JSON S3 permissions)

We provide a public bucket with an open dataset to get started.

Make sure you're in your target branch:

bauplan branch checkout <YOUR_USERNAME>.<YOUR_BRANCH_NAME>

Then create and import a new table:

bauplan table create <YOUR_USERNAME>_green_taxi_table --search-uri 's3://alpha-hello-bauplan/green-taxi/*.parquet'
bauplan table import <YOUR_USERNAME>_green_taxi_table --search-uri 's3://alpha-hello-bauplan/green-taxi/*.parquet'

To verify the table creation:

bauplan table get <YOUR_USERNAME>_green_taxi_table

For detailed information about importing data, schema conflict resolution, and using the Python SDK for imports, see the importing data page.

Merge a branch

To merge your hello_bauplan branch into the main branch:

  1. Review the differences between branches:

    bauplan branch diff main

    You can compare your active branch with the main branch to identify the differences. This comparison will show which tables exist in one branch but not the other.

  2. Switch to main and merge:

    bauplan branch checkout main
    bauplan branch merge <YOUR_USERNAME>.<YOUR_BRANCH_NAME>
  3. Check the schema of the merged table:

    bauplan table ls
    bauplan table get <YOUR_USERNAME>_green_taxi_table

You can now query the table. For example, to find out how many records are in the table:

bauplan query "SELECT COUNT(lpep_pickup_datetime) as number_of_trips FROM <YOUR_USERNAME>_green_taxi_table"

Congratulations, you just merged a data branch into the main data catalog!

tip
  • Data branches are user-specific; always prefix branch names with your username.
  • For a complete command reference, please consult our reference documentation.
  • The bauplan data catalog supports additional operations like namespace management, removing tables, and deleting branches. See CLI Reference for more details.