Data branches¶

Data branches are isolated development environments that let you safely import data, delete tables, create namespaces, run and test pipelines without affecting production. Inspired by Git, branches in Bauplan bring modern software development workflows to your data lake.

Traditional data systems make safe development hard:

Risk of corrupting production data
Lack of isolation between developers
Difficult rollbacks
Inefficient iteration and testing

Data branches solve these problems by enabling version-controlled, zero-copy, namespace-aware data sandboxes.

Why Use Data Branches?¶

Branches introduce Git-like workflows to data management, enabling robust development practices throughout the data application lifecycle:

Safe parallel development without conflicts
Zero-copy sandboxes for experiments and CI
Simplified rollbacks and debugging using immutable Refs
Auditability and traceability by design

Key Features¶

Zero-Copy Branching: Creating a branch is instantaneous and cost-efficient. No data duplication; only metadata changes.
Username-Based Namespaces: Each user gets a private branch space (username.branch_name) with write access to their branches and read access to others.
Main Branch Protection: main is protected by default. Changes flow in via merges, keeping production safe.

Using Data Branches and Refs¶

Branch	Ref
Mutable	Immutable
Points to the latest commit	Points to a specific commit
Used for development workflow	Used for reproducibility / audit trail
You move it (via new runs, imports, etc.)	It never changes
Example: `ciro.feature_xyz`	Example: `ciro.feature@xyz-run-202505...`

Each time you change the state of the lake, for instance when you run a pipeline, Bauplan creates a new Commit. If the run succeeds, the branch is updated to point to the new Ref. If it fails, the branch remains unchanged. This is what makes pipelines transactional by default.

Hint

Think of a branch as your active sandbox and a Ref as a pointer to a timestamped, frozen snapshot created by any write operation on the lake.

Basic Data Branch Operations¶

# Create and switch to a feature branch
bauplan branch create ciro.feature_xyz
bauplan checkout ciro.feature_xyz

# List branches and active branch
bauplan branch

# Compare active branch with main
bauplan branch diff main

# Checkout and merge to main
bauplan branch checkout main
bauplan branch merge ciro.feature_xyz

# Delete a branch
bauplan branch rm ciro.feature_xyz

Tags¶

A tag is an immutable, user-defined label that points to a specific commit hash.

Used to “freeze” a specific state of the lake (e.g. v1.0-passed-qa)
Always points to a commit, never moves on its own.
Useful for bookmarking releases, debugging points, or restoring known-good states.
Can be updated manually by reassigning it to a new commit hash.

Tags are also compatible with relative syntax:

v1.0^1 → commit before the tagged one
v1.0*2025-05-19T06:11:33Z → last commit before the timestamp