Branches

Branches are isolated environments in the data catalog where users can safely develop and test data transformations without affecting production data.

Traditional data systems face several challenges in development and testing:

  • Risk of corrupting production data

  • Difficulty in testing changes safely

  • Lack of isolation between developers

  • Limited parallel development

  • Complex rollback procedures

  • Long development-to-production cycles

bauplan branches solve these challenges by providing isolated, version-controlled environments for data development.

Branches introduce Git-like workflows to data management, enabling robust development practices throughout the data application lifecycle. This setup empowers you to manage data with the same rigor as code, enhancing control, safety, and collaboration, especially in these key areas.

Key Features

  • Zero-Copy Branching: Creating a branch doesn’t duplicate the entire dataset; instead, it’s a zero-copy operation, making branching instantaneous and cost-efficient.

  • Username-Based Namespaces: Every user has their own branch space, with write access to their branches and read access to others—promoting collaboration and control.

  • Main Branch Protection: To ensure stability, main is write-protected by default. Updates happen via merges, keeping production safe from accidental changes.

What’s New

  1. Data Version Control:
    • Git-like workflows for data

    • Zero-copy branching of large datasets

    • Branch-level access control

    • Independent schema evolution

  2. Developer Experience:
    • Safe experimentation and debug environment

    • Separate environment for reproducibility

    • Interactive development with real-time feedback

  3. Collaboration Model:
    • User namespacing

    • Parallel workflows and branch comparison

    • Clear ownership and audit trail

  4. Development Workflow:
    • Feature branch development

    • Safe testing environment

    • Easy rollback

    • Simple CI/CD integration and promotion to production

Examples

Here are comprehensive examples of branch operations:

Basic Branch Operations

# Create and switch to a feature branch
bauplan branch create <username>.<branch_name>

# Check out branch
bauplan checkout <username>.<branch_name>

# Remove a branch
bauplan branch rm <username>.<branch_name>

# Compare active branch with main
bauplan branch diff main

# List all branches and show the active branch
bauplan branch

# List all tables in a branch
bauplan branch get <username>.<branch_name>

Development Workflow

Here is an example of a simple workflow for developing a new feature using branches:

# Create development branch
bauplan branch create <username.new_feature>

# Run transformations in isolation
bauplan run --branch <username.new_feature>

# Review changes
bauplan branch diff main

# Merge to main
bauplan branch merge <username.new_feature>

Best Practices

  1. Branch Naming:
    • Use descriptive names

    • Include username prefix

    • Follow team conventions

  2. Development Flow:
    • Create feature branches

    • Test thoroughly before merge

    • Maintain regular synchronization with main

    • Clean up unused branches

  3. Collaboration:
    • Communicate branch purposes

    • Coordinate merge timing

    • Review changes together

    • Document significant changes

  4. Maintenance:
    • Regular branch cleanup

    • Merge completed work