Branches¶
Branches are isolated environments in the data catalog where users can safely develop and test data transformations without affecting production data.
Traditional data systems face several challenges in development and testing:
Risk of corrupting production data
Difficulty in testing changes safely
Lack of isolation between developers
Limited parallel development
Complex rollback procedures
Long development-to-production cycles
bauplan branches solve these challenges by providing isolated, version-controlled environments for data development.
Branches introduce Git-like workflows to data management, enabling robust development practices throughout the data application lifecycle. This setup empowers you to manage data with the same rigor as code, enhancing control, safety, and collaboration, especially in these key areas.
Key Features¶
Zero-Copy Branching: Creating a branch doesn’t duplicate the entire dataset; instead, it’s a zero-copy operation, making branching instantaneous and cost-efficient.
Username-Based Namespaces: Every user has their own branch space, with write access to their branches and read access to others—promoting collaboration and control.
Main Branch Protection: To ensure stability,
main
is write-protected by default. Updates happen via merges, keeping production safe from accidental changes.
What’s New¶
- Data Version Control:
Git-like workflows for data
Zero-copy branching of large datasets
Branch-level access control
Independent schema evolution
- Developer Experience:
Safe experimentation and debug environment
Separate environment for reproducibility
Interactive development with real-time feedback
- Collaboration Model:
User namespacing
Parallel workflows and branch comparison
Clear ownership and audit trail
- Development Workflow:
Feature branch development
Safe testing environment
Easy rollback
Simple CI/CD integration and promotion to production
Examples¶
Here are comprehensive examples of branch operations:
Basic Branch Operations¶
# Create and switch to a feature branch
bauplan branch create <username>.<branch_name>
# Check out branch
bauplan checkout <username>.<branch_name>
# Remove a branch
bauplan branch rm <username>.<branch_name>
# Compare active branch with main
bauplan branch diff main
# List all branches and show the active branch
bauplan branch
# List all tables in a branch
bauplan branch get <username>.<branch_name>
Development Workflow¶
Here is an example of a simple workflow for developing a new feature using branches:
# Create development branch
bauplan branch create <username.new_feature>
# Run transformations in isolation
bauplan run --branch <username.new_feature>
# Review changes
bauplan branch diff main
# Merge to main
bauplan branch merge <username.new_feature>
Best Practices¶
- Branch Naming:
Use descriptive names
Include username prefix
Follow team conventions
- Development Flow:
Create feature branches
Test thoroughly before merge
Maintain regular synchronization with main
Clean up unused branches
- Collaboration:
Communicate branch purposes
Coordinate merge timing
Review changes together
Document significant changes
- Maintenance:
Regular branch cleanup
Merge completed work