# Bauplan Documentation Bauplan is a serverless data lakehouse platform where data changes follow a Git-like workflow. You develop and test on isolated data branches, then publish by merging into main. The platform handles compute, storage, and orchestration automatically. ## Getting Started - [Welcome to Bauplan](https://docs.bauplanlabs.com/tutorial/index.md): Bauplan brings Git-style workflows to data so AI agents can safely make changes on your data. - [Installation](https://docs.bauplanlabs.com/tutorial/installation.md): Ensure you are running a supported Python version (>=3.10). ### Quick Start Create an isolated data branch to work in - like a git branch, but for your lakehouse - [Quick start](https://docs.bauplanlabs.com/tutorial/quick-start.md) - [Data Branches](https://docs.bauplanlabs.com/tutorial/data-branches.md): In the Quick Start you created a branch and ran a pipeline on it. This guide goes deeper into branches - importing external data and merging results back to main. - [Import](https://docs.bauplanlabs.com/tutorial/import.md): This guide explains how to import data into bauplan's data catalog as Iceberg tables. ## Platform Overview Bauplan is a Python-first lakehouse runtime for building and operating data pipelines on object storage (for example, AWS S3) with Git-style branching and versioning for data. - [Overview](https://docs.bauplanlabs.com/overview/index.md) - [Execution model](https://docs.bauplanlabs.com/overview/execution-model.md): When you run a function or a pipeline in Bauplan, here\'s what happens behind the scenes - [Architecture](https://docs.bauplanlabs.com/overview/architecture.md): Bauplan\'s architecture is organized around three main layers - [Deployment](https://docs.bauplanlabs.com/overview/deployment.md): Bauplan offers two secure deployment options tailored to your needs. ## Core Concepts - [Projects](https://docs.bauplanlabs.com/concepts/projects.md): A Bauplan project encapsulates your data workflows, including models, pipelines, configurations, and dependencies. - [Models](https://docs.bauplanlabs.com/concepts/models.md): In Bauplan, models are the core unit of data manipulation. They are declarative functions written in Python or SQL that transform one or more input tables into a single output table. - [Pipelines](https://docs.bauplanlabs.com/concepts/pipelines.md): In Bauplan, pipelines are implicitly defined by chaining models together through their declared inputs and outputs. - [Tables](https://docs.bauplanlabs.com/concepts/tables.md): Bauplan uses Apache Iceberg tables to bring transactional, SQL-ready structure to your object storage. - [Namespaces](https://docs.bauplanlabs.com/concepts/namespaces.md): A namespace in Bauplan is a logical container that groups related tables, similar to schemas in a relational database or folders in a filesystem. - [Expectations](https://docs.bauplanlabs.com/concepts/expectations.md): Expectations are statistical and quality checks that validate the structure and values of your data. ### Git for Data Bauplan's Git for Data brings version control capabilities to your data lake with branches, transactional pipelines, and time travel features. - [Git for Data](https://docs.bauplanlabs.com/concepts/git-for-data/index.md) - [Transactional pipelines](https://docs.bauplanlabs.com/concepts/git-for-data/transactional-pipelines.md): In Bauplan, every pipeline run is treated like a database transaction. - [Commits and refs](https://docs.bauplanlabs.com/concepts/git-for-data/commits-refs.md): In Bauplan, everything is versioned: code, data, and execution environments. - [Tags](https://docs.bauplanlabs.com/concepts/git-for-data/tags.md): Tags in Bauplan are mutable human-readable names that point to a specific commit. - [Data branches](https://docs.bauplanlabs.com/concepts/git-for-data/data-branches.md): Data branches are isolated development environments that let you safely import data, delete tables, create namespaces, run and test pipelines without affecting production. ## Agents - [Overview](https://docs.bauplanlabs.com/agents/overview.md): There are three complementary ways to use AI agents with Bauplan. They are not mutually exclusive and can be combined depending on the workflow. - [Skills setup](https://docs.bauplanlabs.com/agents/setup.md): Bauplan skills are distributed as a Claude Code plugin that installs with a single command and stays up to date automatically. ## Common Scenarios - [Multi-stage pipelines](https://docs.bauplanlabs.com/common-scenarios/multi-stage-pipelines.md): When a pipeline grows past a single model, you need to decide how data flows between stages, what gets materialized, and how to keep each step efficient. - [Create table schema conflicts](https://docs.bauplanlabs.com/common-scenarios/schema-conflicts.md): When you're passing in a dataset via --search-uri option in table create command, Bauplan scans your files to infer the table schema. - [Branching Workflows](https://docs.bauplanlabs.com/common-scenarios/branching-workflows.md): Bauplan replaces traditional dev/staging/prod environment separation with branch-based isolation. - [Detached Job Runs](https://docs.bauplanlabs.com/common-scenarios/detached-runs.md): In Bauplan you can run jobs in a detached mode. This means that the job will be submitted to the system and the client will return immediately, then the job can be followed up on later. - [Parameterized Runs](https://docs.bauplanlabs.com/common-scenarios/parameterized-runs.md): In production, it is common for the same DAG code to be run on different days, or to call an external service (for example, an LLM for data enrichment). - [SDK or CLI?](https://docs.bauplanlabs.com/common-scenarios/sdk-or-cli.md): All Bauplan functionalities are available both as CLI commands and through the Python SDK. ## Integrations Bauplan integrates with the tools you already use - orchestrators, notebooks, warehouses, BI tools, and more. - [Integrations](https://docs.bauplanlabs.com/integrations/index.md) ### Orchestrators Bauplan integrates seamlessly with common workflow orchestrators such as Airflow, Prefect, and Dagster. - [Orchestrators](https://docs.bauplanlabs.com/integrations/orchestrators/index.md) - [Airflow 3](https://docs.bauplanlabs.com/integrations/orchestrators/airflow.md): Airflow is one of the most widely adopted orchestrators in data engineering. It provides a central way to define, schedule, and monitor workflows, making it the backbone for many production data platforms. - [Temporal](https://docs.bauplanlabs.com/integrations/orchestrators/temporal.md): Bauplan exposes a Python SDK that lets you do everything with your lakehouse in code - from running pipelines to managing branches and tables. - [Dagster](https://docs.bauplanlabs.com/integrations/orchestrators/dagster.md): Dagster is a modern orchestrator designed specifically for data applications. Instead of just scheduling tasks, it treats pipelines as first-class objects, with strong typing, assets, and metadata built in. - [DBOS](https://docs.bauplanlabs.com/integrations/orchestrators/dbos.md): DBOS is a distributed operating system for workflows: it gives you durable, exactly-once execution of Python functions, with built-in scheduling, retries, and observability. - [Prefect](https://docs.bauplanlabs.com/integrations/orchestrators/prefect.md): Prefect is a modern workflow orchestrator built entirely in Python. Instead of managing YAML or DSLs, you define flows and tasks as plain Python functions, and Prefect takes care of scheduling, retries, logging, and observability. - [Orchestra](https://docs.bauplanlabs.com/integrations/orchestrators/orchestra.md): Orchestra is a managed orchestration platform that lets data teams build, schedule, and monitor pipelines through a simple web interface or declarative YAML. ### Interactive notebooks and data apps Notebooks and data apps are widely used by data teams to explore datasets, test ideas, and share results with other stakeholders. - [Notebooks and data apps](https://docs.bauplanlabs.com/integrations/notebooks-data-apps/index.md) - [Jupyter Notebooks](https://docs.bauplanlabs.com/integrations/notebooks-data-apps/jupyter-notebooks.md): Jupyter notebooks are an interactive computing environment for Python. They are commonly used for exploratory analysis, iterative development, and sharing reproducible workflows that combine code, text, and visuals. - [marimo](https://docs.bauplanlabs.com/integrations/notebooks-data-apps/marimo.md): marimo is a reactive Python notebook and app framework for building interactive tools from pure Python. - [Streamlit](https://docs.bauplanlabs.com/integrations/notebooks-data-apps/streamlit.md): Streamlit is a Python framework for turning scripts into interactive web apps. Teams commonly use it for lightweight dashboards, prototypes, and internal tools that they can build and share quickly. ### Warehouses and Lakehouses This section explains how to connect Bauplan‑managed Iceberg tables in your object storage to other platforms your teams already use, such as Databricks Unity Catalog, AWS Glue and Athena. - [Warehouses and Lakehouses](https://docs.bauplanlabs.com/integrations/warehouses-lakehouses/index.md) - [Snowflake (Inbound)](https://docs.bauplanlabs.com/integrations/warehouses-lakehouses/snowflake-inbound.md): Connect Snowflake to Bauplan by treating Bauplan as an Iceberg REST catalog and creating externally managed Iceberg tables in Snowflake that point to your Bauplan tables and S3 storage. - [Snowflake (Outbound)](https://docs.bauplanlabs.com/integrations/warehouses-lakehouses/snowflake-outbound.md): Connect Bauplan to Snowflake to read from Snowflake tables in your Bauplan pipelines. - [BigQuery (Inbound)](https://docs.bauplanlabs.com/integrations/warehouses-lakehouses/big-query-inbound.md): Connect BigQuery to Bauplan by creating external Iceberg tables in BigQuery that point to your Bauplan tables and S3 storage. - [BigQuery (Outbound)](https://docs.bauplanlabs.com/integrations/warehouses-lakehouses/big-query-outbound.md): Connect Bauplan to Google BigQuery to read from BigQuery tables in your Bauplan pipelines. - [GCS (Google Cloud Storage)](https://docs.bauplanlabs.com/integrations/warehouses-lakehouses/gcs.md): Connect your GCS bucket to Bauplan by creating an automated sync with the S3 bucket linked to your Bauplan lakehouse. ### BI tools and Postgres client Bauplan includes a PostgreSQL-compatible Proxy that enables Read-Only data access from major BI Tools. - [BI tools and Postgres client](https://docs.bauplanlabs.com/integrations/bi-tools-postgres/index.md) - [Metabase](https://docs.bauplanlabs.com/integrations/bi-tools-postgres/metabase.md): Metabase is a BI tool available in both open source and enterprise versions; you can run it locally (via Docker), self-host it, or use one of the available Metabase Cloud versions. ### Data Integration and ELT Tools Use your ELT tool to land data in your bucket, use Bauplan to turn that landing zone into safe, queryable Iceberg tables. - [Data Integration and ELT](https://docs.bauplanlabs.com/integrations/data-int-and-etl/index.md) - [Fivetran](https://docs.bauplanlabs.com/integrations/data-int-and-etl/fivetran.md): Fivetran is a managed ELT platform that moves data from hundreds of sources into your data lake with automated schemas, scheduling, and monitoring. - [Estuary via EMR](https://docs.bauplanlabs.com/integrations/data-int-and-etl/estuary.md): Stream data from any Estuary source into Bauplan Iceberg tables using the Apache Iceberg materialization connector. - [Frequently Asked Questions](https://docs.bauplanlabs.com/faq.md) ## API Reference - [CLI Reference](https://docs.bauplanlabs.com/reference/cli.md): Complete reference documentation for the Bauplan Command Line Interface (CLI). - [bauplan](https://docs.bauplanlabs.com/reference/bauplan.md): my_table = client.query('SELECT avg(Age) AS average_age FROM bauplan.titanic limit 1', ref='main') - [bauplan.exceptions](https://docs.bauplanlabs.com/reference/bauplan-exceptions.md) - [bauplan.schema](https://docs.bauplanlabs.com/reference/bauplan-schema.md) - [bauplan.standard_expectations](https://docs.bauplanlabs.com/reference/bauplan-standard-expectations.md): This module contains standard expectations that can be used to test data artifacts in a Bauplan pipeline. - [bauplan.state](https://docs.bauplanlabs.com/reference/bauplan-state.md) ## Other - [Datasets](https://docs.bauplanlabs.com/datasets.md) - [dbt Core](https://docs.bauplanlabs.com/integrations/dbt/index.md): We're currently developing our dbt Core integration to bring the best of both worlds together. - [S3 Permissions Example](https://docs.bauplanlabs.com/tutorial/s3-permissions.md)