Skip to main content

Introduction

Using Bauplan with AI Coding Assistants

Bauplan is designed to work naturally with AI coding assistants such as Claude Code, Cursor, and similar tools. If you already use an LLM to write, refactor, and reason about code inside your IDE, you can use the same workflow for data pipelines. Bauplan’s surface area is a small set of explicit CLI and SDK operations, which map cleanly to tool calls. There is no UI state, notebook state, or hidden side effects an assistant has to “click through” or guess.

In practice, an assistant can drive the same end-to-end workflow a data engineer would: edit pipeline code in your repo, create a data branch, run against real data, inspect results with SQL, validate outputs, then publish or roll back.

How this works

Bauplan separates authoring from execution. You author pipelines locally in a Git repository, as declarative Python and SQL. Execution happens remotely on Bauplan-managed compute against data in object storage. Runs produce versioned table updates on an isolated data branch.

When you run a pipeline, Bauplan snapshots the code, the environment, and the input data version, then executes the run on a branch. If validation passes, you publish by merging the branch into main atomically. If it fails, nothing reaches main, and the branch state remains available for inspection and iteration.

From an assistant’s perspective, this behaves like a standard software project: files in a repo, deterministic commands, explicit state, and a reviewable history.

Why this model fits AI-assisted development

AI coding assistants work best with repositories and command-line workflows. Bauplan matches that shape and adds structural safety so faster iteration does not increase risk.

Bauplan operationalizes safety with Git-style data semantics:

An assistant can propose a change, run it in isolation on a branch, validate results, then publish with an atomic merge, with rollback to a prior commit if needed. You do not have to rely on the assistant “being careful”; the workflow prevents partial writes to production by construction.

Working with AI agents

There are three complementary ways to use AI agents with Bauplan. They are not mutually exclusive and can be combined depending on the workflow.

1. MCP Server

The Bauplan MCP Server exposes lakehouse operations through the Model Context Protocol. This allows AI assistants such as Claude Code, Claude Desktop, or Cursor to interact with a Bauplan lakehouse via tool calls.

Through the MCP server, an assistant can:

  • Inspect schemas and tables
  • Run queries
  • Manage data branches and commits
  • Run Bauplan projects and pipelines
  • Track and inspect jobs

The MCP server is primarily intended for local development and interactive usage, where an assistant needs live access to lakehouse state. Setup and usage details, including videos and configuration examples, are available in the repository.

See: https://github.com/BauplanLabs/bauplan-mcp-server

2. Repository-based usage (CLAUDE.md + CLI / SDK)

Bauplan can also be used with LLMs without running an MCP server.

In this mode, the assistant operates purely through repository context:

  • A CLAUDE.md file at the project root that explains how to work with Bauplan
  • Reference documentation for the Bauplan CLI and Python SDK
  • Standard command-line execution and code generation

The assistant reads documentation, writes Python or SQL, and invokes Bauplan through the CLI or SDK directly. Functionally, this covers the same core operations as the MCP server, but without requiring a long-running service.

This approach is well suited to IDE-based assistants like Claude Code or Cursor and is often the simplest starting point.

See: https://github.com/BauplanLabs/bauplan-mcp-server

3. Agent Skills for structured workflows

Agent Skills are reusable, declarative workflow templates designed to guide LLMs through multi-step data engineering tasks. Skills can be used together with the MCP server or with repository-based usage.

Skills encode best practices and sequencing for tasks that are otherwise easy to get wrong, including:

  • Creating new data pipelines
  • Ingesting data safely using Write-Audit-Publish
  • Exploring large or complex datasets
  • Investigating failed runs and performing root-cause analysis

Each skill defines the intent, constraints, and expected steps of a workflow, while still operating on the same underlying Bauplan primitives (branches, runs, validation, publish).

Available skills and usage instructions are maintained in the repository.

See: https://github.com/BauplanLabs/bauplan-mcp-server