Create table schema conflicts

When you're passing in a dataset via --search-uri option in table create command, Bauplan scans your files to infer the table schema. If your files have inconsistent columns or types, the create will fail with a schema conflict.

Under the hood, the table create command runs two steps: it plans the table (inferring the schema from your source files) and then applies the plan. Normally these are bundled into a single command, but when planning detects conflicts, the combined command fails. In that case you can split the flow into its two parts - generate the plan explicitly, resolve the conflicts in the YAML (or programmatically), then apply it which is what the rest of this page walks through.

note

The dataset example you pass using --search-uri is not inserted into the table. It's only used to infer the schema and create an empty table. You can import the data later via table import, using the same --search-uri or a different one with the same schema.

When a conflict occurs, you might see an error like:

Error: plan has schema conflicts and cannot be auto-applied; use `table create-plan` and `table create-plan-apply` instead

To fix this, you will have to generate a custom plan and manually resolve the conflicts.

Create a table plan

Instead of letting Bauplan infer the schema automatically, you'll create a plan file and save it in a yml file, where potential schema conflicts can be solved either manually or programmatically.

To create a plan, use

$ bauplan table create-plan <your_table_name> --search-uri 's3://your/s3/bucket/*.parquet' --save-plan table_plan.yml

This generates a table_plan.yml file that includes:

The inferred schema
Detected conflicts
Metadata about the data files

Understand the YAML structure

The plan has a schema_info section that looks like this:

schema_info:
  conflicts:
    - column_with_conflict: VendorID
      reconcile_step: In the destination_datatype, please choose between (long or int)
  detected_schemas:
    - column_name: VendorID
      src_datatypes:
        - datatype: long
        - datatype: int
      dst_datatype:
        - datatype: long
        - datatype: int

For each conflicting column:

src_datatypes lists the types Bauplan found across your files.
dst_datatype lists the possible target types - you must pick one. By declaring the value of this field you will tell the system to cast a specific data type for a column.
The conflicts: field explains what to do by explicitly telling you the reconcile_step.

Cast types and resolve conflicts

To resolve the conflict, edit the dst_datatype list to include only the type you want. For example:

From:

dst_datatype:
  - datatype: long
  - datatype: int

To:

dst_datatype:
  - datatype: int

Once you've resolved all conflicts, your conflicts: section should be empty:

conflicts: []

Apply the plan and import data

Apply your edited schema plan:

$ bauplan table create-plan-apply --plan table_plan.yml

Then import the data as usual:

$ bauplan table import <your_table_name> --search-uri 's3://your/s3/bucket/*.parquet'

This manual step ensures you're making intentional decisions about your schema - especially important when types like int, long, or double could affect downstream logic or validation.

Handle casting programmatically

If you'd rather manage schema conflicts without editing YAML by hand, you can handle the entire flow programmatically using the Bauplan Python SDK.

import bauplan
from typing import Dict, Any
import yaml

def generate_plan(
    client: bauplan.Client,
    table: str,
    search_uri: str,
    branch: str
) -> Dict[str, Any]:
    """
    Generate a schema plan for importing data into Bauplan.

    This plan will include inferred column types and any detected schema conflicts.

    Returns:
        A dictionary containing the table creation plan.
    """
    response = client.plan_table_creation(
        table=table,
        search_uri=search_uri,
        branch=branch
    )

    # Extract the actual plan dictionary from the response
    plan = yaml.safe_load(response.plan)
    return plan

client = bauplan.Client()
plan = generate_plan(client=client, table='your_table', search_uri='s3://your-bucket/*.parquet', branch='import_branch')

Overriding column types in a schema plan

Sometimes you know more about your data than Bauplan's automatic inference - for example, when a column should be interpreted as a timestamp rather than a string. You can override inferred types by modifying the dst_datatype field in the schema plan.

This pattern is especially useful when:

You want schema logic to live in code and version control
You have known transformations (for example, parsing timestamp strings)
You want repeatable, auditable data type enforcement

Here's a reusable helper to apply type overrides based on a type_map: a Python dictionary where each key is a column name and the value is a list of type definitions using the same structure Bauplan uses in dst_datatype.

type_map = {
    "EventTime": [
        {
            "datatype": "timestamp",
            "unit": "us",
            "parse_format": "%Y%m%d%H%M%S"
        }
    ],
    "UserID": [
        {
            "datatype": "int"
        }
    ]
}

This allows you to override inferred types programmatically, in a format that's fully compatible with Bauplan schema plans.

def override_column_types(plan: dict, type_map: dict) -> dict:
    """
    Update destination types for specific columns in a Bauplan schema plan.
    Clears the conflict list after applying overrides.
    """
    for col in plan['schema_info']['detected_schemas']:
        if col['column_name'] in type_map:
            col['dst_datatype'] = type_map[col['column_name']]

    # Mark the plan as resolved
    plan['schema_info']['conflicts'] = []

    return plan

This function clears schema_info.conflicts after applying overrides. This makes the procedure compatible with both clean plans (where you're programmatically casting columns) and plans with conflicts (where inferred types are ambiguous).

Because Bauplan plans include a list of detected columns and their inferred types, you can modify dst_datatype to control how each column will be interpreted during import - regardless of what was inferred from the files. Only update dst_datatype; leave src_datatypes unchanged, since it reflects what Bauplan detected in the files:

for schema in plan['schema_info']['detected_schemas']:
    if schema['column_name'] == 'sometimes_int_sometimes_string':
        schema['dst_datatype'] = [{'datatype': 'string'}]
    if schema['column_name'] == 'created_at_ms':
        schema['dst_datatype'] = [
            {
                'datatype': 'timestamp',
                'timezone': 'UTC',
                'unit': 'us',
            }
        ]

Once you've updated the plan, you can now apply the plan and import the data

import bauplan
import yaml

client = bauplan.Client()

# apply_table_creation_plan expects a YAML string, so dump the dict first.
client.apply_table_creation_plan(plan=yaml.safe_dump(plan))
client.import_data(
    table='your_table',
    search_uri='s3://your-bucket/*.parquet',
    branch='import_branch'
)