Handle schema conflicts during import
When importing a dataset, Bauplan scans your files to infer the table schema. If your files have inconsistent columns or types, the import will fail with a schema conflict.
This page shows how to resolve conflicts using either a YAML plan or the Python SDK.
When this happens, you might see an error like:
2025-05-26 14:12:14 WRN The produced plan contains conflicts
2025-05-26 14:12:14 ERR cannot automatically create table from search string. here are conflicts
To fix this, you will have to generate a custom import plan and manually resolve the conflicts.
Create a table plan
Instead of letting Bauplan infer the schema automatically, you'll
create a plan file and save it in a yml file, where
potential schema conflicts can be solved either manually or
programmatically.
To create an import plan use
bauplan table create-plan <your_table_name> --search-uri 's3://your/s3/bucket/*.parquet' --save-plan your_table_plan.yml
This generates a table_plan.yml file that includes:
- The inferred schema
- Detected conflicts
- Metadata about the data files
Understand the YAML structure
The plan has a schema_info section that looks like this:
schema_info:
conflicts:
- column_with_conflict: VendorID
reconcile_step: In the destination_datatype, please choose between (long or int)
detected_schemas:
- column_name: VendorID
src_datatypes:
- datatype: long
- datatype: int
dst_datatype:
- datatype: long
- datatype: int
For each conflicting column:
src_datatypeslists the types Bauplan found across your files.dst_datatypelists the possible target types - you must pick one. By declaring the value of this field you will tell the system to cast a specific data type for a column.- The
conflicts:field explains what to do by explicitly telling you thereconcile_step.
Cast types and resolve conflicts
To resolve the conflict, edit the dst_datatype list to
include only the type you want. For example:
From:
dst_datatype:
- datatype: long
- datatype: int
To:
dst_datatype:
- datatype: int
Once you've resolved all conflicts, your conflicts:
section should be empty:
conflicts: []
Apply the Plan and import data
Apply your edited schema plan:
bauplan table create-plan-apply --plan table_plan.yml
Then import the data as usual:
bauplan table import <your_table_name> --search-uri 's3://your/s3/bucket/*.parquet'
This manual step ensures you're making intentional decisions about your
schema - especially important when types like int,
long, or double could affect downstream
logic or validation.
Handle casting programmatically
If you'd rather manage schema conflicts without editing YAML by hand, you can handle the entire flow programmatically using the Bauplan Python SDK.
import bauplan
from typing import Dict, Any
import yaml
def generate_plan(
client: bauplan.Client,
table: str,
search_uri: str,
branch: str
) -> Dict[str, Any]:
"""
Generate a schema plan for importing data into Bauplan.
This plan will include inferred column types and any detected schema conflicts.
Returns:
A dictionary containing the table creation plan.
"""
response = client.plan_table_creation(
table=table,
search_uri=search_uri,
branch=branch
)
# Extract the actual plan dictionary from the response
plan = yaml.safe_load(response.plan)
return plan
client = bauplan.Client()
plan = generate_plan(client=client, table='your_table', search_uri='s3://your-bucket/*.parquet', branch='import_branch')
Overriding column types in a schema plan
Sometimes you know more about your data than Bauplan's automatic
inference - for example, when a column should be interpreted as a
timestamp rather than a string. You can override inferred types by
modifying the dst_datatype field in the schema plan.
This pattern is especially useful when:
- You want schema logic to live in code and version control
- You have known transformations (for example, parsing timestamp strings)
- You want repeatable, auditable data type enforcement
Here's a reusable helper to apply type overrides based on a type_map:
a Python dictionary where each key is a column name and the value is a
list of type definitions using the same structure Bauplan uses in
dst_datatype.
type_map = {
"EventTime": [
{
"datatype": "timestamp",
"unit": "us",
"parse_format": "%Y%m%d%H%M%S"
}
],
"UserID": [
{
"datatype": "int"
}
]
}
This allows you to override inferred types programmatically, in a format that's fully compatible with Bauplan schema plans.
def override_column_types(plan: dict, type_map: dict) -> dict:
"""
Update destination types for specific columns in a Bauplan schema plan.
Clears the conflict list after applying overrides.
"""
for col in plan['schema_info']['detected_schemas']:
if col['column_name'] in type_map:
col['dst_datatype'] = type_map[col['column_name']]
# Mark the plan as resolved
plan['schema_info']['conflicts'] = []
return plan
This function clears schema_info.conflicts after applying overrides.
This makes the procedure compatible with both clean plans (where you're
programmatically casting columns) and plans with conflicts (where
inferred types are ambiguous).
Because Bauplan plans include a list of detected columns and their
inferred types, you can modify dst_datatype to control how each column
will be interpreted during import - regardless of what was inferred
from the files:
plan['schema_info']['detected_schemas'] = [
{
"column_name": "EventTime",
"src_datatypes": [{"datatype": "string"}],
"dst_datatype": [{"datatype": "string"}]
},
...
]
Once you've updated the plan, you can now apply the plan and import the data
import bauplan
client = bauplan.Client()
client.apply_table_creation_plan(plan=plan)
client.import_data(
table='your_table',
search_uri='s3://your-bucket/*.parquet',
branch='import_branch'
)