Import

This guide explains how to import data into bauplan's data catalog as Iceberg tables.

General Requirements

Data must be in Parquet or CSV format
Data must be in S3 - local files cannot be imported directly
S3 bucket must have proper permissions configured:
- For bauplan's clients, this is a one-off operation done during onboarding when pairing your data with the system
- For sandbox users, see sandbox requirements below

Sandbox Environment Requirements

When using the bauplan sandbox (beta environment), additional requirements apply:

S3 bucket must be publicly readable
S3 bucket must have ListObject permission enabled

note

These additional requirements exist because the sandbox runs in an isolated EC2 instance that can only access public data. In a production environment, the EC2 instance would be privately linked to your bucket through IAM.

Import Process Overview

The import process in bauplan consists of two main steps:

Create Table: Define the table schema based on your data files
Import Data: Load the data into your newly created table

Step 1: Create Table

Create an empty table using the create command:

bauplan table create <YOUR_TABLE_NAME> --search-uri 's3://your-bucket/*.parquet'

This command will:

Analyze your Parquet/CSV files to determine the schema
Create an empty table with the appropriate structure
Not yet import any data

Step 2: Import Data

After creating the table, import the data:

bauplan table import <YOUR_TABLE_NAME> --search-uri 's3://your-bucket/*.parquet'

You can also perform imports programmatically using the bauplan Python SDK:

import bauplan

client = bauplan.Client()

# Create the table
client.create_table(
    table='my_table_name',
    search_uri='s3://path/to/my/files/*.parquet',
    branch='my_branch_name'
)

# Import the data
state = client.import_data(
    table='my_table_name',
    search_uri='s3://path/to/my/files/*.parquet',
    branch='my_branch_name'
)

# Check for errors during import
if state.error:
    print(f"Import failed: {state.error}")

Handling Schema Conflicts

If schema conflicts occur between files during import:

Generate an import plan:

bauplan table create-plan <YOUR_TABLE_NAME> \
    --search-uri 's3://your-bucket/*.parquet' \
    --save-plan table_creation_plan.yml

Review the table_creation_plan.yml file for conflicts
Modify the schema as needed
Ensure the conflicts field is empty (conflicts: [])
Apply the modified plan:

bauplan table create-plan-apply --plan table_creation_plan.yml

note

For a complete import reference - including error handling and advanced import options -, please consult the reference documentation.

General Requirements​

Sandbox Environment Requirements​

Import Process Overview​

Step 1: Create Table​

Step 2: Import Data​

Handling Schema Conflicts​