Import

This guide explains how to import data into bauplan’s data catalog as Iceberg tables.

General Requirements

  • Data must be in Parquet or CSV format

  • Data must be in S3 - local files cannot be imported directly

  • S3 bucket must have proper permissions configured: - For bauplan’s clients, this is a one-off operation done during onboarding when pairing your data with the system - For sandbox users, see sandbox requirements below

Sandbox Environment Requirements

When using the bauplan sandbox (beta environment), additional requirements apply:

Note

These additional requirements exist because the sandbox runs in an isolated EC2 instance that can only access public data. In a production environment, the EC2 instance would be privately linked to your bucket through IAM.

Import Process Overview

The import process in bauplan consists of two main steps:

  1. Create Table: Define the table schema based on your data files

  2. Import Data: Load the data into your newly created table

Step 1: Create Table

Create an empty table using the create command:

bauplan table create --name <YOUR_TABLE_NAME> --search-uri 's3://your-bucket/*.parquet'

This command will:

  • Analyze your Parquet/CSV files to determine the schema

  • Create an empty table with the appropriate structure

  • Not yet import any data

Step 2: Import Data

After creating the table, import the data:

bauplan table import --name <YOUR_TABLE_NAME> --search-uri 's3://your-bucket/*.parquet'

You can also perform imports programmatically using the bauplan Python SDK:

import bauplan

client = bauplan.Client()

# Create the table
client.create_table(
    table='my_table_name',
    search_uri='s3://path/to/my/files/*.parquet',
    branch='my_branch_name'
)

# Import the data
state = client.import_data(
    table='my_table_name',
    search_uri='s3://path/to/my/files/*.parquet',
    branch='my_branch_name'
)

# Check for errors during import
if state.error:
    print(f"Import failed: {state.error}")

Handling Schema Conflicts

If schema conflicts occur between files during import:

  1. Generate an import plan:

bauplan table create-plan --name <YOUR_TABLE_NAME> \
    --search-uri 's3://your-bucket/*.parquet' \
    --save-plan table_creation_plan.yml
  1. Review the table_creation_plan.yml file for conflicts (example)

  2. Modify the schema as needed

  3. Ensure the conflicts field is empty (conflicts: [])

  4. Apply the modified plan:

bauplan table create-plan-apply --plan table_creation_plan.yml

Note

  • For a complete import reference -including error handling and advanced import options-, please consult our reference documentation.