Import¶
This guide explains how to import data into bauplan’s data catalog as Iceberg tables.
General Requirements¶
Data must be in Parquet or CSV format
Data must be in S3 - local files cannot be imported directly
S3 bucket must have proper permissions configured: - For bauplan’s clients, this is a one-off operation done during onboarding when pairing your data with the system - For sandbox users, see sandbox requirements below
Sandbox Environment Requirements¶
When using the bauplan sandbox (beta environment), additional requirements apply:
S3 bucket must be publicly readable
S3 bucket must have
ListObject
permission enabled (here is an example of json S3 permissions).
Note
These additional requirements exist because the sandbox runs in an isolated EC2 instance that can only access public data. In a production environment, the EC2 instance would be privately linked to your bucket through IAM.
Import Process Overview¶
The import process in bauplan consists of two main steps:
Create Table: Define the table schema based on your data files
Import Data: Load the data into your newly created table
Step 1: Create Table¶
Create an empty table using the create
command:
bauplan table create --name <YOUR_TABLE_NAME> --search-uri 's3://your-bucket/*.parquet'
This command will:
Analyze your Parquet/CSV files to determine the schema
Create an empty table with the appropriate structure
Not yet import any data
Step 2: Import Data¶
After creating the table, import the data:
bauplan table import --name <YOUR_TABLE_NAME> --search-uri 's3://your-bucket/*.parquet'
You can also perform imports programmatically using the bauplan Python SDK:
import bauplan
client = bauplan.Client()
# Create the table
client.create_table(
table='my_table_name',
search_uri='s3://path/to/my/files/*.parquet',
branch='my_branch_name'
)
# Import the data
state = client.import_data(
table='my_table_name',
search_uri='s3://path/to/my/files/*.parquet',
branch='my_branch_name'
)
# Check for errors during import
if state.error:
print(f"Import failed: {state.error}")
Handling Schema Conflicts¶
If schema conflicts occur between files during import:
Generate an import plan:
bauplan table create-plan --name <YOUR_TABLE_NAME> \
--search-uri 's3://your-bucket/*.parquet' \
--save-plan table_creation_plan.yml
Review the
table_creation_plan.yml
file for conflicts (example)Modify the schema as needed
Ensure the
conflicts
field is empty (conflicts: []
)Apply the modified plan:
bauplan table create-plan-apply --plan table_creation_plan.yml
Note
For a complete import reference -including error handling and advanced import options-, please consult our reference documentation.