Quick start
These are the only things you need to do:
Clone this repo.
Install Bauplan.
pip install bauplan --upgrade
Set up your username and authentication key (please, make sure you have both before starting)
bauplan --profile default config set api_key "your_bauplan_key"
Alternatively, you can manually create a ~/.bauplan/config.yml
file
with the following structure:
profiles:
default:
active_branch: main
env: prod
api_key: <YOUR_API_KEY_HERE>
project_dir: .
Explore the data catalog
We pre-loaded some data for you so let’s start by looking at it in the data catalog using the CLI.
This command will show you the Iceberg tables in the main branch in the data lake.
bauplan branch get main
We can then explore the schema of the tables in the data catalog. The
important tables for this tutorial
are taxi_fhvhv
and taxi_zones
(here you can have a look of the datasets). Here bauplan
corresponds to the namespace.
bauplan table get bauplan.taxi_fhvhv
bauplan table get bauplan.taxi_zones
Run a query
You can query the data directly in the data lake using the CLI:
bauplan query "SELECT max(tips) FROM bauplan.taxi_fhvhv WHERE pickup_datetime = '2023-01-01T00:00:00-05:00'"
The results will be visualized in your terminal (we will show how to use different interfaces than the CLI later in the tutorial ).
Run a pipeline
Go into the folder quick_start
, and run our demo pipeline. You
should see the terminal updating in real-time as the code is executed.
cd quick-start
bauplan run
👏👏 Congratulations, you just ran your first Bauplan pipeline! In this example, you ran a very simple pipeline composed of two Python functions:
- What happens when we do
bauplan.run
? Bauplan parsed the code in the file
models.py
,built a logical plan based on the implicit dependencies between the nodes,
and ran the nodes of the pipeline as isolated functions in the cloud, while streaming back in real time.