Streamlit Integration
Streamlit is a Python framework for turning scripts into interactive web apps. It is commonly used for lightweight dashboards, prototypes, and internal tools that can be built and shared quickly.
Bauplan exposes a Python SDK that lets you interact with your lakehouse in code: running pipelines, managing branches, and querying tables. Integrating the two is straightforward: use Bauplan for data management and queries, and Streamlit for the user interface and the visualization.
Prerequisites
- Python 3.10 or above.
- A Bauplan API key (via environment or passed to the client).
- Streamlit, Bauplan, and Pandas installed.
- Access to a Bauplan table you can query (i.e.
taxi_fhvhv
).
To isolate your data environment, run it on a branch (not on main
). Branch names must be prefixed with your Bauplan username, for example alice.exploration_app
. Learn more about branches.
Quickstart: query Bauplan from a Streamlit app
This example reads from the taxi_fhvhv
table in the Bauplan Sandbox (data for 2023). The app:
- Authenticates with
bauplan.Client()
using your profile orBAUPLAN_API_KEY
. - Uses Bauplan SDK to executes SQL against a table in a data branch (
<username>.exploration_app
) for a user-selected date range. - Visualizes the resulting table as a Pandas dataframe and a matplotlib bar chart embedded in Streamlit.
The Sandbox dataset is shared and meant for examples. In a private environment, you can replace the table name (and catalog/namespace if needed) while keeping the same pattern: Client → branch → query → visualize.
# streamlit_app.py
import streamlit as st
import bauplan
import pandas as pd
from datetime import date
import matplotlib.pyplot as plt
# 1) Initialize Bauplan client
# The client will use your local profile or BAUPLAN_API_KEY for authentication.
# We'll also capture the username so we can scope a branch per user.
client = bauplan.Client()
username = client.info().user.username
# 2) Define a user-scoped branch
# Apps should never query `main` directly. By running against a personal branch,
# you keep exploration isolated from production.
exploration_branch = f"{username}.exploration_app"
# Sidebar for UI inputs
st.sidebar.header("Filters")
# 3) Date inputs
# The sandbox dataset contains fhvhv taxi rides for 2023.
# Default to the month of January 2023 for reproducibility.
start_date = st.sidebar.date_input("Start date", value=date(2023, 1, 1))
end_date = st.sidebar.date_input("End date", value=date(2023, 1, 31))
# Guard against invalid ranges
if start_date > end_date:
st.warning("Start date must be <= end date.")
st.stop()
# 4) Query function (cached)
# We wrap the SQL query in @st.cache_data so Streamlit reuses results
# instead of re-running the query every time the page refreshes.
@st.cache_data(show_spinner="Running query…")
def run_query(start: str, end: str, ref: str) -> pd.DataFrame:
query = f"""
SELECT
DATE_TRUNC('day', pickup_datetime) AS dt,
COUNT(*) AS rides
FROM taxi_fhvhv
WHERE pickup_datetime BETWEEN '{start}' AND '{end}'
GROUP BY dt
ORDER BY dt ASC
"""
return client.query(query, ref=ref).to_pandas()
# Execute the query against the sandbox table on our branch
df = run_query(start_date.isoformat(), end_date.isoformat(), exploration_branch)
st.subheader("Rides per day")
# Stop early if the query returned no rows
if df.empty:
st.info("No rows for the selected date range.")
st.stop()
# 5) Normalize data types
# Ensure datetime and numeric types are consistent for plotting.
df["dt"] = pd.to_datetime(df["dt"], errors="coerce")
df["rides"] = pd.to_numeric(df["rides"], errors="coerce").fillna(0).astype("int64")
# 6) Display the result table
st.dataframe(df, use_container_width=True)
# 7) Create a bar chart with matplotlib
# We render server-side with matplotlib, then embed the figure in Streamlit.
labels = df["dt"].dt.strftime("%Y-%m-%d")
values = df["rides"].values
fig, ax = plt.subplots(figsize=(max(6, len(df) * 0.25), 4))
ax.bar(labels, values)
ax.set_xlabel("Date")
ax.set_ylabel("Rides")
ax.set_title(f"Rides per day ({start_date} → {end_date})")
ax.tick_params(axis="x", labelrotation=45)
fig.tight_layout()
# 8) Render the chart in the app
st.pyplot(fig, clear_figure=True)
Run it
Create a virtual environment and install the dependencies:
python -m venv .venv
source .venv/bin/activate
pip install streamlit bauplan pandas
Then start the Streamlit app, run in your terminal:
streamlit run streamlit_app.py
Open the URL shown in your terminal (usually http://localhost:8501) to view the app in your browser.