marimo Integration
marimo is a reactive Python notebook and app framework for building interactive tools from pure Python. You can edit as a notebook and serve the same file as a read-only web app with marimo run.
Bauplan exposes a Python SDK that lets you interact with your lakehouse in code: running pipelines, managing branches, and querying tables. Integrating the two is straightforward: use Bauplan for data management and queries, and marimo for the user interface and the visualization. marimo apps are standard Python files composed of reactive cells defined with @app.cell.
Prerequisites
- Python 3.10 or above.
- A Bauplan API key (via environment or passed to the client).
- marimo, Bauplan, Pandas, and Matplotlib installed.
- Access to a Bauplan table you can query (i.e.
taxi_fhvhv
).
To isolate your data environment, run it on a branch (not on main). Branch names must be prefixed with your Bauplan username, for example alice.exploration_app
.
Quickstart: query Bauplan from a marimo app
This example reads from the taxi_fhvhv
table in the Bauplan Sandbox (data for 2023). The app:
- Authenticates with
bauplan.Client()
using your profile orBAUPLAN_API_KEY
. - Uses Bauplan SDK to execute SQL against a table in a data branch (
<username>.exploration_app
) for a user-selected date range. - Visualizes the resulting table with marimo’s UI elements and a matplotlib bar chart, embedded in the app. Use
mo.ui.date_range
for inputs and mo.ui.table for interactive tabular display; render Matplotlib withmo.mpl.interactive
.
The Sandbox dataset is shared and meant for examples. In a private environment, you can replace the table name (and catalog/namespace if needed) while keeping the same pattern: Client → branch → query → visualize.
# marimo_app.py
import marimo
app = marimo.App()
# 1) Imports
# - marimo as mo for UI/layout
# - bauplan for querying your lakehouse
# - pandas for dataframe ops
# - matplotlib for plotting
@app.cell
def _():
import marimo as mo
import bauplan
import pandas as pd
from datetime import date
import matplotlib.pyplot as plt
return mo, bauplan, pd, date, plt
# 2) Initialize Bauplan client and define a user-scoped branch
# Apps should never query `main` directly. Run against a personal branch
# to keep exploration isolated from production.
@app.cell
def _(bauplan):
client = bauplan.Client()
username = client.info().user.username
exploration_branch = f"{username}.exploration_app"
return client, username, exploration_branch
# Sidebar for UI inputs
# 3) Date inputs
# The sandbox dataset contains fhvhv taxi rides for 2023.
# Default to the month of January 2023 for reproducibility.
@app.cell
def _(mo, date):
date_range = mo.ui.date_range(
start=date(2023, 1, 1),
stop=date(2023, 12, 31),
value=(date(2023, 1, 1), date(2023, 1, 31)),
)
# Render a sidebar with controls
mo.sidebar(
mo.vstack([
mo.md("### Filters"),
date_range
])
)
return date_range
# 4) Query function (cached)
# Wrap the SQL query in @mo.cache so expensive results are reused
# when inputs do not change.
@app.cell
def _(mo, client):
@mo.cache # in-memory caching
def run_query(start: str, end: str, ref: str):
query = f"""
SELECT
DATE_TRUNC('day', pickup_datetime) AS dt,
COUNT(*) AS rides
FROM taxi_fhvhv
WHERE pickup_datetime BETWEEN '{start}' AND '{end}'
GROUP BY dt
ORDER BY dt ASC
"""
return client.query(query, ref=ref).to_pandas()
return run_query
# Execute the query against the sandbox table on our branch
@app.cell
def _(date_range, exploration_branch, run_query):
start_date, end_date = date_range.value
df = run_query(start_date.isoformat(), end_date.isoformat(), exploration_branch)
return df, start_date, end_date
# 5) Normalize data types
# Ensure datetime and numeric types are consistent for plotting.
@app.cell
def _(pd, df):
df_norm = df.copy()
if not df_norm.empty:
df_norm["dt"] = pd.to_datetime(df_norm["dt"], errors="coerce")
df_norm["rides"] = (
pd.to_numeric(df_norm["rides"], errors="coerce")
.fillna(0)
.astype("int64")
)
return df_norm
# 6) Display the result table
@app.cell
def _(mo, df_norm):
header = mo.md("## Rides per day")
body = mo.md("No rows for the selected date range.") if df_norm.empty else mo.ui.table(df_norm)
mo.vstack([header, body])
return
# 7) Create a bar chart with matplotlib and embed it in Marimo
@app.cell
def _(mo, df_norm, start_date, end_date, plt):
out = mo.md("No chart: no rows.") if df_norm.empty else None
if not df_norm.empty:
labels = df_norm["dt"].dt.strftime("%Y-%m-%d")
values = df_norm["rides"].values
fig, ax = plt.subplots(figsize=(max(6, len(df_norm) * 0.25), 4))
ax.bar(labels, values)
ax.set_xlabel("Date"); ax.set_ylabel("Rides")
ax.set_title(f"Rides per day ({start_date} \u2192 {end_date})")
ax.tick_params(axis="x", labelrotation=45)
fig.tight_layout()
out = mo.mpl.interactive(fig)
out # last expression renders
if __name__ == "__main__":
app.run()
Run it
Create a virtual environment and install the dependencies:
python -m venv .venv
source .venv/bin/activate
pip install marimo bauplan pandas matplotlib
Then start the marimo app from your terminal:
marimo run marimo_app.py
Open the URL shown in your terminal (usually http://localhost:2718) to view the app in your browser.