Examples

Bauplan performs many tasks typically handled by dedicated data platform teams—like managing infrastructure, versioning data, and orchestrating complex workflows. By providing these capabilities out of the box, Bauplan unlocks simplicity, reproducibility, and speed for scenarios such as deploying ML models, building feature stores, running data quality checks, serving real-time dashboards, and managing complex data transformation pipelines.

dataproduct

Serverless Data Product

Serverless data product with built-in quality checks using Lambda and Bauplan.

DataProd Lamba
RAG

RAG system with Pinecone

Build a RAG system with Pinecone and OpenAI over StackOverflow data.

Pinecone OpenAI
medallion

Medallion Architecture + WAP Pattern

End-to-end data engineering repo using Mage & the medallion architecture.

Medallion Mage Polars
LLM to Tabular

From unstructured to structured data with LLMs

Convert PDFs into structured, analyzable tables using LLMs.

OpenAI PDF Processing Unstructured to Structured
Playlist recommendations with MongoDB

Playlist recommendations with MongoDB

Embedding-based recommender system for music playlists.

MongoDB Vector Search Recs
Iceberg Lakehouse Example

Iceberg Lakehouse Pipeline

Orchestrated WAP pattern for ingesting parquet files to Iceberg tables.

Prefect Pandas Iceberg
PDF analysis with bauplan and OpenAI

PDF analysis with bauplan and OpenAI

Analyze PDFs using Bauplan for data preparation and OpenAI’s GPT for text analysis

PDF Processing OpenAI
ML Pipeline Example

ML Model Training and Deployment Pipeline

End-to-end ML pipeline for predicting taxi trip tips.

Scikit-Learn Pandas Notebooks Streamlit
Entity Matching Example

Entity Matching with OpenAI

Product matching across e-commerce catalogs using LLMs.

OpenAI Streamlit Pandas DuckDB
Data Quality Example

Data Quality and Expectations

Implement data quality checks using expectations.

PyArrow Pandas DuckDB
Real-time Analytics Example

Near Real-time Analytics

Build near real-time analytics pipeline with WAP pattern and metrics visualization.

Prefect Streamlit DuckDB
Data Dashboard Example

Interactive Data Dashboard

Build an interactive dashboard to visualize taxi pickup locations in NYC.

Streamlit Pandas