

Genie, Agent Bricks, or Build Your Own on Databricks Lakebase
Data Engineering leaders deciding whether to adopt Genie, build a custom text-to-SQL stack, or wire something in between.
90 minutes. Live build, not slides. Real workspace, real data, a real LLM call across an HTTP boundary you control.
What you will see
I'll go from an empty Databricks workspace to a working text-to-SQL agent that:
Joins live OLTP rows in Lakebase (managed Postgres) with pre-aggregated gold tables in Unity Catalog Delta — through Lakehouse Federation, in a single query.
Generates SQL via a pluggable LLM endpoint — Databricks Model Serving, OpenAI-compatible APIs, or a self-hosted vLLM on a neo-cloud GPU — switched with one environment variable.
Validates every SQL string before execution with a SELECT-only safety guardrail that catches the Databricks-specific destructive ops generic validators miss (
OPTIMIZE,VACUUM,ZORDER,COPY).Is auditable end-to-end: one question = one LLM call, one SQL statement, one execution. No autonomous loops, no surprise bills.
What you will leave with
A decision framework for db-agent vs Genie vs Agent Bricks for your specific use case — including when not to build.
The companion open-source db-agent repo (presented at AAAI-25, ships a Databricks Apps deployment variant) and a quick-lab repo with a step-by-step build.
A reference architecture diagram and the actual code — pipeline orchestrator is ~60 lines of Python, safety validator is ~30.
Specific gotchas that cost me a half-day each: federation database options, Lakebase token rotation, Streamlit/Apps reverse-proxy traps, context-window blowouts on real catalogs.
Who is this for
Heads of Data, Data Engineering Managers, Staff and Principal Data Engineers.
Teams already on Databricks (or evaluating) who are being asked: "Can we put an AI agent on top of this?"
Anyone making a build-vs-buy call between Genie, Agent Bricks, and a custom text-to-SQL stack — and wants to make it with their eyes open.
This is a technical session. We'll read code. Bring your senior engineers.
Agenda
The architecture in one slide (5 min)
Lakebase + Unity Catalog + Lakehouse Federation — why both data planes, and what breaks (15 min)
The agent pipeline — schema → prompt → LLM → validate → execute (20 min)
The SQL safety guardrail — what generic SELECT-only validators miss on Databricks (10 min)
The pluggable LLM layer — live swap from a hosted API to a self-hosted vLLM on a neo-cloud GPU (15 min)
db-agent vs Genie vs Agent Bricks — when to use which, and why (10 min)
Q&A (15 min)
About the Speaker
Chandan Kumar — founder of BeCloudReady, organizer of the TorontoAI community (10K+ members), and a Databricks Partner. Maintainer of the open-source db-agent text-to-SQL agent, presented at AAAI-25. Runs the Databricks Lakehouse Bootcamp and works with engineering teams on getting AI agents into production against real data