Database School

30 épisodes

Infinite, shareable volume storage with Hunter Leath, Archil CEO
15/01/2026 | 55 min
Hunter Leath, CEO of Archil, explains how they’re building a “universal storage engine” that sits between your apps and S3—making an S3 bucket behave like a fast, POSIX-compatible disk for containers, servers, and even Lambda. Along the way, we dig into how their SSD-backed clusters and custom protocol avoid the usual small-file pain and where this approach shines (and where it doesn’t).
Follow Hunter:
Twitter/X: https://twitter.com/jhleath
Archil Twitter/X: https://twitter.com/archildata
Archil: https://archil.com/
Follow Aaron:
Twitter/X: https://twitter.com/aarondfrancis
Database School: https://databaseschool.com
Database School YouTube Channel: https://www.youtube.com/@UCT3XN4RtcFhmrWl8tf_o49g (Subscribe today)
LinkedIn: https://www.linkedin.com/in/aarondfrancis
Website: https://aaronfrancis.com - find articles, podcasts, courses, and more.
Chapters:
00:00 - Intro: Archil Data and “S3 as a disk”
01:05 - Hunter’s background and the core pitch
02:32 - The real problem: state management (S3 vs block storage)
05:02 - SQLite on S3: what the stack looks like
07:13 - The missing layer: durable SSD-backed clusters
10:14 - Who uses this: unstructured data, CI/CD, Git, agents
12:15 - Small files + Git performance and avoiding S3 request explosion
16:22 - Why they built a new protocol (NFS vs Luster)
20:00 - What gets written to S3: real files in your bucket
22:29 - S3 limits, throttling, and the “keep it on SSD” escape hatch
25:32 - Multi-cloud + R2, and why regions/latency matter
32:10 - Pricing model: “pay only when data is active”
34:41 - Tradeoffs: random reads and ultra-low-latency metal
37:19 - Storage/compute separation and AI/agent-native workflows
43:21 - YC timeline + the marketing challenge of a “universal layer”
47:34 - Single-tenant clusters for enterprises and why it’s hard
50:27 - Where the company is now, hiring, and how to try it (disk.new)
Building search for AI systems with Chroma CTO Hammad Bashir
18/12/2025 | 1 h 6 min
Hammad Bashir, CTO of Chroma, joins the show to break down how modern vector search systems are actually built from local, embedded databases to massively distributed, object-storage-backed architectures. We dig into Chroma’s shared local-to-cloud API, log-structured storage on object stores, hybrid search, and why retrieval-augmented generation (RAG) isn’t going anywhere.
Follow Hammad:
Twitter/X: https://twitter.com/HammadTime
LinkedIn: https://www.linkedin.com/in/hbashir
Chroma: https://trychroma.com
Follow Aaron:
Twitter/X: https://twitter.com/aarondfrancis
Database School: https://databaseschool.com
Database School YouTube Channel: https://www.youtube.com/@UCT3XN4RtcFhmrWl8tf_o49g (Subscribe today)
LinkedIn: https://www.linkedin.com/in/aarondfrancis
Website: https://aaronfrancis.com - find articles, podcasts, courses, and more.
Chapters:
00:00 – Introduction From high-school ASICs to CTO of Chroma
01:04 – Hammad’s background and why vector search stuck
03:01 – Why Chroma has one API for local and distributed systems
05:37 – Local experimentation vs production AI workflows
08:03 – What “unprincipled data” means in machine learning
10:31 – From computer vision to retrieval for LLMs
13:00 – Exploratory data analysis and why looking at data still matters
16:38 – Promoting data from local to Chroma Cloud
19:26 – Why Chroma is built on object storage
20:27 – Write-ahead logs, batching, and durability
26:56 – Compaction, inverted indexes, and storage layout
29:26 – Strong consistency and reading from the log
34:12 – How queries are routed and executed
37:00 – Hybrid search: vectors, full-text, and metadata
41:03 – Chunking, embeddings, and retrieval boundaries
43:22 – Agentic search and letting models drive retrieval
45:01 – Is RAG dead? A grounded explanation
48:24 – Why context windows don’t replace search
56:20 – Context rot and why retrieval reduces confusion
01:00:19 – Faster models and the future of search stacks
01:02:25 – Who Chroma is for and when it’s a great fit
01:04:25 – Hiring, team culture, and where to follow Chroma
Scaling DuckDB in the cloud with MotherDuck CEO Jordan Tigani
11/12/2025 | 1 h 5 min
In this episode of Database School, Aaron Francis sits down with Jordan Tigani, co-founder and CEO of MotherDuck, to break down what DuckDB is, how MotherDuck hosts it in the cloud, and why analytics workloads are shifting toward embedded databases. They dig into Duck Lake, pricing models, scaling strategies, and what it really takes to build a modern cloud data warehouse.
Follow Jordan:
Twitter/X: https://twitter.com/jrdntgn
LinkedIn: https://www.linkedin.com/in/jordantigani
MotherDuck: https://motherduck.com
Follow Aaron:
Twitter/X: https://twitter.com/aarondfrancis
Database School: https://databaseschool.com
Database School YouTube Channel: https://www.youtube.com/@UCT3XN4RtcFhmrWl8tf_o49g (Subscribe today)
LinkedIn: https://www.linkedin.com/in/aarondfrancis
Website: https://aaronfrancis.com - find articles, podcasts, courses, and more.
Chapters:
00:00 - Introduction
01:44 - What DuckDB is and why embedded analytics matter
04:03 - How MotherDuck hosts DuckDB in the cloud
05:18 - Is MotherDuck like the “Turso for DuckDB”?
07:38 - Isolated analytics per user and scaling to zero
08:51 - The academic origins of DuckDB
10:00 - From SingleStore to founding MotherDuck
12:28 - Getting fired… and funded 12 days later
16:39 - Jordan’s background: Kernel dev, BigQuery, and Product
18:36 - Partnering with DuckDB Labs and avoiding a fork
20:52 - Why MotherDuck targets startups and the long tail
24:22 - Pricing lessons: why $25 was too cheap
28:11 - Ducklings, instance sizing, and compute scaling
34:16 - How MotherDuck separates compute and storage
37:09 - Inside the AWS architecture and differential storage
43:12 - Hybrid execution: joining local and cloud data
45:14 - Analytics vs warehouses vs operational databases
47:41 - Data lakes, Iceberg, and what Duck Lake actually is
53:22 - When Duck Lake makes more sense than DuckDB alone
56:09 - Who switches to MotherDuck and why
58:02 - PG DuckDB and offloading analytics from Postgres
1:00:49 - Who should use MotherDuck and why
1:03:39 - Hiring plans and where to follow Jordan
1:05:01 - Wrap-up
Just use Postgres with Denis Magda
04/12/2025 | 1 h 7 min
In this episode, Aaron talks with Dennis Magda, author of Just Use Postgres!, about the wide world of modern Postgres, from JSON and full-text search to generative AI, time-series storage, and even message queues. They explore when Postgres should be your go-to tool, when it shouldn’t, and why understanding its breadth helps developers build better systems.
Use the code DBSmagda to get 45% off Denis' new book Just Use Postgres!
Order Just Use Postgres!
Follow Denis:
Twitter/X: https://twitter.com/denismagda
LinkedIn: https://www.linkedin.com/in/dmagda

Follow Aaron:
Twitter/X: https://twitter.com/aarondfrancis
Database School: https://databaseschool.com
Database School YouTube Channel: https://www.youtube.com/@UCT3XN4RtcFhmrWl8tf_o49g (Subscribe today)
LinkedIn: https://www.linkedin.com/in/aarondfrancis
Website: https://aaronfrancis.com - find articles, podcasts, courses, and more.
Chapters:
00:00 – Welcome
01:28 – Dennis’ Background: Java, JVM, and Databases
03:20 – Bridging Application Development & Databases
04:05 – Moving Down the Stack: How Dennis Entered Databases
07:28 – Apache Ignite, Distributed Systems & the Path to Postgres
08:02 – Writing Just Use Postgres!: The Origin Story
10:26 – Why a Modern Postgres Book Was Needed
11:01 – The Spark That Led to the Book Proposal
13:06 – Developers Still Don’t Know What Postgres Can Do
15:40 – Connecting With Manning & Refining the Book Vision
16:38 – What Just Use Postgres! Covers
17:40 – The Book’s Core Thesis: The Breadth of Postgres
19:50 – Favorite Use Cases & Learning While Writing
20:30 – When to Use Postgres for Non-Relational Workloads
23:08 – Full Text Search in Postgres Explained
29:31 – When Not to Use Postgres (Pragmatism Over Fanaticism)
34:01 – Using Postgres as a Message Queue
42:09 – When Message Queues Outgrow Postgres
48:10 – Postgres for Generative AI (PGVector)
55:34 – Dennis’ 14-Month Writing Process
01:00:50 – Who the Book Is For
01:04:10 – Where to Follow Dennis & Closing Thoughts
Strictly typed SQL with Contra CTO, Gajus Kuizinas
20/11/2025 | 59 min
In this episode, Gajus Kuizinas, co-founder and CTO of Contra, joins Aaron to talk about building the engineering world you want to live in, from strict runtime-validated SQL with Slonik to creating high-ownership engineering cultures. They dive into developer experience, runtime assertions, SafeQL, and even “Loom-driven development,” a powerful review process that lets teams move fast without breaking things.
Follow Gajus:
Twitter/X: https://twitter.com/kuizinas
Slonk: https://github.com/gajus/slonik
Scaling article: https://gajus.medium.com/lessons-learned-scaling-postgresql-database-to-1-2bn-records-month-edc5449b3067
Follow Aaron:
Twitter/X: https://twitter.com/aarondfrancis
Database School: https://databaseschool.com
Database School YouTube Channel: https://www.youtube.com/@UCT3XN4RtcFhmrWl8tf_o49g (Subscribe today)
LinkedIn: https://www.linkedin.com/in/aarondfrancis
Website: https://aaronfrancis.com - find articles, podcasts, courses, and more.
Chapters:
00:00 – Introduction
01:03 – Meet Gajus and Contra
01:48 – What Contra does and how it’s different
05:34 – Why Slonik exists & early career origins
07:47 – The early Node.js era and frustrations with ORMs
09:50 – SQL vs abstractions and the case for raw SQL
10:35 – Template tags and the breakthrough idea
12:03 – Strictness, catching errors early & data shape guarantees
13:37 – Runtime type checking, Zod, and performance debates
16:02 – SafeQL and real-time schema linting
17:01 – Synthesizing Slonik’s philosophy
21:29 – Handling drift, static types vs reality
22:52 – Defining schemas per-query & why it matters
27:59 – Integrating runtime types with large test suites
31:00 – Scaling the team and performance tradeoffs
33:41 – Runtime validation cost vs developer productivity
35:21 – Real drift examples from payments & external APIs
38:21 – User roles, data shape differences & edge cases
39:51 – Integration test safety & catching issues pre-deploy
40:52 – Contra’s engineering culture
41:47 – Why traditional PR reviews don’t scale
43:22 – Introducing Loom-Driven Development
45:12 – How looms transformed the review process
52:38 – Using GetDX to measure engineering friction
53:07 – How the team uses AI (Claude, etc.)
56:26 – Closing thoughts on DX and engineering philosophy
58:05 – Contra needs Postgres experts
59:00 – Where to find Gajus