The Information Bottleneck

28 épisodes

EP28: How to Control a Stochastic Agent with Stefano Soatto (VP AWS/ Pro. UCLA)
06/03/2026 | 1 h 2 min
Stefano Soatto, VP for AI at AWS and Professor at UCLA, the person responsible for agentic AI at AWS, joins us to explain why building reliable AI agents is fundamentally a control theory problem.
Stefano sees LLMs as stochastic dynamical systems that need to be controlled, not just prompted. He introduces "strands coding," a new framework AWS is building that sits between vibe coding and spec coding, you write a skeleton with AI functions constrained by pre- and post-conditions, verifying intent before a single line of code is generated. The surprising part: even as AI coding adoption goes up, developer trust in the output is going down.
We go deep into the philosophy of models and the world. Stefano argues that the dichotomy between "language models" and "world models" doesn't really exist, where a reasoning engine trained on rich enough data is a world model. He walks us through why naive realism is indefensible, how reverse diffusion was originally intended to show that models can't be identical to reality, and why that matters now.
We also discuss three types of information, Shannon, algorithmic, and conceptual, and why algorithmic information is the one that actually matters to agents. Synthetic data doesn't add Shannon information, but it adds algorithmic information, which is why it works. Intelligence isn't about scaling to Solomonov's universal induction; it's about learning to solve new problems fast.

Takeaways:
Vibe coding is local feedback control with high cognitive load; spec coding is open-loop global control with silent failures, neither scales well alone.
Trust in AI-generated code is declining even as adoption rises.
The distinction between next-token prediction and world model is mostly nomenclature - reasoning engines operating on multimodal data are world models.
Algorithmic information, not Shannon information, is what matters in the agentic setting.
Intelligence isn't minimizing inference uncertainty - it's minimizing time to solve unforeseen tasks.
The intent gap between user and model cannot be fully automated or delegated.

Timeline
(00:13) Introduction and guest welcome
(01:12) How the agentic era changed machine learning
(06:11) Vibe coding one year later
(07:23) Vibe vs. spec vs. strands coding
(14:30) Why English is not a programming language
(16:36) Constrained generation and agent choreography
(20:44) Diffusion models vs. autoregressive models (25:59) The platonic representation hypothesis and naive realism
(31:14) Synthetic data and the information bottleneck
(36:22) Three types of information: Shannon, algorithmic, conceptual
(38:47) Scaling laws and Solomonov induction
(42:14) World models and the Goethian vs. Marrian approach
(49:00) Encoding vs. generation and JEPA-style training
(55:50) Are language models already world models?
(59:13) Closing thoughts on trust, education, and responsibility.

Music:
"Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.
"Palms Down" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0. Changes: trimmed
About
The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.
EP27: Medical Foundation Models - with Tanishq Abraham (Sophont.AI)
02/03/2026 | 1 h 25 min
Tanishq Abraham, CEO and co-founder of Sophont.ai, joins us to talk about building foundation models specifically for medicine.
Sophont is trying to be something like an OpenAI or Anthropic but for healthcare - training models across pathology, neuroimaging, and clinical text, to eventually fuse them into one multimodal system. The surprising part: their pathology model trained on 12,000 public slides performs on par with models trained on millions of private ones. Data quality beats data quantity.
We talk about what actually excites Tanishq, which is not replacing doctors, but finding things doctors can't see. AI predicting gene mutations from a tissue slide, or cardiovascular risk from an eye scan.
We also talk about the regulation and how the picture is less scary than people assume. Text-based clinical decision support can ship without FDA approval. Pharma partnerships offer near-term impact. The five-to-ten-year timeline people fear is really about drug discovery, not all of medical AI.

Takeaways:
The real promise of medical AI is finding hidden signals in existing data, not just automating doctors
Small, curated public datasets can rival massive private ones
Multimodal fusion is the goal, but you need strong individual encoders first
AI research itself might get automated sooner than biology or chemistry
FDA regulation has more flexibility than most people think

Timeline
(00:12) Introduction and guest welcome
(02:32) Anthropic's ad about ChatGPT ads
(07:26) XAI merging into SpaceX
(13:32) Vibe coding one year later
(17:00) Claude Code and agentic workflows
(21:52) Can AI automate AI research?
(26:57) What is medical AI
(31:06) Sofont as a frontier medical AI lab
(33:52) Public vs. private data - 12K slides vs. millions
(36:43) Domain expertise vs. scaling
(41:54) Cancer, diabetes, and personal stakes
(47:52) Classification vs. prediction in medicine
(50:36) When doctors disagree
(54:43) Quackery and AI
(57:15) Uncertainty in medical AI
(1:03:11) Will AI replace doctors?
(1:07:24) Self-supervised learning on sleep data
(1:10:10) Aligning modalities
(1:13:17) FDA regulation
(1:22:28) Closing

Music:
"Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.
"Palms Down" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.
Changes: trimmed

About
The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.
EP26: Measuring Intelligence in the Wild - Arena and the Future of AI Evaluation
24/02/2026 | 44 min
Anastasios Angelopoulos, Co-Founder and CEO of Arena AI (formerly LMArena), joins us to talk about why static benchmarks are failing, how human preference data actually works under the hood, and what it takes to be the "gold standard" of AI evaluation.
Anastasios sits at a fascinating intersection - a theoretical statistician running the platform that every major lab watches when they release a model. We talk about the messiness of AI-generated code slop (yes, he hides Claude's commits too), then dig into the statistical machinery that powers Arena's leaderboards and why getting evaluation right is harder than most people think.
We explore why style control is both necessary and philosophically tricky, where you can regress away markdown headers and response length, but separating style from substance is a genuinely unsolved causal inference problem. We also get into why users are surprisingly good judges of model quality, how Arena serves as a pre-release testing ground for labs shipping stealth models under codenames, and whether the fragmentation of the AI market (Anthropic going enterprise, OpenAI going consumer, everyone going multimodal) is actually a feature, not a bug. Plus, we discuss the role of rigorous statistics in the age of "just run it again," why structured decoding can hurt model performance, and what Arena's 2026 roadmap looks like.

Timeline:
(00:12) Introduction and Anastasios's Background
(00:55) What Arena Does and Why Static Benchmarks Aren't Enough
(02:26) Coverage of Use Cases - Is There Enough?
(04:22) Style Control and the Bradley-Terry Methodology
(08:35) Can You Actually Separate Style from Substance?
(10:24) Measuring Slop - And the Anti-Slop Paper Plug
(11:52) Can Users Judge Factual Correctness?
(13:31) Tool Use and Agentic Evaluation on Arena
(14:14) Intermediate Feedback Signals Beyond Final Preference
(15:30) Tool Calling Accuracy and Code Arena
(17:42) AI-Generated Code Slop and Hiding Claude's Commits
(19:49) Do We Need Separate Code Streams for Humans and LLMs?
(20:01) RL Flywheels and Arena's Preference Data
(21:16) Focus as a Startup - Being the Evaluation Company
(22:16) Structured vs. Unconstrained Generation
(25:00) The Role of Rigorous Statistics in the Age of AI
(29:23) LLM Sampling Parameters and Evaluation Complexity
(30:56) Model Versioning and the Frequentist Approach to Fairness
(32:12) Quantization and Its Effects on Model Quality
(33:10) Pre-Release Testing and Stealth Models (34:23) Transparency - What to Share with the Public vs. Labs
(36:27) When Winning Models Don't Get Released
(36:59) Why Users Keep Coming Back to Arena
(38:19) Market Fragmentation and Arena's Future Value
(39:37) Custom Evaluation Frameworks for Specific Users
(40:03) Arena's 2026 Roadmap - Science, Methodology, and New Paradigms
(42:15) The Economics of Free Inference
(43:13) Hiring and Closing Thoughts

Music:
"Kid Kodi" — Blue Dot Sessions — via Free Music Archive — CC BY-NC 4.0.
"Palms Down" — Blue Dot Sessions — via Free Music Archive — CC BY-NC 4.0.
Changes: trimmed

About:
The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.
EP25: Personalization, Data, and the Chaos of Fine-Tuning with Fred Sala (UW-Madison / Snorkel AI)
17/02/2026 | 1 h 15 min
Fred Sala, Assistant Professor at UW-Madison and Chief Scientist at Snorkel AI, joins us to talk about why personalization might be the next frontier for LLMs, why data still matters more than architecture, and how weak supervision refuses to die.
Fred sits at a rare intersection, building the theory of data-centric AI in academia while shipping it to enterprise clients at Snorkel. We talk about the chaos of OpenClaw (the personal AI assistant that's getting people hacked the old-fashioned way, via open ports), then focus on one of the most important questions: how do you make a model truly yours?
We dig into why prompting your preferences doesn't scale, why even LoRA might be too expensive for per-user personalization, and why activation steering methods like REFT could be the sweet spot. We also explore self-distillation for continual learning, the unsolved problem of building realistic personas for evaluation, and Fred's take on the data vs. architecture debate (spoiler: data is still undervalued). Plus, we discuss why the internet's "Ouroboros effect" might not doom pre-training as much as people fear, and what happens when models become smarter than the humans who generate their training data.

Takeaways:
Personalization requires ultra-efficient methods - even one LoRA per user is probably too expensive. Activation steering is the promising middle ground.
The "pink elephant problem" makes prompt-based personalization fundamentally limited - telling a model what not to do often makes it do it more.
Self-distillation can enable on-policy continual learning without expensive RL reward functions, dramatically reducing catastrophic forgetting.
Data is still undervalued relative to architecture and compute, especially high-quality post-training data, which is actually improving, not getting worse.
Weak supervision principles are alive and well inside modern LLM data pipelines, even if people don't call it that anymore.

Timeline:
(00:13) Introduction and Fred's Background
(00:39) OpenClaw — The Personal AI Assistant Taking Over Macs
(03:43) Agent Security Risks and the Privacy Problem
(05:13) Cloud Code, Permissions, and Living Dangerously
(07:47) AI Social Media and Agents Talking to Each Other
(08:56) AI Persuasion and Competitive Debate
(09:51) Self-Distillation for Continual Learning
(12:43) What Does Continual Learning Actually Mean?
(14:12) Updating Weights on the Fly — A Grand Challenge
(15:09) The Personalization Problem — Motivation and Use Cases
(17:41) The Pink Elephant Problem with Prompt-Based Personalization
(19:58) Taxonomy of Personalization — Preferences vs. Tone vs. Style
(21:31) Activation Steering, REFT, and Parameter-Efficient Fine-Tuning
(27:00) Evaluating Personalization — Benchmarks and Personas
(31:14) Unlearning and Un-Personalization
(31:51) Cultural Alignment as Group-Level Personalization
(41:00) Can LLM Personas Replace Surveys and Polling?
(44:32) Is Continued Pre-Training Still Relevant?
(46:28) Data vs. Architecture — What Matters More?
(52:25) Multi-Epoch Training — Is It Over?
(54:53) What Makes Good Data? Matching Real-World Usage
(59:23) Decomposing Uncertainty for Better Data Selection
(1:01:52) Mapping Human Difficulty to Model Difficulty
(1:04:49) Scaling Small Ideas — From Academic Proof to Frontier Models
(1:12:01) What Happens When Models Surpass Human Training Data?
(1:15:24) Closing Thoughts
Music:
"Kid Kodi" — Blue Dot Sessions — via Free Music Archive — CC BY-NC 4.0.
"Palms Down" — Blue Dot Sessions — via Free Music Archive — CC BY-NC 4.0.
Changes: trimmed
EP24: Can AI Learn to Think About Money? - with Bayan Bruss (Capital One)
08/02/2026 | 1 h 31 min
Bayan Bruss, VP of Applied AI at Capital One, joins us to talk about building AI systems that can make autonomous financial decisions, and why money might be the hardest problem in machine learning.
Bayan leads Capital One's AI Foundations team, where they're working toward a destination most people don't associate with banking: getting AI systems to perceive financial ecosystems, form beliefs about the future, and take actions based on those beliefs. It's a framework that sounds simple until you realize you're asking a model to predict whether someone will pay back a loan over 30 years while the world changes around them.
We get into why LLMs are a bad fit for ingesting 5,000 credit card transactions, why synthetic data works surprisingly well for time series, and the tension between end-to-end learning and regulatory requirements that demand you know exactly what your model learned. We also discuss reasoning in language vs. in latent space - if you wouldn't trust a self-driving car that translated images to words before deciding to turn, should you trust a financial system that does all its reasoning in token space?
Takeaways:
Money is a behavioral science problem - AI in finance requires understanding people, not just numbers.
Foundation models pre-trained on web text don't outperform purpose-built models for financial tasks. You're better off building a standalone encoder for financial data.
Synthetic data works surprisingly well for time series - possibly because real-world time series lives on a simpler manifold than we assume.
Explainability in ML is fundamentally unsatisfying because people want causality from non-causal models.
Financial AI needs world models that can imagine alternative futures, not just fit historical data.

Timeline:
(00:24) Introduction and Bayan's Background
(00:42) Claude Code, Vibe Coding - Hype or AGI?
(05:59) The Future of Software Engineering and Abstraction
(11:20) Abstraction Layers and Karpathy's Take
(13:54) Hamming, Kuhn, and Scientific Revolutions in AI
(19:24) Stack Overflow's Decline and Proof of Humanity
(23:07) Why We Still Trust Humans Over LLMs
(30:45) Deep Dive: AI in Banking and Consumer Finance
(34:17) Are Markets Efficient? Behavioral Economics vs. Classical Views
(37:14) The Components of a Financial Decision: Perception, Belief, Action
(42:15) Protected Variables, Proxy Features, and Fairness in Lending
(45:05) Explainability: Roller Skating on Marbles
(47:55) Sparse Autoencoders, Interpretability, and Turtles All the Way Down
(51:57) Foundation Models for Finance — Web Text vs. Purpose-Built
(53:09) Time Series, Synthetic Data, and TabPFN
(59:44) Feeding Tabular Data to VLMs - Graphs Beat Raw Numbers
(1:03:35) Reasoning in Language vs. Latent Space
(1:08:24) Is Language the Optimal Representation? Chinese Compression and Information Density
(1:13:37) Personalization and Predicting Human Behavior
(1:21:36) World Models, Uncertainty, and Professional Worrying
(1:24:07) Prediction Markets and Insider Betting
(1:26:33) Can LLMs Predict Stocks?
(1:29:11) Multi-Agent Systems for Financial Decisions

Music:
"Kid Kodi" — Blue Dot Sessions — via Free Music Archive — CC BY-NC 4.0.
"Palms Down" — Blue Dot Sessions — via Free Music Archive — CC BY-NC 4.0. Changes: trimmed

About: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.

Plus de podcasts Sciences

Podcasts tendance de Sciences

À propos de The Information Bottleneck

Two AI Researchers - Ravid Shwartz Ziv, and Allen Roush, discuss the latest trends, news, and research within Generative AI, LLMs, GPUs, and Cloud Systems.

Site web du podcast

Sciences Technologies