What if 88% of your AI queries didn't need a massive data center, but could run directly on your laptop? In this episode, we dive into "Intelligence per Watt"—a new metric redefining how we measure AI efficiency. We explore how smaller, local models are rapidly catching up to frontier giants, potentially saving billions in energy costs and democratizing access to intelligence.Inspired by the work of Jon Saad-Falcon, Avanika Narayan, and their team at Stanford and Together AI, this episode was created using Google’s NotebookLM.Read the original paper here: https://arxiv.org/abs/2511.07885v1
--------
11:28
--------
11:28
When AI Learns From Its Own Context — Self-Improving Language Models
We're all trying to find the perfect "prompt," but what happens when our instructions to an AI get too complex? New research shows they can suddenly fail or "collapse," losing all their knowledge. In this episode, we explore "Agentic Context Engineering," a new framework that avoids this. Instead of a static prompt, it builds an "evolving playbook" that allows the AI to learn from every single task, failure, and success.Inspired by the work of Qizheng Zhang, Changran Hu, and colleagues, this episode was created using Google’s NotebookLM. Read the original paper here: https://arxiv.org/abs/2510.04618
--------
17:16
--------
17:16
Will Your Next Prompt Engineer Be an AI?
What if you could get the performance of a massive, 100-example prompt, but with 13 times fewer tokens?That’s the breakthrough promise of "instruction induction" —teaching an AI to be the prompt engineer.This week, we dive into PROMPT-MII , a new framework that essentially meta-learns how to write compact, high-performance instructions for LLMs. It’s a reinforcement learning approach that could make AI adaptation both cheaper and more effective.This episode explores the original research by Emily Xiao, Yixiao Zeng, Ada Chen, Chin-Jou Li, Amanda Bertsch, and Graham Neubig from Carnegie Mellon University.Read the full paper here for a deeperdive: https://arxiv.org/abs/2510.16932
--------
17:58
--------
17:58
The Vision Hack: How a Picture Solved AI's Biggest Memory Problem
The biggest bottleneck for AIs handling massive documents—the context window—just got a radical fix. DeepSeek AI's DeepSeek-GOCR uses a counterintuitive trick: it turns text into an image to compress it by up to 10 times without losing accuracy. That means your AI can suddenly read the equivalent of 20 million tokens (entire codebases or legal troves) efficiently! This episode dives into the elegant vision-based solution, the power of its Mixture of Experts architecture, and why some experts believe all AI input should become an image.Original Research: DeepSeek-GOCR is a breakthrough by the DeepSeek AI team.Content generated with the help of Google's NotebookLM.Link to the Original Research Paper: https://deepseek.ai/blog/deepseek-ocr-context-compression
--------
14:22
--------
14:22
Smarter Agents, Less Budget: Reinforcement Learning with Tree Search
Training AI agents using Reinforcement Learning (RL) to handle complex, multi-turn tasks is notoriously difficult.Traditional methods face two major hurdles: high computational costs (generating numerous interaction scenarios, or "rollouts," is expensive) and sparse supervision (rewards are only given at the very end of a task, making it hard for the agent to learn which specific steps were useful).In this episode, we explore "Tree Search for LLM Agent Reinforcement Learning," by researchers from Xiamen University, AMAP (Alibaba Group), and the Southern University of Science and Technology. They introduce a novel approach called Tree-GRPO (Tree-based Group Relative Policy Optimization) that fundamentally changes how agents explore possibilities.Tree-GRPO replaces inefficient "chain-based" sampling with a tree-search strategy. By allowing different trajectories to share common prefixes (the initial steps of an interaction), the method significantly increases the number of scenarios explored within the same budget. Crucially, the tree structure allows the system to derive step-by-step "process supervision signals," even when only the final outcome reward is available. The results demonstrate superior performance over traditional methods, with some models achieving better results using only a quarter of the training budget.📄 Paper: Tree Search for LLM Agent Reinforcement Learning https://arxiv.org/abs/2509.21240
AI Odyssey is your journey through the vast and evolving world of artificial intelligence. Powered by AI, this podcast breaks down both the foundational concepts and the cutting-edge developments in the field. Whether you're just starting to explore the role of AI in our world or you're a seasoned expert looking for deeper insights, AI Odyssey offers something for everyone. From AI ethics to machine learning intricacies, each episode is crafted to inspire curiosity and spark discussion on how artificial intelligence is shaping our future.