Never Trust a Monkey: The Chasm, the Craft, and the Chain of AI-Assisted Code
Abstract
We’re in the middle of another leap in abstraction. Like compilers, cloud, and containers before it, AI coding agents arrived with hype, fear, and broken assumptions. We gave the monkeys GPUs. Sometimes they output Shakespeare. Other times, they confidently ship code that compiles, passes tests, and still does the wrong thing. The problem is the gap between what we mean and what actually runs. This talk delivers a practical framework for working with AI agents, built on three ideas: the Chasm between human intent and the code that actually runs, the Context that replaces guessing with grounding (APIs, conventions, constraints, domain rules), and the Chain that keeps intent alive through a structured flow from prompt to spec to test to code, where every step produces a verifiable artifact validated externally. Through interactive demonstrations and honest war stories, we’ll trace how intent gets lost and build the guardrails that prevent it. You’ll leave with a working model for AI-assisted development where humans own the meaning and machines do the typing. Trust your context. Trust your guardrails. Never trust a monkey.
Resources
Research — AI code quality
- CodeRabbit: State of AI vs Human Code Generation (Dec 2025) — 1.7× more issues in AI PRs; +75% logic errors; 8× performance problems (470 real PRs)
- Sonar: State of Code Developer Survey (January 2026) — 96% don’t fully trust AI output; only 48% always verify; AI = 42% of committed code
- Stack Overflow 2025 Developer Survey — AI — 84% use AI, 46% actively distrust accuracy; 66% frustrated by “almost-right” AI code
- Stack Overflow 2026 follow-up: Mind the Gap — the AI trust gap widens as adoption rises
- Qodo: 2025 State of AI Code Quality — 60% say AI misses critical context; 1-in-5 suggestions contain factual errors
- Sonar: Assessing the Quality and Security of AI-Generated Code (arXiv 2508.14727) — no correlation between Pass@1 test performance and overall code quality
- Apiiro: Faster code, greater risks — 322% more privilege-escalation paths in AI code
- METR: Task-Completion Time Horizons of Frontier AI Models — the “Moore’s Law for AI agents” chart
The Intent Integrity Chain
- Intent Integrity Kit (IIKit) — GitHub — the framework, open-source
- Tessl.io — make agents work in real codebases
Spec-driven development
- Martin Fowler: Understanding Spec-Driven Development
- ThoughtWorks: Spec-Driven Development
- GitHub Spec-Kit
- OpenSpec
- Amazon Kiro — Spec-Driven AI IDE
- Andrej Karpathy on Spec-Driven Development
Foundations
- Curse of Knowledge (Wikipedia)
- Test-Driven Development (Wikipedia)
- Martin Fowler: Given-When-Then
- Behavior-Driven Development (Wikipedia)
- Gherkin Specs (Cucumber)
Baruch’s books
One more thing
- AINative DevCon London — June 1–2, 2026. Use code BARUCH50 for 50% off