Prompt-Test-Code: A New Productivity Boost for Developers

PromptTest-Code: A New Productivity Boost for Developers

We have a Trust problem

AI generated code is not great

On top of that, it is dangerou s

Asking it to fix it is as reliable as the rest of it

Baruch Sadogursky - @jbaru ch × Head of DevRel at TuxCare (want safer legacy dependencies? Talk to me)

shownotes × × × × speaking.jbaru.ch Slides Video All the links!

Software design documents

Software design documents × Write-once × Read-maybe-once

So why don’t they work? × Human (mis)understanding (a.k.a. curse of knowledge) × Vague responsibility boundaries

We have a Trust problem (not only with AI)

Next thing you know: It’s a vei n diagram Software I like Software I know really well

But hey, we do have working s oftware sometimes × Good intentions × Professionalism × Tests and QA × End result observation × Common-ish context

Until gen AI changed the game × × × × × Good intentions Professionalism Tests and QA End result observation Common-ish context

Intent-to-prompt chasm User How do I use Spring Boot? You can’t really boot a season of a year LLM

But it gets worse Prompt Temperature Slightly different results

The LGTM syndrome × We’re busy with other stuff × We don’t like to read other people’s code × Obviously, we don’t like to read generated code × Result: LGTM

Idea! What if we code in the intent and always verify against it?

Intent integrity chain

Tests are guardrails for mon keys! × Write the tests first × Let LLM implement TDD? the code × If the tests pass, the intent is captured correctly!

We still have the chasm… Software Definition Documents ? ? Devs write tests LLM ne ge rates code to spec

And developers hate TDD… × Developers are biased for action × We already know how to solve the problem

What if it (almost) won’t look × × × × like code? How about writing specs instead? BDD? Look almost like English human language Product/Business people can read and write Replaces SDDs

We still have the chasm… Software Definition Documents ? Devs write specs Tests generated from specs LLM ne ge rates code to spec

And developers hate BDD… × If developers hate to write tests (code), only imagine how much they hate to write specs (not code)

How about we let LLMs do the work? Software Definition Documents are prompt LLM creates specs Everybody read the specs and approve LLM generates tests from specs LLM impls the tests until they pass

CREATED Can we trust it? PROMPT 🧑 ❌ SPEC 🐒 ❌ TEST 🐒 ❌ CODE 🐒 ❌ ARTIFACT

LET’S SOLVE IT! × Replace monkeys with algorithms where possible

CREATED Can we trust it? PROMPT 🧑 ❌ SPEC 🐒 ❌ TEST 🐒 ❌ ARTIFACT

Gherkin + cucumber

CREATED Can we trust it? PROMPT 🧑 ❌ SPEC 🐒 ❌ TEST 🤖 ✅ ARTIFACT

LET’S SOLVE IT! × Prevent monkeys from messing with validation

PROMPT 🧑 Can we trust it? ❌ SPEC 🐒 ❌ TEST 🤖 ✅ CODE 🐒 ❌ ARTIFACT CREATED

PROMPT 🧑 Can we trust it? ❌ SPEC 🐒 ❌ TEST 🤖 ✅ CODE 🐒 ✅ ARTIFACT CREATED

ARTIFACT CREATED VERIFIED PROMPT 🧑 🧑 SPEC 🐒 🧑 TEST 🤖 🤖 CODE 🐒 🤖

Intent integrity chain, verified Software Definition Documents are prompt LLM creates specs Everybody read the specs and approve Algorithm generates tests LLM impls the readonly tests until they pass

Why does it work? × × × × × Everybody writes text Specs are reviewed by all SDDs are Specs are living docs Everything else is derived from specs All code is verified by machines

INTENT INTEGRITY CHAIN is WHAT BDD WAS meant to be × Define and agree on intent × We verify everything AI does × We don’t trust AI for what we don’t verify

Are gherkin specs and cucumb × × × × er good enough? Gherkin is good for defining behavior It is proven and works Non-behavioral constrains are hard to describe in given-when-then Better spec-ing tools will emerge

INTENT INTEGRITY CHAIN × Generates consensus × Features are verifiable back to the requirements × We can start trusting AI code × But there is still work to be done

THANKS! Q&A and ads: x x x x @jbaruch #DevoxxPL #IntentIntegrityChain speaking.jbaru.ch