Prompt-Test-Code: A New Productivity Boost for Developers

A presentation at Devoxx Poland 2025 in June 2025 in Kraków, Poland by Baruch Sadogursky

Slide 1

Slide 1

PromptTest-Code: A New Productivity Boost for Developers

Slide 2

Slide 2

We have a Trust problem

Slide 3

Slide 3

Slide 4

Slide 4

AI generated code is not great

Slide 5

Slide 5

On top of that, it is dangerou s

Slide 6

Slide 6

Asking it to fix it is as reliable as the rest of it

Slide 7

Slide 7

Baruch Sadogursky - @jbaru ch × Head of DevRel at TuxCare (want safer legacy dependencies? Talk to me)

Slide 8

Slide 8

shownotes × × × × speaking.jbaru.ch Slides Video All the links!

Slide 9

Slide 9

Software design documents

Slide 10

Slide 10

Software design documents × Write-once × Read-maybe-once

Slide 11

Slide 11

Slide 12

Slide 12

Slide 13

Slide 13

Slide 14

Slide 14

Slide 15

Slide 15

So why don’t they work? × Human (mis)understanding (a.k.a. curse of knowledge) × Vague responsibility boundaries

Slide 16

Slide 16

Slide 17

Slide 17

Slide 18

Slide 18

Slide 19

Slide 19

We have a Trust problem (not only with AI)

Slide 20

Slide 20

Next thing you know: It’s a vei n diagram Software I like Software I know really well

Slide 21

Slide 21

But hey, we do have working s oftware sometimes × Good intentions × Professionalism × Tests and QA × End result observation × Common-ish context

Slide 22

Slide 22

Until gen AI changed the game × × × × × Good intentions Professionalism Tests and QA End result observation Common-ish context

Slide 23

Slide 23

Intent-to-prompt chasm User How do I use Spring Boot? You can’t really boot a season of a year LLM

Slide 24

Slide 24

But it gets worse Prompt Temperature Slightly different results

Slide 25

Slide 25

Slide 26

Slide 26

Slide 27

Slide 27

The LGTM syndrome × We’re busy with other stuff × We don’t like to read other people’s code × Obviously, we don’t like to read generated code × Result: LGTM

Slide 28

Slide 28

Slide 29

Slide 29

Slide 30

Slide 30

Idea! What if we code in the intent and always verify against it?

Slide 31

Slide 31

Intent integrity chain

Slide 32

Slide 32

Slide 33

Slide 33

Tests are guardrails for mon keys! × Write the tests first × Let LLM implement TDD? the code × If the tests pass, the intent is captured correctly!

Slide 34

Slide 34

Slide 35

Slide 35

Slide 36

Slide 36

We still have the chasm… Software Definition Documents ? ? Devs write tests LLM ne ge rates code to spec

Slide 37

Slide 37

And developers hate TDD… × Developers are biased for action × We already know how to solve the problem

Slide 38

Slide 38

What if it (almost) won’t look × × × × like code? How about writing specs instead? BDD? Look almost like English human language Product/Business people can read and write Replaces SDDs

Slide 39

Slide 39

Slide 40

Slide 40

We still have the chasm… Software Definition Documents ? Devs write specs Tests generated from specs LLM ne ge rates code to spec

Slide 41

Slide 41

And developers hate BDD… × If developers hate to write tests (code), only imagine how much they hate to write specs (not code)

Slide 42

Slide 42

Slide 43

Slide 43

How about we let LLMs do the work? Software Definition Documents are prompt LLM creates specs Everybody read the specs and approve LLM generates tests from specs LLM impls the tests until they pass

Slide 44

Slide 44

Slide 45

Slide 45

Slide 46

Slide 46

Slide 47

Slide 47

CREATED Can we trust it? PROMPT 🧑 ❌ SPEC 🐒 ❌ TEST 🐒 ❌ CODE 🐒 ❌ ARTIFACT

Slide 48

Slide 48

Slide 49

Slide 49

LET’S SOLVE IT! × Replace monkeys with algorithms where possible

Slide 50

Slide 50

CREATED Can we trust it? PROMPT 🧑 ❌ SPEC 🐒 ❌ TEST 🐒 ❌ ARTIFACT

Slide 51

Slide 51

Slide 52

Slide 52

Gherkin + cucumber

Slide 53

Slide 53

CREATED Can we trust it? PROMPT 🧑 ❌ SPEC 🐒 ❌ TEST 🤖 ✅ ARTIFACT

Slide 54

Slide 54

LET’S SOLVE IT! × Prevent monkeys from messing with validation

Slide 55

Slide 55

PROMPT 🧑 Can we trust it? ❌ SPEC 🐒 ❌ TEST 🤖 ✅ CODE 🐒 ❌ ARTIFACT CREATED

Slide 56

Slide 56

Slide 57

Slide 57

PROMPT 🧑 Can we trust it? ❌ SPEC 🐒 ❌ TEST 🤖 ✅ CODE 🐒 ✅ ARTIFACT CREATED

Slide 58

Slide 58

Slide 59

Slide 59

ARTIFACT CREATED VERIFIED PROMPT 🧑 🧑 SPEC 🐒 🧑 TEST 🤖 🤖 CODE 🐒 🤖

Slide 60

Slide 60

Intent integrity chain, verified Software Definition Documents are prompt LLM creates specs Everybody read the specs and approve Algorithm generates tests LLM impls the readonly tests until they pass

Slide 61

Slide 61

Why does it work? × × × × × Everybody writes text Specs are reviewed by all SDDs are Specs are living docs Everything else is derived from specs All code is verified by machines

Slide 62

Slide 62

INTENT INTEGRITY CHAIN is WHAT BDD WAS meant to be × Define and agree on intent × We verify everything AI does × We don’t trust AI for what we don’t verify

Slide 63

Slide 63

Are gherkin specs and cucumb × × × × er good enough? Gherkin is good for defining behavior It is proven and works Non-behavioral constrains are hard to describe in given-when-then Better spec-ing tools will emerge

Slide 64

Slide 64

Slide 65

Slide 65

Slide 66

Slide 66

INTENT INTEGRITY CHAIN × Generates consensus × Features are verifiable back to the requirements × We can start trusting AI code × But there is still work to be done

Slide 67

Slide 67

THANKS! Q&A and ads: x x x x @jbaruch #DevoxxPL #IntentIntegrityChain speaking.jbaru.ch