So why don’t they work?
× Human (mis)understanding (a.k.a. curse of knowledge) × Vague responsibility boundaries
Slide 16
Slide 17
Slide 18
Slide 19
We have a Trust problem (not only with AI)
Slide 20
Next thing you know: It’s a vei n diagram
Software I like
Software I know really well
Slide 21
But hey, we do have working s oftware sometimes × Good intentions × Professionalism × Tests and QA × End result observation × Common-ish context
Slide 22
Until gen AI changed the game
× × × × ×
Good intentions Professionalism Tests and QA End result observation Common-ish context
Slide 23
Intent-to-prompt chasm User
How do I use Spring Boot? You can’t really boot a season of a year
LLM
Slide 24
But it gets worse
Prompt
Temperature
Slightly different results
Slide 25
Slide 26
Slide 27
The LGTM syndrome
× We’re busy with other stuff × We don’t like to read other people’s code × Obviously, we don’t like to read generated code × Result: LGTM
Slide 28
Slide 29
Slide 30
Idea! What if we code in the intent and always verify against it?
Slide 31
Intent integrity chain
Slide 32
Slide 33
Tests are guardrails for mon keys! × Write the tests first × Let LLM implement TDD? the code × If the tests pass, the intent is captured correctly!
Slide 34
Slide 35
Slide 36
We still have the chasm…
Software Definition Documents
?
?
Devs write tests
LLM ne ge rates code to spec
Slide 37
And developers hate TDD…
× Developers are biased for action × We already know how to solve the problem
Slide 38
What if it (almost) won’t look
× × × ×
like code? How about writing specs instead? BDD? Look almost like English human language Product/Business people can read and write Replaces SDDs
Slide 39
Slide 40
We still have the chasm…
Software Definition Documents
?
Devs write specs
Tests generated from specs
LLM ne ge rates code to spec
Slide 41
And developers hate BDD…
× If developers hate to write tests (code), only imagine how much they hate to write specs (not code)
Slide 42
Slide 43
How about we let LLMs do the work?
Software Definition Documents are prompt
LLM creates specs
Everybody read the specs and approve
LLM generates tests from specs
LLM impls the tests until they pass
Slide 44
Slide 45
Slide 46
Slide 47
CREATED
Can we trust it?
PROMPT
🧑
❌
SPEC
🐒
❌
TEST
🐒
❌
CODE
🐒
❌
ARTIFACT
Slide 48
Slide 49
LET’S SOLVE IT!
× Replace monkeys with algorithms where possible
Slide 50
CREATED
Can we trust it?
PROMPT
🧑
❌
SPEC
🐒
❌
TEST
🐒
❌
ARTIFACT
Slide 51
Slide 52
Gherkin + cucumber
Slide 53
CREATED
Can we trust it?
PROMPT
🧑
❌
SPEC
🐒
❌
TEST
🤖
✅
ARTIFACT
Slide 54
LET’S SOLVE IT!
× Prevent monkeys from messing with validation
Slide 55
PROMPT
🧑
Can we trust it? ❌
SPEC
🐒
❌
TEST
🤖
✅
CODE
🐒
❌
ARTIFACT
CREATED
Slide 56
Slide 57
PROMPT
🧑
Can we trust it? ❌
SPEC
🐒
❌
TEST
🤖
✅
CODE
🐒
✅
ARTIFACT
CREATED
Slide 58
Slide 59
ARTIFACT
CREATED
VERIFIED
PROMPT
🧑
🧑
SPEC
🐒
🧑
TEST
🤖
🤖
CODE
🐒
🤖
Slide 60
Intent integrity chain, verified
Software Definition Documents are prompt
LLM creates specs
Everybody read the specs and approve
Algorithm generates tests
LLM impls the readonly tests until they pass
Slide 61
Why does it work?
× × × × ×
Everybody writes text Specs are reviewed by all SDDs are Specs are living docs Everything else is derived from specs All code is verified by machines
Slide 62
INTENT INTEGRITY CHAIN is WHAT
BDD WAS meant to be × Define and agree on intent × We verify everything AI does × We don’t trust AI for what we don’t verify
Slide 63
Are gherkin specs and cucumb
× × × ×
er good enough? Gherkin is good for defining behavior It is proven and works Non-behavioral constrains are hard to describe in given-when-then Better spec-ing tools will emerge
Slide 64
Slide 65
Slide 66
INTENT INTEGRITY CHAIN
× Generates consensus × Features are verifiable back to the requirements × We can start trusting AI code × But there is still work to be done
Slide 67
THANKS! Q&A and ads: x x x x
@jbaruch #DevoxxPL #IntentIntegrityChain speaking.jbaru.ch