RoboCoders: Judgment Day — AI Coding Agents Face Off
Abstract
Two AI coding agents, same prompts, same hardware, no rehearsal — Baruch on Claude Code (Opus 4.7), Viktor on JetBrains Junie (Gemini Flash 3.5) — race to build a Kotlin/JVM app that watches your face, recognises who you are, shows how confident it is, and reads your emotions, all driving real IoT hardware: a smart bulb, a camera, and two segmented LED light bars. Last year the question was “which agent is smarter?” This year it matters far less than the context you give them. Capability is now trivial and commoditised — what used to need a research team is a one-liner, and both agents happily write Kotlin that RUNS. Neither writes Kotlin that is RIGHT until you engineer the context they inherit: versioned plugins of skills and rules that encode the device truth, the empirical calibrations, and the actuator patterns no code reviewer would ever catch (four silent semantic failures, every HTTP call returning 200 OK). And the moment the coding agent delegates the build to sub-agents, context doesn’t inherit — every delegation is a regression until a meta-plugin engineers the hand-off explicitly. The headline for a JVM crowd: “AI engineering” isn’t a tool choice, it’s the discipline around the agent — the language defaults, the device facts, the calibration constants, the actuator patterns, the sub-agent hand-off, and the eval that measures whether any of it actually works (govee-h6056: 26% → 100%, a 3.84× lift). The variable was never the agent. It’s the context.
Resources
Demo Code
- Baruch’s demo — ready version — Claude Code build, Kotlin/JVM.
- Viktor’s demo — ready version — Junie build, Kotlin/JVM.
Context Engineering
- Tessl — Agent Enablement Platform — Package manager and registry for agent skills, rules, and context.
- Tessl Registry — Browse and install plugins.
- Closing eval run (govee-h6056) — The
26% → 100%(3.84× lift) measurement behind the Monday-morning “write an eval” ask: baseline (no context) vs with-context, scored by an LLM judge.
Tessl Plugins Demonstrated
jbaruch/kotlin-tutor— Stage 0.alwaysApplyrules: Kotlin idioms + stack defaults (Kotlin 2.x + JDK 21 + Gradle KTS, Ktor, coroutines, DJL, JavaCV). Flips the agent from Python to Kotlin on the same prompt.jbaruch/govee-h6056— Stage 3. Segment topology (12 physical, not 15), bar mapping, phantom-segment caveat, rate-limit + retry-backoff, color-space.jbaruch/face-recognition-calibration-djl— Stage 3. Piecewise FaceNet cosine-distance bands measured empirically on the DJL pipeline (so a strong match doesn’t read as 40%).jbaruch/iot-actuator-patterns-kotlin— Stage 3.Flow.debounce+ quantization + bottom-up progress-bar ordering for Kotlin coroutines (kills the Stage-1 flicker).jbaruch/vision-pipeline-foundations-kotlin— Stage 1-2. JavaCV camera setup + frame-skip viaFlow.sample().jbaruch/shelly-duo-gu10— mDNS discovery name, REST endpoints, color model for the smart bulb.jbaruch/sub-agent-delegation— Stage 4 meta-plugin. Teaches the orchestrator that sub-agents start fresh and inherit nothing: explicit hand-off + echo-skills validation + single-writer-per-actuator.
Sub-Agents — How Delegation Drops Context (Stage 4)
The “agentic” stage is about the coding agents (Claude Code, Junie) spawning sub-agents to write the code — not about embedding AI in the generated app (the RoboCoders app is pure Kotlin/DJL/JavaCV with zero runtime AI). Sub-agents start with fresh context and inherit none of the parent’s plugins, so every delegation regresses until the hand-off is engineered.
- Claude Agent SDK — Subagents — The source for “the only channel from parent to subagent is the Agent tool’s prompt string” — the bug-by-design the meta-plugin works around.
- Junie — Custom subagents — JetBrains Junie’s subagents: Markdown + YAML in
.junie/agents/, with per-subagent tool restrictions, models, and agent skills. - Junie — Agent skills — How Junie loads skills (the Junie-side equivalent of the plugins demonstrated here).
- Junie — Guidelines & memory —
AGENTS.md/.junie/AGENTS.mdalways-on guidelines.
Kotlin / JVM Stack
- DJL — Deep Java Library — RetinaFace detect, FaceNet/ArcFace embeddings, ViT/FER+ emotion — all on the JVM (PyTorch engine).
- JavaCV — OpenCV bindings for Kotlin/JVM: camera capture + Haar cascade.
- Ktor — HTTP client (CIO engine) for the Shelly/Govee REST + embedded server for the MJPEG preview.
- kotlinx-coroutines —
Flow, structured concurrency,Dispatchers.IOfor the IoT controllers. - Gradle (Kotlin DSL) — Single multi-module build (
live/,ready/). - Koog — JetBrains’ Kotlin AI agent framework — JVM-native, multiplatform agent orchestration with MCP integration (for taking sub-agent orchestration into the app itself).
Hardware — Specs & Device APIs
- Shelly Duo GU10 RGBW — Smart bulb, local REST API (~30 ms LAN latency). · Shelly Gen1 HTTP API docs — the
/light/0//color/0color + gain endpoints. - DJI Osmo Pocket 3 — 1” sensor, 1080p/60 USB-C webcam mode, used as the face-detection camera.
- Govee Flow Plus Light Bars H6056 — Segmented LED bars via cloud API (the ~10 req/s sustained cap and HSV-with-gamma color quirk are the Stage-3 traps). · Govee Developer API —
Govee-API-Keyheader;turn/brightness/colordevice commands.
Models
- DJL face_feature — ArcFace-derived 512-d face embeddings, PyTorch engine.
- ONNX Model Zoo — emotion-ferplus — 8-class FER+ emotion classifier (64×64 grayscale, ~30 MB).
Coding Agents
- Claude Code — Baruch’s agent, running on Opus 4.7.
- Junie — Viktor’s agent, JetBrains-native, running Gemini Flash 3.5.
Conference
- J-Spring 2026 — The International Java Conference by NLJUG. Utrecht, Netherlands, 4 June 2026.
Speakers
- Baruch Sadogursky — speaking.jbaru.ch — All talks, slides, and shownotes.
- Viktor Gamov — speaking.gamov.io — Viktor’s speaker profile.
- @jbaruch on X
- @gamussa on X
Related Talks
- RoboCoders: Judgment Day @ KotlinConf 2026 — The original 45-minute Kotlin cut. Same hardware, same dual-agent setup, same five-stage escalation.
- RoboCoders: Judgment Day @ JNation 2026 — The 180-minute extended cut: three new Stage-3 failure modes, three new Stage-4 delegation failure modes, and a live
tessl eval runStage 5. - RoboCoders: Judgment Day @ Arc of AI 2026 — The Python original. Same thesis, different stack: dlib + transformers + Claude Agent SDK orchestrator instead of DJL + JavaCV.