Binary Soup: 2026

Thursday, April 23, 2026

How Claude Spent Three Days* Debugging a ZX Spectrum Next Game (And Why It Was Worth It)

I'm building a game for ZX Spectrum Next. It's called Deep. The protagonist is a patient waking up in a psychiatric hospital with no memory of how he got there.

The Next port has been the hard part. I'd left it relatively late in the day to get the code functional on there. I'm using z88dk with sccz80, banked C across eight 16KB segments, a custom IM2 interrupt handler, and a hand-rolled display pipeline writing directly to ULA memory. It's not a simple project. When things go wrong, they go wrong in interesting ways.

And things went very wrong.

The symptoms

After getting the game to boot for the first time on ZEsarUX, I had a title screen. Progress. Then I'd press SPACE to explore and the game would fall into BASIC ROM. Every time. No crash message, no clue. Just BASIC.

The display was also misbehaving. Green vertical stripes. Keyboard input not registering. The MMU apparently drifting mid-run, remapping slots 6 and 7 to who-knows-where. Stack pointer landing in the middle of BSS. It looked like ten different bugs.

It was one bug. But I didn't know that yet.

Enter the agent

I'd been using Claude (Opus 4.6 and then 4.7) as a coding agent via Claude Code, with a local mempalace MCP server to persist session state between conversations. The idea being that significant findings get filed to mempalace immediately, so the next session can pick up the thread without starting from scratch.

Good thing too. Because the copilot client crashed on me several times over the following days, wiping live context. Each time, the agent had to reconstruct its mental model from whatever mempalace fragments had survived. A few entries were missing. It reconstructed anyway.

What the agent did, without being explicitly told to, was invent a debugging methodology suited to the platform's constraints. No printf on ZX Next. No gdb. No breakpoints in any conventional sense.

Instead: sentinel bytes. Volatile writes to fixed addresses in the $5B00-$5B08 range, read back via ZRCP (ZEsarUX's remote control protocol) after smartloading the NEX file. Nine staggered time snapshots per run: 20ms, 50ms, 100ms, 200ms, 300ms, 500ms, 1.2s, 2.2s, 4.2s. Combined with PC, SP, and MMU register dumps at each interval.

It was methodical in a way that was almost uncomfortable to watch. Hypothesis. Build. Flash. Query. Eliminate. File to mempalace. Repeat.

On and off, over three days - with and without my intervention.

What it found (the short version)

Nine bugs, in order of discovery. ROM calls in the render path re-enabling interrupts. Banked string functions returning stale pointers. BSS overflowing into the banked window. Missing IM2 infrastructure. Keyboard scan not working. Stack colliding with BSS. A sizeof() on a pointer field returning the wrong size.

And then bug nine. The root cause of everything.

z88dk's sdcc_ix variants of _memcpy_callee and _memset_callee pop their arguments in the opposite order from what sccz80 actually pushes at the call site. So memcpy(dst, src, 11) was running LDIR with bc=dst (roughly 32KB as a count) and de=11 as the destination. It was wiping huge swaths of RAM on every call. The MMU drifted because the trampoline's saved bank record got scribbled. The system fell into BASIC ROM because slots 6 and 7 got remapped to garbage.

cls() survived only by accident. Most of its garbage writes landed in ROM and were silently ignored.

The fix was two functions in a single .asm file with the correct pop order, linked first so the linker prefers them over the lib. Thirty lines of assembly to fix three days of chaos.

Why it took three days

The ABI mismatch doesn't announce itself. Every symptom pointed somewhere else. Stack issues, MMU drift, mysterious crashes, each one a plausible standalone bug. The agent chased all of them, fixed several real ones along the way, and kept narrowing. The sentinel methodology was what made it possible to bisect the crash to a single source statement even when the failure mode was "the machine falls into BASIC ROM 300ms after boot."

I'm not going to pretend it was cheap. Opus 4.7 at 7x the cost of Sonnet, for three days, hurts. But it found a bug that would have taken me weeks. Possibly longer. The kind of bug that makes you question your entire toolchain.

Where we are now

The game now renders with a viewport and sidebar. The main character is now walking about. IM2 is solid. The display pipeline is clean.

There are two known bugs left. The sidebar is rendering in the wrong position. One of the enemies doesn't move. Both are small, bounded, and can probably be handed to Sonnet 4.6 with a tight prompt.

The hard part is done. The game runs.

*on and off. In the background mostly.

Saturday, April 4, 2026

Well, I'm amazed

These agents are ridiculous. I've been more productive in my personal projects in the last week than I've been in the last 8 years.

In the last three weeks I've:

Written the +3 disk checker for the Plus3 I've been putting off for 3 years
Started the ZX Spectrum Nextgame I've been planning since I got my Next
Converted a 25k line Pascal/Delphi mod tracker to Rust and got it building on Windows, Mac, Linux, and the web (for the music to my game, of course)
Tidied up the parsing on my BDD framework
Fixed a couple of bugs in dokker and GoCrest

It's insane.

There are two issues with this though:

I've learned almost nothing about how to actually solve the problems these projects present
I don't fully trust some of it.

And that second one is the bigger problem.

Because everything works. Impressively well. Suspiciously well. The kind of “this shouldn’t be this easy” well.

I can build things faster than ever, but I’m also one layer removed from understanding them. When something breaks, I’m not debugging my thinking - I’m debugging something I half-generated and half-understand.

It’s like going from writing code to reviewing code… except the author is an overconfident ghost.

I’m not convinced this is bad. In fact, it might be the whole point. Maybe the skill shifts from “how do I build this?” to “how do I steer this?” and “how do I know when it’s wrong?” My tests and specifications aren't always air tight, and agents fill the gaps if they think they can get away with it.

But it does feel like cheating.

And also like the future.

I've been used to being slightly removed from coding as a Tech lead, and it definitely feels good being productive again. I can go into a meeting having set an agent a task, and be reasonably confident the task will be done, documented and tested.

The solution to the overconfidence? The same as it always was. Good engineering habits, small, vertical slices, well thought out testing strategies, and good communication. Coding was never the hard part of development, we still have that (for now).

We are the monks of old, sitting in our isolation, copying text by eye and by hand. In the same way the printing presses destroyed the monk's art, the era of hand crafting software is over too, for better or worse.

But.. just but.. maybe there's room for the storyteller, the one who can weave these tools the best will surge ahead and create new, bold creations with them.

Now.. why does the Sorcerer's Apprentice come to mind?

Tuesday, March 24, 2026

Disk Check for the ZX Spectrum +3

Well, not to jump on the agentic coding bandwagon or anything, I've totally jumped on the and implemented a in . Its taken me and copilot a couple of weeks, but I now have a functioning Disk Checker.

Copilot (and sometimes Claude, thanks Claude) helped me write almost all of this. I knew very little machine code, but I did know C. Copilot fixed the main bug I had when I was just running the disk routines, namely the crash I couldn't get my head around when trying to communicate to the controller. It was the biggest stumbling block to me getting it done, and it fixed it in about an afternoon of it fiddling with the ASM.

Hopefully now I can fix those two real +3 drives sitting on my shelf.. erm.

Features

Motor + drive status – Combined motor control and ST3 status check
Drive probe (Read ID) – Probe media and report controller status bytes; decodes ST1/ST2 on failure
Recal + seek track 2 – Track-0 recalibrate then seek verification
Interactive step seek – Manually step the head track by track
Read ID – Read sector ID from track 0 (requires readable disk)
Read track data loop – Continuously reads sector data on selected track (J/K to change track)
Disk RPM checker – Rotational-speed estimate from repeated ID reads; requires readable sector IDs
Run all – Execute all core tests in sequence and display a report card
Show report card – Display last run results (PASS / FAIL / NOT RUN per test)
Clear stored results – Reset all stored test results
Direct-key menu UI – Navigation and hotkeys respond directly; confirmation prompts use ENTER

Main menu

This was an exercise in how these agents work, their limits, my limits, the highlighs and the pitfalls. I let them pretty much have free reign, and had a lot of refactoring to do myself. It taught me a lot about how to work with them, and what I want from them and what I don't.

But above all, it was really fun bringing the old and new worlds together like this!