I'm building a game for ZX Spectrum Next. It's called Deep. The protagonist is Jonathan Deep, a philosophy professor waking up in a psychiatric hospital with no memory of how he got there. It's the kind of game that probably shouldn't exist on a 1980s-derived 8-bit platform, which is exactly why I'm building it.
The Next port has been the hard part. I'm using z88dk with sccz80, banked C across eight 16KB segments, a custom IM2 interrupt handler, and a hand-rolled display pipeline writing directly to ULA memory. It's not a simple project. When things go wrong, they go wrong in interesting ways.
And things went very wrong.
The symptoms
After getting the game to boot for the first time on ZEsarUX, I had a title screen. Progress. Then I'd press SPACE to explore and the game would fall into BASIC ROM. Every time. No crash message, no clue. Just BASIC.
The display was also misbehaving. Green vertical stripes. Keyboard input not registering. The MMU apparently drifting mid-run, remapping slots 6 and 7 to who-knows-where. Stack pointer landing in the middle of BSS. It looked like ten different bugs.
It was one bug. But I didn't know that yet.
Enter the agent
I'd been using Claude (Opus 4.6 and then 4.7) as a coding agent via Claude Code, with a local mempalace MCP server to persist session state between conversations. The idea being that significant findings get filed to mempalace immediately, so the next session can pick up the thread without starting from scratch.
Good thing too. Because the copilot client crashed on me several times over the following days, wiping live context. Each time, the agent had to reconstruct its mental model from whatever mempalace fragments had survived. A few entries were missing. It reconstructed anyway.
What the agent did, without being explicitly told to, was invent a debugging methodology suited to the platform's constraints. No printf on ZX Next. No gdb. No breakpoints in any conventional sense.
Instead: sentinel bytes. Volatile writes to fixed addresses in the $5B00-$5B08 range, read back via ZRCP (ZEsarUX's remote control protocol) after smartloading the NEX file. Nine staggered time snapshots per run: 20ms, 50ms, 100ms, 200ms, 300ms, 500ms, 1.2s, 2.2s, 4.2s. Combined with PC, SP, and MMU register dumps at each interval.
It was methodical in a way that was almost uncomfortable to watch. Hypothesis. Build. Flash. Query. Eliminate. File to mempalace. Repeat.
On and off, over three days - with and without my intervention.
What it found (the short version)
Nine bugs, in order of discovery. ROM calls in the render path re-enabling interrupts. Banked string functions returning stale pointers. BSS overflowing into the banked window. Missing IM2 infrastructure. Keyboard scan not working. Stack colliding with BSS. A sizeof() on a pointer field returning the wrong size.
And then bug nine. The root cause of everything.
z88dk's sdcc_ix variants of _memcpy_callee and _memset_callee pop their arguments in the opposite order from what sccz80 actually pushes at the call site. So memcpy(dst, src, 11) was running LDIR with bc=dst (roughly 32KB as a count) and de=11 as the destination. It was wiping huge swaths of RAM on every call. The MMU drifted because the trampoline's saved bank record got scribbled. The system fell into BASIC ROM because slots 6 and 7 got remapped to garbage.
cls() survived only by accident. Most of its garbage writes landed in ROM and were silently ignored.
The fix was two functions in a single .asm file with the correct pop order, linked first so the linker prefers them over the lib. Thirty lines of assembly to fix three days of chaos.
Why it took three days
The ABI mismatch doesn't announce itself. Every symptom pointed somewhere else. Stack issues, MMU drift, mysterious crashes, each one a plausible standalone bug. The agent chased all of them, fixed several real ones along the way, and kept narrowing. The sentinel methodology was what made it possible to bisect the crash to a single source statement even when the failure mode was "the machine falls into BASIC ROM 300ms after boot."
I'm not going to pretend it was cheap. Opus 4.7 at 7x the cost of Sonnet, for three days, hurts. But it found a bug that would have taken me weeks. Possibly longer. The kind of bug that makes you question your entire toolchain.
Where we are now
The game now renders with a viewport and sidebar. The main character is now walking about. IM2 is solid. The display pipeline is clean.
There are two known bugs left. The sidebar is rendering in the wrong position. One of the enemies doesn't move. Both are small, bounded, and can probably be handed to Sonnet 4.6 with a tight prompt.
The hard part is done. The game runs.
*on and off. In the background mostly.
No comments:
Post a Comment