Debugging a SNES Game Engine
Some background
Over the last two and a half years, I've been intermittently working towards creating a fully complete Super Nintendo Game. This is no easy task; all the code is in 65816 assembly language and its really easy to introduce subtle bugs with simple typos. That's why I have also been writing unit tests to ensure that any bugs are found and squashed ASAP.
My unit testing framework
Programming the Super Nintendo is much like programming a microcontroller. There is no operating system, no multi-tasking, no virtual memory and no terminal. Everything has to be coded from scratch, even the bare basics.
I'll skip the process of coding a terminal. Suffice to say I've written
a bare-bones text console
that provides me with a text cursor, some print integer routines, some
print hex routines and a PrintString
routine.
My unit testing framework
is as simple as it needs to be. There is a list of unit test routines,
Test.AddrTable
, and a list of unit test names, Test.TestNameList
.
All test routines return carry set if the test passed and carry clear if
the test failed. The test framework goes through every test in the
list; calls the test routine and then tests the carry flag. When a
framework encounters a failed test it simply sets the background colour
to red, prints the address and name of the failed test and stops the
processor.
There are two different types of unit tests.
-
The traditional unit test. These tests initialises the engine, calls some routine and then tests that the routine preformed the desired action.
-
An interactive test. These tests initialise the engine and creates a game-loop that the user interacts with. The user is expected to manipulate variables with the controller and visually confirm they are working as expected.
Writing a new unit test
Over the last week I've been refactoring the UnTech Engine's entity subsystem. Upon looking through the unit tests I discovered that I never tested to see what happens if an entity's list changes on the same frame a new entity is spawned.
So I wrote a new entity, SpawnAndChangeListTestEntity
, that that
spawns 3 DeleteAfterDelayEntity
entities and changes its entity list
every time its Process
routine is called.
Then I wrote a new test that:
- Initialises the entity subsystem
- Spawn a number of
SpawnAndChangeListTestEntity
entities. - Processes the entity game-loop a few times.
- Checks that the entity lists are still valid.
// Tests that an entity can spawn a new entity and change its entity
// list at the same time.
a16()
i16()
code()
Test.add("GameLoop (Spawn And Change List)")
function GameLoop_SpawnAndChangeList {
constant _loopsLeft = Test.dpTmp + 0
jsr Entity.Init
lda.w #0x1000
sta.w Entity.Spawn.xPos
sta.w Entity.Spawn.yPos
ldy.w #Entity.lists.N_LISTS - 1
SpawnLoop:
phy
// Y = listId to spawn entity in
lda.w #SpawnAndChangeListTestEntity.EntityId
jsr Entity.Spawn
bcc Fail
ply
dey
bpl SpawnLoop
jsr _FreeListLength
cmp.w #Entity.N_ENTITIES - Entity.lists.N_LISTS
bne Fail
lda.w #20
sta.b _loopsLeft
Loop:
jsr Entity.ProcessGameLoop
jsr _TestEntityListsValid
bcc Fail
dec.b _loopsLeft
bne Loop
jmp _TestEntityListsValid
Fail:
clc
rts
namespace SpawnAndChangeListTestEntity {
buildFunctionTable(BaseEntityFunctionTable, SpawnAndChangeListTestEntity)
Entity.useDefaultRenderer()
constant DELETE_DELAY = 2
// IN: A = entity parameter = entity list id
a16()
i16()
code()
function Init {
sep #$20
a8()
sta.b BaseEntity.listId
rep #$30
a16()
rts
}
a16()
i16()
code()
function Process {
// This entity spawns 3 entities:
// * DeleteAfterDelayEntity with listId = listId
// * DeleteAfterDelayEntity with lidtId = listId - 1
// * DeleteAfterDelayEntity with listId = listId + 1
//
// then changes its entity list to the previous list.
lda.b BaseEntity.listId
and.w #0xff
jsr SpawnDeleteEntity
lda.b BaseEntity.listId
and.w #0xff
dec
jsr SpawnDeleteEntity
lda.b BaseEntity.listId
and.w #0xff
inc
jsr SpawnDeleteEntity
lda.b BaseEntity.listId
and.w #0xff
dec
bpl +
lda.w #Entity.lists.N_LISTS - 1
+
jmp Entity.ChangeEntityListIdAndGotoNextEntity
}
// IN: A = listId to spawn DeleteAfterDelayEntity in.
a16()
i16()
code()
function SpawnDeleteEntity {
// ensure entity is spawned
ldx.w Camera.xPos
stx.w Entity.Spawn.xPos
ldx.w Camera.yPos
stx.w Entity.Spawn.yPos
// wrap list id if necessary
tay
bpl +
// A < 0
lda.w #Entity.lists.N_LISTS - 1
+
cmp.w #Entity.lists.N_LISTS
bcc +
lda.w #0
+
clc
adc.w #Entities.DeleteAfterDelayEntity.EntityId_0
ldy.w #DELETE_DELAY
jmp Entity.Spawn
}
a16()
i16()
code()
Destructor:
ProcessCollision:
function NULL {
rts
}
rodata(EN_RomData)
RomData:
dw FunctionTable
db 0 // initialListId
db 0 // defaultPalette
dw 0 // frameSetId
rodata(EN_EntityList)
constant EntityId = (pc() - Entity.Data.EntityList) / 2
dw RomData
}
}
Well, that's not good.
Debugging with bsnes-plus
bsnes-plus is a debug oriented fork of bsnes v073 that includes a simple step-by-step debugger, a memory editor, a register state viewer, and a tileset/tilemap viewer.
A recent addition to bsnes-plus was the breakpoint file, which allows the debugger's breakpoint settings to be saved in-between sessions.
Using a
simple bash script
I can pre-fill the breakpoint file with any label containing the word
BREAKPOINT
. No more tedious work of manually looking up label
addresses and typing them into the Breakpoint Editor window for me.
Searching for the problem
First we need to find out where the test fails. I doubt it's in the
SpawnLoop
as the spawn tests pass, therefore I'm going go looking in
the main test loop.
Lets add some breakpoints to the test loop and step over the code in the debugger to see where the test returns false.
Ah, I see a problem, the D register is 0x0380
, when it should be
0x0000
. Let me
update _TestEntityListsValid so it doesn't change the D Register,
and try again.
The D register is now correct but the test still fails at the
_FreeListLength
routine. Let's test that next.
Testing _FreeListLength
_FreeListLength
goes through each of the entity lists and tests that:
- All the entities in the deactivated lists have no MetaSprite data
- All the entities in the normal entity lists have MetaSprite data.
- The total number of entities in all lists ==
Entity.N_ENTITIES
.
I seriously doubt the first or second test is failing, so lets add a
breakpoint before the Entity.N_ENTITIES
test.
Here we can see my code counted 0x3b
(59) entities, but it was
expecting to find 0x40
(64). It appears some entities have gone
missing.
Now I know why the test is failing. I just have to find out what is causing it to fail.
Finding the root cause
The entity subsystem is implemented using a series of linked lists. The
failing N_ENTITIES
test tells me that entities in the linked list are
being dropped somewhere. I start by commenting out sections of the
SpawnAndChangeListTestEntity
code, checking to see if it passes or
not.
From this observation I now know that I have a linked list corruption
bug that drops an entity when an entity's Process
routine:
- Spawns a new entity whose
listId
is the same as the current entity'slistId
, and - The entity changes its entity list on the same frame it spawned the entity.
Fixing the bug
The existance of this bug is obvious in hindsight. Entity.ProcessLoop
contains a variable that is holding the address of the previous
entity1 and Entity.Spawn
adds the new entity to the
beginning of the entity list. When an entity is spawned inside an
entity's Process
routine, the value of _previousEntity
could be
invalidated, leading to the dropped entity.
Like many problems in life, the are multiple solutions:
-
Remove the ability for an entity to change the list it is held in. This is something I want to avoid, as my use case for this feature is for an NPC entity turning in a boss entity (or vice-versa).
-
Change the spawn routine so that new entities are inserted into the end of the entity lists. I do not like this idea as it wastes CPU time scanning for the end of the list.
-
Store all newly spawned entities in a special entity list, preventing
_previousEntity
from ever becoming invalid. When the entity loop has finished processing, move all the entities from the special list to their entity list.
Interestingly enough the entity loop already has a special list,
tryToActivateAgain
, which contains a list of entities that could not
be activated in the previous frame2. At the start
of the next frame, the entities in the list are activated, and those who
are activated successfully are then moved into the appropriate entity
list.
In the interest of code reuse, I modified the entity spawning
routines
to no longer activate entities and instead insert them into the
tryToActivateAgain
list. As entities can no-longer be inserted into
the entity lists during the Process
-loop, the _previousEntity
variable is never invalidated and the bug is fixed.
I later
renamed
the tryToActivateAgain
list to activateNextFrame
as it provides a
description of what happens to entities inside the list.
Closing thoughts
Debugging a Super Nintendo ROM is a difficult affair. Luckily my task is a lot easier as I have access to the source code and a good mental-model the entire engine in my head. I have no idea how ROM hackers debug SNES ROMs without access to source code, but considering how hard it is to debug my own work they have earned my honest respect.
-
The
_previousEntity
variable is used to optimize the change entity list code. Removing an item from a linked list is an O(1) operation if we know the address of the previous entity in the list. ↩ -
The entity activation routine fails when there is not enough VBlank time to load the entity's graphics into VRAM. ↩