Debugging a SNES Game Engine

Last Modified:

Some background

Over the last two and a half years, I've been intermittently working towards creating a fully complete Super Nintendo Game. This is no easy task; all the code is in 65816 assembly language and its really easy to introduce subtle bugs with simple typos. That's why I have also been writing unit tests to ensure that any bugs are found and squashed ASAP.

My unit testing framework

Programming the Super Nintendo is much like programming a microcontroller. There is no operating system, no multi-tasking, no virtual memory and no terminal. Everything has to be coded from scratch, even the bare basics.

I'll skip the process of coding a terminal. Suffice to say I've written a bare-bones text console that provides me with a text cursor, some print integer routines, some print hex routines and a PrintString routine.

My unit testing framework is as simple as it needs to be. There is a list of unit test routines, Test.AddrTable, and a list of unit test names, Test.TestNameList. All test routines return carry set if the test passed and carry clear if the test failed. The test framework goes through every test in the list; calls the test routine and then tests the carry flag. When a framework encounters a failed test it simply sets the background colour to red, prints the address and name of the failed test and stops the processor.

There are two different types of unit tests.

  1. The traditional unit test. These tests initialises the engine, calls some routine and then tests that the routine preformed the desired action.

  2. An interactive test. These tests initialise the engine and creates a game-loop that the user interacts with. The user is expected to manipulate variables with the controller and visually confirm they are working as expected.

Writing a new unit test

Over the last week I've been refactoring the UnTech Engine's entity subsystem. Upon looking through the unit tests I discovered that I never tested to see what happens if an entity's list changes on the same frame a new entity is spawned.

So I wrote a new entity, SpawnAndChangeListTestEntity, that that spawns 3 DeleteAfterDelayEntity entities and changes its entity list every time its Process routine is called.

Then I wrote a new test that:

  • Initialises the entity subsystem
  • Spawn a number of SpawnAndChangeListTestEntity entities.
  • Processes the entity game-loop a few times.
  • Checks that the entity lists are still valid.

 

// Tests that an entity can spawn a new entity and change its entity
// list at the same time.
a16()
i16()
code()
Test.add("GameLoop (Spawn And Change List)")
function GameLoop_SpawnAndChangeList {
constant _loopsLeft = Test.dpTmp + 0

    jsr     Entity.Init

    lda.w   #0x1000
    sta.w   Entity.Spawn.xPos
    sta.w   Entity.Spawn.yPos

    ldy.w   #Entity.lists.N_LISTS - 1
    SpawnLoop:
        phy
            // Y = listId to spawn entity in
            lda.w   #SpawnAndChangeListTestEntity.EntityId

            jsr     Entity.Spawn
            bcc     Fail
        ply
        dey
        bpl     SpawnLoop

    jsr     _FreeListLength
    cmp.w   #Entity.N_ENTITIES - Entity.lists.N_LISTS
    bne     Fail


    lda.w   #20
    sta.b   _loopsLeft

    Loop:
        jsr     Entity.ProcessGameLoop

        jsr     _TestEntityListsValid
        bcc     Fail

        dec.b   _loopsLeft
        bne     Loop


    jmp     _TestEntityListsValid


Fail:
    clc
    rts


namespace SpawnAndChangeListTestEntity {
    buildFunctionTable(BaseEntityFunctionTable, SpawnAndChangeListTestEntity)
    Entity.useDefaultRenderer()

    constant DELETE_DELAY = 2


    // IN: A = entity parameter = entity list id
    a16()
    i16()
    code()
    function Init {
        sep     #$20
    a8()
        sta.b   BaseEntity.listId

        rep     #$30
    a16()
        rts
    }


    a16()
    i16()
    code()
    function Process {
        // This entity spawns 3 entities:
        //  * DeleteAfterDelayEntity with listId = listId
        //  * DeleteAfterDelayEntity with lidtId = listId - 1
        //  * DeleteAfterDelayEntity with listId = listId + 1
        //
        // then changes its entity list to the previous list.

        lda.b   BaseEntity.listId
        and.w   #0xff
        jsr     SpawnDeleteEntity


        lda.b   BaseEntity.listId
        and.w   #0xff
        dec
        jsr     SpawnDeleteEntity


        lda.b   BaseEntity.listId
        and.w   #0xff
        inc
        jsr     SpawnDeleteEntity


        lda.b   BaseEntity.listId
        and.w   #0xff
        dec
        bpl     +
            lda.w   #Entity.lists.N_LISTS - 1
        +
        jmp     Entity.ChangeEntityListIdAndGotoNextEntity
    }


    // IN: A = listId to spawn DeleteAfterDelayEntity in.
    a16()
    i16()
    code()
    function SpawnDeleteEntity {
        // ensure entity is spawned
        ldx.w   Camera.xPos
        stx.w   Entity.Spawn.xPos
        ldx.w   Camera.yPos
        stx.w   Entity.Spawn.yPos

        // wrap list id if necessary
        tay
        bpl     +
            // A < 0
            lda.w   #Entity.lists.N_LISTS - 1
        +
        cmp.w   #Entity.lists.N_LISTS
        bcc     +
            lda.w   #0
        +

        clc
        adc.w   #Entities.DeleteAfterDelayEntity.EntityId_0
        ldy.w   #DELETE_DELAY
        jmp     Entity.Spawn
    }


    a16()
    i16()
    code()
    Destructor:
    ProcessCollision:
    function NULL {
        rts
    }


    rodata(EN_RomData)
    RomData:
        dw  FunctionTable
        db  0               // initialListId
        db  0               // defaultPalette
        dw  0               // frameSetId

    rodata(EN_EntityList)
    constant EntityId = (pc() - Entity.Data.EntityList) / 2
        dw  RomData
}
}
UnTech Unit Tests: FAILED

Well, that's not good.

Debugging with bsnes-plus

bsnes-plus is a debug oriented fork of bsnes v073 that includes a simple step-by-step debugger, a memory editor, a register state viewer, and a tileset/tilemap viewer.

A recent addition to bsnes-plus was the breakpoint file, which allows the debugger's breakpoint settings to be saved in-between sessions.

Using a simple bash script I can pre-fill the breakpoint file with any label containing the word BREAKPOINT. No more tedious work of manually looking up label addresses and typing them into the Breakpoint Editor window for me.

Searching for the problem

First we need to find out where the test fails. I doubt it's in the SpawnLoop as the spawn tests pass, therefore I'm going go looking in the main test loop.

Lets add some breakpoints to the test loop and step over the code in the debugger to see where the test returns false.

    jsr     _FreeListLength
    cmp.w   #Entity.N_ENTITIES - Entity.lists.N_LISTS
    bne     Fail

BREAKPOINT0:
    lda.w   #20
    sta.b   _loopsLeft

    Loop:
        jsr     Entity.ProcessGameLoop

BREAKPOINT1:
        jsr     _TestEntityListsValid
        bcc     Fail

        dec.b   _loopsLeft
        bne     Loop

BREAKPOINT2:
    jmp     _TestEntityListsValid
Step-over debugging of SpawnAndChangeListTestEntity
Step-over debugging of SpawnAndChangeListTestEntity.

Ah, I see a problem, the D register is 0x0380, when it should be 0x0000. Let me update _TestEntityListsValid so it doesn't change the D Register, and try again.

SpawnAndChangeListTestEntity still failed
Step-over debugging of SpawnAndChangeListTestEntity.

The D register is now correct but the test still fails at the _FreeListLength routine. Let's test that next.

Testing _FreeListLength

_FreeListLength goes through each of the entity lists and tests that:

  1. All the entities in the deactivated lists have no MetaSprite data
  2. All the entities in the normal entity lists have MetaSprite data.
  3. The total number of entities in all lists == Entity.N_ENTITIES.

I seriously doubt the first or second test is failing, so lets add a breakpoint before the Entity.N_ENTITIES test.

function _TestEntityListsValid {
...[snip]...

BREAKPOINT:
    jsr     _FreeListLength
    clc
    adc.w   tmp_count
    cmp.w   #Entity.N_ENTITIES
    bne     Fail

    sec
    rts

Fail:
    clc
    rts
}
Debugging _FreeListLength N_ENTITIES test
Step-over debugging of _FreeListLength.

Here we can see my code counted 0x3b (59) entities, but it was expecting to find 0x40 (64). It appears some entities have gone missing.

Now I know why the test is failing. I just have to find out what is causing it to fail.

Finding the root cause

The entity subsystem is implemented using a series of linked lists. The failing N_ENTITIES test tells me that entities in the linked list are being dropped somewhere. I start by commenting out sections of the SpawnAndChangeListTestEntity code, checking to see if it passes or not.

PASSED
Comment out spawn entity @ listId
FAILED
Comment out spawn entity @ listId - 1
FAILED
Comment out spawn entity @ listId + 1
PASSED
Comment out change entity list

From this observation I now know that I have a linked list corruption bug that drops an entity when an entity's Process routine:

  1. Spawns a new entity whose listId is the same as the current entity's listId, and
  2. The entity changes its entity list on the same frame it spawned the entity.

Fixing the bug

The existance of this bug is obvious in hindsight. Entity.ProcessLoop contains a variable that is holding the address of the previous entity1 and Entity.Spawn adds the new entity to the beginning of the entity list. When an entity is spawned inside an entity's Process routine, the value of _previousEntity could be invalidated, leading to the dropped entity.

Like many problems in life, the are multiple solutions:

  • Remove the ability for an entity to change the list it is held in. This is something I want to avoid, as my use case for this feature is for an NPC entity turning in a boss entity (or vice-versa).

  • Change the spawn routine so that new entities are inserted into the end of the entity lists. I do not like this idea as it wastes CPU time scanning for the end of the list.

  • Store all newly spawned entities in a special entity list, preventing _previousEntity from ever becoming invalid. When the entity loop has finished processing, move all the entities from the special list to their entity list.

Interestingly enough the entity loop already has a special list, tryToActivateAgain, which contains a list of entities that could not be activated in the previous frame2. At the start of the next frame, the entities in the list are activated, and those who are activated successfully are then moved into the appropriate entity list.

In the interest of code reuse, I modified the entity spawning routines to no longer activate entities and instead insert them into the tryToActivateAgain list. As entities can no-longer be inserted into the entity lists during the Process-loop, the _previousEntity variable is never invalidated and the bug is fixed.

PASSED
All unit tests pass.

I later renamed the tryToActivateAgain list to activateNextFrame as it provides a description of what happens to entities inside the list.

Closing thoughts

Debugging a Super Nintendo ROM is a difficult affair. Luckily my task is a lot easier as I have access to the source code and a good mental-model the entire engine in my head. I have no idea how ROM hackers debug SNES ROMs without access to source code, but considering how hard it is to debug my own work they have earned my honest respect.


  1. The _previousEntity variable is used to optimize the change entity list code. Removing an item from a linked list is an O(1) operation if we know the address of the previous entity in the list. 

  2. The entity activation routine fails when there is not enough VBlank time to load the entity's graphics into VRAM.