Adding a second layer to unnamed-snes-game

Earlier in the year I finally added a second layer to my unnamed-snes-game. The original tech-demo did not use BG2, it had room tiles on BG1 and a status bar on BG3. Now that the engine uses both BG1 and BG2, I can (hopefully) improve the game's graphics and add layering effects to my game.

This blog post is not a deep dive into the second-layer subsystem. It is a rambling of the design decisions and optimisations I used when coding the second-layer. I tried to to document how the scrolling code works but I hit a major case of writer's block.

For those interested, the source code to the second-layer subsystem can be found on GitHub.

Animated screen capture
A scrolling programmer-art second-layer above the room.
The second-layer stops scrolling in the healing room.

Goals

A second background layer above or below the room tiles.1

  • Built from a single png image file and json parameters
  • Can be placed above or below the room tiles
    • Above: clouds, fog, large bosses made from BG tiles
    • Below: grass, decorative paths, wooden floors, floor tiles, etc
  • Can be scrolled during the game-loop
    • Moving fog
    • Dark rooms with moving circle of visibility
    • A far away mountain that slowly moves on each room transition
  • Can be scrolled in sync with a room transition
    • The ability to see and move the second-layer in Tiled (the map editor I'm using)
  • Infinitely scrollable (will seamlessly wrap around)
  • Can be larger than the VRAM tilemap
  • Uses callbacks with room/dungeon parameters (reusing room-event code/data-structures where possible)


I also had an idea where the second-layer subsystem could be reused to draw the floor tiles below the playfield. For this to work, it would need to scroll in sync with the room transitions and a second-layer offset (in tiles) would need to be added to the room data. I called this feature the PART_OF_ROOM flag.

Animated screen capture
A PART_OF_ROOM second-layer scrolling in sync with a room-transition.
The two rooms display different slices of the second-layer.
In the source image 34 is not above 11.

I'm unsure how well PART_OF_ROOM will work when I get back to building levels again. If it becomes too messy to design levels with, I can always modify the mt_tilesets subsystem to optionally use two background layers.

Limitations and Design Decisions

The second layer is built from a single static png image. This is a deliberate design decision. I wanted to be able to create the second-layer with an image editor (with its fancy layering, line drawing and cut/copy/paste features) and have my resources compiler automatically handle the tile deduplication for me. As a bonus, I will have more freedom to change the second-layer data formats in the future.


My first question was "where do I allocate tiles for the second layer in VRAM?". In the tech-demo the VRAM map is full with only a 4KiB chunk of VRAM unaccounted for. Now I could have split the BG1 tiles allocation in two, decreasing the amount of tiles available to the playfield in a multiple of 8 KiB (256 4bpp tile) chunks, but I had a better idea - allocate second-layer tiles from the bottom-down (from tile index 1023 down to 0). This way I get to allocate a 32 KiB (1024 4bpp tile) chunk of VRAM to both BG1/BG2 and the two layers share that chunk without any wasted tiles.

I later added a feature that optionally links a second-layer with a mt_tileset (room tiles), allowing the resources compiler to deduplicate any mt_tileset tiles that are used in the linked second-layer. Be aware, a linked second-layer can only be used with a single mt_tileset as using a different mt_tileset will output glitched tiles.

annotated VRAM map
unnamed-snes-game VRAM map

Next, I needed to decide the number of columns and rows to draw on every frame. There's not enough CPU time to draw an entire background layer. I can only scroll the second-layer, drawing new columns and rows as the viewport moves across the VRAM tilemap.

This places a limit to the maximum scrolling speed of the second-layer. I decided to only update 1 column and/or 1 row at a time, limiting scrolling to 8 pixels/frame. I could have increased this to 2 columns/row per frame but that would require more CPU time then I would have liked.


Finally, I needed to decide where to put the tilemap seam (the row/column that is updated when the second-layer is scrolled).

If I want to add HDMA BGHOFS (horizontal offset) or BGVOFS (vertical offset) effects to the second-layer I need to ensure the tiles surrounding the 256x224 viewport are also populated or else the HDMA effect will show stale tiles onscreen.

So I decided to place the seam as far away from the viewport as possible.

VRAM tilemap
The tilemap seam for a second-layer.
The while box is the viewport.
The striped tiles are the row or column that will change when the second-layer is scrolled.

Annoyingly, I was unable to place the tilemap seam in the center of the off-screen region. The seam's position is off by one in both axis. Here are the number of valid tiles that surround the viewport:

  • 2 8px tiles above2
  • 0 8px tiles below
  • 16 8px tiles left
  • 14 8px tiles right

This is more than enough for horizontal offset effects but will require me to be careful when implementing vertical offset effects.

I have tried multiple times, including thrice while writing this blog post, to move the tilemap seam up a single 8px tile but every time I do so I encounter a scrolling glitch somewhere (mostly when the scroll direction changes). Honestly, this tilemap seam has been the source of countless off-by-one errors and glitches. The scrolling code currently works. I'm going to save myself the headache and leave it alone.

The center is vertically offset by 8 pixels and there is a glitched vertical line The right side vertically offset by 8 pixels The bottom half is missing Lots of missing tiles The second-layer is vertically squished There is a horizontal line on every second row Lots of tiles are missing, some columns have been duplicated Two columns have been duplicated A single tile is corrupt The right side is vertically offset by 16 pixels
A collection of second-layer scrolling glitches


There is one complication with the tilemap seam - it cannot be used with PART_OF_ROOM second-layers. This is because the tiles surrounding the viewport are unknown until a room transition starts.

To solve this, I have written two different scrolling code paths. PART_OF_ROOM second-layers have two tilemap seams per axis, so the seam surrounds the viewport. This will prevent HDMA scrolling effects (without glitches) and I can accept that limitation.

VRAM tilemap
The tilemap seams for PART_OF_ROOM second layers.
The while box is the viewport.
The striped tiles are the rows or columns that will change when the second-layer is scrolled.

Optimisations

Low-level stuff

All drawing code is done backwards, from bottom-right to top-left. This is because it is faster to test if decrementing an index outputs a negative value (dex ; bpl label) than it is to increment and compare an index (inx ; cpx #MAX ; bcc label).


Most of the private state variables serve multiple purposes. This reduces the number of variables I have to test or modify every frame but makes the code a lot more brittle.

For instance, the vertical_buffer.maskedPrevXPos variable is used to:

  • determine scrolling direction.
  • determine if xPos/yPos has advanced past an 8px VRAM tile boundary.
  • limit xPos/yPos to a single VRAM tile update per frame.
  • determine when to advance the map position.

and the vertical_buffer.cursor variable:

  • Holds the current position within the [u16 ; 32] vertical buffer.
  • Used to calculate horizontal_buffer.vramWordAddr1 and horizontal_buffer.vramWordAddr2.
  • Used to determine if the top or bottom half of the MetaTile is written to the horizontal buffer.

To prevent glitches the state variables must not be modified outside of the second_layer namespace. To do this I took advantage of wiz's namespace and variable aliasing features. The game code does not directly access the second_layer namespace. Instead I created a new namespace, sl_callbacks, which contains aliases to the second_layer variables and functions the game code is allowed to access (with some of the variables marked const read-only).

Skipping column/row drawing

If the second-layer is 256 pixels tall I can fit the entire Y-axis of the second-layer tilemap in VRAM. I do not need to draw new rows when a tile boundary is crossed, saving CPU time and removing the Y-axis scrolling limitation. As a bonus, BGVOFS HDMA scrolling effects are simpler to implement as I don't have to worry about the tilemap seam.

Animated Mesen Tilemap Viewer GIF
VRAM tilemap of a 768x256 pixel second layer.
The second layer is scrolling 4 pixels/frame in the X-axis and 50 pixels/frame in the Y-axis.
The white box is the viewport, the red line is the tilemap seam.

If the second-layer is 256 pixels or 512 pixels wide I can do the same in the X axis.

Animated Mesen Tilemap Viewer GIF
VRAM tilemap of a 512x768 pixel second layer.
The second layer is scrolling 50 pixels/frame in the X-axis and 4 pixels/frame in the Y-axis.
The white box is the viewport, the red line is the tilemap seam.

Together, they completely eliminate the maximum scrolling speed limitation and allow the callback to reposition the second-layer wherever it likes for fast moving layer effects (like rain, wind or snow).

Animated Mesen Tilemap Viewer GIF
VRAM tilemap of a 256x256 pixel second layer.
The second layer is scrolling 64 pixels in both axis every half second.
The white box is the viewport. There is no tilemap seam.

These optimisations cannot be applied when the second-layer is PART_OF_ROOM. However, it is not an issue as new the rows or columns will only be drawn during a scrolling room transition.

MetaTiles

Storing the raw tilemap data in Work-RAM is inefficient. If I allocate 12KiB of RAM to hold raw VRAM tilemap data, the maximum second-layer size would be 6144 8px tile cells. (For reference, that's 624x624 pixels square or 6 256x256px nametables.)

One common method of improving data density is MetaTiles. MetaTiles are tiles that are made up of smaller tiles. Now there are a ton of different designs for MetaTiles (including MetaTiles of MetaTiles) and I've gone for the simplest - each map tile is a single byte and indexes a 16 pixel (2x2) MetaTile in RAM.

For speed purposes, the MetaTile data split into 4 fixed-size arrays (in RAM), top-left, top-right, bottom-left and bottom-right. This requires 2KiB of RAM, leaving 10KiB for the map data and increasing the largest map to 10240 16px tile cells (1616x1616 pixels square or 40 256x256px nametables).

Diagram showing the transformation of map data to VRAM tilemap
How the second-layer map and MetaTiles SoA are combined into VRAM tilemap data.

The main downside of MetaTiles is the reduced tile count. Unlike regular tiles, tile-flipping or palette-swapping a MetaTile creates a new MetaTile. The savings are worth it. A single MetaTile is a single byte in size and unfolds into 8 bytes of VRAM tilemap data. A 768x512 pixel second-layer goes from 12302 bytes of Work-RAM to 3598 bytes of Work-RAM (not including the VRAM tile data).

Be aware - this the size of the data in Work-RAM. I can still add compression to the my game and potentially decrease the size of the second-layer data in ROM.

Interestingly, the MetaTile drawing code is slightly faster than storing the raw tilemap data in Work-RAM3. This is because I made the buffers and VRAM map MetaTile aligned, halving the number of buffer and map wrapping tests in the drawing code.

Adding MetaTiles did introduce a few more bugs to the code and made my tilemap seam problems worse. It took a few days to bugfix and verify the scrolling code works, but again the saving are worth it (even if I cannot find an easy way to vertically center the viewport).

Callbacks

How does the game code interact with the second-layer subsystem? Through a callback. On every frame an sl_callback function pointer is called, which is responsible for scrolling the second-layer (if PART_OF_ROOM is false) and HDMA effects on the second layer.

To simplify the upcoming HDMA code, the sl_callback is responsible for calling the scroll_tilemap() function. The will allow the callback to access the xScrollShadow and yScrollShadow variables after they have been modified by scroll_tilemap().

To implement sl_callbacks I reused the room-events subsystem. sl_callback parameters are declared in the mappings.json file. Currently these parameters are split across two data structures: 8 bytes of second_layer data and 2 bytes of room data.

mappings.json
 "sl_callbacks": [
   {
     "name": "fixed_velocity",
     "source": "fixed-velocity",
     "sl_parameters": [
       {
         "name": "xPos",
         "comment": "The starting x-position",
         "type": "u16"
       },
       {
         "name": "yPos",
         "comment": "The starting y-position",
         "type": "u16"
       },
       {
         "name": "xVelocity",
         "comment": "The x-velocity",
         "type": "sQ4_12"
       },
       {
         "name": "yVelocity",
         "comment": "The y-velocity",
         "type": "sQ4_12"
       }
     ],
     "room_parameters": [
       {
         "name": "stationary_sl",
         "comment": "If true, the second layer will not scroll for this room",
         "type": "bool",
         "default": "false"
       }
     ]
   }
  ]


There are two python scripts that generates two wiz source files. sl-callbacks.wiz contains named aliases to second_layer and room data. While function-tables.wiz contains two function tables, init() and process(), that will be called by the second-layer code.

gen/sl-callbacks.wiz
import "src/memmap";
import "engine/game/second-layer";
import "engine/game/room";

namespace sl_callbacks {

struct U8Position {
    xPos : u8,
    yPos : u8,
};

in lowram {

namespace fixed_velocity {
  // The starting x-position
  // (u16)
  const parameter__xPos @ &second_layer.sl_parameters[0] : u16;

  // The starting y-position
  // (u16)
  const parameter__yPos @ &second_layer.sl_parameters[2] : u16;

  // The x-velocity
  // (sQ4_12)
  const parameter__xVelocity @ &second_layer.sl_parameters[4] : i16;

  // The y-velocity
  // (sQ4_12)
  const parameter__yVelocity @ &second_layer.sl_parameters[6] : i16;


  // If true, the second layer will not scroll for this room
  // (bool)
  const parameter__stationary_sl @ &room.sl_parameters[0] : u8;
}

}
}
gen/function-tables.wiz
namespace sl_callbacks {

let N_SECOND_LAYER_FUNCTIONS = 2;

// Called when the second layer is loaded,
// before the tilemap is transferred to VRAM.
//
// This callback is allowed to setup HDMA effects.
// DB = 0x7e
#[mem8, idx8]
const init_function_table : [ func() ; 2 ] = [
  sl_callbacks.null_function,
  sl_callbacks.fixed_velocity.init,
];

// Called once per frame
// DB = 0x7e
#[mem8, idx8]
const process_function_table : [ func() ; 2 ] = [
  sl_callbacks.null_function,
  sl_callbacks.fixed_velocity.process,
];

}


Now that the resources compiler and the game code agree on the sl_callback parameters, the game code can be compiled. Next, the resources are compiled, with the resource compilers populating the sl_callback parameter bytes with the values from the second-layer json input and <properties> tag of the Tiled room tmx files.

other-resources.json
  "second_layers": {
    "scrolling_test": {
      "source": "second-layers/scrolling-test.png",
      "palette": "dungeon",
      "mt_tileset": "dungeon",
      "above_metatiles": true,
      "tile_priority": 1,
      "part_of_room": false,
      "callback": "fixed_velocity",
      "parameters": {
        "xPos": 20,
        "yPos": 40,
        "xVelocity": 1.125,
        "yVelocity": -1.25
      }
    }
  }
rooms/05-07-heal-before-boss.tmx
<?xml version="1.0" encoding="UTF-8"?>
<map [...]>
 <properties>
  <property name="stationary_sl" type="bool" value="true"/>
 </properties>
 [...]
</map>

Together this allows me to add parametrisable effects and scrolling behaviour to the second-layer without modifying the engine, data formats or resources compiler.

Final Thoughts

I'm happy with how the second-layer subsystem turned out. It's the first time I've coded a multi-directional infinitely scrolling map on the SNES and I've got a few interesting ideas for this new layer.

The code is a little slower then I had hoped but not by much. According to Mesen's Profiler, second_layer.scroll_tilemap() uses a maximum 30034 master cycles, 22.0 scanlines or 8.4% of a frame in SlowROM.

What's next

I'm currently working on adding 8-channel music and sound-effect dropout behaviour to my Terrific Audio Driver. Once that's done I'm going to start work on a second unnamed-snes-game tech-demo. I'm hoping to release the next demo by the end of the year.

On the engine side of things there's a few more engine features I'm planning on adding:

  • Dungeons - A 2-dimensional array of rooms.
    • These will be responsible for loading and unloading mt_tilesets, second-layers and sprite tiles.
    • Dungeons will also be used for villages and the overworld.
    • Areas with multiple floors will require a new dungeon for each floor.
  • Warp tiles - I need a way for the player to enter houses and switch between dungeons.
  • A saving system.
  • NPCs with textboxes.
  • Improved collisions.

There's quite a few more things I want to add to the engine, like tile animations, palette cycling, HDMA effects and colour-math but I will probably add them after the demo. I'm going to focus on gameplay, enemies and level-design over engine features.


  1. Original requirements (JPEG image) 

  2. The tilemap is 32 tiles tall. 1 tile is used by the seam and 29 are visible on screen (since the viewport can start in the middle of a tile boundary), leaving 2 unused tiles outside the viewport. 

  3. I used Mesen's Profiler to measure scroll_tilemap() CPU time before and after implementing MetaTiles:
    Before: scroll_tilemap() uses a maximum 32120 m-cycles (23.5 scanlines, 8.98% of a frame)
    After: scroll_tilemap() uses a maximum 31642 m-cycles (23.2 scanlines, 8.85% of a frame).