S-SMP clock speed measurements and analysis

smpspeed

In 2023, lidnariq wrote a S-SMP clock speed measurement tool that used the SNES's master clock (a quartz crystal, with a typical accuracy of 30ppm) to measure the frequency of the SNES's audio clock (ceramic resonator, with a typical accuracy of 0.5% 5000pmm).

smpspeed screen capture
smpspeed test on my 1-chip Super Famicom console after 25 minutes.

With this test I could easily measure the audio clock and see it slowly speed up as the console's internals warmed up.

I quickly ran the tests on all of my consoles, taking screen captures every 5 minutes and letting the test run for 25 minutes per console.

  • 2/1/3 3-chip Super Famicom (26.1C ambient): 32147Hz cold, 32152Hz after 25 minutes
  • 1-chip Super Famicom (26.1C ambient): 32036Hz cold, 32043Hz after 25 minutes
  • 1/1/1 Super Famicom (26.2C ambient) : 32080Hz cold, 32090Hz after 25 minutes
  • PAL 1-chip SNES (25.4C ambient): 32083Hz cold, 32091Hz after 25 minutes
  • 1/1/1 Super Famicom (a with broken PPU, 26.2C ambient): 32067Hz cold, no warm measurements

In 2025, dwangoAC of the TASBot community asked for participants to run the lidnariq's smpspeed test and submit the results online. I reran the tests, this time taking screen captures every minute and running the test for 90 minutes (I had read a few reports that it can take an hour the S-SMP to hit max frequency).

Unfortunately, I wasn't able to OCR the screen captures. The various OCR programs in the Debian repositories were unable to reliably read the 0 (misread as 6 or 8), 4 (sometimes misread as 9) or 2 (misread as Z) tiles.

smpspeed-usb2snes.py

If I couldn't automatically get the data from the screenshots, I'll have to try another way. Luckily I had experience in using a usb2snes websocket to read console memory from creating my unnamed-snes-game resources-over-usb2snes subsystem.

I wrote a quick and simple script to read the values out of the VRAM1 tilemap using usb2snes while the console is running. To ensure the tilemap is read correctly (as there's no way to prevent the usb2snes from reading VRAM at the same time the S-CPU writes to VRAM), the tilemap is repeatedly read until there are 3 identical reads in a row. The test data is then extracted from the tilemap and output as a CSV line.

By default, the script reads every 5 seconds and I let the script run for 2 hours or more on all of my working consoles.

"Time", "SNES PPU", "Meaning", "Slowest", "Fastest", "S-SMP clock", "relative", "Slowest", "Fastest", "DSP sample rate"
2025-03-03T12:30:26.355847, "Connected to SD2SNES ttyACM0"
[...]
2025-03-03T12:40:26.466550, 60, 384587, 384639, 384584, 1025418, +01384, 1025280, 1025426, 32044
2025-03-03T12:40:31.462837, 60, 384587, 384639, 384584, 1025418, +01384, 1025280, 1025426, 32044
2025-03-03T12:40:36.462517, 60, 384587, 384639, 384584, 1025418, +01384, 1025280, 1025426, 32044
2025-03-03T12:40:41.462553, 60, 384587, 384639, 384584, 1025418, +01384, 1025280, 1025426, 32044
2025-03-03T12:40:46.462542, 60, 384584, 384639, 384584, 1025426, +01392, 1025280, 1025426, 32044
2025-03-03T12:40:51.554549, 60, 384587, 384639, 384584, 1025418, +01384, 1025280, 1025426, 32044
2025-03-03T12:40:56.466553, 60, 384581, 384639, 384581, 1025434, +01400, 1025280, 1025434, 32044
2025-03-03T12:41:01.550551, 60, 384584, 384639, 384581, 1025426, +01392, 1025280, 1025434, 32044
 
smpspeed screen capture @ 12:40:26
smpspeed-usb2snes.py output and matching 1-CHIP Super Famicom screen capture after running for 10 minutes

This python script is MIT Licensed and avaiable on GitHub: https://github.com/TASBotL3C/smpspeed-usb2snes

Analysing the data

matplotlib line graph with 4 entries
smpspeed-usb2snes.py output for 2 hours on all 4 of my working consoles
Console Ambient temperature Minimum DSP sample rate Maximum DSP sample rate Sample rate increase Minimum SR after max SR Sample rate drift
2/1/3 Super Famicom 29.2C - 31.1C32147.06Hz32155.78Hz1:14:59 8.72Hz32155.03Hz0.75Hz
1/1/1 Super Famicom 30.8C - 31.9C32084.91Hz32098.16Hz1:27:1013.25Hz32096.94Hz1.22Hz
PAL 1-CHIP 29.8C - 30.0C32086.88Hz32095.38Hz0:49:04 8.50Hz32094.62Hz0.75Hz
1-CHIP Super Famicom31.2C - 31.9C32040.00Hz32047.69Hz1:10:40 7.69Hz32046.97Hz0.72Hz

All four of my consoles have different DSP sample rates with similar-ish values as the 2023 tests. The differences in the 2023 and 2025 results could be explained by the different ambient temperatures. I will need to rerun the smpspeed tests when the cold seasons start to do a proper comparison.

The DSP sample rates are not stable and do drift a little as the ambient temperature and internal temperature changes.

More importantly, the DSP sample rate increase from cold to warm is an average 9.54Hz across my 4 consoles and significantly smaller than the variance between consoles. This is also reflected in the TASBot data, where the cold to warm increase is an average 8Hz while the console variance is 217Hz (31965Hz to 32182Hz).


I noticed the temperature dropping as the sun was slowly setting and left my 1/1/1 Super Famicom console running for 4 hours. The temperature did not drop much, but the S-SMP clock and DSP sample rate slowly decreased as my room got colder.

matplotlib line graph
smpspeed-usb2snes.py output of my 1/1/1 Super Famicom from 14:45 - 18:45; with manual temperature readings every 15 minutes


Finally, I manually transcribed my 2023 screenshots and compared it to the 2025 data. The 2023 sample-rates are slightly lower than the 2025 measurements. This could be explained by the colder temperatures.

matplotlib line graph
Manually transcribed 2023 smpspeed data plotted against 2025 data for 4 of my consoles

Why is this important?

As someone who has just created a homebrew SNES audio driver, there are 3 design considerations when designing an APUIO communications protocol:

  1. APU IO communication occurs across 8 unbuffered ports (4 S-CPU -> S-SMP and 4 S-SMP -> S-CPU)
  2. If a port is read at the same time it is written the data read is corrupted
  3. There are no interrupts in the S-SMP

To synchronise a data transfer or command from the 65816 to S-SMP some kind of semaphore signalling2 is typically used to:

  • Ensure that the S-SMP has finished processing the previous data/command before the S-CPU can safely write to the APUIO ports to send a new command
  • Signal to the S-SMP that there's a new data/command and the APUIO ports can be safely read

How this is done varies across games.

The Terrific Audio Driver (TAD) API is asynchronous - the main-loop writes a pending command to a queue and on the next Tad_Process call (which should be called once per frame) checks if it's OK to send a new command. In TAD there are 2 main queues, one for commands (pause, play, set-song-timer, etc) and another for sound effects. The TAD S-SMP code is designed to check and process IO commands a minimum of 125 times a second. It is possible for the audio-driver to lag and delay an IO command for a single frame. If the game-code waits until a command can be queued before advancing to the next state, that action could be delayed, depending on audio-lag and DSP sample rate.

The second most common command (after play-sound-effects) is the switch-to-loader command, required to load a new song into the Audio-RAM. The game immediately sends a switch-to-loader command and on the next Tad_Process it checks if the audio-driver has sent a "loader-active" signal before transmitting song data. Since there are no interrupts, the audio-driver can only check for a switch-to-loader command after it has finished processing music and sound effects (there's a few more checks in TAD but let's ignore them here). On one console this might take 0.126 frames to process the music-tick, on a different console it might take 0.127 frames. It might not seam like much but it is enough to potentially delay the start of song loading by 1 frame (depending on timing, lag and game-code).

Then there's audio-data loading. To ensure there is no data loss, games typically transmit 1 - 3 bytes at a time (depending on the loader) and wait for an acknowledgement from the S-SMP before sending more data. The TAD loader transmits 2 bytes at a time with a transfer rate of 847 - 855 bytes per second (faster than IPL's 678 - 685) depending on the DSP sample rate. By default the TAD API transfers 256 bytes per frame, freeing CPU time for a fade-in or loading animation. However, snesdevs can choose to ignore this and upload the whole song in one go. If they choose to bypass background loading the audio-data loading time is unknown.

Finally, I need to talk about auto-joypad read. This feature of the SNES automatically reads the controller at (roughly) the start of every Vertical-Blank. It is done in the background and has no understanding of lag-frames. If auto-joypad read is on while loading a song, the console might request 5 controller reads on one console or 4 controller reads on different console with a faster DSP clock.

The unknown audio loading time is worse in games that upload new audio samples when loading a new song. In TAD (which does not sample swap) a song might be 3-7KiB. A game with sample swapping could easily transfer 20KiB or more of audio data when loading a song.

TAD does not allow game code to query the song or sound effect state. Some games do. A game which waits until a sound effect has finished before doing the next action can also cause a TAS desync.

Simulating IPL loader speeds

I wrote a simple test ROM that uses the S-SMP's IPL (Initial Program Load) ROM to transfer 32KiB of data to the Audio-RAM. I then used Mesen to simulate a wide variety of DSP sample rates and the Mesen Profiler to measure the loader's speed (excluding initialisation and waiting for IPL).

The IPL loading speed is bottlenecked by the speed of the S-SMP audio processor, not the S-CPU (there is very little difference between FastROM vs SlowROM speeds).

Transferring 32768 bytes via IPL on a simulated NTSC console
DSP
sample rate
SlowROM FastROM
m-cyclestimebytes/frame m-cyclestimebytes/frame
31900Hz17252854803.31ms678.7417251878803.26ms678.78
31920Hz17242394802.82ms679.1517241100802.76ms679.20
31940Hz17230890802.28ms679.6117230280802.26ms679.63
31960Hz17219862801.77ms680.0417219542801.76ms680.05
31980Hz17209136801.27ms680.4717208722801.25ms680.48
32000Hz17198574800.78ms680.8817197986800.75ms680.91
32020Hz17187642800.27ms681.3217187250800.25ms681.33
32040Hz17177330799.79ms681.7317176514799.75ms681.76
32060Hz17167442799.33ms682.1217165820799.26ms682.18
32080Hz17156204798.81ms682.5717155126798.76ms682.61
32100Hz17145252798.30ms683.0017144420798.26ms683.03
32120Hz17134988797.82ms683.4117133766797.76ms683.46
32140Hz17123910797.30ms683.8517123114797.27ms683.88
32160Hz17113906796.84ms684.2517112420796.77ms684.31
32180Hz17102618796.31ms684.7017101768796.27ms684.74
32200Hz17091528795.80ms685.1517091158795.78ms685.16

To verify these numbers I ran the test on two of my consoles. The output of the ipl-speed-test is:

  1. Test number
  2. Number of Vertical Blanks encountered
  3. Amount of free CPU time until the next Vertical Blank (via spinloop counting); on an NTSC system, the maximum spinloop count is 0x1B17
Transferring 32768 bytes using the IPL.  0x30 VBlanks, 0x1927 spinloop count Transferring 32768 bytes using the IPL.  0x2F VBlanks, 0x0282 spinloop count
ipl-speed-test screen captures on my 1-CHIP Super Famicom console (~32038Hz DSP sample rate) and my 3-CHIP Super Famicom console (~32147Hz DSP sample rate)

A quick bit of math3 gives the following IPL loading speeds (these numbers include the loader setup):

  • 1-CHIP SFC (~32038Hz DSP sample rate): 48.1 frames, 799.9ms
  • 3-CHIP SFC (~32147Hz DSP sample rate): 47.9 frames, 797.1ms

Which is consistent with the simulated loader times above.

Simulating the TAD loader speeds

I also simulated the Terrific Audio Driver loader speeds for various DSP sample rates. It is 24.8% faster than the IPL.

The IPL transfers 1 byte at a time as it is designed to fit in a tiny 64 byte ROM. The TAD loader is 116 bytes, transfers 2 bytes at a time, and also manages the common-audio-data and song-data memory addresses.

This table was created by bypassing the TAD API bytes/transfer frame limits, transferring 32KiB in one go and measuring the TadPrivate_Loader_TransferData execution time using the Mesen Profiler.

Transferring 32768 bytes via the Terrific Audio Driver loader on a simulated NTSC console
DSP
sample rate
SlowROM FastROM
m-cyclestimebytes/frame m-cyclestimebytes/frame
31900Hz13821380643.54ms847.2613820958643.52ms847.28
31920Hz13812564643.12ms847.8013812478643.12ms847.80
31940Hz13804008642.73ms848.3213803810642.72ms848.33
31960Hz13795408642.33ms848.8513794998642.31ms848.88
31980Hz13786768641.92ms849.3813786590641.92ms849.39
32000Hz13778168641.52ms849.9113777814641.51ms849.93
32020Hz13769524641.12ms850.4513769330641.11ms850.46
32040Hz13760840640.72ms850.9813760738640.71ms850.99
32060Hz13752412640.32ms851.5013751998640.30ms851.53
32080Hz13743816639.92ms852.0413743550639.91ms852.05
32100Hz13735216639.52ms852.5713734994639.51ms852.58
32120Hz13726660639.12ms853.1013726398639.11ms853.12
32140Hz13718152638.73ms853.6313717914638.72ms853.65
32160Hz13709552638.33ms854.1713709394638.32ms854.18
32180Hz13701000637.93ms854.7013700838637.92ms854.71
32200Hz13692616637.54ms855.2213692390637.53ms855.24

What does this mean for TAS hardware verification?

It truly depends on the game. There's a potential 10000-60000 CPU/SMP synchronisations loading the audio driver, music data and BRR samples into Audio-RAM when the game starts up. Each one adds a tiny amount of variance to the game's load time. If automatic-joypad-read is on and the number of loading frames in the TAS-emulator does not match the number of loading frames on hardware, the TAS playback desyncs.

In my opinion, for a game to not TAS-desync (after the initial audio-data load) the game would need to load everything to Audio-RAM at the start, send APU commands at most once per frame, not delay actions waiting for APU acknowledgement, and process the song/sound-effects without audio-lag. (And even then I'm unsure if it a 30 minute TAS would desync or not. I'm a snesdever, not a TASer or hardware-guy.)

However, most SNES games do not do this. Not even imaginary Terrific Audio Driver games. Games can load song data whenever the song changes to free up Audio-RAM for more samples. They can also swap out samples when loading songs to improve Audio-RAM utilisation. Each load adds 1 large and 2000-40000 tiny CPU/SMP synchronisations, depending on data size and loader protocol. Each one increases the risk of crossing a frame boundary and causing a TAS desync between hardware and emulator.

What does this mean for speedrunners?

Again, it depends on the game. It the depends on:

  • The audio driver's APUIO protocol
  • The speed of the loader
  • How often the audio driver checks for a new command
  • What the game does when a command has not been acknowledged (Is it dropped? Does the game try again on the next frame? Does the game wait until acknowledgement?)
  • The tempo of the current song

Song loading delay

It's not all doom and gloom. The differences in loading times are small.

Let's take another look at the IPL loading speed, this time measuring the difference in loading speeds.

DSP sample rateFastROM timeDelta DSP sample rateFastROM timeDelta
31900Hz803.26ms 32060Hz799.26ms0.49ms
31920Hz802.76ms0.50ms32080Hz798.76ms0.50ms
31940Hz802.26ms0.50ms32100Hz798.26ms0.50ms
31960Hz801.76ms0.50ms32120Hz797.76ms0.50ms
31980Hz801.25ms0.51ms32140Hz797.27ms0.49ms
32000Hz800.75ms0.50ms32160Hz796.77ms0.50ms
32020Hz800.25ms0.50ms32180Hz796.27ms0.50ms
32040Hz799.75ms0.50ms32200Hz795.78ms0.49ms

Assuming a warm console has a DSP that runs 20Hz faster (which seams unlikely given the average cold/warm increase in the data collected is 8Hz), it will be half a millisecond faster loading 32KiB of audio data. Even then, the difference between a theoretical 31900Hz and 32200Hz DSP sample rate is 7.8ms, less than a half a frame. I suspect when the audio data loader starts (within the frame), the size of the data and what the game does after the loader have a bigger impact on lag-frames then IPL loading-times.

Switch-to-loader delay

Do not assume a faster S-SMP clock is better. The time it takes for the audio driver to process a song-tick can be greater and more unpredictable than the difference between cold and warm loading times.

Here's a hypothetical switch-to-loader command on a song running at ~100 ticks/second. In this example, the S-CPU sends the switch-to-loader command after the song has been playing for 5 minutes and the audio-driver checks the APUIO ports once per tick.

switch to loader timing table and diagram
Hypothetical switch-to-loader timings for DSP sample rates from 32000Hz to 32100Hz

The difference in tick lengths for the various S-SMP clocks is tiny, measured in microseconds, but quickly add up when multiplied by the tens of thousands (or hundreds of thousands) of song ticks.

After a few minutes the unavoidable S-SMP clock drift (as the ceramic resonator's frequency fluctuates with temperature) will cause the song's position to be effectively unknowable. This unpredictability results in a random delay based on the song's tick timer - less than 10ms for a song with a 100 ticks/second timer and <25ms for a song with a slow 40 ticks/second timer.

Here's the same timing diagram, except it covers a tiny 1.75Hz sample-rate range. The switch-to-loader delay is all over the place, 0.5ms - 9.9ms.

switch to loader timing table and diagram
Hypothetical switch-to-loader timings for a smaller DSP sample rate range


A different audio-driver (like TAD) might repeatedly check the APUIO ports for new commands whenever the driver is not processing audio. Assuming each tick uses the same amount of processing time4 we get the following timing diagram:

switch to loader timing table and diagram
Hypothetical switch-to-loader timings for an audio driver that constantly checks the APUIO ports when it is not processing audio.

Most of the time the audio-driver switches to the audio-data loader near instantly. Sometimes the switch-to-loader command is sent when the audio-driver is processing the song which causes an unpredictable delay, but it is a lot smaller than audio-drivers that process commands once per tick.

Remember this is only an analogy. The actual switch-to-loader delay depends on a lot of complex factors including the audio-driver, the song timer, song position, song complexity, song effects and sound effects.


How this affects the game also depends on how the game communicates with the audio-driver. A game that loads the level data in the following order:

  • Send a switch-to-loader APUIO command
  • Decompress level data to Work-RAM
  • Decompress graphics data to Work-RAM
  • Transfer graphics data to VRAM
  • Wait for the S-SMP to acknowledge the switch-to-loader command
  • Transfer audio data to the loader

Would be practically immune to the variable switch-to-loader delay if it took more than 2 frames to decompress the level data.

Conclusions

A SNES console's audio module runs on a separate clock to the console's S-CPU and S-PPU chips. This was probably done to simplify the audio module's development and ensure NTSC and PAL consoles (which have different master-clock frequencies) output audio at the same pitch and ~32000Hz sample rate.

All of my consoles have wildly different S-SMP clocks and process sound at different sample rates. While my 3-chip SFC conosole is running faster than the rest, they are all within the ±0.5% (5000pmm) tolerance of a typical ceramic resonator.

The ceramic resonator's frequency varies across consoles and fluctuates with temperature, causing unpredictable synchronisation delays when a console's S-CPU communicates with the external audio module's S-SMP processor, leading to desyncs when replaying a TAS on original hardware.

I'm unable to make a definitive statement on what this means for speedrunners. The S-SMP clock speed impact is minimal but it depends on the game's audio-driver and how the game communicates with the audio driver.

Special Thanks

I would like to thank:

  • lidnariq for creating the S-SMP clock speed measurement tool
  • dwangoAC and the TASBot community for spreading the word about smpspeed to and collecting the interesting data
  • Sour for creating Mesen
  • RedGuy for creating usb2snes
  • Sylvain "Skarsnik" Colinet for creating QUsb2Snes

  1. The FXPAK's save state feature monitors the data bus for Work-RAM and PPU writes and copies them to the FXPAK's cartridge memory. The usb2snes websocket allows me to easily write code that reads the FXPAK's cartridge memory. 

  2. Terrific Audio Driver combines the command ID and the semaphore into a single port. TAD also dedicates a last port exclusively to the switch-to-loader command. 

  3. ipl-speed-test output to frame time on an NTSC console is vblanks + (1 - spinloop_count/0x1b17) frames. 

  4. The amount of processing time each audio-tick takes depends on the number of effects and song-commands (play note, set volume, change instrument, etc) in the song-tick. A real song would have lots of small process ticks interspersed with longer ticks whenever the song plays a note.

    The process_music_channels() subrouine in TAD has the following statistics when playing unnamed-snes-game's village theme: average 1305 cycles/tick, minimum 1189 cycles, maximum 4147 cycles.

    To keep things simple, this diagram is ignoring the sound-effect processing. In my audio driver the sound effects are processed on a separate 125 ticks/second timer.