I’m excited to announce the project I’ve been working on for the last year and a half: Game Bub, an open-source FPGA based retro emulation handheld, with support for Game Boy, Game Boy Color, and Game Boy Advance games.

Play Video:

Game Bub can play physical cartridges, as well as emulated cartridges using ROM files loaded from a microSD card. Game Bub also supports the Game Link Cable in both GB and GBA modes for multiplayer games. I designed the hardware with a number of bonus features, like video out (HDMI) via a custom dock, a rumble motor, real-time clock (for certain games). Additionally, the hardware is designed with extensibility in mind, allowing future software improvements to expand its capabilities.

Game Bub has a custom-designed 6 layer PCB featuring a Xilinx XC7A100T FPGA with integrated memory, display, speakers, rechargable battery, GB/GBA cartridge slot, all packaged up in a custom 3D-printed enclosure.

This writeup is a detailed description of how I developed and built Game Bub. I also wrote up a document explaining the architecture of Game Bub, which is a much shorter read if you’re only interested in the technical details.

Check out the instructions, code, and design files on GitHub. Note that building a Game Bub unit is fairly complex. If you might be interested in buying a complete Game Bub kit, please fill out this form to help me gauge interest.

Introduction

I first started thinking about this project while I was finishing up work on my previous Game Boy FPGA emulator in mid-2023.

I had a lot of fun implementing a Game Boy at the hardware level, and I started thinking about how far I could take the project. I was using a Pynq-Z2 development board, which was definitely the right way to get started, but it came with a lot of limitations.

I had to use an external monitor for audio/video, and an external gamepad for input, but a real Game Boy, of course, is a portable handheld. I also wanted to add Game Boy Advance support, but the memory architecture of the Pynq-Z2 had access latency that was just barely acceptable for the Game Boy, and would have been completely unacceptable for the Game Boy Advance. I also wanted to make something less “hacky”: a real device that I could play and give to people, not just a bare PCB.

Furthermore, while there are open-source FPGA retrogaming projects (e.g. MiSTer), there doesn’t appear to be anything open-source that supports physical Game Boy and Game Boy Advance cartridges, let alone an open-source handheld device.

Thus, I somewhat naively set out to design what would become by far my most complex electrical engineering and hardware design project to date.

Goals

I set out some goals for the project:

  • Build a standalone, rechargable battery-powered FPGA handheld
  • Minimize cost and complexity by using off-the-shelf components wherever possible
  • Capable of playing Game Boy, Game Boy Color, and Game Boy Advance games
  • Capable of using physical cartridges, or emulating cartridges (reading ROM files off of a microSD card)
  • Easy to use: graphical menu and in-game overlay
  • Integrated display and speakers, with headphone support
  • Integrated peripherals (rumble, real-time clock, accelerometer) for emulated cartridges
  • HDMI video output support for playing on a big screen
  • Decent looking design with good ergonomics
  • Expansion opportunities in the future: support for more systems, Wi-Fi, etc.

And finally, since I was building this project for fun and learning, I wanted to be able to fully understand every single component of the system. I wanted to use my own emulator cores (e.g. not just port them from MiSTer), do my own board design, and write my own drivers to interface with peripherals.

A brief rant about FPGA retrogaming

There’s a lot of misleading marketing and hype out there around FPGA retrogaming. Some claim that FPGA retrogaming devices are not emulators (because they supposedly “act like [the system] at the gate level”), that they achieve “perfect accuracy”, or that they’re superior to software emulators.

In my opinion, this is blatantly wrong and actively harmful. FPGA retrogaming devices are emulators: they pretend to be something they’re not. And they’re only as accurate as they’re programmed to be, since they’re recreations. An FPGA can make certain aspects of accuracy easier to achieve, but it doesn’t guarantee it.

Software emulators can be extremely accurate. Furthermore, perfect accuracy (if it’s even possible) is by no means a requirement to play an entire system’s library of games. Some people claim that FPGA emulators are the only way to “preserve” a system, but I’d argue that software emulators are a significantly more accessible (no special hardware needed!) way to further this goal.

I believe that FPGA emulators have only one real advantage over software emulators: they can more easily interface with original hardware, such as physical cartridges or other consoles via link cables.

I did this project not because I think that FPGA emulators are inherently better than software emulators, but because I think they’re interesting and fun to build.

High-level design

I began work on the project by doing some initial research and sketching out a high level design.

My previous FPGA emulator project used a Xilinx Zynq chip, which integrates FPGA fabric (“PL”) with a dual-core ARM processor running Linux (“PS”). I implemented the entire emulator on the FPGA, and used the Linux system to configure the FPGA, render the UI, and load ROM files from the filesystem.

I decided to keep this same division of responsibilities: using the FPGA to do the core emulation, with a separate processor to do support tasks. However, to make the overall design easier to reason about, I decided to to use an FPGA-only chip (without any hard processor cores), and an external microcontroller (MCU) to do the tasks that the ARM cores did before.

The FPGA would consume input, directly interface to the game cartridges (through level shifters to support both the 3.3 volt GBA and 5 volt Game Boy), and output audio and video to the speakers and display. The MCU would handle the UI, read ROM files from the microSD card, initialize peripherals (display, DAC, IMU), handle power sequencing, and load the FPGA configuration.

I wanted to have Wi-Fi and Bluetooth support: Wi-Fi for software updates, and the possibility of emulating the Game Boy Advance Wireless Adapter, and Bluetooth to support wireless game controllers (when connected to an external display). To reduce complexity (and avoid the need for careful RF design), I looked only for complete Wi-Fi/Bluetooth modules with integrated antennas.

An early block diagram I sketched out

An early block diagram I sketched out

I also drew out rough sketches of what the final device might look like: placement of buttons, screen, speakers, ports, cartridge slot, and battery. I settled on a vertical Game Boy Color-esque design (as opposed to a horizontal Game Boy Advance-style design), because I felt that this would maximize the space in the back of the device for full-size Game Boy Color cartridges and a battery.

Component selection and compromises

After sketching out the goals and high level design, I started component selection: picking out each non-trivial component of the system, evaluating features and requirements (e.g. how they communicate, power consumption and voltages needed).

Since I intended to have this manufactured and assembled at JLCPCB, I strongly preferred parts that were available in their part library. One technique I even used for narrowing down part choices was finding the relevant category in their part search, and sorting by their stock count.

Microcontroller

I initially planned to use an RP2040 microcontroller, with a separate ESP32-WROOM module to support Wi-Fi and Bluetooth.

The ESP32 supports both Bluetooth Classic and LE, which is essential for supporting a wide range of controllers, and the RP2040 has USB host support, to support wired controllers.

During the schematic design process, I ended up simplifying the RP2040 + ESP32 combination to just a single ESP32-S3 module for a few reasons:

  • I started running out of GPIOs on the RP2040, and I was dedicating 4 of them (2 for UART, 1 for reset, 1 for booting in firmware download mode) to communication with the ESP32. Plus, the ESP32-S3 has more GPIOs overall.
  • I wanted to write the MCU firmware in Rust, and the ESP32-S3 had support for the Rust standard library (via ESP-IDF and esp-idf-hal). This seemed like it would be easier to get the software up and running.
  • Fewer components means easier routing and assembly
  • The ESP32-S3 has an SDIO module (for interfacing with the microSD card), and FAT filesystem support (via ESP-IDF). It would be possible to do this with the RP2040 PIO, but having a proper peripheral and driver for this makes it a lot easier.
  • The ESP32-S3 is more powerful than the RP2040, and would probably be able to render a smoother UI.

However, the ESP32-S3 has one main disadvantage compared to the original ESP32: it doesn’t have Bluetooth Classic support, only LE. This would greatly limit the range of supported wireless controllers, but I believed the compromise was worth it. I also decided to scrap USB host support, because supporting USB-C dual role (switchable device or host) would have added a lot of additional complexity.

If the RP2350 microcontroller (the successor to the RP2040) had been available when I started this project, I may very well have chosen it, since it has even more power, PIO blocks, memory, and GPIO pins. I might have paired it with an RM2 radio module for Wi-Fi and Bluetooth.

Display

I wanted a display that would support integer scaling for the Game Boy Advance, which has a 240x160 pixel screen. I was also looking for a screen roughly on the order of 3.0-3.5 inches wide (diagonal), to be comfortable to hold in the hand.

I found the ER-TFT035IPS-6 LCD module from EastRising, with a 3.5 inch display, and a 320x480 pixel resolution. This allows for a 2x integer scale for the Game Boy Advance (and a 2x scale plus centering for the 160x144 Game Boy display). This checked off almost all of the boxes: integer scaling, a good size, available at a reasonable price, pretty good documentation (for the ILI9488 LCD controller).

ER-TFT035IPS-6 LCD module

ER-TFT035IPS-6 LCD module

The main issue, which actually ended up being fairly annoying, is that it’s a 320x480 display, not 480x320. Meaning, it’s oriented in portrait mode, not landscape. I rotated the device 90 degrees to fit in a landscape orientation, but this created two issues:

  • In landscape orientation, the bottom of the display (containing the LCD driver chip and the flex cable) faces to the left or the right, which means that larger bazels are required on the left and right of the display to center the “active area” of the LCD within the handheld.
  • In landscape orientation, the display refreshes from left to right, not top to bottom.

The problem with refreshing from left to right is that the Game Boy and Game Boy Advance (and almost every other system) refresh from top to bottom. This means that the display can’t be refreshed perfectly in sync with the game (zero buffering), and single buffering leads to unsightly diagonal tearing. Instead, I had to use triple buffering, where the game is writing to one framebuffer, the LCD driver is reading from another buffer, and there’s one spare swap buffer. This increases the amount of memory used – and because it needed to be accessed by both the game and LCD driver simultaneously (dual port), it needed to be stored in internal block RAM in the FPGA, a scarce resource.

So, even though the Game Boy emulator uses <10% of the total logic resources of the FPGA, and the Game Boy Advance uses around 30%, I had to use a large (more expensive, and power hungry) FPGA so that I had enough block RAM.

I also stuck a standard size HDMI port into the design, connected directly to the FPGA. HDMI has a few additional, non-video signals that need level shifting from 5V to 3.3V (I opted for discrete transistors), and it requires the source (me!) to supply a small amount of power.

Power

Battery

I had never previously designed anything that used a lithium ion battery, so I had a fair amount of learning to do. Adafruit was a helpful resource. I needed a way to charge the battery from USB power, and a way to measure how charged it is.

Lithium ion batteries can be dangerous if misused. Safely charging a battery is non-trivial, and requires a feedback loop and adjustable voltage sources. A dedicated IC seemed like the best way to do this. A lot of hobbyists use the ultra-cheap TP4056 1A battery charger, but I’d read about a lot of issues it has around safely charging the battery while using it. I decided instead to opt for the TI BQ2407x series of battery charger ICs. They seem to be widely used in commercial products, came with a comprehensive datasheet, and had a few critical features: programmable input and charge current limits, safety timers, and “power path management” for safely charging the battery while the device is on.

Typical discharge curve for a 3.7V lipo battery (source: Adafruit)

Typical discharge curve for a 3.7V lipo battery (source: Adafruit)

There are a few ways to measure the charge level of the battery, which generally relies on the fact that a lithium ion battery’s voltage depends on its charge level. A fully charged battery is about 4.2 volts, a battery with between 80% and 20% charge is about 3.7 volts, and below that a drained battery falls off pretty quickly to under 3.0 volts. If all you want is a coarse estimate of the battery level, you can use an ADC to read the voltage and estimate whether the battery is fully charged or nearly discharged. However, since the voltage curve is nearly flat between 20% and 80% charge (and is also dependent on the load), this can’t give the fine-grained battery percentage that we’re used to on phones and laptops. Instead, I opted for a discrete fuel gauge IC, the MAX17048. It’s simple to integrate and inexpensive.

Power switch

I decided to use a push button for the main power switch, because I needed to be able to do a graceful shutdown, where the microcontroller could save state (e.g. the current save file for an emulated cartridge) before it actually powered off.

I used this push-on hold-off circuit from Mosaic Industries. The “hold to power off” feature is useful if I need the force the system off, for example if the microcontroller crashes or is misbehaving. The circuit is built out of discrete MOSFETs.

I briefly considered using an ultra-low power, always on microcontroller to act as a custom PMIC provide power switch functionality (and perhaps avoid the need for a separate real-time clock IC, and even a battery gauge). While this would have been flexible and really cool, I figured it wasn’t worth the additional complexity.

Power regulation

The main system power ranges from about 3.4 V when the battery is discharged, to 4.2 V when the battery is fully charged, up to 5.0 V when the device is plugged in with USB.

The ESP32-S3 module required 3.3 V, and most of the other ICs in the system did too. The main exception is the FPGA, which requires a 1.0 V core power rail, a 1.8 V “auxiliary” power rail, and a 3.3 V power rail for I/O. Moreover, according to the Xilinx Artix-7 datasheet (DS181), these power rails need to be powered on in a particular sequence: for my use, this means 1.0 V, then 1.8 V, then 3.3 V. Additionally, I needed a 5.0 V supply to interface with Game Boy / Game Boy Color cartridges.

There are multi-rail power regulators available, and a lot of FPGA development boards use them. However, they all seemed to be expensive and difficult to purchase in low quantities. Instead, I opted for separate power regulators for each rail. I used buck converters instead of linear regulators to maximize power efficiency.

I used the TLV62585 converter for the 3.3 V, 1.8 V, and 1.0 V rails. This is a simple, performant buck converter with a “power good” output, which is useful for power sequencing: you can connect the power good output of one regulator to the enable pin of the next regulator, to power on the rails in the desired order.

For the 5.0 V rail, I used the TPS61022 boost converter. This converter is way overkill for the 5.0 V rail (which might use 75mA max), but it was readily available, and conveniently compatible with the same 1µH inductor as the buck converters.

According to the FPGA datasheet, the XC7A100T consumes more than 100mW of static power. That is, it consumes that as long as it’s connected to power, even if it’s doing absolutely nothing. I figured I might want to support a low power sleep mode, so I decided to split the FPGA into a separate power domain with an explicit power enable signal from the MCU. I also used an AP2191W load switch for the FPGA’s 3.3 V rail to be able to keep the 1.0 V → 1.8 V → 3.3 V sequencing.

Audio

I wanted the device to have both speakers and a 3.5mm headphone jack. Ultimately, the FPGA generates an I2S digital audio signal, and I needed a DAC to convert it to an analog audio signal, and then an amplifier to drive the speakers (or headphones). I wanted digital volume control (to support volume buttons, rather than a volume knob or slider), and I needed some way to switch the audio output between speakers and the headphones, depending on whether or not headphones are plugged in. With no real audio experience, this seemed like a daunting task.

While searching for multiple separate components, I stumbled upon the TLV320DAC3101. It combines a stereo DAC with a speaker amplifier and a headphone driver. Additionally, it supports digital volume control, and headphone detection. I think this chip is a good example of how thoughtful component selection can simplify the overall design. Looking through the datasheet, it required a 1.8 V core voltage (unlike essentially every other component other than the FPGA) and a fair amount of configuration registers to set over I2C, but it had all of the features I needed.

I was originally planning to have just a single (mono) speaker, but I figured if I had a stereo DAC, I might as well put two in there. I chose the CES-20134-088PMB, an enclosed microspeaker with a JST-SH connector. Having an enclosed speaker simplified audio design, because as it turns out, you can’t just stick a speaker to a board and expect it to sound okay (Same Sky, the manufacturer of that speaker, has a blog post explaining some of the nuances).

Buttons

I prefer the feeling of clicky, tactile buttons (such as those found in the GBA SP, Nintendo DS (original), Nintendo 3DS, Switch) compared to “mushy” membrane buttons (such as those found in the Game Boy Color, original GBA, and Nintendo DS Lite). I learned that the tactile switches used in the GBA SP are a widely available off-the-shelf part from Alps Alpine. I used similar, but smaller buttons for the Start/Select/Home buttons, and a right-angle button from the same manufacturer for side volume and power buttons.

Although I only had plans to support Game Boy and Game Boy Advance (requiring a D-pad, A and B buttons, L and R shoulder buttons, and Start/Select), I opted to add two more “X” and “Y” face buttons to leave the possibility open of supporting more systems in the future.

The L and R buttons posed an additional challenge – I found numerous right-angle tactile buttons (to be soldered onto the back, facing towards the top). However, none of them seemed to have the actuator (the part of the button you make contact with) far enough away from the PCB to be easily pressed. At first, I thought about making a separate shoulder button board to move them at the correct distance, but then I started looking at what existing devices do for inspiration. The Game Boy Advance SP actually uses a more complex mechanism for the shoulder buttons: rather than a simple actuator like the face buttons, there’s a hinge with a torsion spring that hits the actuator at an angle. This is actually part of what makes the shoulder buttons pleasant to press: you don’t need to hit them from exactly the right direction, because they pivot. I ended up just going with a standard right-angle tactile button, opting to solve the problem with the mechanism in the enclosure.

GBA SP shoulder button mechanism

Memory

One of my main goals was to allow ROM files to be loaded from a microSD card, rather than only being able to be played from a physical cartridge. To do this, I’d need dedicated RAM for the FPGA to hold the game. Game Boy Advance games, typically, are a maximum of 32 MB. They don’t make SRAMs that large (and if they did, they’d be very expensive). Instead, I needed to use DRAM.

Asynchronous SRAM is very simple: supply a read address to the address pins, and some amount of nanoseconds later, the data you’re reading appears on the data pins. DRAM is more complex: the simplest kind is “single data rate synchronous DRAM” (SDR SDRAM, or just SDRAM, distinguishing it from the significantly more complex DDR SDRAM). However, even SDRAM is non-trivial to use. DRAM is organized into banks, rows, and columns, and accessing DRAM requires sending commands to “activate” (open) a rows before reading out “columns”, and then “precharging” (closing) a row. Handling all of this requires a DRAM controller (see this simple description of the state machine required). This isn’t terribly complex, but I was signing myself up for more work.

Alternatively, I could have chosen a PSRAM chip (essentially DRAM with an integrated controller to make it have a more SRAM-like interface). However, I couldn’t find a PSRAM part that I was happy with (cost, availability, interface), and so I ended up going with the inexpensive W9825G6KH 32MB 16-bit SDRAM.

I also decided to stick a 512 KiB SRAM chip in the design in case I ended up needing some more simple memory later, like for emulating the SRAM used for Game Boy cartridge save files. Despite being 1/64 the capacity, this chip was about 3x the cost of the SDRAM. This ended up being a wise decision, since a lot of my internal FPGA block ram was eaten up by the triple buffer for the display (see above).

Of course, there’s no point to an FPGA emulator that can’t play actual cartridges or interact with other devices. The cartridge slot and link ports are no-name parts from Aliexpress, easily available for cheap. These seem to mostly be GBA SP compatible, and are often used as repair parts.

I’d already used these for my cartridge adapter board in my first Game Boy FPGA project, and so I used a similar design: 2x 16-bit level shifters for the majority of the cartridge slot signals (since the Game Boy runs at 5V, which is incompatible with the FPGA), and a few 1-bit level shifters with individual direction control for some extra signals on the cartridge slot, as well as the four signals on the link port.

The Game Boy Advance can play both Game Boy [Color] and Game Boy Advance games. These run at different voltages and use different protocols, so the device need some way of determining which type of cartridge is inserted.

GBA cartridge (top) vs GB cartridge (bottom)

GBA cartridge (top) vs GB cartridge (bottom)

The cartridges are physically different at the bottom: GBA cartridges (the top cartridge in the image) have a notch on either side. The GBA has a detector switch that senses the absence of a notch on an inserted cartridge and switches the device into Game Boy Color mode.

I measured the size and position of this notch, and searched Digi-Key and Mouser for switches that met these constraints. In the end, I was only able to find a single switch that would work.

Miscellaneous peripherals

I used the surprisingly cheap LSM6DS3TR-C IMU from ST. This tiny IMU has a 3-axis accelerometer and gyroscope, more than sufficient for emulating the few GB/GBA cartridges that have motion controls.

For keeping track of time even when the device was off, I used the PCF8563T real-time clock chip. I chose this because it was 1) I2C (no additional pins required) 2) cheap and 3) readily available from JLCPCB. Interestingly, all of the real-time clock chips I found count in seconds/minutes/hours/days/months/years. This makes sense for a really simple device with minimal computational power. However, it’s annoying for my purposes, since all I really want is a timestamp I can pass to some other datetime library, and converting between the calendar time and a unix timestamp is non-trivial due to how the chips incompletely handle leap years.

I picked up a few cheap coin vibration motors to use for vibration support (for the rare cartridge that had a built-in vibration motor).

I also used a TCA9535 I2C I/O expander to connect the face buttons to the MCU. I ran out of pins, and while I could have used the FPGA as a sort of I/O expander, I figured I’d make it simpler for myself (and allow the buttons to be used even if the FPGA was powered off) by letting the MCU read them itself.

PCB Design

Schematic

For this project, as with my previous ones, I used KiCad to create my schematic and do PCB layout. I really can’t recommend KiCad enough: it’s a great program, intuitive to use, and it’s free and open source.

This was a very ambitious project for my level of electrical engineering experience, and creating the schematic took a couple of weeks. I spent a lot of time designing the circuit for each component, because I was afraid I’d do something wrong and end up with a stack of useless boards without the skills needed to debug them. A lot of the component selection actually happened in parallel with schematic design, as I found new requirements or problems and had to change components.

I gained a lot of experience reading component datasheets. It’s a really valuable skill, both for component selection and for creating designs that use the components. Nearly every datasheet has a “typical application” section, where the manufacturer shows how the component would fit into a circuit. At minimum, this has power supply information (e.g. these voltages to these pins with these decoupling capacitors). For more complex components like the DAC, it also has information about power sequencing, different ways the device could be connected to the rest of the system, a register list, that sort of thing. Some components also included PCB layout recommendations. This information was all really helpful, and gave me a good deal of confidence that my board would work as long as I read through the datasheet and followed the manufacturer’s recommendations.

Then I got to the FPGA. Nearly every component has a single datasheet. Some of them have an additional application note or two. Particularly complex chips (like the ESP32-S3 microcontroller) have a separate datasheet, reference manual, and hardware design guide. The Xilinx Series 7 FPGAs have dozens of datasheets. Overviews, packaging and pinout, configuration guides, BGA design rules, power specifications, clocking resources, I/O specifications, PCB layout guides, design checklists… even a 4MB Excel spreadsheet for estimating power consumption! And believe me, Xilinx didn’t just write documentation for fun: there’s so much documentation because the chip needs this much documentation.

Designing with the FPGA was overwhelming, and way beyond my experience level. At several points I genuinely considered dropping the project altogether. Fortunately, I persevered, and gradually internalized a lot of the information. I also read through the schematics of any open-source Artix-7 development board I could get my hands on. Seeing what other people were doing gave me more confidence that I was doing the right thing.

Eventually, I laid out all of the components, connected them, ensured all of the nets were labeled, and ran KiCad’s electrical rules checker (ERC) to find obvious mistakes, I moved on to layout.

Layout

I did PCB layout at the same time as some of the initial enclosure CAD. The mechanics of how everything fit together influenced the placement of the display connector, cartridge slot, buttons, speakers, and connectors. After I came up with a plausible enclosure design, I placed some of the first key components onto the PCB and locked them into place while I did the rest of the routing.

Rough enclosure design to help with board layout

Rough enclosure design to help with board layout

I first focused on components that would be hardest to route. Primarily, the FPGA: the package I was using (CSG324) is a BGA, 18x18 with 0.8mm pitch between pins. “Fanning out” all of the I/O signals requires careful routing, and at 0.8mm pitch, it’s difficult to do this routing with cheap PCB manufacturing techniques. I ended up being able to do this routing with a 6-layer PCB (three signal, two ground, one power), with 0.1mm track width and spacing, and 0.4/0.25 mm vias. Fortunately, this is all within the realm of JLCPCB’s capabilities.

BGA fanout with thin traces and small vias

BGA fanout with thin traces and small vias

As I routed signals out from the FPGA to other parts, I assigned those signals to the FPGA pins. Similarly, with the MCU, I assigned signals to pins in a way that made routing easier. Certain signals had restrictions (e.g. on the FPGA, the main 50 MHz clock signal can only go into certain pins, or the configuration bitstream can only go to certain pins, or certain pins are differential pairs for HDMI output), but overall, I had a lot of flexibility with pin assignment.

KiCad has a feature where it automatically backs up your project as you work on it. I changed the settings to save every 5 minutes and not delete old backups, which allowed me to generate this timelapse of my layout process:

Revision 1 board layout timelapse

Once I finished placing and routing all of the components, I ran the design rules checker (DRC) and fixed issues. I hesitated for a while before sending the PCB for manufacturing. I re-read the schematics, reviewed the layout, and eventually felt confident enough that I was done. I submitted the order to JLCPCB, and after a few questions by their engineers about component placement, they started manufacturing it.

Board testing and bring-up

After two weeks or so, I received the assembled boards in the mail:

An assembled board and an unassembled board

An assembled board and an unassembled board

First, I probed the power rail test points with a multimeter to check for shorts. Then, I plugged the boards in for the first time, and pressed the power button. To my delight, the green LED turned on, indicating that the power button circuit, power path, and 3.3V regulator worked. The microcontroller USB enumerated, and I could see that it logged some errors (since I hadn’t flashed anything to it yet).

I intended to write the MCU firmware in Rust, but I did initial board testing and bring-up with MicroPython. This would let me interactively type in Python and write basic scripts to communicate with the peripherals on the board and make sure I had connected everything correctly. I didn’t have to worry about writing efficient or well-organized code, and could just focus on functionality.

I flashed the MicroPython firmware image, and wrote a couple lines of Python to blink the LED. I powered on the FPGA power domain, and checked that the +1V0, +1V8, and +3V3_FPGA rails had the correct voltage.

Next, I wrote a simple bitstream for the FPGA that read the state of the buttons and produced a pattern on the shared signals between the FPGA and the MCU. I wrote simple Python code to configure the FPGA, loaded up the bitstream, and polled the signals from the FPGA. Pressing buttons changed the state, and confirmed that the FPGA was properly powered, and configurable from the MCU.

After I confirmed the FPGA worked, I started writing a simple display driver to initialize the LCD and push some pixels from the MCU over SPI. The initialization sequence uses a number of LCD-specific parameters (voltages, gamma correction, etc.), that I learned from the LCD manufacturer’s example code.

(Slowly) pushing pixels to the LCD

The LCD module’s controller, an ILI9488, has a few quirks: despite claiming that it supports 16-bit colors over SPI, it actually only supports 18-bit colors. This unfortunately meant that the MCU’s LCD driver would be more inefficient than I expected, since it has to expand 16-bit colors to 18-bit before sending them over the bus. This didn’t end up being a huge issue, however, because the FPGA is the one driving the display most of the time.

Another quirk (hardware bug?) is that the ILI9488 doesn’t stop driving its SPI output line, even when its chip-select signal is inactive. This means that the chip will interfere with any other communication on the bus… including the FPGA, which sits on the same bus. I never actually needed to read any data back from the LCD (and even if I did, it supports three-wire SPI), so I just cut the trace between the LCD’s SDO line and the SPI bus.

Debugging the LCD test code

Debugging the LCD test code

Trouble with power domains

I started trying to communicate with the I2C peripherals (I/O expander, RTC, etc.), and found that nothing was responding. A bit of probing with a logic analyzer revealed that the SCL/SDA lines were being held low, and that powering on the FPGA power domain let the lines be pulled high and communication to happen.

I deduced that this was due to the DAC, which had its IOVDD powered by +3V3_FPGA, which likely caused its protection diodes to pull the IO lines (SCL and SDA) low:

The problematic portion of the schematic

The problematic portion of the schematic

I tested out this theory by cutting the PCB traces connecting the DAC’s IOVDD and +3V3_FPGA with a knife. After this, I2C worked even with the FPGA power disabled. Then, I tested a possible fix by adding a wire to power the DAC’s IOVDD from the +3V3 rail. I confirmed that I could still talk to the other I2C devices, and once enabling FPGA power, that I could talk to the DAC too.

DAC IOVDD rework

DAC IOVDD rework

While bringing up the LCD, I saw that the FPGA was also pulling down the shared SPI bus lines while it was unpowered. Not enough to prevent communication with the LCD, but it still wasn’t great. Between this and the DAC issue, I learned an important EE lesson: be careful when connecting components in different power domains together. A tristate buffer, such as the 74LVC1G125, could have helped here to isolate the buses.

Once I2C was working, I wrote some basic driver code for the fuel gauge, real-time clock, IMU, and I/O expander, just to check that they all worked correctly. I also checked that the MCU could read from and write to the attached microSD card.

Audio and video output from the FPGA

Next, I updated my testing FPGA bitstream with a to output a test pattern over the LCD parallel interface (“DPI”), and a test tone to the DAC over the I2S interface. Then, I began poking on the MCU side to configure the LCD controller and DAC appropriately.

With some amount of trial and error, I convinced the LCD to accept input from the FPGA. Most of the trial and error revolved around the rotation of the LCD module. Soon after, I configured the DAC properly, and it played the test tone from the FPGA over the speakers and the headphones.

WIP video output from the FPGA

At this point, much of the board was working, so I soldered on the rest of the components (cartridge slot, cartridge switch, link port, shoulder buttons).

With the cartridge slot in place, I had everything I needed to port over the Game Boy emulator from my last project. I did a quick-and-dirty port of the emulator, with some hacking around to connect the core to the audio, video, and the physical cartridge. I was able to play the first Game Boy game on the device far sooner than I was expecting:

Pokemon Silver running from cartridge

FPGA communication and memory

I spent the next month or so implementing things on the FPGA. I started on the SPI receiver implementation, so that the MCU and FPGA could communicate.

It was relatively straightforward to write the initial version, which 4x oversampled the SPI signals from the main system clock. For the Game Boy, that was ~8 MHz, for a maximum SPI speed of 2 MHz. The MicroPython ESP32-S3 SPI implementation supported only single SPI, so that allowed for a maximum transfer speed of 256 KB/s. This was sufficient to do most of my initial testing, but I later wrote an improved SPI receiver to run with an internal 200 MHz clock (from a PLL that turned on and off with the chip-select signal to save power), communicating with the rest of the system via a pair of FIFOs. This added a lot of complexity and edge cases, but it greatly improved performance, allowing the bus to run at 40 MHz.

I wrote the SPI interface to the FPGA with memory-like semantics: each SPI transfer starts with a command byte, encoding whether it’s a read or write transfer, the size of each word in the transfer (8, 16, or 32 bits), and whether the “target address” should autoincrement as the transfer progresses. Then, a 32-bit address, followed by reading or writing the data. Each thing that the MCU might want to access (control registers, blocks of memory) are mapped into the 32-bit address space.

As with my previous FPGA project, I wrote almost all of the FPGA code in Chisel, a Scala-based HDL. The remaining bits were the top-level Verilog. Chisel made it really simple to parametrize, compose, and test the various modules that I wrote.

Once I had the SPI receiver working, I wrote controllers for the on-board SRAM and SDRAM. The SRAM was relatively simple (although I still got it slightly wrong at first). The SDRAM was a bit tricky, and even as I write this I’m not quite satisfied with its performance, and intend to rewrite it in the future.

I exposed the SRAM and SDRAM interfaces to the MCU via SPI, which allowed me to read and write to these pieces of memory from the MCU. I used this a lot for testing: writing patterns into memory and reading them back to ensure that read and write both worked.

Side note: SDRAM has to be continuously refreshed, otherwise the stored data decays over time. It depends on the chip, but typically each row has to be read and written back (or auto-refreshed, which does the same thing) at least once every 64 milliseconds to avoid losing state. What I found interesting, however, is that the data can actually persist for quite a bit longer. I discovered that when I was reconfiguring the FPGA between tests, most of the test data that I had previously written would still stick around even without being refreshed. In the first few seconds some bits would start flipping, and over the course of a few minutes, most of what was written was completely unintelligible.

With the SDRAM controller and SPI receiver written, I was then able to implement the “emulated cartridge” part of the Game Boy emulator, where the MCU reads a ROM file off of the microSD card and sends it to the FPGA to be stored in SDRAM. Then, the FPGA “emulates” a cartridge (rather than interfacing with a real physical cartridge). After a few stupid mistakes, I was able to run test ROMs and homebrew. As an added bonus, since I was using my own SDRAM controller directly, I didn’t have any of the performance issues I’d faced before when accessing the ROM stored in memory.

Writing the microcontroller firmware in Rust

By this point I had tested, in some form or another, all of the different components of the system. I’m really surprised that everything worked in my first board revision – even the rework I did early on wasn’t actually required for functionality.

I decided now was a good time to start building an interactive GUI. Up until this point, I had just been running commands in the MicroPython REPL. However, I didn’t want to build a whole UI in Python just to throw it away later, so I also started working on the “production” Rust firmware.

In the last few years, a lot of progress has been made towards making Rust on the ESP32 chips work well, even on the chips that use the Xtensa ISA. I followed the Rust on ESP Book and quickly had an environment set up. I opted for the “Rust with the Standard Library” approach, so that I could benefit from ESP-IDF, especially the built-in support for USB and SD cards with the FAT filesystem.

I started porting over the drivers I had written in Python. I found embedded Rust to be a bit verbose in some cases, but overall pleasant to use and worth the (little) trouble.

GUI

I starting writing my own minimal GUI framework for basic menus. I poked around with the embedded_graphics library, but soon found that the typical patterns I was expecting to use weren’t a great fit for Rust. I also started planning out different screens and realized that I probably actually wanted to use a more comprehensive UI framework.

Early main menu screen

Early main menu screen

Early rom select screen

Early rom select screen


Ultimately, I settled on Slint, a Rust-native declarative GUI framework with excellent support for embedded devices. Slint has a custom DSL to describe the UI and composable components. After a bit of practice I found myself to be really productive with it. I enjoyed using Slint, and I’d use it again in the future. The authors are responsive on GitHub, and the project has steadily improved over the year or so that I’ve been using it.

There were a few rough edges for my use case, however:

  • The built-in GUI elements and examples were all heavily oriented around mouse or touchscreen navigation. Game Bub only has buttons for navigation, however, so I had to make my own widgets (buttons, lists) that worked with key navigation. This involved a few hacks, because Slint’s focus handling was a little bit simplistic.
  • The built-in GUI styles looked (in my opinion) bad on a low DPI screen. Text was excessively anti-aliased and hard to read at small sizes. This was also fixed by building my own widgets.
  • Slint doesn’t have a great story around supporting different “screens” – I had to build some of my own infrastructure to be able to support navigation between the main menu, games, rom select, settings, etc.
Main menu

Main menu

The GUI is rendered on the MCU, and then the rendered framebuffer is sent over to the FPGA. Slint supports partial rendering, where only the parts of the screen that have changed are updated, which improved performance. The FPGA maintains a copy of the framebuffer and ultimately is responsible for driving the display. This has a few advantages over driving the display directly from the MCU:

  • Sending a framebuffer at 40 MHz QSPI to the FPGA is 16x faster than sending it to the LCD controller at 10 MHz (the fastest speed supported by the ILI9488)
  • The UI is rendered at 240x160 to improve performance and maintain the GBA aesthetic, but the LCD controller doesn’t have a scaler, so the MCU would have to send 4x the pixels. The FPGA can easily scale the UI framebuffer itself.
  • The FPGA can composite the emulator output with a semi-transparent “overlay” to support an in-game menu, volume / brightness bars, battery notifications, etc.
  • An external display (e.g. monitor or TV) can be driven by the FPGA via HDMI

Firmware improvements

I spent some time making a variety of firmware improvements, mostly polish and quality-of-life. I added a settings screen to set the date and time, whether to use Game Boy (DMG) or Game Boy Color (CGB) mode when playing Game Boy games, etc.

Settings screen

Settings screen

Then I improved the ROM select file browser, and added a battery level indicator.

Rom select screen

Rom select screen

I also got sick of having to take the microSD card out of the device and connect it to my computer through a series of adapters (microSD to SD to USB-A to USB-C), so I implemented a basic utility to expose the microSD card as a USB Mass Storage Device, using TinyUSB and the ESP32-S3’s USB-OTG capabilities.

USB Mass Storage screen

USB Mass Storage screen

It was a little bit more difficult than I expected, because USB Mass Storage requires the device to provide raw block access. This means that the filesystem has to be unmounted by the device, otherwise the device and host could conflict and corrupt the filesystem. The ESP32-S3 also only supports USB Full Speed, for a practical maximum transfer speed of ~600KB/sec. It’s really useful for transferring save files or updating the FPGA bitstreams, but less useful for transferring a large number of ROM files.

Later, I implemented MBC7 support in the Game Boy emulator for Kirby Tilt ’n Tumble, using the on-board accelerometer.

Creating the enclosure

After I implemented a decent amount of software functionality, I decided to finish the enclosure design. The bare board just wasn’t cutting it anymore, and the taped LCD module, loose speakers, and rubber-banded battery was fragile.

Game Bub looking rough without an enclosure

Game Bub looking rough without an enclosure

I came into this project without any CAD or 3D printing experience. I looked at a few different CAD software packages, and I ultimately settled on FreeCAD, primarily because it was free and open source. I learned how to use the software with some video tutorials. FreeCAD, unfortunately, was a little bit rough around the edges and I ended up running into some annoying issues. Nevertheless, I powered through and finished the design.

FreeCAD view of the enclosure and some buttons

FreeCAD view of the enclosure and some buttons

I found parametric modeling, where the geometry of the model is defined by constraints and dimensions, to be intuitive. However overall, I found 3D CAD to be very time consuming. I think a large part of this is my inexperience, but thinking in three dimensions is a lot more difficult than, say, a 2D PCB layout. Creating a full assembly was even more difficult: I had to visualize how the front and rear pieces would fit together, where the screws would go, and how the buttons, screen, speaker, cartridge slot, battery, and ports would all fit in. This project definitely pushed the boundaries of my (previously non-existent) product design skills.

After finishing the design, I printed out the technical drawing at a 1:1 scale and physically placed the board and other components down as a final check. Then, I sent it to JLCPCB for manufacturing. I opted for SLA resin printing, for high precision and a smooth finish.

Enclosure technical drawing

Enclosure technical drawing

After a couple weeks, I got the finished enclosure and custom buttons back.

Front and rear half, outside

Front and rear half, outside

I put the buttons, speakers, and screen into the enclosure, screwed on the PCB, and put the whole thing together.

Assembling the front side

Assembling the front side

Game Bub, fully assembled and functional

I wasn’t sure how dimensionally accurate the 3D printing would be, so I added a lot of extra clearance around the buttons and ports. As it turned out, the printing was very precise, so the buttons rattled around a little in the oversized button holes.

It’s a little bit chunky (smaller than an original Game Boy, though!) and the ergonomics aren’t ideal, but I was really happy to finally have an enclosure. It actually started (sort of) looking like a real product, and I wasn’t constantly worried about breaking it anymore.

Game Boy Advance support

In mid-April 2024, I started working on Game Boy Advance support. I had some prior familiarity with the system, having previously written a Game Boy Advance emulator in Rust.

I won’t go into all of the details of how I wrote the emulator here (this article is already long enough!). If you’re interested, my previous article about my Game Boy FPGA emulator goes into detail about the general process of writing an emulator, and for a high-level introduction to the Game Boy Advance (from a technical perspective), I recommend Rodrigo Copetti’s article. In general, I tried to implement the emulator the way it might actually have been implemented in the original hardware: each cycle of the FPGA corresponds to one actual hardware cycle (no cheating!).

As with the Game Boy, I did nearly all of my development with a simulator backed by Verilator and SDL. By the end of the development process, the simulator was running at about 8% of the real-time speed (on an M3 MacBook Air with excellent single-core performance), which was a bit painful.

CPU

The Game Boy Advance CPU, the ARM7TDMI, is significantly more complicated than the Game Boy’s SM83 (a Z80 / 8080-ish hybrid). However, in some ways, it was easier to understand and implement: the ARM7TDMI is much closer to a simple modern processor architecture, and it’s extensively documented by ARM. For example, the ARM7TDMI Technical Reference Manual has block diagrams and detailed cycle-by-cycle instruction timing descriptions.

ARM7TDMI block diagram (source: ARM7TDMI Technical Reference Manual

ARM7TDMI block diagram (source: ARM7TDMI Technical Reference Manual

I had a lot of fun implementing the CPU. The architecture has a three-stage pipeline (fetch, decode, execute) – a division that feels natural when you implement it in hardware. The ARM7TDMI has two instruction sets: the standard 32-bit ARM instruction set, and the compressed 16-bit THUMB instruction set. I implemented the CPU the way it works in hardware, where the only difference between ARM and THUMB is the decode stage.

As I was implementing the CPU, I wrote test cases for each instruction. Each test checks the functionality of the instruction: processor state, register values after, as well as the cycle-by-cycle behavior and interaction with the memory bus. This was helpful for catching regressions as I implemented more and more control logic. It was also really satisfying to be able to implement individual instructions, then write the tests, and check that everything worked.

Chisel made it easy to write out the CPU control logic. The CPU control logic is a state machine that generates microarchitectural control signals (e.g. bus A should hold the value from the first read register, bus B should hold an immediate value, the memory unit should start fetching the computed address, etc.). Chisel allowed me to collect common functionality into functions (e.g. nextInstruction() to set up the signals to dispatch the next decoded instruction, or flushPipeline() to signal that the pipeline should be flushed and a new instruction should be fetched from the current program counter).

I found it helpful to draw out timing diagrams with WaveDrom when working through instructions, especially to deal with the pipelined memory bus.

My timing diagram of the ARM7TDMI branch instructions

My timing diagram of the ARM7TDMI branch instructions

By mid-May (about a month later), I finished the CPU implementation (with occasional bug fixes after) and moved onto the rest of the system.

PPU, MMIO, and everything else

Over the next month and a half, I implemented the majority of the rest of the Game Boy Advance. The CPU interacts with the rest of the system via memory-mapped IO (MMIO) registers. Unlike the Game Boy CPU, which can only access memory a single byte at a time, the ARM7TDMI can make 8-bit, 16-bit, and 32-bit accesses. This complicates MMIO, and the different hardware registers and memory regions in the GBA respond to different access widths in different ways.

I started with the Picture Processing Unit (PPU), which produces the video output. The author of NanoBoyAdvance, fleroviux, had helpfully documented the PPU VRAM access patterns, which gave a lot of insight into how the PPU might work internally. Tonc was also immensely helpful for implementing the PPU and testing individual pieces of functionality.

(Sort of) running a Tonc PPU demo

(Sort of) running a Tonc PPU demo

The PPU took a few weeks, and then I moved onto DMA, followed by hardware timers, and audio. Of course, as I’d try new tests, demos, and games, I’d uncover bugs and fix them.

Kirby Nightmare in Dream Land

Cartridge support

Game Boy and Game Boy Advance cartridges use the same 32-pin connector. However, they work very differently. The Game Boy cartridge bus is asynchronous: the game outputs the 16-bit address (64 KiB address space) on one set of pins and lowers the nRD pin, some time later, the 8-bit read data from the ROM stabilizes on a separate set of pins.

For the GBA, Nintendo extended the bus data width to 16-bit and the address space to 25-bit (32 MiB). However, they kept roughly the same set of pins, accomplishing this by multiplexing the 24 data/address pins: the console outputs the address (in increments of the data word size of 16-bits, for a 24-bit physical address), then lowers the nCS signal to “latch” the address in the cartridge. Then, each time the console pulses the nRD pin, the cartridge increments its latched address and outputs the next data over the same pins. This allows for a continuous read of sequential data without having to send a new address for each access. The GBA also allows games to configure cartridge access timings to support different ROM chips.

Cartridge timing

I had to do a lot of my own research here. Software emulators don’t need to care about the precise timing of the cartridge bus, so there wasn’t much documentation. To figure out the exact cycle-accurate timing, I used a Saleae logic analyzer and connected it to the cartridge bus. I wrote a test program for the GBA to do different types of accesses (reads, writes, sequential, non-sequential, DMA) with different timing configurations.

Cartridge bus analysis setup

Cartridge bus analysis setup

Portion of a trace

Portion of a trace

After coming up with numerous scenarios (especially around the interaction between DMA and the CPU, and starting and stopping burst accesses), I came up with a consistent model for how cartridge accesses worked. I created some timing diagrams to help:

Timing diagram of a non-sequential access followed by a sequential access

Timing diagram of a non-sequential access followed by a sequential access

Finally, I started implementing the cartridge controller state machine based on my observations, paired with an emulated cartridge implementation. With the emulated cartridge, I was able to properly run real games in the simulator.

Running it on the FPGA

I quickly implemented physical cartridge support, to be able to finally run it on the actual FPGA. I connected the signals, built a new bitstream, and… it didn’t work at all. The Game Boy Advance boot screen ran, but it didn’t get any further than that. I implemented the emulated cartridge on the FPGA (reading ROM files from the SD card), and it worked! Which was great, but physical cartridges still didn’t.

I used the logic analyzer to observe how my emulator was interacting with the cartridge compared to how an actual GBA, and found numerous issues.

One of the first things I noticed was short glitches on the nCS line. I knew these had to be glitches (rather than incorrect logic), because they were 8 nanoseconds long, much shorter than the ~59.6ns clock period. Since the cartridge latches the address on a falling edge of nCS, glitches cause it to latch an address when it shouldn’t, screwing up reads.

Glitches on the cartridge bus

Glitches on the cartridge bus

Here, I learned an important lesson in digital design: output signals should come directly from flip-flops, with no logic in between.

After each flip-flop outputs a new value (on the rising edge of the clock), the signals propagate through the chip. As they propagate, taking different paths of different lengths throughout the chip, the output from each lookup table (LUT) is unstable. These values only stabilize near the end of the clock cycle (assuming the design met timing closure), and then each flip-flop stores the stable value at the next rising edge. If you output a signal from logic, this instability is visible from outside of the chip, manifesting as glitches in the output signal. If you instead output the signal from a flip-flop, it’ll change only on each clock edge, remaining stable in the middle.

And of course, I had written the cartridge controller without thinking about this, and all of the output signals were generated from logic. I rewrote the controller to output everything from flip-flops, which had a series of cascading changes since all of the signals now had to be computed one clock cycle earlier than I expected.

There were other issues too – part of the problem was that my emulated cartridge model was too permissive, and didn’t catch some fairly obvious incorrect behavior. After a few days of intensive debugging with the logic analyzer, I got to the point where I could play games from physical cartridges.

Metroid: Zero Mission running from the cartridge

Cartridge prefetch buffer

The ARM7TDMI has a single shared instruction and data memory bus. As a result, a long series of sequential memory accesses is rare. Even a linear piece of code without branches that includes “load” or “store” instructions would produce a series of non-sequential memory accesses, as the CPU fetches an instruction from one location, loads a register from a different location, and then goes back to fetching the next instruction.

This poses a real performance issue on the GBA, because every non-sequential access from the cartridge incurs a multi-cycle penalty. Nintendo attempted to mitigate this somewhat with the “prefetch buffer” (read this post by endrift, the author of mGBA, for more details) which attempts to keep a cartridge read burst active between CPU accesses. Without emulating the prefetch buffer, some games lag (I noticed this the most in Mario Kart Super Circuit, and some rooms of Metroid: Zero Mission).

The prefetch buffer, while simple in theory, is not well documented and has a lot of corner cases and weird interactions. Emulator developers often start by taking a shortcut: making all cartridge accesses take a single cycle when the prefetch buffer is enabled. This wouldn’t work for me, since I actually had to interface with the physical cartridge.

So, I set out to do some more research to figure out exactly how the prefetch buffer worked. After making some educated guesses and tests, I came up with a reasonable model of how it might work.

Notes about the prefetch state machine

Notes about the prefetch state machine

Actually implementing it took a lot of work, and I kept stumbling upon more and more corner cases. Eventually I got to the point where all games appeared to run at full speed, and most importantly, didn’t randomly crash. My implementation isn’t perfect: there are still a few mGBA test suite timing tests I don’t pass, but it’s certainly sufficient to play games.

I also added support for the GBA link cable, for multiplayer games. The GBA supports a few different physical protocols with the link cable:

  • Normal: standard duplex SPI, used for communicating with accessories
  • Multiplayer: custom multi-drop UART-like protocol, used to link up to four GBAs together for multiplayer games
  • Joybus: the Nintendo N64 and GameCube controller protocol, used to connect to a GameCube
  • UART: duplex UART with flow control, not used by games
  • General Purpose: controlling the four pins individually as GPIO, not used by games

The timing of these isn’t well documented, so I did my own research.

A multiplayer mode transfer with no attached consoles

A multiplayer mode transfer with no attached consoles

I did a lot of testing with examples from the gba-link-connection library, intended for homebrew GBA games, but helpful for testing the different transfer modes in a controlled environment.

Multiplayer Mario Kart with Game Bub and a GBA

Then I got Joybus mode working, so I was able to link with GameCube games that could connect to the GBA. The adapter didn’t quite fit against the Game Bub enclosure, so I had to take it apart for testing.

Game Bub linked to a GameCube playing Animal Crossing

Game Bub linked to a GameCube playing Animal Crossing

Test ROMs and accuracy

During the emulator development, I had used various test ROMS (mentioned before) to test basic functionality in isolation. As my emulator became mature enough to run commercial games, however, I started to shift some of my focus to accuracy-focused test ROMs.

These test ROMs (such as the mGBA test suite) generally test really specific hardware quirks and timing. For example, they might test what happens when you run an instruction that ARM calls “unpredictable”, or the exact number of cycles it takes to service an interrupt in specific scenarios, or the value of the “carry” flag after performing a multiplication. These are the kinds of things that don’t actually matter for playing games, but present a fun challenge and a way to “score” your emulator against others. This also highlights the collaborative nature of the emulation development community: people sharing their research and helping each other out.

I won’t talk about all of the tests here (for my emulator’s test results, see this page). But I do want to mention the AGB Aging Cartridge. This is an official test cartridge from Nintendo, likely used as part of a factory test or RMA procedure. Apparently, Nintendo has also used it to test their emulators (e.g. their GBA emulator on the Nintendo Switch). This test has generally been considered to be difficult to pass (it tests some specific hardware quirks), but it’s easier now that the tests have been thoroughly reverse engineered and documented. Still, passing it is a nice milestone:

Passing the AGB Aging Cartridge

Passing the AGB Aging Cartridge

Second hardware revision

Towards the end of 2024, approximately one year after I originally designed Game Bub, I decided to make a second hardware revision. Over the past year, I had been keeping track of all of the things I would want to change in a future revision. Since the first version of Game Bub miraculously worked without any major issues, this list was primarily minor issues and ergonomics changes.

PCB

I fixed the minor I2C power issues, removed the reference designators from the PCB silkscreen (they looked messy with the dense board, and I didn’t use them for anything anyway), and changed around some test points. I improved the rumble circuit to be more reponsive, and switched to a PCB-mounted vibration motor.

The first version of Game Bub was fairly thick, measuring 12.9mm at the top and 21.9mm on the bottom. The thickness of the rear enclosure was dictated by the thickness of Game Boy cartridges, but I made several changes to the front. I moved the incredibly tall (8.5mm!) link port to the back, and removed the HDMI port (more on that later). I changed the headphone jack (5.0mm tall – no wonder they started getting removed from phones) to a mid-mount one that sunk into the PCB and reduced the overall height.

I also switched from an ESP32-S3-WROOM-1 module (3.1mm depth) to an ESP32-S3-MINI-1 (2.4mm depth). I should have done this from the beginning, I just didn’t even know the ESP32-S3-MINI existed. This had the side effect of giving me 3 more GPIOs, which allowed me to put the FPGA and LCD on separate SPI busses, avoiding the minor issue of an unpowered FPGA interfering with LCD communication, and allowed for faster boot because the LCD could be configured at the same time as the FPGA.

I switched the speakers, from the fully-enclosed CES-20134-088PMB to the CMS-160903-18S-X8. I made this change primarily for ease of assembly. The first speaker had a wire connector that plugged into the board, and I found it difficult to connect during assembly without having the wire interfere with buttons. The new speaker is smaller and has a spring contact connector, so it just presses against the PCB as the device is assembled. This required some speaker enclosure design – an unenclosed speaker in free air sounds quiet and tinny.

I reworked the layout of the face buttons and D-pad to match the spacing of the Nintendo DSi. This allowed me to use the silicone membranes from the DSi for an improved button feel and reduced rattling. I was also hoping to use the plastic buttons from the DSi (which were higher quality compared to my 3D printed buttons), but even with the new thinner design, the buttons weren’t quite tall enough to be easily pressed.

I created another timelapse of my modifications to produce the second version of the PCB:

Revision 2 board layout timelapse

Enclosure

For the second revision of the enclosure, I switched to Fusion 360 for the CAD work. While I would have preferred to keep using FreeCAD, I found that it was making it harder for me to be productive. Fusion 360 has a free version for hobbyists (with some limitations that have gradually increased over time), and overall I’ve found it very pleasant to use.

Fusion 360 view of the second enclosure, fully assembled

Fusion 360 view of the second enclosure, fully assembled

Unlike with the first revision, I waited until I had a final design for both the enclosure and the PCB before getting anything manufactured. This let me go back and forth, making small modifications to each of them as needed.

I wanted to make the end result look more polished and professional, so I contracted a factory to produce custom LCD cover glass, made out of 0.7mm thick tempered glass with a black silkscreen. It was relatively expensive for a low quantity order, but I’m really happy with how it turned out.

Custom LCD cover glass with adhesive backing

Custom LCD cover glass with adhesive backing

Manufacturing and assembly

I got the PCBs manufactured and assembled, this time with black solder mask to look cool.

Assembled PCB, revision 2

Assembled PCB, revision 2

I had two enclosures made. The first was black PA-12 Nylon, printed with MJF. Nylon is strong and durable, and the MJF 3D printing technology produces a slightly grainy surface that’s really pleasant to hold in your hand.

Closeup of the nylon grainy texture

Closeup of the nylon grainy texture

The second one was made of transparent resin (SLA, like before). This lets me show off the PCB that I worked so hard on, and evokes the transparent electronics trend from the 90s.

Transparent Game Bub

Transparent Game Bub

Assembly was a lot easier this time around: the silicone membranes held the face buttons in place, the speakers had a spring contact instead of wires, and the shoulder button assembly was better. In the first revision, I had excessively large tolerances because I wasn’t sure how precise the 3D printing would be. In the second version, I was able to shrink these.

The final product looked and felt a lot better, too. The edges were more rounded, and the device was thinner and easier to hold. The buttons felt much better to press and didn’t rattle around, and the cover glass over the LCD added polish.

First revision (left), second revision (center and right)

First revision (left), second revision (center and right)

Dock

I previously mentioned that I removed the full-size HDMI port from the first revision. I had first planned to change it to a mini-HDMI or micro-HDMI port to reduce the size, but I was worried about durability.

What I really wanted to do was output video through the USB-C port, avoiding the need for any HDMI port at all. Unfortunately, I had already concluded earlier that I wouldn’t be able to output DisplayPort video signals from the FPGA, which meant that I couldn’t use the standard USB-C DisplayPort alternate mode.

However, an idea struck me towards the end of 2024: I didn’t actually need to use the DisplayPort alt-mode. The USB-C connector, in addition to the USB 2.0 D+/D- pins, has four differential pairs (for USB superspeed). Conveniently, HDMI also uses four differential pairs. The USB specification allows for vendor-specific alt-modes, so I could just implement my own, outputting the HDMI signal directly from the FPGA over the additional pins. Then I could build a custom dock that takes those pins and connects them to the data lines of an HDMI port.

USB-C receptacle pinout with super-speed pairs highlighted (source: Chindi.ap on Wikipedia)

USB-C receptacle pinout with super-speed pairs highlighted (source: Chindi.ap on Wikipedia)

According to the USB specification, alternate modes must be negotiated by both sides first, using the USB-C Power Delivery (USB-PD) protocol, to prevent them from interfering with devices that aren’t expecting them. I don’t actually have a USB-PD controller in Game Bub (too much added complexity), so I took a shortcut: have a microcontroller in the dock communicate with the Game Bub over regular USB and perform a handshake before enabling HDMI output from the FPGA. Once Game Bub detects that it’s been disconnected from the dock, it can just switch back to using the internal display.

I realized that the dock also presents another opportunity for controller support. I originally wanted to build wireless controller support into the handheld, but the ESP32-S3 only supports Bluetooth Low Energy, and the majority of controllers use Bluetooth Classic. Fortunately, the Raspberry Pi Pico W (with an RP2040 MCU) supports both types of Bluetooth, so I just decided to use that as the microcontroller on the dock. Game controllers connect to the dock over Bluetooth, and the Pico sends the controller inputs to the device. I wired up the SBU1 and SBU2 USB-C pins as a direct connection between the FPGA and the dock for low latency input.

The RP2040 acts as the USB host, and Game Bub only needs to be a device. I also added a USB hub chip and some additional USB ports on the back of the dock to allow for wired controller support too. Just like with wireless controllers, the dock handles the direct controller communication, and just passes inputs back to the main Game Bub unit.

Since the dock is so simple (comparatively), it only took about a day to design and lay out.

Assembled dock PCB

Assembled dock PCB

I had also hoped to use the dock to solve another problem around HDMI output: HDMI sinks (monitors, TVs) pull the HDMI data lines up to 3.3 volts, and can actually backfeed power to the HDMI source. For Game Bub, this meant that a powered-off unit would turn itself on when connected over HDMI. I used a HDMI buffer chip in the dock to try to alleviate this problem, but the chip I used wasn’t actually properly suited to this use-case and interfered with video output, so I had to carefully rework the board to bypass the chip. I’ll have to fix it in a later revision.

Bypassing the HDMI buffer chip

Bypassing the HDMI buffer chip

After the rework, HDMI output worked! The rest of the features are still a work in progress.

Game Bub PCB on the dock, connected to an external monitor

Game Bub PCB on the dock, connected to an external monitor

Conclusion

Congratulations on reading this far! This writeup ended up being incredibly long, even with a lot of details left out.

I’m proud of what I accomplished over the last year and a half: I met all of my goals to produce a polished handheld FPGA retrogaming device. I pushed my electrical engineering and product design skills to the limit, and learned a lot in the process. Professional product and hardware designers deserve so much respect.

What’s next?

I deliberately designed this project with lots of possible extension opportunities to keep me occupied for a long time. I worked hard to get to the point where I’m comfortable sharing Game Bub with the world, but I still have a long list of TODOs for the future.

In the near term, I’m going to work on finishing the dock, implementing wireless controller support (and maybe wired). I plan to use the Bluepad32 library to do so.

I also want to improve the accuracy of my Game Boy Advance emulator: my goal here is to someday pass the entire mGBA test suite. I hope that I can contribute back to the wonderful emudev community with my emulator, and I plan to write-up some of my research around the GBA cartridge interface and link port.

I have a long list of mostly minor changes to make to the MCU firmware: improving UI render performance, bits of polish like low battery notifications, eliminating display glitching when reloading the FPGA, and that sort of thing. I also plan to add more utilities, like a cartridge dumper and save backup/restore feature.

Some day, I want to emulate the Game Boy Advance Wireless Adapter over Wi-Fi, e.g. with ESP-NOW. This won’t be compatible with the original wireless adapter, unfortunately, since that uses raw 2.4 GHz modulation rather than Wi-Fi.

I added a Pmod-compatible header at the top of the device for future hardware expansion. I might make a few add-on boards to support Game Boy IR communication, a solar sensor for Boktai, or maybe even a basic webcam for Game Boy Camera emulation.

Wishlist

I designed Game Bub with extremely low production volumes in mind, using off-the-shelf commodity parts to keep the overall cost down. However, there are a few things I would have liked to be able to do, but are only possible with much higher volumes:

  • A better LCD module (likely custom): native landscape mode to avoid the need for triple-buffering. Ideally a 720x480 resolution display, to allow for 3x GBA scaling and filter effects.
  • High-quality injection molded case and buttons: 3D printing is great for low volume production, but an injection molded case would be great. It would be more precise (allowing for tighter tolerances), stronger, and allow for significantly more color options.
  • Custom battery pack: or at least customizing the length of the connector wire. The current solution is hacky and doesn’t make the best use of internal space, due to limited off-the-shelf battery options.
  • Smaller BGA parts for SRAM and SDRAM to free up board space (and move internal signals to 1.8 volts): this is actually something that would be possible in smaller volumes too, if I were willing to send parts from Mouser or DigiKey to JLCPCB for assembly.

Acknowledgements