57 KiB
GameCube BBA FPGA Replacement — Design Document
Target hardware: iCEbreaker (Lattice iCE40UP5K)
Target language: Amaranth HDL (Python)
Toolchain: Yosys + nextpnr-ice40 + IceStorm
Purpose: Replace the Nintendo GameCube Broadband Adapter (DOL-015) with an
FPGA-based implementation, exposing a W5500 100BASE-TX ethernet chip to the GC
over the EXI (Expansion Interface) serial bus, enabling game ISO streaming via
Swiss homebrew.
Table of Contents
- System Overview
- Protocol References
- Physical Interface — SP1 Edge Connector
- Clock Domains
- Clock Domain Crossing Strategy
- Module Hierarchy
- Module Specifications
- 7.1 SPIMode3Slave
- 7.2 BBARegisterFile
- 7.3 SPRAMArbiter
- 7.4 RXFrameAssembler
- 7.5 TXFrameDrain
- 7.6 W5500SPIMaster
- 7.7 EEPROMModel
- 7.8 BBATop
- Memory Map
- EXI Transaction Protocol
- BBA Register Reference
- Initialisation Sequence
- RX Data Path — Detailed Flow
- TX Data Path — Detailed Flow
- SPRAM Layout
- Critical Timing Constraints
- SPRAM Read Prefetch Pipeline
- Interrupt Handling
- EEPROM / MAC Address
- iCE40UP5K Resource Budget
- PCB / Connector Notes
- Known Hardware Quirks
- File Structure
- Simulation Strategy
- Open Issues and Extension Points
1. System Overview
The GameCube Broadband Adapter (BBA) is a hardware peripheral that plugs into Serial Port 1 (SP1) on the underside of the GameCube. It presents a network interface to the GC CPU using a Macronix MX98730EC custom IC. GC software (primarily Swiss homebrew) communicates with the BBA through a memory-mapped register interface accessed over the EXI serial bus.
This project replaces the MX98730EC with an iCEbreaker FPGA that emulates the register interface, and connects to a W5500 ethernet chip (on a Pmod-compatible module) for actual network communication.
High-level data flow
GameCube CPU
│ EXI (SPI Mode 3, 32 MHz, Serial Port 1)
▼
iCEbreaker FPGA
├── exi domain (64 MHz): SPI slave, register file, prefetch pipeline
└── sync domain (48 MHz): SPRAM arbiter, RX assembler, TX drain, W5500 driver
│ SPI (up to 40 MHz)
▼
W5500 Pmod module (100BASE-TX ethernet)
│ RJ-45
▼
Network
What this design does NOT implement
- A network stack. The GC CPU runs TCP/IP. The FPGA is a dumb MAC bridge.
- IP address awareness. The FPGA never parses ethernet frame payloads.
- The GC's DMA engine quirk (only relevant to GC-side software).
- Video/audio streaming logic (handled by Swiss on the GC CPU side).
2. Protocol References
| Source | Content |
|---|---|
| YAGCD §2.4.1.4 | SP1 (P6) connector pinout |
| YAGCD §5.9 | EXI bus register descriptions |
| YAGCD §10.8 | MX98730EC (BBA chip) register map |
Dolphin source EXI_DeviceEthernet.h |
Register offsets, init sequence, RX/TX flow |
Dolphin source EXI_DeviceEthernet.cpp |
Transaction encoding, interrupt logic |
Swiss source bba.c |
GC-side driver, exact register access patterns |
| MX98730EC datasheet | Unavailable publicly; YAGCD is the primary reference |
| W5500 datasheet | SPI interface, register map, socket model |
| iCE40UP5K datasheet | SPRAM timing, PLL parameters, I/O standards |
Critical implementation note: The MX98730EC uses SPI Mode 3 (CPOL=1, CPHA=1). CLK idles HIGH. Data is sampled on the FALLING edge of CLK and set up on the RISING edge. This is the opposite of memory cards and the RTC chip, which use SPI Mode 0. Getting this wrong means the GC will never enumerate the device.
3. Physical Interface — SP1 Edge Connector
Slot characteristics
- Dual-sided PCB edge connector
- Contacts on both top and bottom faces of the PCB edge
- Top and bottom contact rows are staggered (offset by half a pitch), not mirrored — similar to ISA/PCI card edge geometry
- PCB must be ordered at 1.2 mm thickness with ENIG (gold) finish
- Keying notch at top-right corner of housing (when looking into console socket with front of console facing right)
Connector footprint
Exact pad positions and pitch must be taken from the SP1ETH KiCad project (github.com/silverstee1/SP1ETH). Do not attempt to derive dimensions from YAGCD alone — the document lists signals but not physical geometry. Cross-reference against the ETH2SP1 (LaserBear) open model files as a second source.
Key parameters to verify from those files before PCB layout:
- Contact pitch (expected: 2.0 mm or 2.54 mm — measure from KiCad file)
- Stagger offset between top and bottom rows
- Total contact count per side (expected: 6 per side = 12 total, or 12 per side = 24 total with duplicated power/ground)
- Insertion depth from board edge to first contact
- Board width at connector edge
Signal pinout (YAGCD §2.4.1.4)
Pin numbering: looking into the console socket, front of console to the right, pin 1 is on the left. On the adapter PCB (component side up, inserting down), pin 1 is also on the left — numbering does not mirror.
| Pin | Signal | Direction | Notes |
|---|---|---|---|
| 1 | EXTIN | Adapter → GC | Device detect/sense. Tie to 3.3V via 10 kΩ resistor. Without this the GC does not enumerate the device. |
| 2 | GND | — | Shield ground |
| 3 | INT | Adapter → GC | Active-low interrupt to GC CPU. Assert when IR & IMR != 0. |
| 4 | CLK | GC → Adapter | SPI clock, up to 32 MHz, idles HIGH (Mode 3) |
| 5 | 12V | — | 12 V supply from GC. Do not connect to FPGA I/O. Leave unconnected or route to a test point only. |
| 6 | DO (MISO) | Adapter → GC | Serial data out: adapter drives, GC samples |
| 7 | 3.3V | — | 3.3 V supply (~200 mA available combined with pin 8) |
| 8 | 3.3V | — | 3.3 V supply (parallel with pin 7) |
| 9 | DI (MOSI) | GC → Adapter | Serial data in: GC drives, adapter samples |
| 10 | CS | GC → Adapter | Chip select, active low. Delineates each transaction. |
| 11 | GND | — | Signal ground |
| 12 | GND | — | Signal ground |
Power budget: Pins 7+8 together supply 3.3 V. The iCEbreaker draws ~80 mA active, the W5500 ~150 mA peak. Total ~230 mA. The GC's 3.3 V rail on SP1 is rated for the original BBA which also drew ~200 mA, so headroom is tight. Add a 100 µF bulk capacitor on the interposer PCB close to the FPGA power pins.
Voltage levels: All EXI signals are 3.3 V logic. The iCEbreaker I/O is 3.3 V. The W5500 is 3.3 V. No level shifting required anywhere in this design.
4. Clock Domains
The design uses two clock domains. The iCE40UP5K has one PLL and one internal 48 MHz oscillator (SB_HFOSC).
Domain table
| Domain | Frequency | Source | Purpose |
|---|---|---|---|
exi |
64 MHz | PLL (12 MHz × 16 / 3) | SPI Mode 3 slave, BBA register file, prefetch pipeline |
sync |
48 MHz | SB_HFOSC internal oscillator | SPRAM arbiter, RX/TX ethernet engines, W5500 SPI master |
Rationale
Why 64 MHz for exi?
The EXI bus runs at 32 MHz. The SPI Mode 3 slave needs to detect CLK edges and
respond on the correct edge. Running the exi domain at 2× the bus rate (64 MHz)
gives two FPGA ticks per EXI CLK half-period. One tick for the setup phase
(MOSI→shift register, prepare MISO), one tick for the sample/drive phase. This
is the minimum oversampling ratio that cleanly implements Mode 3 without
combinatorial timing risk on the MISO output path.
Why 48 MHz for sync?
The iCE40UP5K's internal 48 MHz oscillator (SB_HFOSC) is available without
consuming the PLL. This leaves the one PLL free for the 64 MHz exi domain. The
W5500 SPI can run up to 80 MHz but we drive it at 24 MHz (48 MHz ÷ 2 via clock
enable), which is well within spec and requires no additional PLL output.
PLL configuration (iCE40UP5K)
Input: 12 MHz crystal (iCEbreaker on-board)
DIVR: 0 (input divider: 12 MHz / (0+1) = 12 MHz)
DIVF: 63 (feedback mult: 12 MHz × (63+1) = 768 MHz VCO)
DIVQ: 3 (output divider: 768 MHz / 2^3 = 96 MHz)
... actually for 64 MHz:
DIVR: 0
DIVF: 15 (12 × 16 = 192 MHz VCO) -- VCO must be 533–1066 MHz on UP5K
The iCE40UP5K VCO range is 533–1066 MHz. To reach 64 MHz cleanly:
DIVR = 0 → F_pfd = 12 MHz
DIVF = 63 → F_vco = 12 × (63+1) = 768 MHz (within range)
DIVQ = 3 → F_out = 768 / 8 = 96 MHz (too fast)
Better: target 64 MHz
DIVF = 53 → F_vco = 12 × 54 = 648 MHz
DIVQ = 3 → F_out = 648 / 8 = 81 MHz (still off)
Correct combination:
DIVR = 0, DIVF = 42, DIVQ = 3
F_vco = 12 × 43 = 516 MHz (just below range minimum — not valid)
Use:
DIVR = 0, DIVF = 63, DIVQ = 3 → 96 MHz, then use clock enable for /1.5
-- or --
Accept 96 MHz exi domain (3× bus rate instead of 2×): more margin, same logic
-- or --
DIVR = 2, DIVF = 63, DIVQ = 2 → (12/3) × 64 / 4 = 64 MHz exactly
F_pfd = 4 MHz, F_vco = 4×64 = 256 MHz — below 533 MHz minimum, invalid
Recommended: use 96 MHz (DIVR=0, DIVF=63, DIVQ=3) for exi domain.
At 96 MHz there are 3 ticks per 32 MHz EXI half-period.
Adjust SPIMode3Slave edge detection accordingly (3-tick phases instead of 2).
Implementation note: Verify exact PLL parameters with icepll tool:
icepll -i 12 -o 64 # finds closest achievable output
icepll -i 12 -o 96 # alternative
The agent implementing this should run icepll and use whatever output it
recommends, then adjust the SPIMode3Slave tick counts accordingly.
Reset strategy
Each domain has its own reset, deasserted synchronously using
ResetSynchronizer from amaranth.lib.cdc:
# In platform create_missing_domain("exi"):
m.submodules.exi_rst = ResetSynchronizer(
arst = ResetSignal("sync"),
domain = "exi",
)
The sync domain reset comes from the iCEbreaker's on-chip power-on reset
(SB_GB driven by SB_HFOSC, which has built-in POR).
5. Clock Domain Crossing Strategy
All signals crossing between exi and sync domains must use one of the
following CDC primitives from amaranth.lib.cdc. Never pass a raw multi-bit
signal directly between domains — only one bit may change per clock crossing.
CDC primitive selection guide
| Signal type | Primitive | Latency |
|---|---|---|
| Single bit, slow-changing (flags, status) | FFSynchronizer |
2 dest clocks |
| Single-cycle pulse / event | PulseSynchronizer |
~3–4 dest clocks |
| Multi-bit data stream (packet bytes) | AsyncFIFO |
~3–4 dest clocks |
| Reset deassertion | ResetSynchronizer |
2 dest clocks |
| Async external pin (CLK, MOSI, CS) | FFSynchronizer |
2 dest clocks |
CDC inventory for this design
| Signal | From | To | Primitive | Notes |
|---|---|---|---|---|
| EXI CLK pin | async | exi | FFSynchronizer | stages=2, reset=1 (CLK idles high) |
| EXI MOSI pin | async | exi | FFSynchronizer | stages=2 |
| EXI CS pin | async | exi | FFSynchronizer | stages=2, reset=1 (CS idles high) |
| SPRAM read request (addr) | exi | sync | AsyncFIFO 16-bit wide, depth=4 | Prefetch pipeline |
| SPRAM read result (data) | sync | exi | AsyncFIFO 8-bit wide, depth=4 | Prefetch pipeline |
| TX packet bytes | exi | sync | AsyncFIFO 8-bit wide, depth=64 | GC→ethernet |
| TX packet start/len | exi | sync | AsyncFIFO 16-bit wide, depth=4 | Frame delimiter |
| RX packet bytes | sync | exi | AsyncFIFO 8-bit wide, depth=64 | ethernet→GC |
| RWP update (new value) | sync | exi | AsyncFIFO 8-bit wide, depth=4 | After frame committed |
| RRP update (new value) | exi | sync | AsyncFIFO 8-bit wide, depth=4 | After GC advances pointer |
| IR[RI] set (RX ready) | sync | exi | PulseSynchronizer | Triggers RI interrupt |
| IR[TI] set (TX done) | sync | exi | PulseSynchronizer | Triggers TI interrupt |
| NCRA reset pulse | exi | sync | PulseSynchronizer | Resets ethernet engine |
| exi_int_n output | exi | physical pin | Direct (output register) | Active-low to GC |
Critical rule: The register file lives entirely in the exi domain. The
sync domain never directly reads or writes EXI registers. All interaction
between the two domains goes through the AsyncFIFOs and PulseSynchronizers
listed above. This ensures the GC's register reads always respond within the
exi domain without waiting on CDC latency.
6. Module Hierarchy
BBATop (top-level, sets up clock domains)
├── SPIMode3Slave (exi domain — bit engine)
├── BBARegisterFile (exi domain — register decode + response)
│ ├── [AsyncFIFO: spram_req] (exi→sync: read address requests)
│ ├── [AsyncFIFO: spram_rsp] (sync→exi: read data responses)
│ ├── [AsyncFIFO: tx_bytes] (exi→sync: TX packet data)
│ ├── [AsyncFIFO: tx_ctrl] (exi→sync: TX frame length)
│ ├── [AsyncFIFO: rx_wptr] (sync→exi: RWP updates)
│ ├── [AsyncFIFO: rx_rptr] (exi→sync: RRP updates from GC)
│ ├── [PulseSynchronizer: rx_irq] (sync→exi)
│ ├── [PulseSynchronizer: tx_irq] (sync→exi)
│ └── [PulseSynchronizer: ncra_rst] (exi→sync)
├── SPRAMArbiter (sync domain — owns all SPRAM)
├── RXFrameAssembler (sync domain — ethernet→SPRAM)
├── TXFrameDrain (sync domain — SPRAM→ethernet)
├── W5500SPIMaster (sync domain — SPI master to W5500)
└── EEPROMModel (exi domain — 93C46 bit-bang model)
7. Module Specifications
7.1 SPIMode3Slave
Domain: exi
File: exi_bba/spi_mode3_slave.py
Implements a byte-oriented SPI Mode 3 slave. Handles CLK/MOSI/MISO/CS at the
bit level and presents a clean byte interface to BBARegisterFile.
SPI Mode 3 timing recap:
- CLK idles HIGH
- MOSI is set up by master before the FALLING edge
- Slave samples MOSI on the FALLING edge of CLK
- Slave drives MISO on the RISING edge of CLK (ready for master to sample on next falling edge)
Port list:
| Port | Width | Dir | Domain | Description |
|---|---|---|---|---|
spi_clk |
1 | in | async→exi | Raw SPI clock from GC, synchronized internally |
spi_mosi |
1 | in | async→exi | Raw MOSI from GC, synchronized internally |
spi_miso |
1 | out | exi | MISO output to GC |
spi_cs_n |
1 | in | async→exi | Raw CS from GC (active low), synchronized internally |
rx_byte |
8 | out | exi | Last complete received byte |
rx_valid |
1 | out | exi | Pulses 1 cycle when rx_byte contains a new byte |
tx_byte |
8 | in | exi | Byte to transmit; sampled when tx_load pulses |
tx_load |
1 | out | exi | Requests next TX byte from upstream |
Internal behaviour:
- Instantiate FFSynchronizer stages=2 on each of
spi_clk,spi_mosi,spi_cs_n. Reset values:spi_clk=1,spi_cs_n=1. - Register the synchronized signals one further cycle to form edge detectors:
rising_clk = clk_s & ~clk_prev,falling_clk = ~clk_s & clk_prev. - On CS falling edge: load
tx_byteinto internal shift register, pulsetx_load, resetbit_ctrto 0. - On FALLING CLK edge (sample): shift
mosi_sintorx_shiftMSB-first, incrementbit_ctr. Whenbit_ctr == 8: registerrx_shiftintorx_byte, pulserx_valid, resetbit_ctrto 0, pulsetx_loadto request next byte. - On RISING CLK edge (drive): shift
tx_shiftleft by 1, drive MSB ontospi_miso. - On CS rising edge: drive
spi_misohigh (idle), reset state.
Note on tx_load timing: tx_load pulses at two points — CS assertion
(loads first byte before any bits are clocked) and after each complete received
byte (loads the next byte). The upstream (BBARegisterFile) must register the
next TX byte within one exi clock of tx_load pulsing.
7.2 BBARegisterFile
Domain: exi (with AsyncFIFO interfaces to sync)
File: exi_bba/bba_register_file.py
Decodes EXI transactions (2-byte header + N data bytes), reads/writes the BBA
register space, and manages all CDC crossings to the sync domain.
EXI transaction decoder FSM
States: HEADER0 → HEADER1 → DATA → (back to HEADER0)
Header format:
Byte 0: [7] = write flag (1 = write, 0 = read)
[6:0] = addr[12:6] (upper 7 bits of 13-bit address)
Byte 1: [7:2] = addr[5:0] (lower 6 bits of 13-bit address)
[1:0] = xfer_len (0=1 byte, 1=2 bytes, 2=3 bytes, 3=4 bytes)
Full address = { byte0[6:0], byte1[7:2] } = 13 bits → range 0x0000–0x1FFF.
HEADER0 state: Wait for rx_valid. Latch rx_byte as hdr0.
HEADER1 state: Wait for rx_valid. Decode address and flags. For read
transactions, immediately issue SPRAM prefetch request if address ≥ 0x100
(ring buffer region). Load tx_byte with the register value for addresses
< 0x100 (register file region). Transition to DATA.
DATA state (write path): For each rx_valid, write rx_byte to
regs[addr + byte_ctr] and handle side effects (see register side effects
table). Increment byte_ctr. When byte_ctr == xfer_len, go to HEADER0.
DATA state (read path): Drive tx_byte from prefetch result (addresses
≥ 0x100) or directly from regs[] (addresses < 0x100). On each tx_load,
advance the read pointer and issue next prefetch. When byte_ctr == xfer_len,
go to HEADER0.
CS deassertion abort: In any state, if cs_n rises, return to HEADER0.
Register file storage
Registers 0x00–0x1FF are implemented as an Array of 8-bit Signals (512
registers). In synthesis this maps to distributed RAM on iCE40. Not SPRAM —
SPRAM is reserved for the packet ring buffer.
The register file is entirely in the exi domain. No CDC is needed to read
or write registers 0x00–0xFF.
Register side effects
| Register | Write side effect |
|---|---|
| NCRA (0x00) | If bit 0 (RESET) written: pulse ncra_rst PulseSynchronizer to sync domain. Self-clear bit 0 on next cycle. Reset TX/RX pointers in register file. |
| IR (0x09) | Write-1-to-clear: IR <= IR & ~written_value |
| RRP (0x18–0x19) | After GC writes new RRP value, push value into rx_rptr AsyncFIFO (exi→sync) so RX engine knows GC has consumed those pages |
| TWD (0x34–0x37) | Bytes written here are the TX frame length field (2 bytes little-endian). Latch for TX engine. |
| TXDATA (0x48) | Each byte written goes into tx_bytes AsyncFIFO (exi→sync). When byte_ctr == xfer_len on last write chunk, push frame length into tx_ctrl AsyncFIFO. |
Interrupt register update (from sync domain)
rx_irqPulseSynchronizer arriving from sync: setIR[1](RI bit)tx_irqPulseSynchronizer arriving from sync: setIR[2](TI bit), clearNCRA[3:2](ST1:ST0 — transmit start bits)
Interrupt output
exi_int_n <= ~|(IR & IMR) # active-low: assert when any unmasked bit set
Register this one flip-flop in the exi domain. The physical pin is a direct
output — no CDC needed because the GC only reads the interrupt state via polling
IR over EXI (which is already in the exi domain) or via the interrupt line
which the GC CPU samples asynchronously.
NWAYS register
Always return 0x17 (link up, 100 Mbps, full duplex, autoneg complete).
The GC's BBA driver polls NWAYS after reset to confirm link status before
enabling RX. Hardcode this value — do not attempt to forward real link status
from the W5500.
# NWAYS = 0x17:
# bit 4 (LS100) = 1: 100BASE-TX link up
# bit 2 (ANCLPT) = 1: autoneg complete
# bit 1 (100TXH) = 1: 100BASE-TX half (also set in practice)
# bit 0 (LS10) = 1: 10BASE-T (also reported)
7.3 SPRAMArbiter
Domain: sync
File: exi_bba/spram_arbiter.py
Arbitrates access to the iCE40UP5K's 128 KB SPRAM between two clients:
- Client A (EXI read): Issues read requests from the prefetch pipeline
(
spram_reqAsyncFIFO). Must service requests fast enough to keep the prefetch pipeline full. - Client B (ETH write): The
RXFrameAssemblerwrites incoming ethernet frames into the ring buffer area.
Priority: ETH write wins over EXI read when both request simultaneously. This is safe because:
- The GC only reads a ring buffer page after RWP has advanced past it (i.e., the ETH engine has finished writing that page).
- Even if an EXI read is delayed by one SPRAM cycle, the prefetch pipeline has enough depth (4 entries) to absorb the stall without the SPI slave running out of data.
SPRAM interface (iCE40UP5K SB_SPRAM256KA):
WREN : write enable
CHIPSELECT : always 1
CLOCK : sync domain clock (48 MHz)
STANDBY : 0
SLEEP : 0
POWEROFF_N : 1
ADDRESS[13:0] : byte address divided by 2 (SPRAM is 16-bit wide)
DATAIN[15:0] : write data (use only [7:0] for byte writes, mask upper byte)
MASKWREN[3:0] : byte enable (0b0011 for lower byte, 0b1100 for upper byte)
DATAOUT[15:0] : read data
The SPRAM is 16-bit wide. Byte addressing is done via MASKWREN. For an 8-bit
write to address A: set ADDRESS = A >> 1, MASKWREN = (A & 1) ? 0b1100 : 0b0011, write data in the appropriate byte of DATAIN.
Read latency: SPRAM has 1-cycle synchronous read latency. The result of a read issued at cycle N is valid at cycle N+1. The arbiter must account for this when responding to the prefetch pipeline.
Port list:
| Port | Width | Dir | Notes |
|---|---|---|---|
exi_req_addr |
16 | in | From spram_req AsyncFIFO (exi→sync) |
exi_req_valid |
1 | in | FIFO r_rdy |
exi_req_ready |
1 | out | FIFO r_en (pop when serviced) |
exi_rsp_data |
8 | out | To spram_rsp AsyncFIFO (sync→exi) |
exi_rsp_valid |
1 | out | FIFO w_en |
eth_wr_addr |
16 | in | From RXFrameAssembler |
eth_wr_data |
8 | in | Byte to write |
eth_wr_valid |
1 | in | Write request |
eth_wr_ready |
1 | out | Write accepted this cycle |
7.4 RXFrameAssembler
Domain: sync
File: exi_bba/rx_frame_assembler.py
Receives complete ethernet frames from W5500SPIMaster and writes them into
the SPRAM ring buffer in the correct MX98730EC format.
Ring buffer layout (in SPRAM):
SPRAM address 0x0100–0x0FFF (3840 bytes = 15 × 256-byte pages)
Page 0x01: first usable RX page
Page 0x0F: last usable RX page (RHBP default)
Pages wrap: after 0x0F, next is 0x01 (not 0x00, which is reserved)
Each page is 256 bytes. A received frame may span multiple pages.
Frame descriptor (first 4 bytes of first page):
Byte 0: LRPS value (Last Received Packet Status — set to 0x00 or actual status)
Byte 1: 0x00
Byte 2: frame_length[15:8] (big-endian, includes descriptor bytes)
Byte 3: frame_length[7:0]
Bytes 4+: raw ethernet frame data (DA, SA, EtherType, payload, FCS)
Flow:
- Wait for
W5500SPIMasterto signal frame available (rx_sofpulse). - Read frame bytes from W5500 frame FIFO.
- Compute how many 256-byte pages are needed:
pages_needed = ceil((frame_length + 4) / 256) - Check that
(RWP + pages_needed) mod 16 != RRP(ring not full). If full, drop the frame and increment a drop counter. - Write 4-byte descriptor at SPRAM address
0x100 + (RWP * 0x100). - Write frame bytes sequentially, wrapping pages at 256-byte boundaries.
Page wrap:
next_page = (current_page % 15) + 1(pages 1–15, skip 0). - After last byte written, update
RWPin therx_wptrAsyncFIFO (sync→exi). Theexidomain will update the RWP register from this FIFO. - Pulse
rx_irqPulseSynchronizer toexidomain.
MAC address filter:
Before writing a frame, check destination MAC against PAR0–PAR5 (broadcast
FF:FF:FF:FF:FF:FF always accepted). The GC will typically configure PAR0–PAR5
via EXI after boot, so the BBARegisterFile must expose these to the
RXFrameAssembler. Pass them via a dedicated small AsyncFIFO or by reading
them from a shared register shadow (6 bytes, sync domain copy updated when
GC writes PAR0–PAR5). Multicast hash table (MAR0–MAR7) filtering is optional
for initial implementation — accept all frames (promiscuous mode) until the GC
configures the filter.
7.5 TXFrameDrain
Domain: sync
File: exi_bba/tx_frame_drain.py
Drains the TX byte FIFO (fed from the exi domain as the GC writes to TXDATA
register 0x48) and forwards complete frames to W5500SPIMaster.
Flow:
- Wait for
tx_ctrlAsyncFIFO to contain a frame length value. This is pushed byBBARegisterFilewhen the GC has written the complete TX frame (i.e., NCRA ST1:ST0 transitions to 01 or 10). - Pop
frame_lengthfromtx_ctrl. - Pop exactly
frame_lengthbytes fromtx_bytesAsyncFIFO. - Forward bytes to
W5500SPIMasterTX interface with SOF/EOF framing. - Wait for
W5500SPIMasterto signal TX complete. - Pulse
tx_irqPulseSynchronizer toexidomain.
NCRA ST bits: The GC writes NCRA with ST1:ST0 = 01 (start transmit from
buffer 1) or 10 (start transmit from buffer 2). The BBA hardware has two TX
buffers; this implementation uses a single TX FIFO and ignores the buffer
selection. When ST1:ST0 goes non-zero, treat it as a TX trigger regardless of
which bits are set. The BBARegisterFile should push the frame length into
tx_ctrl on this transition.
7.6 W5500SPIMaster
Domain: sync
File: exi_bba/w5500_spi_master.py
Implements the W5500 SPI master interface. The W5500 uses SPI Mode 0 (CPOL=0, CPHA=0), opposite to the BBA EXI interface.
W5500 SPI frame format:
Byte 0–1: Address (16-bit, big-endian)
Byte 2: Control byte:
[7:3] = Block Select (BSB):
00000 = Common Register
00001 = Socket 0 Register
00010 = Socket 0 TX buffer
00011 = Socket 0 RX buffer
[2] = Read/Write (0=read, 1=write)
[1:0] = Operation Mode (00=variable, 01=fixed 1B, 10=fixed 2B, 11=fixed 4B)
Byte 3+: Data bytes
W5500 configuration (to be performed once on NCRA reset):
1. Write MR (Mode Register, 0x0000): 0x80 — software reset
2. Wait ~1 ms
3. Write SHAR (Source MAC, 0x0009–0x000E): copy from PAR0–PAR5 register shadow
4. Write S0_MR (Socket 0 Mode, 0x4000): 0x04 — MACRAW mode (raw ethernet)
5. Write S0_CR (Socket 0 Command, 0x4001): 0x01 — OPEN
6. Write S0_IMR (Socket 0 Interrupt Mask, 0x4024): 0x04 | 0x01 — RECV | SEND_OK
MACRAW mode: In MACRAW mode the W5500 Socket 0 sends and receives raw ethernet frames including the full MAC header and FCS. This is exactly what the MX98730EC presents to the GC. No IP stack runs in the FPGA.
RX polling: The W5500 asserts its INT_N pin (active low) when a frame
arrives. Connect W5500 INT_N to an FPGA input pin and use it to trigger the
RXFrameAssembler. Alternatively poll S0_IR (Socket 0 Interrupt Register,
0x4002) periodically. The INT_N approach has lower latency and is preferred.
SPI clock rate: Drive W5500 SPI at 24 MHz (sync clock 48 MHz ÷ 2 using a clock enable toggle). The W5500 supports up to 80 MHz so there is ample margin.
Port list:
| Port | Width | Dir | Notes |
|---|---|---|---|
spi_clk |
1 | out | To W5500 CLK pin (SPI Mode 0, idles LOW) |
spi_mosi |
1 | out | To W5500 MOSI |
spi_miso |
1 | in | From W5500 MISO |
spi_cs_n |
1 | out | To W5500 CS (active low) |
w5500_int_n |
1 | in | W5500 interrupt (active low) |
tx_data |
8 | in | Byte to transmit (from TXFrameDrain) |
tx_valid |
1 | in | TX byte available |
tx_ready |
1 | out | TX byte consumed |
tx_sof |
1 | in | Start of frame marker |
tx_eof |
1 | in | End of frame marker |
rx_data |
8 | out | Received byte (to RXFrameAssembler) |
rx_valid |
1 | out | RX byte available |
rx_ready |
1 | in | RX byte consumed |
rx_sof |
1 | out | Start of frame |
rx_eof |
1 | out | End of frame |
7.7 EEPROMModel
Domain: exi
File: exi_bba/eeprom_model.py
Models the 93C46-compatible serial EEPROM that stores the BBA's MAC address. The GC software bit-bangs the EEPROM interface through register 0x1C (EEPROM Interface Register) of the BBA chip.
Register 0x1C bit fields:
[3] EECK — EEPROM clock
[2] EECS — EEPROM chip select
[1] EEDI — EEPROM data in (GC → EEPROM)
[0] EEDO — EEPROM data out (EEPROM → GC) [read-only]
The GC reads EEDO by reading register 0x1C bit 0.
93C46 protocol summary:
The 93C46 uses a 3-wire serial protocol (SK=clock, CS=select, DI=data in, DO=data out). Commands:
- READ: start bit (1) + opcode (10) + 6-bit address → 16-bit data out
- WRITE: start bit (1) + opcode (01) + 6-bit address + 16-bit data
- EWEN (write enable): start bit (1) + opcode (00) + address (11xxxx)
Each 93C46 word is 16 bits. The MAC address occupies words 0–2 (6 bytes).
Implementation approach:
Maintain a small ROM of 64 × 16-bit words in the exi domain (as a Const
array, synthesises to LUTs). Pre-populate words 0–2 with the chosen MAC
address. Implement a small FSM that watches writes to register 0x1C for the
93C46 protocol, drives EEDO accordingly.
Simpler alternative: Many GC BBA drivers read the EEPROM once at boot and then write the MAC to PAR0–PAR5 themselves. Pre-populate PAR0–PAR5 in the register file reset state with a valid Nintendo OUI MAC (00:09:BF:xx:xx:xx). Skip a full 93C46 implementation for the first version — if Swiss ignores the EEPROM read result and uses a hardcoded or user-configurable MAC, this is sufficient.
7.8 BBATop
Domain: both
File: exi_bba/bba_top.py
Top-level module. Instantiates all submodules, creates clock domains, connects physical pins.
Clock domain creation:
def elaborate(self, platform):
m = Module()
# exi domain: 96 MHz from PLL (3× 32 MHz EXI bus rate)
exi_domain = ClockDomain("exi")
m.domains += exi_domain
pll = platform.get_pll() # platform-specific PLL primitive
m.d.comb += exi_domain.clk.eq(pll.clkout)
m.submodules.exi_rst = ResetSynchronizer(
arst=ResetSignal("sync"), domain="exi"
)
# sync domain: 48 MHz from SB_HFOSC (platform default)
# Created automatically by iCEbreaker platform
# Instantiate submodules...
m.submodules.spi = spi = SPIMode3Slave()
m.submodules.regfile = regfile = BBARegisterFile()
m.submodules.arbiter = arbiter = SPRAMArbiter()
m.submodules.rx_asm = rx_asm = RXFrameAssembler()
m.submodules.tx_drn = tx_drn = TXFrameDrain()
m.submodules.w5500 = w5500 = W5500SPIMaster()
m.submodules.eeprom = eeprom = EEPROMModel()
# ... wiring ...
Physical pin connections (iCEbreaker):
The SP1 EXI signals connect via the interposer PCB to iCEbreaker PMOD pins. The W5500 Pmod connects to the second PMOD connector. Exact pin mapping depends on the interposer PCB layout — define these in a platform resource file.
# Example resource definitions (add to iCEbreaker platform file):
Resource("exi", 0,
Subsignal("clk", Pins("1", conn=("pmod", 0), dir="i")),
Subsignal("mosi", Pins("2", conn=("pmod", 0), dir="i")),
Subsignal("miso", Pins("3", conn=("pmod", 0), dir="o")),
Subsignal("cs_n", Pins("4", conn=("pmod", 0), dir="i")),
Subsignal("int_n",Pins("7", conn=("pmod", 0), dir="o")),
Attrs(IO_STANDARD="SB_LVCMOS"),
),
Resource("w5500", 0,
Subsignal("clk", Pins("1", conn=("pmod", 1), dir="o")),
Subsignal("mosi", Pins("2", conn=("pmod", 1), dir="o")),
Subsignal("miso", Pins("3", conn=("pmod", 1), dir="i")),
Subsignal("cs_n", Pins("4", conn=("pmod", 1), dir="o")),
Subsignal("int_n",Pins("7", conn=("pmod", 1), dir="i")),
Subsignal("rst_n",Pins("8", conn=("pmod", 1), dir="o")),
Attrs(IO_STANDARD="SB_LVCMOS"),
),
8. Memory Map
The BBA register address space is 13 bits wide (0x0000–0x1FFF).
| Address range | Region | Implemented in | Notes |
|---|---|---|---|
| 0x0000–0x0033 | MAC control registers | Register file (exi) | NCRA, NCRB, IMR, IR, pointers |
| 0x0034–0x0037 | TWD — TX write data | Register file (exi) | TX frame length (2 bytes) |
| 0x0038–0x0039 | Reserved | — | Ignore |
| 0x003A | HIPR — Host Interface Protocol | Register file (exi) | Read: 0x01 (BBA present) |
| 0x003B | NAFR — Network Address Filter | Register file (exi) | |
| 0x003C | NWBA — Network Write Buffer Addr | Register file (exi) | |
| 0x003D–0x0047 | Reserved | — | Ignore |
| 0x0048 | TXDATA — Bulk TX data port | Register file → tx_bytes FIFO | Write path to ethernet |
| 0x0049–0x00FF | Reserved | — | Ignore |
| 0x0100–0x0FFF | RX ring buffer | SPRAM (sync) | Read path from ethernet |
9. EXI Transaction Protocol
All BBA register accesses follow a strict two-phase (header + data) format.
Header encoding
Byte 0: [7] write flag 1=write, 0=read
[6:0] addr[12:6] upper 7 bits of address
Byte 1: [7:2] addr[5:0] lower 6 bits of address
[1:0] xfer_len-1 0=1 byte, 1=2 bytes, 2=3 bytes, 3=4 bytes
CS is asserted (low) before byte 0 and remains low through the entire transaction including all data bytes. CS deasserts (high) after the last data byte.
Read transaction timing
CS ─┐ ┌─
└────────────────────────────────────┘
CLK ┌┐┌┐┌┐┌┐┌┐┌┐┌┐┌┐ ┌┐┌┐┌┐┌┐┌┐┌┐┌┐┌┐ ┌┐┌┐...
header byte 0 header byte 1 data byte 0...
MOSI [addr+flags] [addr+len] [don't care]
MISO [don't care] [don't care] [register data]
The register file must have data ready on MISO from the very first clock edge of the data phase. For register-file-backed reads (address < 0x100), the data is available immediately after header decode. For SPRAM-backed reads (address ≥ 0x100), the prefetch pipeline issues the SPRAM read request during the header phase so data is ready in time.
Write transaction timing
Identical header, then MOSI carries the write data. The FPGA samples MOSI on each falling CLK edge during the data phase and writes to the register.
ID query
On power-on the GC queries the device ID. The query is two 0x00 bytes written,
then four bytes read. The BBA returns 0x04020200. Implement this as a special
case: when address decodes to 0x0000 on a read with no prior NCRA reset, return
the hardcoded ID.
Alternatively, read the Dolphin source for the exact byte sequence GC software uses to detect the BBA and replicate it faithfully.
10. BBA Register Reference
Key registers the GC driver accesses. Full register map in YAGCD §10.8.
| Addr | Name | R/W | Reset | Description |
|---|---|---|---|---|
| 0x00 | NCRA | R/W | 0x00 | Network Control A. [0]=RESET (self-clear), [2:1]=ST (TX start), [3]=SR (start receive), [6]=INTMODE (0=int active low) |
| 0x01 | NCRB | R/W | 0x00 | Network Control B |
| 0x04 | LTPS | R | 0x00 | Last TX packet status |
| 0x05 | LRPS | R | 0x00 | Last RX packet status |
| 0x08 | IMR | R/W | 0x00 | Interrupt mask. Bits match IR. Interrupt fires when IR & IMR != 0 |
| 0x09 | IR | R/W | 0x00 | Interrupt register. Write 1 to clear. [7]=RBFI, [4]=TEI, [2]=TI, [1]=RI |
| 0x0A–0x0B | BP | R/W | — | Boundary page pointer |
| 0x0C–0x0D | TLBP | R/W | — | TX low boundary page |
| 0x0E–0x0F | TWP | R/W | 0x00 | TX write page pointer |
| 0x12–0x13 | TRP | R/W | 0x00 | TX read page pointer |
| 0x16–0x17 | RWP | R | updates | RX write page pointer. Advances after each frame written |
| 0x18–0x19 | RRP | R/W | 0x01 | RX read page pointer. GC writes to advance after consuming frames |
| 0x1A–0x1B | RHBP | R/W | 0x0F | RX high boundary page (last valid page). Default 0x0F |
| 0x1C | EEPROM | R/W | — | EEPROM bit-bang interface [3:0] = EECK, EECS, EEDI, EEDO |
| 0x20–0x25 | PAR0–5 | R/W | MAC | MAC address bytes 0–5. GC writes after reading EEPROM |
| 0x26–0x2D | MAR0–7 | R/W | 0xFF | Multicast hash table. 0xFF = accept all |
| 0x2E | ANALOG | R/W | — | PHY analog control. GC writes 0xD6 to enable PHY |
| 0x30 | NWAYC | R/W | — | Autoneg config. GC sets ANE + LTE bits |
| 0x31 | NWAYS | R | 0x17 | Autoneg status. Hardcode 0x17 = 100M full duplex link up |
| 0x32 | GCA | R/W | — | GMAC config A. GC sets AUTOPUB bit |
| 0x33 | GCB | R/W | — | GMAC config B |
| 0x34–0x37 | TWD | W | — | TX write data (frame length, 2 bytes LE, then ignored) |
| 0x3A | HIPR | R | 0x01 | Host interface protocol version. Return 0x01 |
| 0x3B | NAFR | R/W | — | Network address filter |
| 0x3C | NWBA | R/W | — | Network write buffer address |
| 0x48 | TXDATA | W | — | Bulk TX data port. GC streams frame bytes here |
| 0x100+ | RX buf | R | — | RX ring buffer. GC reads frames from here |
11. Initialisation Sequence
This is the exact sequence Swiss/GC software executes. The register file must respond correctly to each step.
1. Assert CS, write 0x0000 (2 bytes), read 4 bytes
→ Must return: 0x04 0x02 0x02 0x00 (device ID)
2. Write 0x01 to NCRA (0x00) — software reset
→ RESET bit self-clears next cycle
→ Pulse ncra_rst to sync domain (resets W5500, clears SPRAM pointers)
3. Poll NCRA bit 0 until clear — wait for reset complete
→ Return 0x00 from NCRA reads after self-clear
4. Write 6 bytes to PAR0–PAR5 (0x20–0x25)
→ Latch MAC address; forward to sync domain MAC filter shadow
5. Write 8 bytes to MAR0–MAR7 (0x26–0x2D)
→ Typically all 0xFF (promiscuous mode)
6. Write 0xD6 to ANALOG (0x2E) — enable PHY
→ Store in register file; no hardware effect in FPGA
7. Write NWAYC (0x30): set bits for ANE + LTE
→ Store; no hardware effect
8. Write IMR (0x08): typically 0x86 (RBFI | TI | RI)
→ Enables interrupts; INT line will now assert when frames arrive
9. Write GCA (0x32): set AUTOPUB bit
→ Store; AUTOPUB means RWP auto-updates — we always do this anyway
10. Write NCRA (0x00): set SR bit (0x08) — start receive
→ Enable RX path; the RXFrameAssembler should begin accepting frames
11. Poll NWAYS (0x31) until link up
→ Return hardcoded 0x17 immediately
12. RX Data Path — Detailed Flow
W5500 receives frame on wire
│
▼
W5500SPIMaster detects S0_IR[RECV] (via INT_N pin)
Reads frame length from S0_RX_RSR (Socket 0 RX Received Size, 0x4026)
Reads frame bytes from Socket 0 RX buffer (BSB=0b00011)
Pulses rx_sof, streams rx_data bytes, pulses rx_eof
│
▼ (sync domain)
RXFrameAssembler
- Checks destination MAC vs PAR shadow
- Checks NCRA SR bit is set (RX enabled)
- Computes pages_needed
- Checks ring buffer not full (RWP+pages != RRP)
- Writes descriptor + frame data into SPRAM via SPRAMArbiter
- Advances RWP (local register in sync domain)
- Pushes new RWP value into rx_wptr AsyncFIFO (sync→exi)
- Pulses rx_irq PulseSynchronizer (sync→exi)
│
▼ AsyncFIFO / PulseSynchronizer crossing
│ (exi domain)
BBARegisterFile
- Pops new RWP from rx_wptr FIFO, updates RWP register
- rx_irq pulse arrives: sets IR[1] (RI bit)
- IR & IMR now non-zero: asserts exi_int_n (INT low to GC)
│
▼ (GC CPU, driven by interrupt or polling)
GC reads IR register: sees RI=1
GC reads RWP (0x16): gets updated pointer
GC reads frame from 0x100+RRP (bulk read, up to 1500+ bytes)
→ BBARegisterFile issues SPRAM read requests via spram_req FIFO (exi→sync)
→ SPRAMArbiter services reads from SPRAM
→ Results flow back via spram_rsp FIFO (sync→exi)
→ Prefetch pipeline keeps data ready for SPI bit engine
GC writes new RRP (0x18) to advance past consumed pages
→ BBARegisterFile pushes RRP update into rx_rptr FIFO (exi→sync)
→ RXFrameAssembler updates its local RRP shadow
GC writes IR register with RI=1 (write-1-to-clear)
→ IR[1] clears, INT line deasserts
13. TX Data Path — Detailed Flow
GC CPU constructs ethernet frame in GC RAM
│
▼ (GC CPU → EXI)
GC writes 2-byte length to TWD register (0x34)
GC writes frame bytes to TXDATA register (0x48) in chunks
→ BBARegisterFile: each written byte goes into tx_bytes AsyncFIFO (exi→sync)
GC writes NCRA with ST1:ST0 = 01 (transmit trigger)
→ BBARegisterFile pushes frame_length into tx_ctrl AsyncFIFO (exi→sync)
│
▼ AsyncFIFO crossing
│ (sync domain)
TXFrameDrain
- Pops frame_length from tx_ctrl
- Pops frame_length bytes from tx_bytes
- Forwards to W5500SPIMaster with SOF/EOF
│
▼ (sync domain)
W5500SPIMaster
- Writes frame length to S0_TX_FSR (TX Free Size Register, 0x4020)
- Writes frame bytes into Socket 0 TX buffer (BSB=0b00010)
- Writes SEND command to S0_CR (0x4001 = 0x20)
- Polls S0_IR until SEND_OK bit set
- Clears S0_IR[SEND_OK]
- Pulses tx_irq PulseSynchronizer (sync→exi)
│
▼ PulseSynchronizer crossing
│ (exi domain)
BBARegisterFile
- tx_irq arrives: sets IR[2] (TI bit), clears NCRA ST1:ST0
- If IMR[2] set: INT asserts to GC
│
▼ (GC CPU)
GC reads IR, sees TI=1
GC writes IR with TI=1 to clear
14. SPRAM Layout
The iCE40UP5K has 4 × 32 KB SPRAM banks (128 KB total). Map them as:
| SPRAM region | Size | Usage |
|---|---|---|
| 0x0000–0x00FF | 256 B | Reserved (address 0x00 page not used by ring buffer) |
| 0x0100–0x0FFF | 3840 B | RX ring buffer (15 × 256-byte pages, pages 0x01–0x0F) |
| 0x1000–0x17FF | 2048 B | TX frame staging buffer |
| 0x1800–0x1FFF | 2048 B | Reserved / future use |
The ring buffer uses pages 0x01–0x0F (15 pages × 256 bytes = 3840 bytes). This
matches the MX98730EC default RHBP (RX High Boundary Page) value of 0x0F and
RRP reset value of 0x01.
SPRAM addressing: iCE40UP5K SB_SPRAM256KA instances are 64K × 16-bit (128 KB total across 4 instances). To address the ring buffer region as bytes:
- Byte address 0x0100 maps to SPRAM word address 0x0080 (byte 0x0100 >> 1)
- The arbiter converts byte addresses to word addresses and uses MASKWREN for byte selection
15. Critical Timing Constraints
Must-meet timing in exi domain (96 MHz → 10.4 ns period)
| Path | Budget | Notes |
|---|---|---|
| FFSynchronizer output → edge detect flip-flop | 1 cycle = 10.4 ns | Trivially met — just a register |
| Edge detect → shift register update | 1 cycle | Register-to-register, no logic |
rx_valid → header decode → spram_req FIFO write |
2 cycles | Address decode is combinatorial MUX; must close at 96 MHz |
tx_load → tx_byte driven from register file |
1 cycle | regs[addr] array lookup — critical path; keep address decode combinatorial depth ≤ 4 LUTs |
tx_load → tx_byte driven from prefetch buffer |
1 cycle | Just a register read — trivial |
Must-meet timing in sync domain (48 MHz → 20.8 ns period)
| Path | Budget | Notes |
|---|---|---|
| SPRAM read request → SPRAM address valid | 1 cycle | AsyncFIFO read + mux — easy |
| SPRAM DATAOUT → result FIFO write | 1 cycle | Register-to-FIFO — easy |
| W5500 SPI bit engine | N/A | Clock-enable based at 24 MHz effective; no hard timing |
Cross-domain latency budget for SPRAM prefetch
EXI header phase duration: 16 exi clocks at 96 MHz = 167 ns
SPRAM prefetch round trip:
exi → spram_req FIFO write: 1 exi tick = 10 ns
FIFO cross-domain: 2 sync ticks = 42 ns
SPRAM read (1 cycle latency): 1 sync tick = 21 ns
Result → spram_rsp FIFO write: 1 sync tick = 21 ns
FIFO cross-domain: 2 exi ticks = 21 ns
Result available in prefetch buffer: = 21 ns
Total: ~136 ns
136 ns < 167 ns header window → prefetch completes before first data bit needed ✓
This is the tightest timing consideration in the design. The prefetch must be issued during HEADER1 (not after) to make the deadline.
16. SPRAM Read Prefetch Pipeline
The prefetch pipeline ensures MISO data is always ready before the SPI slave needs it for the data phase.
State machine (in BBARegisterFile, exi domain)
State HEADER1 (decoding second header byte):
If is_read AND address >= 0x100:
push address into spram_req AsyncFIFO ← issued NOW, during header decode
set prefetch_pending = True
State DATA (read phase):
On each tx_load pulse:
If prefetch_pending AND spram_rsp FIFO has data:
pop byte from spram_rsp FIFO
load into tx_byte
push (address + byte_ctr + 1) into spram_req for NEXT byte ← pipelining
Elif address < 0x100:
tx_byte = regs[address + byte_ctr] ← direct register file read
Pipeline depth
The spram_req and spram_rsp FIFOs each have depth 4. This allows up to 4
read requests to be in-flight simultaneously, which absorbs any SPRAM arbiter
stalls (ETH write winning the arbitration) without stalling the SPI data phase.
SPRAM arbiter stall handling
If the SPRAM arbiter defers an EXI read by 1 cycle (due to ETH write priority),
the spram_rsp FIFO will be momentarily empty when tx_load arrives. The
BBARegisterFile must stall the SPI slave in this case.
However: the SPI slave cannot be stalled mid-bit. The stall mechanism must work at byte boundaries only — i.e., after a complete byte has been transmitted, hold MISO at 0 (or 1) and do not toggle until the next byte is ready. Since the GC is the SPI master and controls CLK, it will simply clock in garbage on the retry byte.
Practical note: At 48 MHz sync with 24 MHz effective W5500 access rate, the ETH write path can only consume the SPRAM arbiter for ~1 sync cycle per byte written. The EXI read path gets the remaining cycles. With 4-deep FIFOs the pipeline should almost never stall in practice. Monitor the stall condition in simulation.
17. Interrupt Handling
The exi_int_n output (pin 3 of SP1) is active-low. Assert it (drive low)
when IR & IMR != 0.
# In BBARegisterFile, exi domain:
ir_masked = Signal(8)
m.d.comb += ir_masked.eq(regs[BBARegs.IR] & regs[BBARegs.IMR])
m.d.exi += exi_int_n.eq(~ir_masked.any())
Register the output — do not drive exi_int_n combinatorially. A registered
output prevents glitches from propagating onto the GC board.
Interrupt sources and IR bit assignments:
| IR bit | Name | Set by | Cleared by |
|---|---|---|---|
| 7 | RBFI | RXFrameAssembler when ring full | GC write-1-to-clear |
| 4 | TEI | TXFrameDrain on TX error | GC write-1-to-clear |
| 2 | TI | tx_irq pulse from sync | GC write-1-to-clear |
| 1 | RI | rx_irq pulse from sync | GC write-1-to-clear |
The GC typically masks in IMR: 0x86 = 0b10000110 (RBFI | TI | RI).
18. EEPROM / MAC Address
The GC software reads the MAC address from the 93C46 EEPROM during initialisation (bit-banging through register 0x1C). It then writes the MAC to PAR0–PAR5.
Recommended approach for initial implementation:
Skip full 93C46 emulation. Pre-populate regs[0x1C] with a pattern that makes
the EEPROM read return a valid MAC. Use Nintendo's OUI 00:09:BF for the first
3 bytes, with locally administered bits for the last 3:
MAC: 00:09:BF:00:00:01
Verify against Swiss source whether it validates the MAC read from EEPROM or accepts whatever PAR0–PAR5 contains. If it re-reads EEPROM after writing PAR, a full 93C46 model is required. If it only uses PAR0–PAR5, pre-populating the register file is sufficient.
MAC address propagation:
When the GC writes PAR0–PAR5, forward the new MAC to the W5500 SHAR register
via the sync domain. Use a 6-byte AsyncFIFO or a dedicated MAC update pulse.
The W5500 uses SHAR as its source MAC for all transmitted frames.
19. iCE40UP5K Resource Budget
| Resource | Available | Estimated use | Margin |
|---|---|---|---|
| Logic cells (4-LUT + FF) | 5280 | ~1800 | 66% free |
| EBR (4 Kbit blocks) | 30 (120 Kbit) | 4 (FIFOs) | 26 free |
| SPRAM (32 KB banks) | 4 (128 KB) | 1 bank for ring buffer | 3 free |
| PLL | 1 | 1 (for exi domain) | 0 free |
| SB_HFOSC | 1 | 1 (sync domain) | 0 free |
| I/O pins | 39 usable | ~14 (EXI:5 + W5500:6 + misc:3) | 25 free |
Logic cell breakdown:
| Module | Estimated cells |
|---|---|
| SPIMode3Slave | 90 |
| BBARegisterFile FSM + decode | 250 |
| Register file (512 × 8b) | ~200 (distributed RAM) |
| AsyncFIFO × 8 | 400 |
| PulseSynchronizer × 4 | 40 |
| FFSynchronizer × 5 | 30 |
| SPRAMArbiter | 80 |
| RXFrameAssembler | 200 |
| TXFrameDrain | 150 |
| W5500SPIMaster | 200 |
| EEPROMModel | 100 |
| Misc glue | 60 |
| Total | ~1800 |
iCE40UP5K fmax with nextpnr: typically 60–80 MHz for logic of this complexity.
The exi domain at 96 MHz is the tightest. If nextpnr fails to close timing:
- First option: reduce to 64 MHz
exidomain (icepll alternative). - Second option: reduce EXI bus speed in Swiss settings to 16 MHz (clock index 4 instead of 5), halving the FPGA timing requirement.
- Third option: add pipeline registers on the critical address decode path.
20. PCB / Connector Notes
Interposer PCB
A simple pass-through interposer PCB connects the GC SP1 slot to the iCEbreaker via a ribbon cable or header.
Required PCB spec:
- Thickness: 1.2 mm (not standard 1.6 mm — critical for fit)
- Copper finish: ENIG (gold) — prevents oxidation on edge contacts
- Board material: FR4 standard
Footprint source: Copy the edge connector footprint from
github.com/silverstee1/SP1ETH KiCad files. Do not design from scratch.
The staggered dual-row geometry requires exact pad positions that have been
physically verified. Cross-reference with the ETH2SP1 LaserBear open files.
Additional interposer components:
- 10 kΩ resistor: EXTIN (pin 1) to 3.3V (pin 7) — device detect
- 100 µF capacitor: 3.3V to GND — bulk decoupling near connector
- 100 nF capacitor × 2: additional HF decoupling
- ESD protection diode array: on CLK, MOSI, MISO, CS lines (optional but recommended — the GC motherboard is difficult to repair if damaged)
Do not connect pin 5 (12V) to anything on the FPGA side.
iCEbreaker connection
The interposer PCB exposes EXI signals on a 2.54 mm pitch 8-pin header. Connect to iCEbreaker PMOD1 connector using a short ribbon cable. Keep the cable as short as possible (< 10 cm) to minimize signal integrity issues at 32 MHz.
21. Known Hardware Quirks
EXI DMA bug
The GC's EXI DMA engine has a bug where data on the MISO line during a DMA write is clocked back out with a 1-bit shift. This only affects GC software doing DMA writes (rare). Swiss uses IMM (immediate) mode transfers. No FPGA workaround needed.
SPI Mode 3 vs Mode 0
Every other EXI device (memory cards, RTC, IPL) uses SPI Mode 0. The BBA is the only device using Mode 3. Do not share the SPI slave implementation with other EXI device implementations without parameterising CPOL/CPHA.
MISO tristate
On real hardware, MISO (DO) is tristated when CS is deasserted. Other EXI devices on the same bus would otherwise conflict. On this FPGA implementation, drive MISO high (not tristated) when CS is deasserted. The iCE40UP5K does not easily support pin tristate from user logic — drive high is safe because the BBA occupies a dedicated CS line (SP1 device 2) separate from memory cards and the RTC.
GC hardware revisions
- DOL-001 (original): SP1 present, BBA compatible
- DOL-001 Rev B: SP1 physically absent on motherboard but case hole present
- DOL-101 (later): SP1 present again (but Serial Port 2 absent)
- Panasonic Q: SP1 present
Swiss supports all revisions with SP1 via the EXI hypervisor driver (required from Swiss build 1788 onwards for BBA emulation features).
EXI clock index
The real BBA uses clock index 5 (32 MHz). Swiss allows configuring a lower
clock index for compatibility. If 96 MHz fmax is not achievable, instruct users
to configure Swiss to use clock index 4 (16 MHz EXI), which requires only
32 MHz exi domain and is trivially achievable.
22. File Structure
gc_bba_fpga/
├── exi_bba/
│ ├── __init__.py
│ ├── spi_mode3_slave.py # SPIMode3Slave
│ ├── bba_register_file.py # BBARegisterFile + register constants
│ ├── spram_arbiter.py # SPRAMArbiter
│ ├── rx_frame_assembler.py # RXFrameAssembler
│ ├── tx_frame_drain.py # TXFrameDrain
│ ├── w5500_spi_master.py # W5500SPIMaster
│ ├── eeprom_model.py # EEPROMModel (93C46)
│ └── bba_top.py # BBATop + clock domain setup
├── sim/
│ ├── sim_spi_slave.py # SPIMode3Slave unit test
│ ├── sim_register_file.py # BBARegisterFile unit test
│ ├── sim_bba_init.py # Full init sequence simulation
│ ├── sim_rx_path.py # RX data path end-to-end test
│ ├── sim_tx_path.py # TX data path end-to-end test
│ ├── gc_master_model.py # GC CPU SPI master simulation model
│ ├── w5500_slave_model.py # W5500 SPI slave simulation model
│ └── ethernet_frame_gen.py # Test frame generator
├── platform/
│ ├── icebreaker_bba.py # iCEbreaker platform with BBA resources
│ └── interposer_pinmap.py # SP1 ↔ PMOD pin mapping
├── pcb/
│ ├── interposer/ # KiCad project for interposer PCB
│ └── README.md # PCB ordering instructions (1.2mm, ENIG)
├── constraints/
│ └── timing.py # nextpnr timing constraints (if needed)
├── tests/
│ └── test_bba.py # pytest suite
├── build.py # Amaranth build script
└── README.md
23. Simulation Strategy
Each module should have a standalone simulation before integration. All
simulations use Amaranth's Simulator with two clock domains:
sim.add_clock(1/96e6, domain="exi") and sim.add_clock(1/48e6, domain="sync").
Unit tests
SPIMode3Slave: Drive CLK/MOSI/CS manually from a process in the exi
domain. Verify rx_byte/rx_valid match sent data. Verify spi_miso
matches pre-loaded tx_byte. Test CS abort mid-byte.
BBARegisterFile: Use a GCMasterModel (SPI Mode 3 master process) to
perform read/write transactions. Verify register writes are stored. Verify
register reads return correct values. Verify IR bit setting and clearing.
Verify NWAYS returns 0x17. Verify ID query returns 0x04020200.
SPRAMArbiter: Issue concurrent EXI reads and ETH writes. Verify ETH writes win arbitration. Verify EXI reads complete within 3 sync cycles. Verify no data corruption.
RXFrameAssembler: Feed a known ethernet frame byte-by-byte. Verify SPRAM contents match expected descriptor + frame layout. Verify RWP advances by correct page count. Verify rx_irq fires.
TXFrameDrain + W5500SPIMaster: Issue TX frame from tx_bytes FIFO. Use
W5500SlaveModel process to simulate W5500 responses. Verify frame bytes
arrive at W5500 correctly. Verify tx_irq fires after SEND_OK.
Integration test
sim_bba_init.py: Full GC init sequence (all 11 steps from Section 11).
GCMasterModel performs every transaction. Verify no stalls, correct responses.
sim_rx_path.py: W5500SlaveModel delivers a 64-byte test frame.
GCMasterModel polls IR, reads RWP, bulk-reads the frame, advances RRP.
Verify GC receives identical bytes to what W5500 sent.
sim_tx_path.py: GCMasterModel writes a 64-byte frame through TXDATA.
W5500SlaveModel captures it. Verify W5500 receives identical bytes.
24. Open Issues and Extension Points
Must resolve before first synthesis
- Exact PLL parameters for iCE40UP5K: run
icepll -i 12 -o 96and confirm the output is achievable (VCO in 533–1066 MHz range). - SP1 connector footprint: clone SP1ETH repo, extract pad positions, verify stagger geometry and pitch before PCB layout.
- W5500 Pmod module pin mapping: confirm which Pmod pins INT_N and RST_N appear on (varies by module vendor).
- Swiss version requirement: confirm Swiss build ≥ 1788 for BBA hypervisor support. Earlier builds use a different driver that may have different register access patterns.
Known limitations
- Single TX buffer (MX98730EC has two). ST1:ST0 = 01 and 10 are treated identically. No known GC title relies on dual TX buffering.
- No DMA mode support. IMM mode only. Matches real-world Swiss usage.
- No Serial Port 2 support (different connector, different project scope).
- 93C46 EEPROM emulation is simplified (hardcoded MAC). A full bit-bang model can be added later if Swiss requires it.
- RX ring buffer is 15 pages (3840 bytes). The real BBA has 4KB. Frames larger than ~3800 bytes (jumbo frames) will be dropped. Standard 1500-byte MTU frames fit in at most 7 pages — no practical issue.
Extension points
- Larger ring buffer: Use additional SPRAM banks for more RX buffering.
- Multiple sockets: W5500 supports 8 sockets; only socket 0 in MACRAW mode is used here.
- Link status passthrough: Read W5500 PHYCFGR register and forward real link status to NWAYS instead of hardcoding 0x17.
- Statistics counters: LTPS/LRPS (last packet status) are currently 0x00. A more complete implementation would fill these from W5500 socket status.
- Serial Port 2 support: Different physical connector and EXI channel but same FPGA logic; would require a second interposer PCB.