Files
rebbarb/docs/gc_bba_fpga_design.md
T
2026-06-13 18:35:38 +02:00

57 KiB
Raw Blame History

GameCube BBA FPGA Replacement — Design Document

Target hardware: iCEbreaker (Lattice iCE40UP5K)
Target language: Amaranth HDL (Python)
Toolchain: Yosys + nextpnr-ice40 + IceStorm
Purpose: Replace the Nintendo GameCube Broadband Adapter (DOL-015) with an FPGA-based implementation, exposing a W5500 100BASE-TX ethernet chip to the GC over the EXI (Expansion Interface) serial bus, enabling game ISO streaming via Swiss homebrew.


Table of Contents

  1. System Overview
  2. Protocol References
  3. Physical Interface — SP1 Edge Connector
  4. Clock Domains
  5. Clock Domain Crossing Strategy
  6. Module Hierarchy
  7. Module Specifications
  8. Memory Map
  9. EXI Transaction Protocol
  10. BBA Register Reference
  11. Initialisation Sequence
  12. RX Data Path — Detailed Flow
  13. TX Data Path — Detailed Flow
  14. SPRAM Layout
  15. Critical Timing Constraints
  16. SPRAM Read Prefetch Pipeline
  17. Interrupt Handling
  18. EEPROM / MAC Address
  19. iCE40UP5K Resource Budget
  20. PCB / Connector Notes
  21. Known Hardware Quirks
  22. File Structure
  23. Simulation Strategy
  24. Open Issues and Extension Points

1. System Overview

The GameCube Broadband Adapter (BBA) is a hardware peripheral that plugs into Serial Port 1 (SP1) on the underside of the GameCube. It presents a network interface to the GC CPU using a Macronix MX98730EC custom IC. GC software (primarily Swiss homebrew) communicates with the BBA through a memory-mapped register interface accessed over the EXI serial bus.

This project replaces the MX98730EC with an iCEbreaker FPGA that emulates the register interface, and connects to a W5500 ethernet chip (on a Pmod-compatible module) for actual network communication.

High-level data flow

GameCube CPU
    │  EXI (SPI Mode 3, 32 MHz, Serial Port 1)
    ▼
iCEbreaker FPGA
    ├── exi domain (64 MHz): SPI slave, register file, prefetch pipeline
    └── sync domain (48 MHz): SPRAM arbiter, RX assembler, TX drain, W5500 driver
            │  SPI (up to 40 MHz)
            ▼
        W5500 Pmod module (100BASE-TX ethernet)
            │  RJ-45
            ▼
        Network

What this design does NOT implement

  • A network stack. The GC CPU runs TCP/IP. The FPGA is a dumb MAC bridge.
  • IP address awareness. The FPGA never parses ethernet frame payloads.
  • The GC's DMA engine quirk (only relevant to GC-side software).
  • Video/audio streaming logic (handled by Swiss on the GC CPU side).

2. Protocol References

Source Content
YAGCD §2.4.1.4 SP1 (P6) connector pinout
YAGCD §5.9 EXI bus register descriptions
YAGCD §10.8 MX98730EC (BBA chip) register map
Dolphin source EXI_DeviceEthernet.h Register offsets, init sequence, RX/TX flow
Dolphin source EXI_DeviceEthernet.cpp Transaction encoding, interrupt logic
Swiss source bba.c GC-side driver, exact register access patterns
MX98730EC datasheet Unavailable publicly; YAGCD is the primary reference
W5500 datasheet SPI interface, register map, socket model
iCE40UP5K datasheet SPRAM timing, PLL parameters, I/O standards

Critical implementation note: The MX98730EC uses SPI Mode 3 (CPOL=1, CPHA=1). CLK idles HIGH. Data is sampled on the FALLING edge of CLK and set up on the RISING edge. This is the opposite of memory cards and the RTC chip, which use SPI Mode 0. Getting this wrong means the GC will never enumerate the device.


3. Physical Interface — SP1 Edge Connector

Slot characteristics

  • Dual-sided PCB edge connector
  • Contacts on both top and bottom faces of the PCB edge
  • Top and bottom contact rows are staggered (offset by half a pitch), not mirrored — similar to ISA/PCI card edge geometry
  • PCB must be ordered at 1.2 mm thickness with ENIG (gold) finish
  • Keying notch at top-right corner of housing (when looking into console socket with front of console facing right)

Connector footprint

Exact pad positions and pitch must be taken from the SP1ETH KiCad project (github.com/silverstee1/SP1ETH). Do not attempt to derive dimensions from YAGCD alone — the document lists signals but not physical geometry. Cross-reference against the ETH2SP1 (LaserBear) open model files as a second source.

Key parameters to verify from those files before PCB layout:

  • Contact pitch (expected: 2.0 mm or 2.54 mm — measure from KiCad file)
  • Stagger offset between top and bottom rows
  • Total contact count per side (expected: 6 per side = 12 total, or 12 per side = 24 total with duplicated power/ground)
  • Insertion depth from board edge to first contact
  • Board width at connector edge

Signal pinout (YAGCD §2.4.1.4)

Pin numbering: looking into the console socket, front of console to the right, pin 1 is on the left. On the adapter PCB (component side up, inserting down), pin 1 is also on the left — numbering does not mirror.

Pin Signal Direction Notes
1 EXTIN Adapter → GC Device detect/sense. Tie to 3.3V via 10 kΩ resistor. Without this the GC does not enumerate the device.
2 GND Shield ground
3 INT Adapter → GC Active-low interrupt to GC CPU. Assert when IR & IMR != 0.
4 CLK GC → Adapter SPI clock, up to 32 MHz, idles HIGH (Mode 3)
5 12V 12 V supply from GC. Do not connect to FPGA I/O. Leave unconnected or route to a test point only.
6 DO (MISO) Adapter → GC Serial data out: adapter drives, GC samples
7 3.3V 3.3 V supply (~200 mA available combined with pin 8)
8 3.3V 3.3 V supply (parallel with pin 7)
9 DI (MOSI) GC → Adapter Serial data in: GC drives, adapter samples
10 CS GC → Adapter Chip select, active low. Delineates each transaction.
11 GND Signal ground
12 GND Signal ground

Power budget: Pins 7+8 together supply 3.3 V. The iCEbreaker draws ~80 mA active, the W5500 ~150 mA peak. Total ~230 mA. The GC's 3.3 V rail on SP1 is rated for the original BBA which also drew ~200 mA, so headroom is tight. Add a 100 µF bulk capacitor on the interposer PCB close to the FPGA power pins.

Voltage levels: All EXI signals are 3.3 V logic. The iCEbreaker I/O is 3.3 V. The W5500 is 3.3 V. No level shifting required anywhere in this design.


4. Clock Domains

The design uses two clock domains. The iCE40UP5K has one PLL and one internal 48 MHz oscillator (SB_HFOSC).

Domain table

Domain Frequency Source Purpose
exi 64 MHz PLL (12 MHz × 16 / 3) SPI Mode 3 slave, BBA register file, prefetch pipeline
sync 48 MHz SB_HFOSC internal oscillator SPRAM arbiter, RX/TX ethernet engines, W5500 SPI master

Rationale

Why 64 MHz for exi?
The EXI bus runs at 32 MHz. The SPI Mode 3 slave needs to detect CLK edges and respond on the correct edge. Running the exi domain at 2× the bus rate (64 MHz) gives two FPGA ticks per EXI CLK half-period. One tick for the setup phase (MOSI→shift register, prepare MISO), one tick for the sample/drive phase. This is the minimum oversampling ratio that cleanly implements Mode 3 without combinatorial timing risk on the MISO output path.

Why 48 MHz for sync?
The iCE40UP5K's internal 48 MHz oscillator (SB_HFOSC) is available without consuming the PLL. This leaves the one PLL free for the 64 MHz exi domain. The W5500 SPI can run up to 80 MHz but we drive it at 24 MHz (48 MHz ÷ 2 via clock enable), which is well within spec and requires no additional PLL output.

PLL configuration (iCE40UP5K)

Input:  12 MHz crystal (iCEbreaker on-board)
DIVR:   0   (input divider:   12 MHz / (0+1) = 12 MHz)
DIVF:   63  (feedback mult:   12 MHz × (63+1) = 768 MHz VCO)
DIVQ:   3   (output divider:  768 MHz / 2^3   = 96 MHz)
... actually for 64 MHz:
DIVR:   0
DIVF:   15  (12 × 16 = 192 MHz VCO)  -- VCO must be 5331066 MHz on UP5K

The iCE40UP5K VCO range is 5331066 MHz. To reach 64 MHz cleanly:

DIVR = 0  → F_pfd = 12 MHz
DIVF = 63 → F_vco = 12 × (63+1) = 768 MHz  (within range)
DIVQ = 3  → F_out = 768 / 8 = 96 MHz        (too fast)

Better: target 64 MHz
DIVF = 53 → F_vco = 12 × 54 = 648 MHz
DIVQ = 3  → F_out = 648 / 8 = 81 MHz        (still off)

Correct combination:
DIVR = 0, DIVF = 42, DIVQ = 3
F_vco = 12 × 43 = 516 MHz  (just below range minimum — not valid)

Use:
DIVR = 0, DIVF = 63, DIVQ = 3  → 96 MHz, then use clock enable for /1.5
-- or --
Accept 96 MHz exi domain (3× bus rate instead of 2×): more margin, same logic
-- or --
DIVR = 2, DIVF = 63, DIVQ = 2  → (12/3) × 64 / 4 = 64 MHz exactly
  F_pfd = 4 MHz, F_vco = 4×64 = 256 MHz — below 533 MHz minimum, invalid

Recommended: use 96 MHz (DIVR=0, DIVF=63, DIVQ=3) for exi domain.
At 96 MHz there are 3 ticks per 32 MHz EXI half-period.
Adjust SPIMode3Slave edge detection accordingly (3-tick phases instead of 2).

Implementation note: Verify exact PLL parameters with icepll tool:

icepll -i 12 -o 64    # finds closest achievable output
icepll -i 12 -o 96    # alternative

The agent implementing this should run icepll and use whatever output it recommends, then adjust the SPIMode3Slave tick counts accordingly.

Reset strategy

Each domain has its own reset, deasserted synchronously using ResetSynchronizer from amaranth.lib.cdc:

# In platform create_missing_domain("exi"):
m.submodules.exi_rst = ResetSynchronizer(
    arst   = ResetSignal("sync"),
    domain = "exi",
)

The sync domain reset comes from the iCEbreaker's on-chip power-on reset (SB_GB driven by SB_HFOSC, which has built-in POR).


5. Clock Domain Crossing Strategy

All signals crossing between exi and sync domains must use one of the following CDC primitives from amaranth.lib.cdc. Never pass a raw multi-bit signal directly between domains — only one bit may change per clock crossing.

CDC primitive selection guide

Signal type Primitive Latency
Single bit, slow-changing (flags, status) FFSynchronizer 2 dest clocks
Single-cycle pulse / event PulseSynchronizer ~34 dest clocks
Multi-bit data stream (packet bytes) AsyncFIFO ~34 dest clocks
Reset deassertion ResetSynchronizer 2 dest clocks
Async external pin (CLK, MOSI, CS) FFSynchronizer 2 dest clocks

CDC inventory for this design

Signal From To Primitive Notes
EXI CLK pin async exi FFSynchronizer stages=2, reset=1 (CLK idles high)
EXI MOSI pin async exi FFSynchronizer stages=2
EXI CS pin async exi FFSynchronizer stages=2, reset=1 (CS idles high)
SPRAM read request (addr) exi sync AsyncFIFO 16-bit wide, depth=4 Prefetch pipeline
SPRAM read result (data) sync exi AsyncFIFO 8-bit wide, depth=4 Prefetch pipeline
TX packet bytes exi sync AsyncFIFO 8-bit wide, depth=64 GC→ethernet
TX packet start/len exi sync AsyncFIFO 16-bit wide, depth=4 Frame delimiter
RX packet bytes sync exi AsyncFIFO 8-bit wide, depth=64 ethernet→GC
RWP update (new value) sync exi AsyncFIFO 8-bit wide, depth=4 After frame committed
RRP update (new value) exi sync AsyncFIFO 8-bit wide, depth=4 After GC advances pointer
IR[RI] set (RX ready) sync exi PulseSynchronizer Triggers RI interrupt
IR[TI] set (TX done) sync exi PulseSynchronizer Triggers TI interrupt
NCRA reset pulse exi sync PulseSynchronizer Resets ethernet engine
exi_int_n output exi physical pin Direct (output register) Active-low to GC

Critical rule: The register file lives entirely in the exi domain. The sync domain never directly reads or writes EXI registers. All interaction between the two domains goes through the AsyncFIFOs and PulseSynchronizers listed above. This ensures the GC's register reads always respond within the exi domain without waiting on CDC latency.


6. Module Hierarchy

BBATop                          (top-level, sets up clock domains)
├── SPIMode3Slave               (exi domain — bit engine)
├── BBARegisterFile             (exi domain — register decode + response)
│   ├── [AsyncFIFO: spram_req]  (exi→sync: read address requests)
│   ├── [AsyncFIFO: spram_rsp]  (sync→exi: read data responses)
│   ├── [AsyncFIFO: tx_bytes]   (exi→sync: TX packet data)
│   ├── [AsyncFIFO: tx_ctrl]    (exi→sync: TX frame length)
│   ├── [AsyncFIFO: rx_wptr]    (sync→exi: RWP updates)
│   ├── [AsyncFIFO: rx_rptr]    (exi→sync: RRP updates from GC)
│   ├── [PulseSynchronizer: rx_irq]   (sync→exi)
│   ├── [PulseSynchronizer: tx_irq]   (sync→exi)
│   └── [PulseSynchronizer: ncra_rst] (exi→sync)
├── SPRAMArbiter                (sync domain — owns all SPRAM)
├── RXFrameAssembler            (sync domain — ethernet→SPRAM)
├── TXFrameDrain                (sync domain — SPRAM→ethernet)
├── W5500SPIMaster              (sync domain — SPI master to W5500)
└── EEPROMModel                 (exi domain — 93C46 bit-bang model)

7. Module Specifications

7.1 SPIMode3Slave

Domain: exi
File: exi_bba/spi_mode3_slave.py

Implements a byte-oriented SPI Mode 3 slave. Handles CLK/MOSI/MISO/CS at the bit level and presents a clean byte interface to BBARegisterFile.

SPI Mode 3 timing recap:

  • CLK idles HIGH
  • MOSI is set up by master before the FALLING edge
  • Slave samples MOSI on the FALLING edge of CLK
  • Slave drives MISO on the RISING edge of CLK (ready for master to sample on next falling edge)

Port list:

Port Width Dir Domain Description
spi_clk 1 in async→exi Raw SPI clock from GC, synchronized internally
spi_mosi 1 in async→exi Raw MOSI from GC, synchronized internally
spi_miso 1 out exi MISO output to GC
spi_cs_n 1 in async→exi Raw CS from GC (active low), synchronized internally
rx_byte 8 out exi Last complete received byte
rx_valid 1 out exi Pulses 1 cycle when rx_byte contains a new byte
tx_byte 8 in exi Byte to transmit; sampled when tx_load pulses
tx_load 1 out exi Requests next TX byte from upstream

Internal behaviour:

  1. Instantiate FFSynchronizer stages=2 on each of spi_clk, spi_mosi, spi_cs_n. Reset values: spi_clk=1, spi_cs_n=1.
  2. Register the synchronized signals one further cycle to form edge detectors: rising_clk = clk_s & ~clk_prev, falling_clk = ~clk_s & clk_prev.
  3. On CS falling edge: load tx_byte into internal shift register, pulse tx_load, reset bit_ctr to 0.
  4. On FALLING CLK edge (sample): shift mosi_s into rx_shift MSB-first, increment bit_ctr. When bit_ctr == 8: register rx_shift into rx_byte, pulse rx_valid, reset bit_ctr to 0, pulse tx_load to request next byte.
  5. On RISING CLK edge (drive): shift tx_shift left by 1, drive MSB onto spi_miso.
  6. On CS rising edge: drive spi_miso high (idle), reset state.

Note on tx_load timing: tx_load pulses at two points — CS assertion (loads first byte before any bits are clocked) and after each complete received byte (loads the next byte). The upstream (BBARegisterFile) must register the next TX byte within one exi clock of tx_load pulsing.


7.2 BBARegisterFile

Domain: exi (with AsyncFIFO interfaces to sync)
File: exi_bba/bba_register_file.py

Decodes EXI transactions (2-byte header + N data bytes), reads/writes the BBA register space, and manages all CDC crossings to the sync domain.

EXI transaction decoder FSM

States: HEADER0HEADER1DATA → (back to HEADER0)

Header format:

Byte 0:  [7]   = write flag  (1 = write, 0 = read)
         [6:0] = addr[12:6]  (upper 7 bits of 13-bit address)

Byte 1:  [7:2] = addr[5:0]   (lower 6 bits of 13-bit address)
         [1:0] = xfer_len    (0=1 byte, 1=2 bytes, 2=3 bytes, 3=4 bytes)

Full address = { byte0[6:0], byte1[7:2] } = 13 bits → range 0x00000x1FFF.

HEADER0 state: Wait for rx_valid. Latch rx_byte as hdr0.

HEADER1 state: Wait for rx_valid. Decode address and flags. For read transactions, immediately issue SPRAM prefetch request if address ≥ 0x100 (ring buffer region). Load tx_byte with the register value for addresses < 0x100 (register file region). Transition to DATA.

DATA state (write path): For each rx_valid, write rx_byte to regs[addr + byte_ctr] and handle side effects (see register side effects table). Increment byte_ctr. When byte_ctr == xfer_len, go to HEADER0.

DATA state (read path): Drive tx_byte from prefetch result (addresses ≥ 0x100) or directly from regs[] (addresses < 0x100). On each tx_load, advance the read pointer and issue next prefetch. When byte_ctr == xfer_len, go to HEADER0.

CS deassertion abort: In any state, if cs_n rises, return to HEADER0.

Register file storage

Registers 0x000x1FF are implemented as an Array of 8-bit Signals (512 registers). In synthesis this maps to distributed RAM on iCE40. Not SPRAM — SPRAM is reserved for the packet ring buffer.

The register file is entirely in the exi domain. No CDC is needed to read or write registers 0x000xFF.

Register side effects

Register Write side effect
NCRA (0x00) If bit 0 (RESET) written: pulse ncra_rst PulseSynchronizer to sync domain. Self-clear bit 0 on next cycle. Reset TX/RX pointers in register file.
IR (0x09) Write-1-to-clear: IR <= IR & ~written_value
RRP (0x180x19) After GC writes new RRP value, push value into rx_rptr AsyncFIFO (exi→sync) so RX engine knows GC has consumed those pages
TWD (0x340x37) Bytes written here are the TX frame length field (2 bytes little-endian). Latch for TX engine.
TXDATA (0x48) Each byte written goes into tx_bytes AsyncFIFO (exi→sync). When byte_ctr == xfer_len on last write chunk, push frame length into tx_ctrl AsyncFIFO.

Interrupt register update (from sync domain)

  • rx_irq PulseSynchronizer arriving from sync: set IR[1] (RI bit)
  • tx_irq PulseSynchronizer arriving from sync: set IR[2] (TI bit), clear NCRA[3:2] (ST1:ST0 — transmit start bits)

Interrupt output

exi_int_n <= ~|(IR & IMR)   # active-low: assert when any unmasked bit set

Register this one flip-flop in the exi domain. The physical pin is a direct output — no CDC needed because the GC only reads the interrupt state via polling IR over EXI (which is already in the exi domain) or via the interrupt line which the GC CPU samples asynchronously.

NWAYS register

Always return 0x17 (link up, 100 Mbps, full duplex, autoneg complete). The GC's BBA driver polls NWAYS after reset to confirm link status before enabling RX. Hardcode this value — do not attempt to forward real link status from the W5500.

# NWAYS = 0x17:
# bit 4 (LS100)  = 1: 100BASE-TX link up
# bit 2 (ANCLPT) = 1: autoneg complete
# bit 1 (100TXH) = 1: 100BASE-TX half (also set in practice)
# bit 0 (LS10)   = 1: 10BASE-T (also reported)

7.3 SPRAMArbiter

Domain: sync
File: exi_bba/spram_arbiter.py

Arbitrates access to the iCE40UP5K's 128 KB SPRAM between two clients:

  • Client A (EXI read): Issues read requests from the prefetch pipeline (spram_req AsyncFIFO). Must service requests fast enough to keep the prefetch pipeline full.
  • Client B (ETH write): The RXFrameAssembler writes incoming ethernet frames into the ring buffer area.

Priority: ETH write wins over EXI read when both request simultaneously. This is safe because:

  1. The GC only reads a ring buffer page after RWP has advanced past it (i.e., the ETH engine has finished writing that page).
  2. Even if an EXI read is delayed by one SPRAM cycle, the prefetch pipeline has enough depth (4 entries) to absorb the stall without the SPI slave running out of data.

SPRAM interface (iCE40UP5K SB_SPRAM256KA):

WREN   : write enable
CHIPSELECT : always 1
CLOCK  : sync domain clock (48 MHz)
STANDBY : 0
SLEEP  : 0
POWEROFF_N : 1
ADDRESS[13:0] : byte address divided by 2 (SPRAM is 16-bit wide)
DATAIN[15:0] : write data (use only [7:0] for byte writes, mask upper byte)
MASKWREN[3:0] : byte enable (0b0011 for lower byte, 0b1100 for upper byte)
DATAOUT[15:0] : read data

The SPRAM is 16-bit wide. Byte addressing is done via MASKWREN. For an 8-bit write to address A: set ADDRESS = A >> 1, MASKWREN = (A & 1) ? 0b1100 : 0b0011, write data in the appropriate byte of DATAIN.

Read latency: SPRAM has 1-cycle synchronous read latency. The result of a read issued at cycle N is valid at cycle N+1. The arbiter must account for this when responding to the prefetch pipeline.

Port list:

Port Width Dir Notes
exi_req_addr 16 in From spram_req AsyncFIFO (exi→sync)
exi_req_valid 1 in FIFO r_rdy
exi_req_ready 1 out FIFO r_en (pop when serviced)
exi_rsp_data 8 out To spram_rsp AsyncFIFO (sync→exi)
exi_rsp_valid 1 out FIFO w_en
eth_wr_addr 16 in From RXFrameAssembler
eth_wr_data 8 in Byte to write
eth_wr_valid 1 in Write request
eth_wr_ready 1 out Write accepted this cycle

7.4 RXFrameAssembler

Domain: sync
File: exi_bba/rx_frame_assembler.py

Receives complete ethernet frames from W5500SPIMaster and writes them into the SPRAM ring buffer in the correct MX98730EC format.

Ring buffer layout (in SPRAM):

SPRAM address 0x01000x0FFF  (3840 bytes = 15 × 256-byte pages)
  Page 0x01: first usable RX page
  Page 0x0F: last usable RX page (RHBP default)
  Pages wrap: after 0x0F, next is 0x01 (not 0x00, which is reserved)

Each page is 256 bytes. A received frame may span multiple pages.

Frame descriptor (first 4 bytes of first page):

Byte 0: LRPS value (Last Received Packet Status — set to 0x00 or actual status)
Byte 1: 0x00
Byte 2: frame_length[15:8]  (big-endian, includes descriptor bytes)
Byte 3: frame_length[7:0]
Bytes 4+: raw ethernet frame data (DA, SA, EtherType, payload, FCS)

Flow:

  1. Wait for W5500SPIMaster to signal frame available (rx_sof pulse).
  2. Read frame bytes from W5500 frame FIFO.
  3. Compute how many 256-byte pages are needed: pages_needed = ceil((frame_length + 4) / 256)
  4. Check that (RWP + pages_needed) mod 16 != RRP (ring not full). If full, drop the frame and increment a drop counter.
  5. Write 4-byte descriptor at SPRAM address 0x100 + (RWP * 0x100).
  6. Write frame bytes sequentially, wrapping pages at 256-byte boundaries. Page wrap: next_page = (current_page % 15) + 1 (pages 115, skip 0).
  7. After last byte written, update RWP in the rx_wptr AsyncFIFO (sync→exi). The exi domain will update the RWP register from this FIFO.
  8. Pulse rx_irq PulseSynchronizer to exi domain.

MAC address filter:

Before writing a frame, check destination MAC against PAR0PAR5 (broadcast FF:FF:FF:FF:FF:FF always accepted). The GC will typically configure PAR0PAR5 via EXI after boot, so the BBARegisterFile must expose these to the RXFrameAssembler. Pass them via a dedicated small AsyncFIFO or by reading them from a shared register shadow (6 bytes, sync domain copy updated when GC writes PAR0PAR5). Multicast hash table (MAR0MAR7) filtering is optional for initial implementation — accept all frames (promiscuous mode) until the GC configures the filter.


7.5 TXFrameDrain

Domain: sync
File: exi_bba/tx_frame_drain.py

Drains the TX byte FIFO (fed from the exi domain as the GC writes to TXDATA register 0x48) and forwards complete frames to W5500SPIMaster.

Flow:

  1. Wait for tx_ctrl AsyncFIFO to contain a frame length value. This is pushed by BBARegisterFile when the GC has written the complete TX frame (i.e., NCRA ST1:ST0 transitions to 01 or 10).
  2. Pop frame_length from tx_ctrl.
  3. Pop exactly frame_length bytes from tx_bytes AsyncFIFO.
  4. Forward bytes to W5500SPIMaster TX interface with SOF/EOF framing.
  5. Wait for W5500SPIMaster to signal TX complete.
  6. Pulse tx_irq PulseSynchronizer to exi domain.

NCRA ST bits: The GC writes NCRA with ST1:ST0 = 01 (start transmit from buffer 1) or 10 (start transmit from buffer 2). The BBA hardware has two TX buffers; this implementation uses a single TX FIFO and ignores the buffer selection. When ST1:ST0 goes non-zero, treat it as a TX trigger regardless of which bits are set. The BBARegisterFile should push the frame length into tx_ctrl on this transition.


7.6 W5500SPIMaster

Domain: sync
File: exi_bba/w5500_spi_master.py

Implements the W5500 SPI master interface. The W5500 uses SPI Mode 0 (CPOL=0, CPHA=0), opposite to the BBA EXI interface.

W5500 SPI frame format:

Byte 01: Address (16-bit, big-endian)
Byte 2:   Control byte:
            [7:3] = Block Select (BSB):
                    00000 = Common Register
                    00001 = Socket 0 Register
                    00010 = Socket 0 TX buffer
                    00011 = Socket 0 RX buffer
            [2]   = Read/Write (0=read, 1=write)
            [1:0] = Operation Mode (00=variable, 01=fixed 1B, 10=fixed 2B, 11=fixed 4B)
Byte 3+:  Data bytes

W5500 configuration (to be performed once on NCRA reset):

1. Write MR (Mode Register, 0x0000): 0x80  — software reset
2. Wait ~1 ms
3. Write SHAR (Source MAC, 0x00090x000E): copy from PAR0PAR5 register shadow
4. Write S0_MR (Socket 0 Mode, 0x4000): 0x04  — MACRAW mode (raw ethernet)
5. Write S0_CR (Socket 0 Command, 0x4001): 0x01 — OPEN
6. Write S0_IMR (Socket 0 Interrupt Mask, 0x4024): 0x04 | 0x01  — RECV | SEND_OK

MACRAW mode: In MACRAW mode the W5500 Socket 0 sends and receives raw ethernet frames including the full MAC header and FCS. This is exactly what the MX98730EC presents to the GC. No IP stack runs in the FPGA.

RX polling: The W5500 asserts its INT_N pin (active low) when a frame arrives. Connect W5500 INT_N to an FPGA input pin and use it to trigger the RXFrameAssembler. Alternatively poll S0_IR (Socket 0 Interrupt Register, 0x4002) periodically. The INT_N approach has lower latency and is preferred.

SPI clock rate: Drive W5500 SPI at 24 MHz (sync clock 48 MHz ÷ 2 using a clock enable toggle). The W5500 supports up to 80 MHz so there is ample margin.

Port list:

Port Width Dir Notes
spi_clk 1 out To W5500 CLK pin (SPI Mode 0, idles LOW)
spi_mosi 1 out To W5500 MOSI
spi_miso 1 in From W5500 MISO
spi_cs_n 1 out To W5500 CS (active low)
w5500_int_n 1 in W5500 interrupt (active low)
tx_data 8 in Byte to transmit (from TXFrameDrain)
tx_valid 1 in TX byte available
tx_ready 1 out TX byte consumed
tx_sof 1 in Start of frame marker
tx_eof 1 in End of frame marker
rx_data 8 out Received byte (to RXFrameAssembler)
rx_valid 1 out RX byte available
rx_ready 1 in RX byte consumed
rx_sof 1 out Start of frame
rx_eof 1 out End of frame

7.7 EEPROMModel

Domain: exi
File: exi_bba/eeprom_model.py

Models the 93C46-compatible serial EEPROM that stores the BBA's MAC address. The GC software bit-bangs the EEPROM interface through register 0x1C (EEPROM Interface Register) of the BBA chip.

Register 0x1C bit fields:

[3] EECK  — EEPROM clock
[2] EECS  — EEPROM chip select
[1] EEDI  — EEPROM data in (GC → EEPROM)
[0] EEDO  — EEPROM data out (EEPROM → GC) [read-only]

The GC reads EEDO by reading register 0x1C bit 0.

93C46 protocol summary:

The 93C46 uses a 3-wire serial protocol (SK=clock, CS=select, DI=data in, DO=data out). Commands:

  • READ: start bit (1) + opcode (10) + 6-bit address → 16-bit data out
  • WRITE: start bit (1) + opcode (01) + 6-bit address + 16-bit data
  • EWEN (write enable): start bit (1) + opcode (00) + address (11xxxx)

Each 93C46 word is 16 bits. The MAC address occupies words 02 (6 bytes).

Implementation approach:

Maintain a small ROM of 64 × 16-bit words in the exi domain (as a Const array, synthesises to LUTs). Pre-populate words 02 with the chosen MAC address. Implement a small FSM that watches writes to register 0x1C for the 93C46 protocol, drives EEDO accordingly.

Simpler alternative: Many GC BBA drivers read the EEPROM once at boot and then write the MAC to PAR0PAR5 themselves. Pre-populate PAR0PAR5 in the register file reset state with a valid Nintendo OUI MAC (00:09:BF:xx:xx:xx). Skip a full 93C46 implementation for the first version — if Swiss ignores the EEPROM read result and uses a hardcoded or user-configurable MAC, this is sufficient.


7.8 BBATop

Domain: both
File: exi_bba/bba_top.py

Top-level module. Instantiates all submodules, creates clock domains, connects physical pins.

Clock domain creation:

def elaborate(self, platform):
    m = Module()

    # exi domain: 96 MHz from PLL (3× 32 MHz EXI bus rate)
    exi_domain = ClockDomain("exi")
    m.domains += exi_domain
    pll = platform.get_pll()   # platform-specific PLL primitive
    m.d.comb += exi_domain.clk.eq(pll.clkout)
    m.submodules.exi_rst = ResetSynchronizer(
        arst=ResetSignal("sync"), domain="exi"
    )

    # sync domain: 48 MHz from SB_HFOSC (platform default)
    # Created automatically by iCEbreaker platform

    # Instantiate submodules...
    m.submodules.spi    = spi    = SPIMode3Slave()
    m.submodules.regfile = regfile = BBARegisterFile()
    m.submodules.arbiter = arbiter = SPRAMArbiter()
    m.submodules.rx_asm  = rx_asm  = RXFrameAssembler()
    m.submodules.tx_drn  = tx_drn  = TXFrameDrain()
    m.submodules.w5500   = w5500   = W5500SPIMaster()
    m.submodules.eeprom  = eeprom  = EEPROMModel()
    # ... wiring ...

Physical pin connections (iCEbreaker):

The SP1 EXI signals connect via the interposer PCB to iCEbreaker PMOD pins. The W5500 Pmod connects to the second PMOD connector. Exact pin mapping depends on the interposer PCB layout — define these in a platform resource file.

# Example resource definitions (add to iCEbreaker platform file):
Resource("exi", 0,
    Subsignal("clk",  Pins("1",  conn=("pmod", 0), dir="i")),
    Subsignal("mosi", Pins("2",  conn=("pmod", 0), dir="i")),
    Subsignal("miso", Pins("3",  conn=("pmod", 0), dir="o")),
    Subsignal("cs_n", Pins("4",  conn=("pmod", 0), dir="i")),
    Subsignal("int_n",Pins("7",  conn=("pmod", 0), dir="o")),
    Attrs(IO_STANDARD="SB_LVCMOS"),
),
Resource("w5500", 0,
    Subsignal("clk",  Pins("1",  conn=("pmod", 1), dir="o")),
    Subsignal("mosi", Pins("2",  conn=("pmod", 1), dir="o")),
    Subsignal("miso", Pins("3",  conn=("pmod", 1), dir="i")),
    Subsignal("cs_n", Pins("4",  conn=("pmod", 1), dir="o")),
    Subsignal("int_n",Pins("7",  conn=("pmod", 1), dir="i")),
    Subsignal("rst_n",Pins("8",  conn=("pmod", 1), dir="o")),
    Attrs(IO_STANDARD="SB_LVCMOS"),
),

8. Memory Map

The BBA register address space is 13 bits wide (0x00000x1FFF).

Address range Region Implemented in Notes
0x00000x0033 MAC control registers Register file (exi) NCRA, NCRB, IMR, IR, pointers
0x00340x0037 TWD — TX write data Register file (exi) TX frame length (2 bytes)
0x00380x0039 Reserved Ignore
0x003A HIPR — Host Interface Protocol Register file (exi) Read: 0x01 (BBA present)
0x003B NAFR — Network Address Filter Register file (exi)
0x003C NWBA — Network Write Buffer Addr Register file (exi)
0x003D0x0047 Reserved Ignore
0x0048 TXDATA — Bulk TX data port Register file → tx_bytes FIFO Write path to ethernet
0x00490x00FF Reserved Ignore
0x01000x0FFF RX ring buffer SPRAM (sync) Read path from ethernet

9. EXI Transaction Protocol

All BBA register accesses follow a strict two-phase (header + data) format.

Header encoding

Byte 0: [7]   write flag     1=write, 0=read
        [6:0] addr[12:6]     upper 7 bits of address

Byte 1: [7:2] addr[5:0]      lower 6 bits of address
        [1:0] xfer_len-1     0=1 byte, 1=2 bytes, 2=3 bytes, 3=4 bytes

CS is asserted (low) before byte 0 and remains low through the entire transaction including all data bytes. CS deasserts (high) after the last data byte.

Read transaction timing

CS  ─┐                                    ┌─
      └────────────────────────────────────┘
CLK   ┌┐┌┐┌┐┌┐┌┐┌┐┌┐┌┐  ┌┐┌┐┌┐┌┐┌┐┌┐┌┐┌┐  ┌┐┌┐...
      header byte 0      header byte 1      data byte 0...
MOSI  [addr+flags]        [addr+len]         [don't care]
MISO  [don't care]        [don't care]       [register data]

The register file must have data ready on MISO from the very first clock edge of the data phase. For register-file-backed reads (address < 0x100), the data is available immediately after header decode. For SPRAM-backed reads (address ≥ 0x100), the prefetch pipeline issues the SPRAM read request during the header phase so data is ready in time.

Write transaction timing

Identical header, then MOSI carries the write data. The FPGA samples MOSI on each falling CLK edge during the data phase and writes to the register.

ID query

On power-on the GC queries the device ID. The query is two 0x00 bytes written, then four bytes read. The BBA returns 0x04020200. Implement this as a special case: when address decodes to 0x0000 on a read with no prior NCRA reset, return the hardcoded ID.

Alternatively, read the Dolphin source for the exact byte sequence GC software uses to detect the BBA and replicate it faithfully.


10. BBA Register Reference

Key registers the GC driver accesses. Full register map in YAGCD §10.8.

Addr Name R/W Reset Description
0x00 NCRA R/W 0x00 Network Control A. [0]=RESET (self-clear), [2:1]=ST (TX start), [3]=SR (start receive), [6]=INTMODE (0=int active low)
0x01 NCRB R/W 0x00 Network Control B
0x04 LTPS R 0x00 Last TX packet status
0x05 LRPS R 0x00 Last RX packet status
0x08 IMR R/W 0x00 Interrupt mask. Bits match IR. Interrupt fires when IR & IMR != 0
0x09 IR R/W 0x00 Interrupt register. Write 1 to clear. [7]=RBFI, [4]=TEI, [2]=TI, [1]=RI
0x0A0x0B BP R/W Boundary page pointer
0x0C0x0D TLBP R/W TX low boundary page
0x0E0x0F TWP R/W 0x00 TX write page pointer
0x120x13 TRP R/W 0x00 TX read page pointer
0x160x17 RWP R updates RX write page pointer. Advances after each frame written
0x180x19 RRP R/W 0x01 RX read page pointer. GC writes to advance after consuming frames
0x1A0x1B RHBP R/W 0x0F RX high boundary page (last valid page). Default 0x0F
0x1C EEPROM R/W EEPROM bit-bang interface [3:0] = EECK, EECS, EEDI, EEDO
0x200x25 PAR05 R/W MAC MAC address bytes 05. GC writes after reading EEPROM
0x260x2D MAR07 R/W 0xFF Multicast hash table. 0xFF = accept all
0x2E ANALOG R/W PHY analog control. GC writes 0xD6 to enable PHY
0x30 NWAYC R/W Autoneg config. GC sets ANE + LTE bits
0x31 NWAYS R 0x17 Autoneg status. Hardcode 0x17 = 100M full duplex link up
0x32 GCA R/W GMAC config A. GC sets AUTOPUB bit
0x33 GCB R/W GMAC config B
0x340x37 TWD W TX write data (frame length, 2 bytes LE, then ignored)
0x3A HIPR R 0x01 Host interface protocol version. Return 0x01
0x3B NAFR R/W Network address filter
0x3C NWBA R/W Network write buffer address
0x48 TXDATA W Bulk TX data port. GC streams frame bytes here
0x100+ RX buf R RX ring buffer. GC reads frames from here

11. Initialisation Sequence

This is the exact sequence Swiss/GC software executes. The register file must respond correctly to each step.

1.  Assert CS, write 0x0000 (2 bytes), read 4 bytes
    → Must return: 0x04 0x02 0x02 0x00  (device ID)

2.  Write 0x01 to NCRA (0x00)       — software reset
    → RESET bit self-clears next cycle
    → Pulse ncra_rst to sync domain (resets W5500, clears SPRAM pointers)

3.  Poll NCRA bit 0 until clear      — wait for reset complete
    → Return 0x00 from NCRA reads after self-clear

4.  Write 6 bytes to PAR0PAR5 (0x200x25)
    → Latch MAC address; forward to sync domain MAC filter shadow

5.  Write 8 bytes to MAR0MAR7 (0x260x2D)
    → Typically all 0xFF (promiscuous mode)

6.  Write 0xD6 to ANALOG (0x2E)     — enable PHY
    → Store in register file; no hardware effect in FPGA

7.  Write NWAYC (0x30): set bits for ANE + LTE
    → Store; no hardware effect

8.  Write IMR (0x08): typically 0x86 (RBFI | TI | RI)
    → Enables interrupts; INT line will now assert when frames arrive

9.  Write GCA (0x32): set AUTOPUB bit
    → Store; AUTOPUB means RWP auto-updates — we always do this anyway

10. Write NCRA (0x00): set SR bit (0x08) — start receive
    → Enable RX path; the RXFrameAssembler should begin accepting frames

11. Poll NWAYS (0x31) until link up
    → Return hardcoded 0x17 immediately

12. RX Data Path — Detailed Flow

W5500 receives frame on wire
        │
        ▼
W5500SPIMaster detects S0_IR[RECV] (via INT_N pin)
Reads frame length from S0_RX_RSR (Socket 0 RX Received Size, 0x4026)
Reads frame bytes from Socket 0 RX buffer (BSB=0b00011)
Pulses rx_sof, streams rx_data bytes, pulses rx_eof
        │
        ▼ (sync domain)
RXFrameAssembler
  - Checks destination MAC vs PAR shadow
  - Checks NCRA SR bit is set (RX enabled)
  - Computes pages_needed
  - Checks ring buffer not full (RWP+pages != RRP)
  - Writes descriptor + frame data into SPRAM via SPRAMArbiter
  - Advances RWP (local register in sync domain)
  - Pushes new RWP value into rx_wptr AsyncFIFO (sync→exi)
  - Pulses rx_irq PulseSynchronizer (sync→exi)
        │
        ▼ AsyncFIFO / PulseSynchronizer crossing
        │ (exi domain)
BBARegisterFile
  - Pops new RWP from rx_wptr FIFO, updates RWP register
  - rx_irq pulse arrives: sets IR[1] (RI bit)
  - IR & IMR now non-zero: asserts exi_int_n (INT low to GC)
        │
        ▼ (GC CPU, driven by interrupt or polling)
GC reads IR register: sees RI=1
GC reads RWP (0x16): gets updated pointer
GC reads frame from 0x100+RRP (bulk read, up to 1500+ bytes)
  → BBARegisterFile issues SPRAM read requests via spram_req FIFO (exi→sync)
  → SPRAMArbiter services reads from SPRAM
  → Results flow back via spram_rsp FIFO (sync→exi)
  → Prefetch pipeline keeps data ready for SPI bit engine
GC writes new RRP (0x18) to advance past consumed pages
  → BBARegisterFile pushes RRP update into rx_rptr FIFO (exi→sync)
  → RXFrameAssembler updates its local RRP shadow
GC writes IR register with RI=1 (write-1-to-clear)
  → IR[1] clears, INT line deasserts

13. TX Data Path — Detailed Flow

GC CPU constructs ethernet frame in GC RAM
        │
        ▼ (GC CPU → EXI)
GC writes 2-byte length to TWD register (0x34)
GC writes frame bytes to TXDATA register (0x48) in chunks
  → BBARegisterFile: each written byte goes into tx_bytes AsyncFIFO (exi→sync)
GC writes NCRA with ST1:ST0 = 01 (transmit trigger)
  → BBARegisterFile pushes frame_length into tx_ctrl AsyncFIFO (exi→sync)
        │
        ▼ AsyncFIFO crossing
        │ (sync domain)
TXFrameDrain
  - Pops frame_length from tx_ctrl
  - Pops frame_length bytes from tx_bytes
  - Forwards to W5500SPIMaster with SOF/EOF
        │
        ▼ (sync domain)
W5500SPIMaster
  - Writes frame length to S0_TX_FSR (TX Free Size Register, 0x4020)
  - Writes frame bytes into Socket 0 TX buffer (BSB=0b00010)
  - Writes SEND command to S0_CR (0x4001 = 0x20)
  - Polls S0_IR until SEND_OK bit set
  - Clears S0_IR[SEND_OK]
  - Pulses tx_irq PulseSynchronizer (sync→exi)
        │
        ▼ PulseSynchronizer crossing
        │ (exi domain)
BBARegisterFile
  - tx_irq arrives: sets IR[2] (TI bit), clears NCRA ST1:ST0
  - If IMR[2] set: INT asserts to GC
        │
        ▼ (GC CPU)
GC reads IR, sees TI=1
GC writes IR with TI=1 to clear

14. SPRAM Layout

The iCE40UP5K has 4 × 32 KB SPRAM banks (128 KB total). Map them as:

SPRAM region Size Usage
0x00000x00FF 256 B Reserved (address 0x00 page not used by ring buffer)
0x01000x0FFF 3840 B RX ring buffer (15 × 256-byte pages, pages 0x010x0F)
0x10000x17FF 2048 B TX frame staging buffer
0x18000x1FFF 2048 B Reserved / future use

The ring buffer uses pages 0x010x0F (15 pages × 256 bytes = 3840 bytes). This matches the MX98730EC default RHBP (RX High Boundary Page) value of 0x0F and RRP reset value of 0x01.

SPRAM addressing: iCE40UP5K SB_SPRAM256KA instances are 64K × 16-bit (128 KB total across 4 instances). To address the ring buffer region as bytes:

  • Byte address 0x0100 maps to SPRAM word address 0x0080 (byte 0x0100 >> 1)
  • The arbiter converts byte addresses to word addresses and uses MASKWREN for byte selection

15. Critical Timing Constraints

Must-meet timing in exi domain (96 MHz → 10.4 ns period)

Path Budget Notes
FFSynchronizer output → edge detect flip-flop 1 cycle = 10.4 ns Trivially met — just a register
Edge detect → shift register update 1 cycle Register-to-register, no logic
rx_valid → header decode → spram_req FIFO write 2 cycles Address decode is combinatorial MUX; must close at 96 MHz
tx_loadtx_byte driven from register file 1 cycle regs[addr] array lookup — critical path; keep address decode combinatorial depth ≤ 4 LUTs
tx_loadtx_byte driven from prefetch buffer 1 cycle Just a register read — trivial

Must-meet timing in sync domain (48 MHz → 20.8 ns period)

Path Budget Notes
SPRAM read request → SPRAM address valid 1 cycle AsyncFIFO read + mux — easy
SPRAM DATAOUT → result FIFO write 1 cycle Register-to-FIFO — easy
W5500 SPI bit engine N/A Clock-enable based at 24 MHz effective; no hard timing

Cross-domain latency budget for SPRAM prefetch

EXI header phase duration: 16 exi clocks at 96 MHz = 167 ns

SPRAM prefetch round trip:
  exi → spram_req FIFO write:         1 exi  tick  = 10 ns
  FIFO cross-domain:                  2 sync ticks  = 42 ns
  SPRAM read (1 cycle latency):       1 sync tick   = 21 ns
  Result → spram_rsp FIFO write:      1 sync tick   = 21 ns
  FIFO cross-domain:                  2 exi  ticks  = 21 ns
  Result available in prefetch buffer:               = 21 ns
  Total:                                            ~136 ns

136 ns < 167 ns header window → prefetch completes before first data bit needed ✓

This is the tightest timing consideration in the design. The prefetch must be issued during HEADER1 (not after) to make the deadline.


16. SPRAM Read Prefetch Pipeline

The prefetch pipeline ensures MISO data is always ready before the SPI slave needs it for the data phase.

State machine (in BBARegisterFile, exi domain)

State HEADER1 (decoding second header byte):
  If is_read AND address >= 0x100:
    push address into spram_req AsyncFIFO  ← issued NOW, during header decode
    set prefetch_pending = True

State DATA (read phase):
  On each tx_load pulse:
    If prefetch_pending AND spram_rsp FIFO has data:
      pop byte from spram_rsp FIFO
      load into tx_byte
      push (address + byte_ctr + 1) into spram_req for NEXT byte  ← pipelining
    Elif address < 0x100:
      tx_byte = regs[address + byte_ctr]  ← direct register file read

Pipeline depth

The spram_req and spram_rsp FIFOs each have depth 4. This allows up to 4 read requests to be in-flight simultaneously, which absorbs any SPRAM arbiter stalls (ETH write winning the arbitration) without stalling the SPI data phase.

SPRAM arbiter stall handling

If the SPRAM arbiter defers an EXI read by 1 cycle (due to ETH write priority), the spram_rsp FIFO will be momentarily empty when tx_load arrives. The BBARegisterFile must stall the SPI slave in this case.

However: the SPI slave cannot be stalled mid-bit. The stall mechanism must work at byte boundaries only — i.e., after a complete byte has been transmitted, hold MISO at 0 (or 1) and do not toggle until the next byte is ready. Since the GC is the SPI master and controls CLK, it will simply clock in garbage on the retry byte.

Practical note: At 48 MHz sync with 24 MHz effective W5500 access rate, the ETH write path can only consume the SPRAM arbiter for ~1 sync cycle per byte written. The EXI read path gets the remaining cycles. With 4-deep FIFOs the pipeline should almost never stall in practice. Monitor the stall condition in simulation.


17. Interrupt Handling

The exi_int_n output (pin 3 of SP1) is active-low. Assert it (drive low) when IR & IMR != 0.

# In BBARegisterFile, exi domain:
ir_masked = Signal(8)
m.d.comb += ir_masked.eq(regs[BBARegs.IR] & regs[BBARegs.IMR])
m.d.exi += exi_int_n.eq(~ir_masked.any())

Register the output — do not drive exi_int_n combinatorially. A registered output prevents glitches from propagating onto the GC board.

Interrupt sources and IR bit assignments:

IR bit Name Set by Cleared by
7 RBFI RXFrameAssembler when ring full GC write-1-to-clear
4 TEI TXFrameDrain on TX error GC write-1-to-clear
2 TI tx_irq pulse from sync GC write-1-to-clear
1 RI rx_irq pulse from sync GC write-1-to-clear

The GC typically masks in IMR: 0x86 = 0b10000110 (RBFI | TI | RI).


18. EEPROM / MAC Address

The GC software reads the MAC address from the 93C46 EEPROM during initialisation (bit-banging through register 0x1C). It then writes the MAC to PAR0PAR5.

Recommended approach for initial implementation:

Skip full 93C46 emulation. Pre-populate regs[0x1C] with a pattern that makes the EEPROM read return a valid MAC. Use Nintendo's OUI 00:09:BF for the first 3 bytes, with locally administered bits for the last 3:

MAC: 00:09:BF:00:00:01

Verify against Swiss source whether it validates the MAC read from EEPROM or accepts whatever PAR0PAR5 contains. If it re-reads EEPROM after writing PAR, a full 93C46 model is required. If it only uses PAR0PAR5, pre-populating the register file is sufficient.

MAC address propagation:

When the GC writes PAR0PAR5, forward the new MAC to the W5500 SHAR register via the sync domain. Use a 6-byte AsyncFIFO or a dedicated MAC update pulse. The W5500 uses SHAR as its source MAC for all transmitted frames.


19. iCE40UP5K Resource Budget

Resource Available Estimated use Margin
Logic cells (4-LUT + FF) 5280 ~1800 66% free
EBR (4 Kbit blocks) 30 (120 Kbit) 4 (FIFOs) 26 free
SPRAM (32 KB banks) 4 (128 KB) 1 bank for ring buffer 3 free
PLL 1 1 (for exi domain) 0 free
SB_HFOSC 1 1 (sync domain) 0 free
I/O pins 39 usable ~14 (EXI:5 + W5500:6 + misc:3) 25 free

Logic cell breakdown:

Module Estimated cells
SPIMode3Slave 90
BBARegisterFile FSM + decode 250
Register file (512 × 8b) ~200 (distributed RAM)
AsyncFIFO × 8 400
PulseSynchronizer × 4 40
FFSynchronizer × 5 30
SPRAMArbiter 80
RXFrameAssembler 200
TXFrameDrain 150
W5500SPIMaster 200
EEPROMModel 100
Misc glue 60
Total ~1800

iCE40UP5K fmax with nextpnr: typically 6080 MHz for logic of this complexity. The exi domain at 96 MHz is the tightest. If nextpnr fails to close timing:

  1. First option: reduce to 64 MHz exi domain (icepll alternative).
  2. Second option: reduce EXI bus speed in Swiss settings to 16 MHz (clock index 4 instead of 5), halving the FPGA timing requirement.
  3. Third option: add pipeline registers on the critical address decode path.

20. PCB / Connector Notes

Interposer PCB

A simple pass-through interposer PCB connects the GC SP1 slot to the iCEbreaker via a ribbon cable or header.

Required PCB spec:

  • Thickness: 1.2 mm (not standard 1.6 mm — critical for fit)
  • Copper finish: ENIG (gold) — prevents oxidation on edge contacts
  • Board material: FR4 standard

Footprint source: Copy the edge connector footprint from github.com/silverstee1/SP1ETH KiCad files. Do not design from scratch. The staggered dual-row geometry requires exact pad positions that have been physically verified. Cross-reference with the ETH2SP1 LaserBear open files.

Additional interposer components:

  • 10 kΩ resistor: EXTIN (pin 1) to 3.3V (pin 7) — device detect
  • 100 µF capacitor: 3.3V to GND — bulk decoupling near connector
  • 100 nF capacitor × 2: additional HF decoupling
  • ESD protection diode array: on CLK, MOSI, MISO, CS lines (optional but recommended — the GC motherboard is difficult to repair if damaged)

Do not connect pin 5 (12V) to anything on the FPGA side.

iCEbreaker connection

The interposer PCB exposes EXI signals on a 2.54 mm pitch 8-pin header. Connect to iCEbreaker PMOD1 connector using a short ribbon cable. Keep the cable as short as possible (< 10 cm) to minimize signal integrity issues at 32 MHz.


21. Known Hardware Quirks

EXI DMA bug

The GC's EXI DMA engine has a bug where data on the MISO line during a DMA write is clocked back out with a 1-bit shift. This only affects GC software doing DMA writes (rare). Swiss uses IMM (immediate) mode transfers. No FPGA workaround needed.

SPI Mode 3 vs Mode 0

Every other EXI device (memory cards, RTC, IPL) uses SPI Mode 0. The BBA is the only device using Mode 3. Do not share the SPI slave implementation with other EXI device implementations without parameterising CPOL/CPHA.

MISO tristate

On real hardware, MISO (DO) is tristated when CS is deasserted. Other EXI devices on the same bus would otherwise conflict. On this FPGA implementation, drive MISO high (not tristated) when CS is deasserted. The iCE40UP5K does not easily support pin tristate from user logic — drive high is safe because the BBA occupies a dedicated CS line (SP1 device 2) separate from memory cards and the RTC.

GC hardware revisions

  • DOL-001 (original): SP1 present, BBA compatible
  • DOL-001 Rev B: SP1 physically absent on motherboard but case hole present
  • DOL-101 (later): SP1 present again (but Serial Port 2 absent)
  • Panasonic Q: SP1 present

Swiss supports all revisions with SP1 via the EXI hypervisor driver (required from Swiss build 1788 onwards for BBA emulation features).

EXI clock index

The real BBA uses clock index 5 (32 MHz). Swiss allows configuring a lower clock index for compatibility. If 96 MHz fmax is not achievable, instruct users to configure Swiss to use clock index 4 (16 MHz EXI), which requires only 32 MHz exi domain and is trivially achievable.


22. File Structure

gc_bba_fpga/
├── exi_bba/
│   ├── __init__.py
│   ├── spi_mode3_slave.py       # SPIMode3Slave
│   ├── bba_register_file.py     # BBARegisterFile + register constants
│   ├── spram_arbiter.py         # SPRAMArbiter
│   ├── rx_frame_assembler.py    # RXFrameAssembler
│   ├── tx_frame_drain.py        # TXFrameDrain
│   ├── w5500_spi_master.py      # W5500SPIMaster
│   ├── eeprom_model.py          # EEPROMModel (93C46)
│   └── bba_top.py               # BBATop + clock domain setup
├── sim/
│   ├── sim_spi_slave.py         # SPIMode3Slave unit test
│   ├── sim_register_file.py     # BBARegisterFile unit test
│   ├── sim_bba_init.py          # Full init sequence simulation
│   ├── sim_rx_path.py           # RX data path end-to-end test
│   ├── sim_tx_path.py           # TX data path end-to-end test
│   ├── gc_master_model.py       # GC CPU SPI master simulation model
│   ├── w5500_slave_model.py     # W5500 SPI slave simulation model
│   └── ethernet_frame_gen.py    # Test frame generator
├── platform/
│   ├── icebreaker_bba.py        # iCEbreaker platform with BBA resources
│   └── interposer_pinmap.py     # SP1 ↔ PMOD pin mapping
├── pcb/
│   ├── interposer/              # KiCad project for interposer PCB
│   └── README.md                # PCB ordering instructions (1.2mm, ENIG)
├── constraints/
│   └── timing.py                # nextpnr timing constraints (if needed)
├── tests/
│   └── test_bba.py              # pytest suite
├── build.py                     # Amaranth build script
└── README.md

23. Simulation Strategy

Each module should have a standalone simulation before integration. All simulations use Amaranth's Simulator with two clock domains: sim.add_clock(1/96e6, domain="exi") and sim.add_clock(1/48e6, domain="sync").

Unit tests

SPIMode3Slave: Drive CLK/MOSI/CS manually from a process in the exi domain. Verify rx_byte/rx_valid match sent data. Verify spi_miso matches pre-loaded tx_byte. Test CS abort mid-byte.

BBARegisterFile: Use a GCMasterModel (SPI Mode 3 master process) to perform read/write transactions. Verify register writes are stored. Verify register reads return correct values. Verify IR bit setting and clearing. Verify NWAYS returns 0x17. Verify ID query returns 0x04020200.

SPRAMArbiter: Issue concurrent EXI reads and ETH writes. Verify ETH writes win arbitration. Verify EXI reads complete within 3 sync cycles. Verify no data corruption.

RXFrameAssembler: Feed a known ethernet frame byte-by-byte. Verify SPRAM contents match expected descriptor + frame layout. Verify RWP advances by correct page count. Verify rx_irq fires.

TXFrameDrain + W5500SPIMaster: Issue TX frame from tx_bytes FIFO. Use W5500SlaveModel process to simulate W5500 responses. Verify frame bytes arrive at W5500 correctly. Verify tx_irq fires after SEND_OK.

Integration test

sim_bba_init.py: Full GC init sequence (all 11 steps from Section 11). GCMasterModel performs every transaction. Verify no stalls, correct responses.

sim_rx_path.py: W5500SlaveModel delivers a 64-byte test frame. GCMasterModel polls IR, reads RWP, bulk-reads the frame, advances RRP. Verify GC receives identical bytes to what W5500 sent.

sim_tx_path.py: GCMasterModel writes a 64-byte frame through TXDATA. W5500SlaveModel captures it. Verify W5500 receives identical bytes.


24. Open Issues and Extension Points

Must resolve before first synthesis

  • Exact PLL parameters for iCE40UP5K: run icepll -i 12 -o 96 and confirm the output is achievable (VCO in 5331066 MHz range).
  • SP1 connector footprint: clone SP1ETH repo, extract pad positions, verify stagger geometry and pitch before PCB layout.
  • W5500 Pmod module pin mapping: confirm which Pmod pins INT_N and RST_N appear on (varies by module vendor).
  • Swiss version requirement: confirm Swiss build ≥ 1788 for BBA hypervisor support. Earlier builds use a different driver that may have different register access patterns.

Known limitations

  • Single TX buffer (MX98730EC has two). ST1:ST0 = 01 and 10 are treated identically. No known GC title relies on dual TX buffering.
  • No DMA mode support. IMM mode only. Matches real-world Swiss usage.
  • No Serial Port 2 support (different connector, different project scope).
  • 93C46 EEPROM emulation is simplified (hardcoded MAC). A full bit-bang model can be added later if Swiss requires it.
  • RX ring buffer is 15 pages (3840 bytes). The real BBA has 4KB. Frames larger than ~3800 bytes (jumbo frames) will be dropped. Standard 1500-byte MTU frames fit in at most 7 pages — no practical issue.

Extension points

  • Larger ring buffer: Use additional SPRAM banks for more RX buffering.
  • Multiple sockets: W5500 supports 8 sockets; only socket 0 in MACRAW mode is used here.
  • Link status passthrough: Read W5500 PHYCFGR register and forward real link status to NWAYS instead of hardcoding 0x17.
  • Statistics counters: LTPS/LRPS (last packet status) are currently 0x00. A more complete implementation would fill these from W5500 socket status.
  • Serial Port 2 support: Different physical connector and EXI channel but same FPGA logic; would require a second interposer PCB.