# GameCube BBA FPGA Replacement — Design Document **Target hardware:** iCEbreaker (Lattice iCE40UP5K) **Target language:** Amaranth HDL (Python) **Toolchain:** Yosys + nextpnr-ice40 + IceStorm **Purpose:** Replace the Nintendo GameCube Broadband Adapter (DOL-015) with an FPGA-based implementation, exposing a W5500 100BASE-TX ethernet chip to the GC over the EXI (Expansion Interface) serial bus, enabling game ISO streaming via Swiss homebrew. --- ## Table of Contents 1. [System Overview](#1-system-overview) 2. [Protocol References](#2-protocol-references) 3. [Physical Interface — SP1 Edge Connector](#3-physical-interface--sp1-edge-connector) 4. [Clock Domains](#4-clock-domains) 5. [Clock Domain Crossing Strategy](#5-clock-domain-crossing-strategy) 6. [Module Hierarchy](#6-module-hierarchy) 7. [Module Specifications](#7-module-specifications) - 7.1 [SPIMode3Slave](#71-spimode3slave) - 7.2 [BBARegisterFile](#72-bbaregisterfile) - 7.3 [SPRAMArbiter](#73-spramarbiter) - 7.4 [RXFrameAssembler](#74-rxframeassembler) - 7.5 [TXFrameDrain](#75-txframedrain) - 7.6 [W5500SPIMaster](#76-w5500spimaster) - 7.7 [EEPROMModel](#77-eeprommodel) - 7.8 [BBATop](#78-bbatop) 8. [Memory Map](#8-memory-map) 9. [EXI Transaction Protocol](#9-exi-transaction-protocol) 10. [BBA Register Reference](#10-bba-register-reference) 11. [Initialisation Sequence](#11-initialisation-sequence) 12. [RX Data Path — Detailed Flow](#12-rx-data-path--detailed-flow) 13. [TX Data Path — Detailed Flow](#13-tx-data-path--detailed-flow) 14. [SPRAM Layout](#14-spram-layout) 15. [Critical Timing Constraints](#15-critical-timing-constraints) 16. [SPRAM Read Prefetch Pipeline](#16-spram-read-prefetch-pipeline) 17. [Interrupt Handling](#17-interrupt-handling) 18. [EEPROM / MAC Address](#18-eeprom--mac-address) 19. [iCE40UP5K Resource Budget](#19-ice40up5k-resource-budget) 20. [PCB / Connector Notes](#20-pcb--connector-notes) 21. [Known Hardware Quirks](#21-known-hardware-quirks) 22. [File Structure](#22-file-structure) 23. [Simulation Strategy](#23-simulation-strategy) 24. [Open Issues and Extension Points](#24-open-issues-and-extension-points) --- ## 1. System Overview The GameCube Broadband Adapter (BBA) is a hardware peripheral that plugs into Serial Port 1 (SP1) on the underside of the GameCube. It presents a network interface to the GC CPU using a Macronix MX98730EC custom IC. GC software (primarily Swiss homebrew) communicates with the BBA through a memory-mapped register interface accessed over the EXI serial bus. This project replaces the MX98730EC with an iCEbreaker FPGA that emulates the register interface, and connects to a W5500 ethernet chip (on a Pmod-compatible module) for actual network communication. ### High-level data flow ``` GameCube CPU │ EXI (SPI Mode 3, 32 MHz, Serial Port 1) ▼ iCEbreaker FPGA ├── exi domain (64 MHz): SPI slave, register file, prefetch pipeline └── sync domain (48 MHz): SPRAM arbiter, RX assembler, TX drain, W5500 driver │ SPI (up to 40 MHz) ▼ W5500 Pmod module (100BASE-TX ethernet) │ RJ-45 ▼ Network ``` ### What this design does NOT implement - A network stack. The GC CPU runs TCP/IP. The FPGA is a dumb MAC bridge. - IP address awareness. The FPGA never parses ethernet frame payloads. - The GC's DMA engine quirk (only relevant to GC-side software). - Video/audio streaming logic (handled by Swiss on the GC CPU side). --- ## 2. Protocol References | Source | Content | |---|---| | YAGCD §2.4.1.4 | SP1 (P6) connector pinout | | YAGCD §5.9 | EXI bus register descriptions | | YAGCD §10.8 | MX98730EC (BBA chip) register map | | Dolphin source `EXI_DeviceEthernet.h` | Register offsets, init sequence, RX/TX flow | | Dolphin source `EXI_DeviceEthernet.cpp` | Transaction encoding, interrupt logic | | Swiss source `bba.c` | GC-side driver, exact register access patterns | | MX98730EC datasheet | Unavailable publicly; YAGCD is the primary reference | | W5500 datasheet | SPI interface, register map, socket model | | iCE40UP5K datasheet | SPRAM timing, PLL parameters, I/O standards | **Critical implementation note:** The MX98730EC uses **SPI Mode 3** (CPOL=1, CPHA=1). CLK idles HIGH. Data is sampled on the FALLING edge of CLK and set up on the RISING edge. This is the opposite of memory cards and the RTC chip, which use SPI Mode 0. Getting this wrong means the GC will never enumerate the device. --- ## 3. Physical Interface — SP1 Edge Connector ### Slot characteristics - Dual-sided PCB edge connector - Contacts on both top and bottom faces of the PCB edge - Top and bottom contact rows are **staggered** (offset by half a pitch), not mirrored — similar to ISA/PCI card edge geometry - PCB must be ordered at **1.2 mm thickness** with **ENIG (gold) finish** - Keying notch at top-right corner of housing (when looking into console socket with front of console facing right) ### Connector footprint Exact pad positions and pitch must be taken from the SP1ETH KiCad project (github.com/silverstee1/SP1ETH). Do not attempt to derive dimensions from YAGCD alone — the document lists signals but not physical geometry. Cross-reference against the ETH2SP1 (LaserBear) open model files as a second source. Key parameters to verify from those files before PCB layout: - Contact pitch (expected: 2.0 mm or 2.54 mm — measure from KiCad file) - Stagger offset between top and bottom rows - Total contact count per side (expected: 6 per side = 12 total, or 12 per side = 24 total with duplicated power/ground) - Insertion depth from board edge to first contact - Board width at connector edge ### Signal pinout (YAGCD §2.4.1.4) Pin numbering: looking into the console socket, front of console to the right, pin 1 is on the left. On the adapter PCB (component side up, inserting down), pin 1 is also on the left — numbering does not mirror. | Pin | Signal | Direction | Notes | |---|---|---|---| | 1 | EXTIN | Adapter → GC | Device detect/sense. Tie to 3.3V via 10 kΩ resistor. Without this the GC does not enumerate the device. | | 2 | GND | — | Shield ground | | 3 | INT | Adapter → GC | Active-low interrupt to GC CPU. Assert when IR & IMR != 0. | | 4 | CLK | GC → Adapter | SPI clock, up to 32 MHz, idles HIGH (Mode 3) | | 5 | 12V | — | 12 V supply from GC. **Do not connect to FPGA I/O.** Leave unconnected or route to a test point only. | | 6 | DO (MISO) | Adapter → GC | Serial data out: adapter drives, GC samples | | 7 | 3.3V | — | 3.3 V supply (~200 mA available combined with pin 8) | | 8 | 3.3V | — | 3.3 V supply (parallel with pin 7) | | 9 | DI (MOSI) | GC → Adapter | Serial data in: GC drives, adapter samples | | 10 | CS | GC → Adapter | Chip select, active low. Delineates each transaction. | | 11 | GND | — | Signal ground | | 12 | GND | — | Signal ground | **Power budget:** Pins 7+8 together supply 3.3 V. The iCEbreaker draws ~80 mA active, the W5500 ~150 mA peak. Total ~230 mA. The GC's 3.3 V rail on SP1 is rated for the original BBA which also drew ~200 mA, so headroom is tight. Add a 100 µF bulk capacitor on the interposer PCB close to the FPGA power pins. **Voltage levels:** All EXI signals are 3.3 V logic. The iCEbreaker I/O is 3.3 V. The W5500 is 3.3 V. No level shifting required anywhere in this design. --- ## 4. Clock Domains The design uses two clock domains. The iCE40UP5K has one PLL and one internal 48 MHz oscillator (SB_HFOSC). ### Domain table | Domain | Frequency | Source | Purpose | |---|---|---|---| | `exi` | 64 MHz | PLL (12 MHz × 16 / 3) | SPI Mode 3 slave, BBA register file, prefetch pipeline | | `sync` | 48 MHz | SB_HFOSC internal oscillator | SPRAM arbiter, RX/TX ethernet engines, W5500 SPI master | ### Rationale **Why 64 MHz for `exi`?** The EXI bus runs at 32 MHz. The SPI Mode 3 slave needs to detect CLK edges and respond on the correct edge. Running the `exi` domain at 2× the bus rate (64 MHz) gives two FPGA ticks per EXI CLK half-period. One tick for the setup phase (MOSI→shift register, prepare MISO), one tick for the sample/drive phase. This is the minimum oversampling ratio that cleanly implements Mode 3 without combinatorial timing risk on the MISO output path. **Why 48 MHz for `sync`?** The iCE40UP5K's internal 48 MHz oscillator (SB_HFOSC) is available without consuming the PLL. This leaves the one PLL free for the 64 MHz `exi` domain. The W5500 SPI can run up to 80 MHz but we drive it at 24 MHz (48 MHz ÷ 2 via clock enable), which is well within spec and requires no additional PLL output. ### PLL configuration (iCE40UP5K) ``` Input: 12 MHz crystal (iCEbreaker on-board) DIVR: 0 (input divider: 12 MHz / (0+1) = 12 MHz) DIVF: 63 (feedback mult: 12 MHz × (63+1) = 768 MHz VCO) DIVQ: 3 (output divider: 768 MHz / 2^3 = 96 MHz) ... actually for 64 MHz: DIVR: 0 DIVF: 15 (12 × 16 = 192 MHz VCO) -- VCO must be 533–1066 MHz on UP5K ``` The iCE40UP5K VCO range is 533–1066 MHz. To reach 64 MHz cleanly: ``` DIVR = 0 → F_pfd = 12 MHz DIVF = 63 → F_vco = 12 × (63+1) = 768 MHz (within range) DIVQ = 3 → F_out = 768 / 8 = 96 MHz (too fast) Better: target 64 MHz DIVF = 53 → F_vco = 12 × 54 = 648 MHz DIVQ = 3 → F_out = 648 / 8 = 81 MHz (still off) Correct combination: DIVR = 0, DIVF = 42, DIVQ = 3 F_vco = 12 × 43 = 516 MHz (just below range minimum — not valid) Use: DIVR = 0, DIVF = 63, DIVQ = 3 → 96 MHz, then use clock enable for /1.5 -- or -- Accept 96 MHz exi domain (3× bus rate instead of 2×): more margin, same logic -- or -- DIVR = 2, DIVF = 63, DIVQ = 2 → (12/3) × 64 / 4 = 64 MHz exactly F_pfd = 4 MHz, F_vco = 4×64 = 256 MHz — below 533 MHz minimum, invalid Recommended: use 96 MHz (DIVR=0, DIVF=63, DIVQ=3) for exi domain. At 96 MHz there are 3 ticks per 32 MHz EXI half-period. Adjust SPIMode3Slave edge detection accordingly (3-tick phases instead of 2). ``` **Implementation note:** Verify exact PLL parameters with `icepll` tool: ```bash icepll -i 12 -o 64 # finds closest achievable output icepll -i 12 -o 96 # alternative ``` The agent implementing this should run `icepll` and use whatever output it recommends, then adjust the `SPIMode3Slave` tick counts accordingly. ### Reset strategy Each domain has its own reset, deasserted synchronously using `ResetSynchronizer` from `amaranth.lib.cdc`: ```python # In platform create_missing_domain("exi"): m.submodules.exi_rst = ResetSynchronizer( arst = ResetSignal("sync"), domain = "exi", ) ``` The `sync` domain reset comes from the iCEbreaker's on-chip power-on reset (SB_GB driven by SB_HFOSC, which has built-in POR). --- ## 5. Clock Domain Crossing Strategy All signals crossing between `exi` and `sync` domains must use one of the following CDC primitives from `amaranth.lib.cdc`. Never pass a raw multi-bit signal directly between domains — only one bit may change per clock crossing. ### CDC primitive selection guide | Signal type | Primitive | Latency | |---|---|---| | Single bit, slow-changing (flags, status) | `FFSynchronizer` | 2 dest clocks | | Single-cycle pulse / event | `PulseSynchronizer` | ~3–4 dest clocks | | Multi-bit data stream (packet bytes) | `AsyncFIFO` | ~3–4 dest clocks | | Reset deassertion | `ResetSynchronizer` | 2 dest clocks | | Async external pin (CLK, MOSI, CS) | `FFSynchronizer` | 2 dest clocks | ### CDC inventory for this design | Signal | From | To | Primitive | Notes | |---|---|---|---|---| | EXI CLK pin | async | exi | FFSynchronizer | stages=2, reset=1 (CLK idles high) | | EXI MOSI pin | async | exi | FFSynchronizer | stages=2 | | EXI CS pin | async | exi | FFSynchronizer | stages=2, reset=1 (CS idles high) | | SPRAM read request (addr) | exi | sync | AsyncFIFO 16-bit wide, depth=4 | Prefetch pipeline | | SPRAM read result (data) | sync | exi | AsyncFIFO 8-bit wide, depth=4 | Prefetch pipeline | | TX packet bytes | exi | sync | AsyncFIFO 8-bit wide, depth=64 | GC→ethernet | | TX packet start/len | exi | sync | AsyncFIFO 16-bit wide, depth=4 | Frame delimiter | | RX packet bytes | sync | exi | AsyncFIFO 8-bit wide, depth=64 | ethernet→GC | | RWP update (new value) | sync | exi | AsyncFIFO 8-bit wide, depth=4 | After frame committed | | RRP update (new value) | exi | sync | AsyncFIFO 8-bit wide, depth=4 | After GC advances pointer | | IR[RI] set (RX ready) | sync | exi | PulseSynchronizer | Triggers RI interrupt | | IR[TI] set (TX done) | sync | exi | PulseSynchronizer | Triggers TI interrupt | | NCRA reset pulse | exi | sync | PulseSynchronizer | Resets ethernet engine | | exi_int_n output | exi | physical pin | Direct (output register) | Active-low to GC | **Critical rule:** The register file lives entirely in the `exi` domain. The `sync` domain never directly reads or writes EXI registers. All interaction between the two domains goes through the AsyncFIFOs and PulseSynchronizers listed above. This ensures the GC's register reads always respond within the `exi` domain without waiting on CDC latency. --- ## 6. Module Hierarchy ``` BBATop (top-level, sets up clock domains) ├── SPIMode3Slave (exi domain — bit engine) ├── BBARegisterFile (exi domain — register decode + response) │ ├── [AsyncFIFO: spram_req] (exi→sync: read address requests) │ ├── [AsyncFIFO: spram_rsp] (sync→exi: read data responses) │ ├── [AsyncFIFO: tx_bytes] (exi→sync: TX packet data) │ ├── [AsyncFIFO: tx_ctrl] (exi→sync: TX frame length) │ ├── [AsyncFIFO: rx_wptr] (sync→exi: RWP updates) │ ├── [AsyncFIFO: rx_rptr] (exi→sync: RRP updates from GC) │ ├── [PulseSynchronizer: rx_irq] (sync→exi) │ ├── [PulseSynchronizer: tx_irq] (sync→exi) │ └── [PulseSynchronizer: ncra_rst] (exi→sync) ├── SPRAMArbiter (sync domain — owns all SPRAM) ├── RXFrameAssembler (sync domain — ethernet→SPRAM) ├── TXFrameDrain (sync domain — SPRAM→ethernet) ├── W5500SPIMaster (sync domain — SPI master to W5500) └── EEPROMModel (exi domain — 93C46 bit-bang model) ``` --- ## 7. Module Specifications ### 7.1 SPIMode3Slave **Domain:** `exi` **File:** `exi_bba/spi_mode3_slave.py` Implements a byte-oriented SPI Mode 3 slave. Handles CLK/MOSI/MISO/CS at the bit level and presents a clean byte interface to `BBARegisterFile`. **SPI Mode 3 timing recap:** - CLK idles HIGH - MOSI is set up by master before the FALLING edge - Slave samples MOSI on the FALLING edge of CLK - Slave drives MISO on the RISING edge of CLK (ready for master to sample on next falling edge) **Port list:** | Port | Width | Dir | Domain | Description | |---|---|---|---|---| | `spi_clk` | 1 | in | async→exi | Raw SPI clock from GC, synchronized internally | | `spi_mosi` | 1 | in | async→exi | Raw MOSI from GC, synchronized internally | | `spi_miso` | 1 | out | exi | MISO output to GC | | `spi_cs_n` | 1 | in | async→exi | Raw CS from GC (active low), synchronized internally | | `rx_byte` | 8 | out | exi | Last complete received byte | | `rx_valid` | 1 | out | exi | Pulses 1 cycle when `rx_byte` contains a new byte | | `tx_byte` | 8 | in | exi | Byte to transmit; sampled when `tx_load` pulses | | `tx_load` | 1 | out | exi | Requests next TX byte from upstream | **Internal behaviour:** 1. Instantiate FFSynchronizer stages=2 on each of `spi_clk`, `spi_mosi`, `spi_cs_n`. Reset values: `spi_clk`=1, `spi_cs_n`=1. 2. Register the synchronized signals one further cycle to form edge detectors: `rising_clk = clk_s & ~clk_prev`, `falling_clk = ~clk_s & clk_prev`. 3. On CS falling edge: load `tx_byte` into internal shift register, pulse `tx_load`, reset `bit_ctr` to 0. 4. On FALLING CLK edge (sample): shift `mosi_s` into `rx_shift` MSB-first, increment `bit_ctr`. When `bit_ctr == 8`: register `rx_shift` into `rx_byte`, pulse `rx_valid`, reset `bit_ctr` to 0, pulse `tx_load` to request next byte. 5. On RISING CLK edge (drive): shift `tx_shift` left by 1, drive MSB onto `spi_miso`. 6. On CS rising edge: drive `spi_miso` high (idle), reset state. **Note on `tx_load` timing:** `tx_load` pulses at two points — CS assertion (loads first byte before any bits are clocked) and after each complete received byte (loads the next byte). The upstream (`BBARegisterFile`) must register the next TX byte within one `exi` clock of `tx_load` pulsing. --- ### 7.2 BBARegisterFile **Domain:** `exi` (with AsyncFIFO interfaces to `sync`) **File:** `exi_bba/bba_register_file.py` Decodes EXI transactions (2-byte header + N data bytes), reads/writes the BBA register space, and manages all CDC crossings to the `sync` domain. #### EXI transaction decoder FSM States: `HEADER0` → `HEADER1` → `DATA` → (back to `HEADER0`) **Header format:** ``` Byte 0: [7] = write flag (1 = write, 0 = read) [6:0] = addr[12:6] (upper 7 bits of 13-bit address) Byte 1: [7:2] = addr[5:0] (lower 6 bits of 13-bit address) [1:0] = xfer_len (0=1 byte, 1=2 bytes, 2=3 bytes, 3=4 bytes) ``` Full address = `{ byte0[6:0], byte1[7:2] }` = 13 bits → range 0x0000–0x1FFF. **`HEADER0` state:** Wait for `rx_valid`. Latch `rx_byte` as `hdr0`. **`HEADER1` state:** Wait for `rx_valid`. Decode address and flags. For read transactions, immediately issue SPRAM prefetch request if address ≥ 0x100 (ring buffer region). Load `tx_byte` with the register value for addresses < 0x100 (register file region). Transition to `DATA`. **`DATA` state (write path):** For each `rx_valid`, write `rx_byte` to `regs[addr + byte_ctr]` and handle side effects (see register side effects table). Increment `byte_ctr`. When `byte_ctr == xfer_len`, go to `HEADER0`. **`DATA` state (read path):** Drive `tx_byte` from prefetch result (addresses ≥ 0x100) or directly from `regs[]` (addresses < 0x100). On each `tx_load`, advance the read pointer and issue next prefetch. When `byte_ctr == xfer_len`, go to `HEADER0`. **CS deassertion abort:** In any state, if `cs_n` rises, return to `HEADER0`. #### Register file storage Registers 0x00–0x1FF are implemented as an `Array` of 8-bit `Signal`s (512 registers). In synthesis this maps to distributed RAM on iCE40. Not SPRAM — SPRAM is reserved for the packet ring buffer. The register file is entirely in the `exi` domain. No CDC is needed to read or write registers 0x00–0xFF. #### Register side effects | Register | Write side effect | |---|---| | NCRA (0x00) | If bit 0 (RESET) written: pulse `ncra_rst` PulseSynchronizer to `sync` domain. Self-clear bit 0 on next cycle. Reset TX/RX pointers in register file. | | IR (0x09) | Write-1-to-clear: `IR <= IR & ~written_value` | | RRP (0x18–0x19) | After GC writes new RRP value, push value into `rx_rptr` AsyncFIFO (exi→sync) so RX engine knows GC has consumed those pages | | TWD (0x34–0x37) | Bytes written here are the TX frame length field (2 bytes little-endian). Latch for TX engine. | | TXDATA (0x48) | Each byte written goes into `tx_bytes` AsyncFIFO (exi→sync). When `byte_ctr == xfer_len` on last write chunk, push frame length into `tx_ctrl` AsyncFIFO. | #### Interrupt register update (from sync domain) - `rx_irq` PulseSynchronizer arriving from sync: set `IR[1]` (RI bit) - `tx_irq` PulseSynchronizer arriving from sync: set `IR[2]` (TI bit), clear `NCRA[3:2]` (ST1:ST0 — transmit start bits) #### Interrupt output ``` exi_int_n <= ~|(IR & IMR) # active-low: assert when any unmasked bit set ``` Register this one flip-flop in the `exi` domain. The physical pin is a direct output — no CDC needed because the GC only reads the interrupt state via polling IR over EXI (which is already in the `exi` domain) or via the interrupt line which the GC CPU samples asynchronously. #### NWAYS register Always return `0x17` (link up, 100 Mbps, full duplex, autoneg complete). The GC's BBA driver polls NWAYS after reset to confirm link status before enabling RX. Hardcode this value — do not attempt to forward real link status from the W5500. ```python # NWAYS = 0x17: # bit 4 (LS100) = 1: 100BASE-TX link up # bit 2 (ANCLPT) = 1: autoneg complete # bit 1 (100TXH) = 1: 100BASE-TX half (also set in practice) # bit 0 (LS10) = 1: 10BASE-T (also reported) ``` --- ### 7.3 SPRAMArbiter **Domain:** `sync` **File:** `exi_bba/spram_arbiter.py` Arbitrates access to the iCE40UP5K's 128 KB SPRAM between two clients: - **Client A (EXI read):** Issues read requests from the prefetch pipeline (`spram_req` AsyncFIFO). Must service requests fast enough to keep the prefetch pipeline full. - **Client B (ETH write):** The `RXFrameAssembler` writes incoming ethernet frames into the ring buffer area. **Priority:** ETH write wins over EXI read when both request simultaneously. This is safe because: 1. The GC only reads a ring buffer page after RWP has advanced past it (i.e., the ETH engine has finished writing that page). 2. Even if an EXI read is delayed by one SPRAM cycle, the prefetch pipeline has enough depth (4 entries) to absorb the stall without the SPI slave running out of data. **SPRAM interface (iCE40UP5K SB_SPRAM256KA):** ``` WREN : write enable CHIPSELECT : always 1 CLOCK : sync domain clock (48 MHz) STANDBY : 0 SLEEP : 0 POWEROFF_N : 1 ADDRESS[13:0] : byte address divided by 2 (SPRAM is 16-bit wide) DATAIN[15:0] : write data (use only [7:0] for byte writes, mask upper byte) MASKWREN[3:0] : byte enable (0b0011 for lower byte, 0b1100 for upper byte) DATAOUT[15:0] : read data ``` The SPRAM is 16-bit wide. Byte addressing is done via `MASKWREN`. For an 8-bit write to address `A`: set `ADDRESS = A >> 1`, `MASKWREN = (A & 1) ? 0b1100 : 0b0011`, write data in the appropriate byte of `DATAIN`. **Read latency:** SPRAM has 1-cycle synchronous read latency. The result of a read issued at cycle N is valid at cycle N+1. The arbiter must account for this when responding to the prefetch pipeline. **Port list:** | Port | Width | Dir | Notes | |---|---|---|---| | `exi_req_addr` | 16 | in | From spram_req AsyncFIFO (exi→sync) | | `exi_req_valid` | 1 | in | FIFO r_rdy | | `exi_req_ready` | 1 | out | FIFO r_en (pop when serviced) | | `exi_rsp_data` | 8 | out | To spram_rsp AsyncFIFO (sync→exi) | | `exi_rsp_valid` | 1 | out | FIFO w_en | | `eth_wr_addr` | 16 | in | From RXFrameAssembler | | `eth_wr_data` | 8 | in | Byte to write | | `eth_wr_valid` | 1 | in | Write request | | `eth_wr_ready` | 1 | out | Write accepted this cycle | --- ### 7.4 RXFrameAssembler **Domain:** `sync` **File:** `exi_bba/rx_frame_assembler.py` Receives complete ethernet frames from `W5500SPIMaster` and writes them into the SPRAM ring buffer in the correct MX98730EC format. **Ring buffer layout (in SPRAM):** ``` SPRAM address 0x0100–0x0FFF (3840 bytes = 15 × 256-byte pages) Page 0x01: first usable RX page Page 0x0F: last usable RX page (RHBP default) Pages wrap: after 0x0F, next is 0x01 (not 0x00, which is reserved) ``` Each page is 256 bytes. A received frame may span multiple pages. **Frame descriptor (first 4 bytes of first page):** ``` Byte 0: LRPS value (Last Received Packet Status — set to 0x00 or actual status) Byte 1: 0x00 Byte 2: frame_length[15:8] (big-endian, includes descriptor bytes) Byte 3: frame_length[7:0] Bytes 4+: raw ethernet frame data (DA, SA, EtherType, payload, FCS) ``` **Flow:** 1. Wait for `W5500SPIMaster` to signal frame available (`rx_sof` pulse). 2. Read frame bytes from W5500 frame FIFO. 3. Compute how many 256-byte pages are needed: `pages_needed = ceil((frame_length + 4) / 256)` 4. Check that `(RWP + pages_needed) mod 16 != RRP` (ring not full). If full, drop the frame and increment a drop counter. 5. Write 4-byte descriptor at SPRAM address `0x100 + (RWP * 0x100)`. 6. Write frame bytes sequentially, wrapping pages at 256-byte boundaries. Page wrap: `next_page = (current_page % 15) + 1` (pages 1–15, skip 0). 7. After last byte written, update `RWP` in the `rx_wptr` AsyncFIFO (sync→exi). The `exi` domain will update the RWP register from this FIFO. 8. Pulse `rx_irq` PulseSynchronizer to `exi` domain. **MAC address filter:** Before writing a frame, check destination MAC against PAR0–PAR5 (broadcast FF:FF:FF:FF:FF:FF always accepted). The GC will typically configure PAR0–PAR5 via EXI after boot, so the `BBARegisterFile` must expose these to the `RXFrameAssembler`. Pass them via a dedicated small AsyncFIFO or by reading them from a shared register shadow (6 bytes, sync domain copy updated when GC writes PAR0–PAR5). Multicast hash table (MAR0–MAR7) filtering is optional for initial implementation — accept all frames (promiscuous mode) until the GC configures the filter. --- ### 7.5 TXFrameDrain **Domain:** `sync` **File:** `exi_bba/tx_frame_drain.py` Drains the TX byte FIFO (fed from the `exi` domain as the GC writes to TXDATA register 0x48) and forwards complete frames to `W5500SPIMaster`. **Flow:** 1. Wait for `tx_ctrl` AsyncFIFO to contain a frame length value. This is pushed by `BBARegisterFile` when the GC has written the complete TX frame (i.e., NCRA ST1:ST0 transitions to 01 or 10). 2. Pop `frame_length` from `tx_ctrl`. 3. Pop exactly `frame_length` bytes from `tx_bytes` AsyncFIFO. 4. Forward bytes to `W5500SPIMaster` TX interface with SOF/EOF framing. 5. Wait for `W5500SPIMaster` to signal TX complete. 6. Pulse `tx_irq` PulseSynchronizer to `exi` domain. **NCRA ST bits:** The GC writes NCRA with ST1:ST0 = 01 (start transmit from buffer 1) or 10 (start transmit from buffer 2). The BBA hardware has two TX buffers; this implementation uses a single TX FIFO and ignores the buffer selection. When ST1:ST0 goes non-zero, treat it as a TX trigger regardless of which bits are set. The `BBARegisterFile` should push the frame length into `tx_ctrl` on this transition. --- ### 7.6 W5500SPIMaster **Domain:** `sync` **File:** `exi_bba/w5500_spi_master.py` Implements the W5500 SPI master interface. The W5500 uses SPI Mode 0 (CPOL=0, CPHA=0), opposite to the BBA EXI interface. **W5500 SPI frame format:** ``` Byte 0–1: Address (16-bit, big-endian) Byte 2: Control byte: [7:3] = Block Select (BSB): 00000 = Common Register 00001 = Socket 0 Register 00010 = Socket 0 TX buffer 00011 = Socket 0 RX buffer [2] = Read/Write (0=read, 1=write) [1:0] = Operation Mode (00=variable, 01=fixed 1B, 10=fixed 2B, 11=fixed 4B) Byte 3+: Data bytes ``` **W5500 configuration (to be performed once on NCRA reset):** ``` 1. Write MR (Mode Register, 0x0000): 0x80 — software reset 2. Wait ~1 ms 3. Write SHAR (Source MAC, 0x0009–0x000E): copy from PAR0–PAR5 register shadow 4. Write S0_MR (Socket 0 Mode, 0x4000): 0x04 — MACRAW mode (raw ethernet) 5. Write S0_CR (Socket 0 Command, 0x4001): 0x01 — OPEN 6. Write S0_IMR (Socket 0 Interrupt Mask, 0x4024): 0x04 | 0x01 — RECV | SEND_OK ``` **MACRAW mode:** In MACRAW mode the W5500 Socket 0 sends and receives raw ethernet frames including the full MAC header and FCS. This is exactly what the MX98730EC presents to the GC. No IP stack runs in the FPGA. **RX polling:** The W5500 asserts its INT_N pin (active low) when a frame arrives. Connect W5500 INT_N to an FPGA input pin and use it to trigger the `RXFrameAssembler`. Alternatively poll `S0_IR` (Socket 0 Interrupt Register, 0x4002) periodically. The INT_N approach has lower latency and is preferred. **SPI clock rate:** Drive W5500 SPI at 24 MHz (sync clock 48 MHz ÷ 2 using a clock enable toggle). The W5500 supports up to 80 MHz so there is ample margin. **Port list:** | Port | Width | Dir | Notes | |---|---|---|---| | `spi_clk` | 1 | out | To W5500 CLK pin (SPI Mode 0, idles LOW) | | `spi_mosi` | 1 | out | To W5500 MOSI | | `spi_miso` | 1 | in | From W5500 MISO | | `spi_cs_n` | 1 | out | To W5500 CS (active low) | | `w5500_int_n` | 1 | in | W5500 interrupt (active low) | | `tx_data` | 8 | in | Byte to transmit (from TXFrameDrain) | | `tx_valid` | 1 | in | TX byte available | | `tx_ready` | 1 | out | TX byte consumed | | `tx_sof` | 1 | in | Start of frame marker | | `tx_eof` | 1 | in | End of frame marker | | `rx_data` | 8 | out | Received byte (to RXFrameAssembler) | | `rx_valid` | 1 | out | RX byte available | | `rx_ready` | 1 | in | RX byte consumed | | `rx_sof` | 1 | out | Start of frame | | `rx_eof` | 1 | out | End of frame | --- ### 7.7 EEPROMModel **Domain:** `exi` **File:** `exi_bba/eeprom_model.py` Models the 93C46-compatible serial EEPROM that stores the BBA's MAC address. The GC software bit-bangs the EEPROM interface through register 0x1C (EEPROM Interface Register) of the BBA chip. **Register 0x1C bit fields:** ``` [3] EECK — EEPROM clock [2] EECS — EEPROM chip select [1] EEDI — EEPROM data in (GC → EEPROM) [0] EEDO — EEPROM data out (EEPROM → GC) [read-only] ``` The GC reads EEDO by reading register 0x1C bit 0. **93C46 protocol summary:** The 93C46 uses a 3-wire serial protocol (SK=clock, CS=select, DI=data in, DO=data out). Commands: - READ: start bit (1) + opcode (10) + 6-bit address → 16-bit data out - WRITE: start bit (1) + opcode (01) + 6-bit address + 16-bit data - EWEN (write enable): start bit (1) + opcode (00) + address (11xxxx) Each 93C46 word is 16 bits. The MAC address occupies words 0–2 (6 bytes). **Implementation approach:** Maintain a small ROM of 64 × 16-bit words in the `exi` domain (as a Const array, synthesises to LUTs). Pre-populate words 0–2 with the chosen MAC address. Implement a small FSM that watches writes to register 0x1C for the 93C46 protocol, drives EEDO accordingly. **Simpler alternative:** Many GC BBA drivers read the EEPROM once at boot and then write the MAC to PAR0–PAR5 themselves. Pre-populate PAR0–PAR5 in the register file reset state with a valid Nintendo OUI MAC (00:09:BF:xx:xx:xx). Skip a full 93C46 implementation for the first version — if Swiss ignores the EEPROM read result and uses a hardcoded or user-configurable MAC, this is sufficient. --- ### 7.8 BBATop **Domain:** both **File:** `exi_bba/bba_top.py` Top-level module. Instantiates all submodules, creates clock domains, connects physical pins. **Clock domain creation:** ```python def elaborate(self, platform): m = Module() # exi domain: 96 MHz from PLL (3× 32 MHz EXI bus rate) exi_domain = ClockDomain("exi") m.domains += exi_domain pll = platform.get_pll() # platform-specific PLL primitive m.d.comb += exi_domain.clk.eq(pll.clkout) m.submodules.exi_rst = ResetSynchronizer( arst=ResetSignal("sync"), domain="exi" ) # sync domain: 48 MHz from SB_HFOSC (platform default) # Created automatically by iCEbreaker platform # Instantiate submodules... m.submodules.spi = spi = SPIMode3Slave() m.submodules.regfile = regfile = BBARegisterFile() m.submodules.arbiter = arbiter = SPRAMArbiter() m.submodules.rx_asm = rx_asm = RXFrameAssembler() m.submodules.tx_drn = tx_drn = TXFrameDrain() m.submodules.w5500 = w5500 = W5500SPIMaster() m.submodules.eeprom = eeprom = EEPROMModel() # ... wiring ... ``` **Physical pin connections (iCEbreaker):** The SP1 EXI signals connect via the interposer PCB to iCEbreaker PMOD pins. The W5500 Pmod connects to the second PMOD connector. Exact pin mapping depends on the interposer PCB layout — define these in a platform resource file. ```python # Example resource definitions (add to iCEbreaker platform file): Resource("exi", 0, Subsignal("clk", Pins("1", conn=("pmod", 0), dir="i")), Subsignal("mosi", Pins("2", conn=("pmod", 0), dir="i")), Subsignal("miso", Pins("3", conn=("pmod", 0), dir="o")), Subsignal("cs_n", Pins("4", conn=("pmod", 0), dir="i")), Subsignal("int_n",Pins("7", conn=("pmod", 0), dir="o")), Attrs(IO_STANDARD="SB_LVCMOS"), ), Resource("w5500", 0, Subsignal("clk", Pins("1", conn=("pmod", 1), dir="o")), Subsignal("mosi", Pins("2", conn=("pmod", 1), dir="o")), Subsignal("miso", Pins("3", conn=("pmod", 1), dir="i")), Subsignal("cs_n", Pins("4", conn=("pmod", 1), dir="o")), Subsignal("int_n",Pins("7", conn=("pmod", 1), dir="i")), Subsignal("rst_n",Pins("8", conn=("pmod", 1), dir="o")), Attrs(IO_STANDARD="SB_LVCMOS"), ), ``` --- ## 8. Memory Map The BBA register address space is 13 bits wide (0x0000–0x1FFF). | Address range | Region | Implemented in | Notes | |---|---|---|---| | 0x0000–0x0033 | MAC control registers | Register file (exi) | NCRA, NCRB, IMR, IR, pointers | | 0x0034–0x0037 | TWD — TX write data | Register file (exi) | TX frame length (2 bytes) | | 0x0038–0x0039 | Reserved | — | Ignore | | 0x003A | HIPR — Host Interface Protocol | Register file (exi) | Read: 0x01 (BBA present) | | 0x003B | NAFR — Network Address Filter | Register file (exi) | | | 0x003C | NWBA — Network Write Buffer Addr | Register file (exi) | | | 0x003D–0x0047 | Reserved | — | Ignore | | 0x0048 | TXDATA — Bulk TX data port | Register file → tx_bytes FIFO | Write path to ethernet | | 0x0049–0x00FF | Reserved | — | Ignore | | 0x0100–0x0FFF | RX ring buffer | SPRAM (sync) | Read path from ethernet | --- ## 9. EXI Transaction Protocol All BBA register accesses follow a strict two-phase (header + data) format. ### Header encoding ``` Byte 0: [7] write flag 1=write, 0=read [6:0] addr[12:6] upper 7 bits of address Byte 1: [7:2] addr[5:0] lower 6 bits of address [1:0] xfer_len-1 0=1 byte, 1=2 bytes, 2=3 bytes, 3=4 bytes ``` CS is asserted (low) before byte 0 and remains low through the entire transaction including all data bytes. CS deasserts (high) after the last data byte. ### Read transaction timing ``` CS ─┐ ┌─ └────────────────────────────────────┘ CLK ┌┐┌┐┌┐┌┐┌┐┌┐┌┐┌┐ ┌┐┌┐┌┐┌┐┌┐┌┐┌┐┌┐ ┌┐┌┐... header byte 0 header byte 1 data byte 0... MOSI [addr+flags] [addr+len] [don't care] MISO [don't care] [don't care] [register data] ``` The register file must have data ready on MISO from the **very first clock edge of the data phase**. For register-file-backed reads (address < 0x100), the data is available immediately after header decode. For SPRAM-backed reads (address ≥ 0x100), the prefetch pipeline issues the SPRAM read request during the header phase so data is ready in time. ### Write transaction timing Identical header, then MOSI carries the write data. The FPGA samples MOSI on each falling CLK edge during the data phase and writes to the register. ### ID query On power-on the GC queries the device ID. The query is two 0x00 bytes written, then four bytes read. The BBA returns `0x04020200`. Implement this as a special case: when address decodes to 0x0000 on a read with no prior NCRA reset, return the hardcoded ID. Alternatively, read the Dolphin source for the exact byte sequence GC software uses to detect the BBA and replicate it faithfully. --- ## 10. BBA Register Reference Key registers the GC driver accesses. Full register map in YAGCD §10.8. | Addr | Name | R/W | Reset | Description | |---|---|---|---|---| | 0x00 | NCRA | R/W | 0x00 | Network Control A. [0]=RESET (self-clear), [2:1]=ST (TX start), [3]=SR (start receive), [6]=INTMODE (0=int active low) | | 0x01 | NCRB | R/W | 0x00 | Network Control B | | 0x04 | LTPS | R | 0x00 | Last TX packet status | | 0x05 | LRPS | R | 0x00 | Last RX packet status | | 0x08 | IMR | R/W | 0x00 | Interrupt mask. Bits match IR. Interrupt fires when IR & IMR != 0 | | 0x09 | IR | R/W | 0x00 | Interrupt register. Write 1 to clear. [7]=RBFI, [4]=TEI, [2]=TI, [1]=RI | | 0x0A–0x0B | BP | R/W | — | Boundary page pointer | | 0x0C–0x0D | TLBP | R/W | — | TX low boundary page | | 0x0E–0x0F | TWP | R/W | 0x00 | TX write page pointer | | 0x12–0x13 | TRP | R/W | 0x00 | TX read page pointer | | 0x16–0x17 | RWP | R | updates | RX write page pointer. Advances after each frame written | | 0x18–0x19 | RRP | R/W | 0x01 | RX read page pointer. GC writes to advance after consuming frames | | 0x1A–0x1B | RHBP | R/W | 0x0F | RX high boundary page (last valid page). Default 0x0F | | 0x1C | EEPROM | R/W | — | EEPROM bit-bang interface [3:0] = EECK, EECS, EEDI, EEDO | | 0x20–0x25 | PAR0–5 | R/W | MAC | MAC address bytes 0–5. GC writes after reading EEPROM | | 0x26–0x2D | MAR0–7 | R/W | 0xFF | Multicast hash table. 0xFF = accept all | | 0x2E | ANALOG | R/W | — | PHY analog control. GC writes 0xD6 to enable PHY | | 0x30 | NWAYC | R/W | — | Autoneg config. GC sets ANE + LTE bits | | 0x31 | NWAYS | R | 0x17 | Autoneg status. Hardcode 0x17 = 100M full duplex link up | | 0x32 | GCA | R/W | — | GMAC config A. GC sets AUTOPUB bit | | 0x33 | GCB | R/W | — | GMAC config B | | 0x34–0x37 | TWD | W | — | TX write data (frame length, 2 bytes LE, then ignored) | | 0x3A | HIPR | R | 0x01 | Host interface protocol version. Return 0x01 | | 0x3B | NAFR | R/W | — | Network address filter | | 0x3C | NWBA | R/W | — | Network write buffer address | | 0x48 | TXDATA | W | — | Bulk TX data port. GC streams frame bytes here | | 0x100+ | RX buf | R | — | RX ring buffer. GC reads frames from here | --- ## 11. Initialisation Sequence This is the exact sequence Swiss/GC software executes. The register file must respond correctly to each step. ``` 1. Assert CS, write 0x0000 (2 bytes), read 4 bytes → Must return: 0x04 0x02 0x02 0x00 (device ID) 2. Write 0x01 to NCRA (0x00) — software reset → RESET bit self-clears next cycle → Pulse ncra_rst to sync domain (resets W5500, clears SPRAM pointers) 3. Poll NCRA bit 0 until clear — wait for reset complete → Return 0x00 from NCRA reads after self-clear 4. Write 6 bytes to PAR0–PAR5 (0x20–0x25) → Latch MAC address; forward to sync domain MAC filter shadow 5. Write 8 bytes to MAR0–MAR7 (0x26–0x2D) → Typically all 0xFF (promiscuous mode) 6. Write 0xD6 to ANALOG (0x2E) — enable PHY → Store in register file; no hardware effect in FPGA 7. Write NWAYC (0x30): set bits for ANE + LTE → Store; no hardware effect 8. Write IMR (0x08): typically 0x86 (RBFI | TI | RI) → Enables interrupts; INT line will now assert when frames arrive 9. Write GCA (0x32): set AUTOPUB bit → Store; AUTOPUB means RWP auto-updates — we always do this anyway 10. Write NCRA (0x00): set SR bit (0x08) — start receive → Enable RX path; the RXFrameAssembler should begin accepting frames 11. Poll NWAYS (0x31) until link up → Return hardcoded 0x17 immediately ``` --- ## 12. RX Data Path — Detailed Flow ``` W5500 receives frame on wire │ ▼ W5500SPIMaster detects S0_IR[RECV] (via INT_N pin) Reads frame length from S0_RX_RSR (Socket 0 RX Received Size, 0x4026) Reads frame bytes from Socket 0 RX buffer (BSB=0b00011) Pulses rx_sof, streams rx_data bytes, pulses rx_eof │ ▼ (sync domain) RXFrameAssembler - Checks destination MAC vs PAR shadow - Checks NCRA SR bit is set (RX enabled) - Computes pages_needed - Checks ring buffer not full (RWP+pages != RRP) - Writes descriptor + frame data into SPRAM via SPRAMArbiter - Advances RWP (local register in sync domain) - Pushes new RWP value into rx_wptr AsyncFIFO (sync→exi) - Pulses rx_irq PulseSynchronizer (sync→exi) │ ▼ AsyncFIFO / PulseSynchronizer crossing │ (exi domain) BBARegisterFile - Pops new RWP from rx_wptr FIFO, updates RWP register - rx_irq pulse arrives: sets IR[1] (RI bit) - IR & IMR now non-zero: asserts exi_int_n (INT low to GC) │ ▼ (GC CPU, driven by interrupt or polling) GC reads IR register: sees RI=1 GC reads RWP (0x16): gets updated pointer GC reads frame from 0x100+RRP (bulk read, up to 1500+ bytes) → BBARegisterFile issues SPRAM read requests via spram_req FIFO (exi→sync) → SPRAMArbiter services reads from SPRAM → Results flow back via spram_rsp FIFO (sync→exi) → Prefetch pipeline keeps data ready for SPI bit engine GC writes new RRP (0x18) to advance past consumed pages → BBARegisterFile pushes RRP update into rx_rptr FIFO (exi→sync) → RXFrameAssembler updates its local RRP shadow GC writes IR register with RI=1 (write-1-to-clear) → IR[1] clears, INT line deasserts ``` --- ## 13. TX Data Path — Detailed Flow ``` GC CPU constructs ethernet frame in GC RAM │ ▼ (GC CPU → EXI) GC writes 2-byte length to TWD register (0x34) GC writes frame bytes to TXDATA register (0x48) in chunks → BBARegisterFile: each written byte goes into tx_bytes AsyncFIFO (exi→sync) GC writes NCRA with ST1:ST0 = 01 (transmit trigger) → BBARegisterFile pushes frame_length into tx_ctrl AsyncFIFO (exi→sync) │ ▼ AsyncFIFO crossing │ (sync domain) TXFrameDrain - Pops frame_length from tx_ctrl - Pops frame_length bytes from tx_bytes - Forwards to W5500SPIMaster with SOF/EOF │ ▼ (sync domain) W5500SPIMaster - Writes frame length to S0_TX_FSR (TX Free Size Register, 0x4020) - Writes frame bytes into Socket 0 TX buffer (BSB=0b00010) - Writes SEND command to S0_CR (0x4001 = 0x20) - Polls S0_IR until SEND_OK bit set - Clears S0_IR[SEND_OK] - Pulses tx_irq PulseSynchronizer (sync→exi) │ ▼ PulseSynchronizer crossing │ (exi domain) BBARegisterFile - tx_irq arrives: sets IR[2] (TI bit), clears NCRA ST1:ST0 - If IMR[2] set: INT asserts to GC │ ▼ (GC CPU) GC reads IR, sees TI=1 GC writes IR with TI=1 to clear ``` --- ## 14. SPRAM Layout The iCE40UP5K has 4 × 32 KB SPRAM banks (128 KB total). Map them as: | SPRAM region | Size | Usage | |---|---|---| | 0x0000–0x00FF | 256 B | Reserved (address 0x00 page not used by ring buffer) | | 0x0100–0x0FFF | 3840 B | RX ring buffer (15 × 256-byte pages, pages 0x01–0x0F) | | 0x1000–0x17FF | 2048 B | TX frame staging buffer | | 0x1800–0x1FFF | 2048 B | Reserved / future use | The ring buffer uses pages 0x01–0x0F (15 pages × 256 bytes = 3840 bytes). This matches the MX98730EC default `RHBP` (RX High Boundary Page) value of 0x0F and `RRP` reset value of 0x01. **SPRAM addressing:** iCE40UP5K SB_SPRAM256KA instances are 64K × 16-bit (128 KB total across 4 instances). To address the ring buffer region as bytes: - Byte address 0x0100 maps to SPRAM word address 0x0080 (byte 0x0100 >> 1) - The arbiter converts byte addresses to word addresses and uses MASKWREN for byte selection --- ## 15. Critical Timing Constraints ### Must-meet timing in `exi` domain (96 MHz → 10.4 ns period) | Path | Budget | Notes | |---|---|---| | FFSynchronizer output → edge detect flip-flop | 1 cycle = 10.4 ns | Trivially met — just a register | | Edge detect → shift register update | 1 cycle | Register-to-register, no logic | | `rx_valid` → header decode → `spram_req` FIFO write | 2 cycles | Address decode is combinatorial MUX; must close at 96 MHz | | `tx_load` → `tx_byte` driven from register file | 1 cycle | `regs[addr]` array lookup — critical path; keep address decode combinatorial depth ≤ 4 LUTs | | `tx_load` → `tx_byte` driven from prefetch buffer | 1 cycle | Just a register read — trivial | ### Must-meet timing in `sync` domain (48 MHz → 20.8 ns period) | Path | Budget | Notes | |---|---|---| | SPRAM read request → SPRAM address valid | 1 cycle | AsyncFIFO read + mux — easy | | SPRAM DATAOUT → result FIFO write | 1 cycle | Register-to-FIFO — easy | | W5500 SPI bit engine | N/A | Clock-enable based at 24 MHz effective; no hard timing | ### Cross-domain latency budget for SPRAM prefetch ``` EXI header phase duration: 16 exi clocks at 96 MHz = 167 ns SPRAM prefetch round trip: exi → spram_req FIFO write: 1 exi tick = 10 ns FIFO cross-domain: 2 sync ticks = 42 ns SPRAM read (1 cycle latency): 1 sync tick = 21 ns Result → spram_rsp FIFO write: 1 sync tick = 21 ns FIFO cross-domain: 2 exi ticks = 21 ns Result available in prefetch buffer: = 21 ns Total: ~136 ns 136 ns < 167 ns header window → prefetch completes before first data bit needed ✓ ``` This is the tightest timing consideration in the design. The prefetch must be issued during HEADER1 (not after) to make the deadline. --- ## 16. SPRAM Read Prefetch Pipeline The prefetch pipeline ensures MISO data is always ready before the SPI slave needs it for the data phase. ### State machine (in BBARegisterFile, exi domain) ``` State HEADER1 (decoding second header byte): If is_read AND address >= 0x100: push address into spram_req AsyncFIFO ← issued NOW, during header decode set prefetch_pending = True State DATA (read phase): On each tx_load pulse: If prefetch_pending AND spram_rsp FIFO has data: pop byte from spram_rsp FIFO load into tx_byte push (address + byte_ctr + 1) into spram_req for NEXT byte ← pipelining Elif address < 0x100: tx_byte = regs[address + byte_ctr] ← direct register file read ``` ### Pipeline depth The `spram_req` and `spram_rsp` FIFOs each have depth 4. This allows up to 4 read requests to be in-flight simultaneously, which absorbs any SPRAM arbiter stalls (ETH write winning the arbitration) without stalling the SPI data phase. ### SPRAM arbiter stall handling If the SPRAM arbiter defers an EXI read by 1 cycle (due to ETH write priority), the `spram_rsp` FIFO will be momentarily empty when `tx_load` arrives. The BBARegisterFile must stall the SPI slave in this case. However: the SPI slave cannot be stalled mid-bit. The stall mechanism must work at byte boundaries only — i.e., after a complete byte has been transmitted, hold MISO at 0 (or 1) and do not toggle until the next byte is ready. Since the GC is the SPI master and controls CLK, it will simply clock in garbage on the retry byte. **Practical note:** At 48 MHz sync with 24 MHz effective W5500 access rate, the ETH write path can only consume the SPRAM arbiter for ~1 sync cycle per byte written. The EXI read path gets the remaining cycles. With 4-deep FIFOs the pipeline should almost never stall in practice. Monitor the stall condition in simulation. --- ## 17. Interrupt Handling The `exi_int_n` output (pin 3 of SP1) is active-low. Assert it (drive low) when `IR & IMR != 0`. ```python # In BBARegisterFile, exi domain: ir_masked = Signal(8) m.d.comb += ir_masked.eq(regs[BBARegs.IR] & regs[BBARegs.IMR]) m.d.exi += exi_int_n.eq(~ir_masked.any()) ``` Register the output — do not drive `exi_int_n` combinatorially. A registered output prevents glitches from propagating onto the GC board. **Interrupt sources and IR bit assignments:** | IR bit | Name | Set by | Cleared by | |---|---|---|---| | 7 | RBFI | RXFrameAssembler when ring full | GC write-1-to-clear | | 4 | TEI | TXFrameDrain on TX error | GC write-1-to-clear | | 2 | TI | tx_irq pulse from sync | GC write-1-to-clear | | 1 | RI | rx_irq pulse from sync | GC write-1-to-clear | The GC typically masks in IMR: 0x86 = 0b10000110 (RBFI | TI | RI). --- ## 18. EEPROM / MAC Address The GC software reads the MAC address from the 93C46 EEPROM during initialisation (bit-banging through register 0x1C). It then writes the MAC to PAR0–PAR5. **Recommended approach for initial implementation:** Skip full 93C46 emulation. Pre-populate `regs[0x1C]` with a pattern that makes the EEPROM read return a valid MAC. Use Nintendo's OUI `00:09:BF` for the first 3 bytes, with locally administered bits for the last 3: ``` MAC: 00:09:BF:00:00:01 ``` Verify against Swiss source whether it validates the MAC read from EEPROM or accepts whatever PAR0–PAR5 contains. If it re-reads EEPROM after writing PAR, a full 93C46 model is required. If it only uses PAR0–PAR5, pre-populating the register file is sufficient. **MAC address propagation:** When the GC writes PAR0–PAR5, forward the new MAC to the W5500 SHAR register via the `sync` domain. Use a 6-byte AsyncFIFO or a dedicated MAC update pulse. The W5500 uses SHAR as its source MAC for all transmitted frames. --- ## 19. iCE40UP5K Resource Budget | Resource | Available | Estimated use | Margin | |---|---|---|---| | Logic cells (4-LUT + FF) | 5280 | ~1800 | 66% free | | EBR (4 Kbit blocks) | 30 (120 Kbit) | 4 (FIFOs) | 26 free | | SPRAM (32 KB banks) | 4 (128 KB) | 1 bank for ring buffer | 3 free | | PLL | 1 | 1 (for exi domain) | 0 free | | SB_HFOSC | 1 | 1 (sync domain) | 0 free | | I/O pins | 39 usable | ~14 (EXI:5 + W5500:6 + misc:3) | 25 free | **Logic cell breakdown:** | Module | Estimated cells | |---|---| | SPIMode3Slave | 90 | | BBARegisterFile FSM + decode | 250 | | Register file (512 × 8b) | ~200 (distributed RAM) | | AsyncFIFO × 8 | 400 | | PulseSynchronizer × 4 | 40 | | FFSynchronizer × 5 | 30 | | SPRAMArbiter | 80 | | RXFrameAssembler | 200 | | TXFrameDrain | 150 | | W5500SPIMaster | 200 | | EEPROMModel | 100 | | Misc glue | 60 | | **Total** | **~1800** | iCE40UP5K fmax with nextpnr: typically 60–80 MHz for logic of this complexity. The `exi` domain at 96 MHz is the tightest. If nextpnr fails to close timing: 1. First option: reduce to 64 MHz `exi` domain (icepll alternative). 2. Second option: reduce EXI bus speed in Swiss settings to 16 MHz (clock index 4 instead of 5), halving the FPGA timing requirement. 3. Third option: add pipeline registers on the critical address decode path. --- ## 20. PCB / Connector Notes ### Interposer PCB A simple pass-through interposer PCB connects the GC SP1 slot to the iCEbreaker via a ribbon cable or header. **Required PCB spec:** - Thickness: **1.2 mm** (not standard 1.6 mm — critical for fit) - Copper finish: **ENIG (gold)** — prevents oxidation on edge contacts - Board material: FR4 standard **Footprint source:** Copy the edge connector footprint from `github.com/silverstee1/SP1ETH` KiCad files. Do not design from scratch. The staggered dual-row geometry requires exact pad positions that have been physically verified. Cross-reference with the ETH2SP1 LaserBear open files. **Additional interposer components:** - 10 kΩ resistor: EXTIN (pin 1) to 3.3V (pin 7) — device detect - 100 µF capacitor: 3.3V to GND — bulk decoupling near connector - 100 nF capacitor × 2: additional HF decoupling - ESD protection diode array: on CLK, MOSI, MISO, CS lines (optional but recommended — the GC motherboard is difficult to repair if damaged) **Do not connect pin 5 (12V) to anything on the FPGA side.** ### iCEbreaker connection The interposer PCB exposes EXI signals on a 2.54 mm pitch 8-pin header. Connect to iCEbreaker PMOD1 connector using a short ribbon cable. Keep the cable as short as possible (< 10 cm) to minimize signal integrity issues at 32 MHz. --- ## 21. Known Hardware Quirks ### EXI DMA bug The GC's EXI DMA engine has a bug where data on the MISO line during a DMA write is clocked back out with a 1-bit shift. This only affects GC software doing DMA writes (rare). Swiss uses IMM (immediate) mode transfers. No FPGA workaround needed. ### SPI Mode 3 vs Mode 0 Every other EXI device (memory cards, RTC, IPL) uses SPI Mode 0. The BBA is the only device using Mode 3. Do not share the SPI slave implementation with other EXI device implementations without parameterising CPOL/CPHA. ### MISO tristate On real hardware, MISO (DO) is tristated when CS is deasserted. Other EXI devices on the same bus would otherwise conflict. On this FPGA implementation, drive MISO high (not tristated) when CS is deasserted. The iCE40UP5K does not easily support pin tristate from user logic — drive high is safe because the BBA occupies a dedicated CS line (SP1 device 2) separate from memory cards and the RTC. ### GC hardware revisions - DOL-001 (original): SP1 present, BBA compatible - DOL-001 Rev B: SP1 physically absent on motherboard but case hole present - DOL-101 (later): SP1 present again (but Serial Port 2 absent) - Panasonic Q: SP1 present Swiss supports all revisions with SP1 via the EXI hypervisor driver (required from Swiss build 1788 onwards for BBA emulation features). ### EXI clock index The real BBA uses clock index 5 (32 MHz). Swiss allows configuring a lower clock index for compatibility. If 96 MHz fmax is not achievable, instruct users to configure Swiss to use clock index 4 (16 MHz EXI), which requires only 32 MHz `exi` domain and is trivially achievable. --- ## 22. File Structure ``` gc_bba_fpga/ ├── exi_bba/ │ ├── __init__.py │ ├── spi_mode3_slave.py # SPIMode3Slave │ ├── bba_register_file.py # BBARegisterFile + register constants │ ├── spram_arbiter.py # SPRAMArbiter │ ├── rx_frame_assembler.py # RXFrameAssembler │ ├── tx_frame_drain.py # TXFrameDrain │ ├── w5500_spi_master.py # W5500SPIMaster │ ├── eeprom_model.py # EEPROMModel (93C46) │ └── bba_top.py # BBATop + clock domain setup ├── sim/ │ ├── sim_spi_slave.py # SPIMode3Slave unit test │ ├── sim_register_file.py # BBARegisterFile unit test │ ├── sim_bba_init.py # Full init sequence simulation │ ├── sim_rx_path.py # RX data path end-to-end test │ ├── sim_tx_path.py # TX data path end-to-end test │ ├── gc_master_model.py # GC CPU SPI master simulation model │ ├── w5500_slave_model.py # W5500 SPI slave simulation model │ └── ethernet_frame_gen.py # Test frame generator ├── platform/ │ ├── icebreaker_bba.py # iCEbreaker platform with BBA resources │ └── interposer_pinmap.py # SP1 ↔ PMOD pin mapping ├── pcb/ │ ├── interposer/ # KiCad project for interposer PCB │ └── README.md # PCB ordering instructions (1.2mm, ENIG) ├── constraints/ │ └── timing.py # nextpnr timing constraints (if needed) ├── tests/ │ └── test_bba.py # pytest suite ├── build.py # Amaranth build script └── README.md ``` --- ## 23. Simulation Strategy Each module should have a standalone simulation before integration. All simulations use Amaranth's `Simulator` with two clock domains: `sim.add_clock(1/96e6, domain="exi")` and `sim.add_clock(1/48e6, domain="sync")`. ### Unit tests **SPIMode3Slave:** Drive CLK/MOSI/CS manually from a process in the `exi` domain. Verify `rx_byte`/`rx_valid` match sent data. Verify `spi_miso` matches pre-loaded `tx_byte`. Test CS abort mid-byte. **BBARegisterFile:** Use a `GCMasterModel` (SPI Mode 3 master process) to perform read/write transactions. Verify register writes are stored. Verify register reads return correct values. Verify IR bit setting and clearing. Verify NWAYS returns 0x17. Verify ID query returns 0x04020200. **SPRAMArbiter:** Issue concurrent EXI reads and ETH writes. Verify ETH writes win arbitration. Verify EXI reads complete within 3 sync cycles. Verify no data corruption. **RXFrameAssembler:** Feed a known ethernet frame byte-by-byte. Verify SPRAM contents match expected descriptor + frame layout. Verify RWP advances by correct page count. Verify rx_irq fires. **TXFrameDrain + W5500SPIMaster:** Issue TX frame from `tx_bytes` FIFO. Use `W5500SlaveModel` process to simulate W5500 responses. Verify frame bytes arrive at W5500 correctly. Verify tx_irq fires after SEND_OK. ### Integration test **sim_bba_init.py:** Full GC init sequence (all 11 steps from Section 11). `GCMasterModel` performs every transaction. Verify no stalls, correct responses. **sim_rx_path.py:** `W5500SlaveModel` delivers a 64-byte test frame. `GCMasterModel` polls IR, reads RWP, bulk-reads the frame, advances RRP. Verify GC receives identical bytes to what W5500 sent. **sim_tx_path.py:** `GCMasterModel` writes a 64-byte frame through TXDATA. `W5500SlaveModel` captures it. Verify W5500 receives identical bytes. --- ## 24. Open Issues and Extension Points ### Must resolve before first synthesis - [ ] Exact PLL parameters for iCE40UP5K: run `icepll -i 12 -o 96` and confirm the output is achievable (VCO in 533–1066 MHz range). - [ ] SP1 connector footprint: clone SP1ETH repo, extract pad positions, verify stagger geometry and pitch before PCB layout. - [ ] W5500 Pmod module pin mapping: confirm which Pmod pins INT_N and RST_N appear on (varies by module vendor). - [ ] Swiss version requirement: confirm Swiss build ≥ 1788 for BBA hypervisor support. Earlier builds use a different driver that may have different register access patterns. ### Known limitations - Single TX buffer (MX98730EC has two). ST1:ST0 = 01 and 10 are treated identically. No known GC title relies on dual TX buffering. - No DMA mode support. IMM mode only. Matches real-world Swiss usage. - No Serial Port 2 support (different connector, different project scope). - 93C46 EEPROM emulation is simplified (hardcoded MAC). A full bit-bang model can be added later if Swiss requires it. - RX ring buffer is 15 pages (3840 bytes). The real BBA has 4KB. Frames larger than ~3800 bytes (jumbo frames) will be dropped. Standard 1500-byte MTU frames fit in at most 7 pages — no practical issue. ### Extension points - **Larger ring buffer:** Use additional SPRAM banks for more RX buffering. - **Multiple sockets:** W5500 supports 8 sockets; only socket 0 in MACRAW mode is used here. - **Link status passthrough:** Read W5500 PHYCFGR register and forward real link status to NWAYS instead of hardcoding 0x17. - **Statistics counters:** LTPS/LRPS (last packet status) are currently 0x00. A more complete implementation would fill these from W5500 socket status. - **Serial Port 2 support:** Different physical connector and EXI channel but same FPGA logic; would require a second interposer PCB.