Files
rebbarb/docs/gc_bba_fpga_design.md
2026-06-13 18:35:38 +02:00

1444 lines
57 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# GameCube BBA FPGA Replacement — Design Document
**Target hardware:** iCEbreaker (Lattice iCE40UP5K)
**Target language:** Amaranth HDL (Python)
**Toolchain:** Yosys + nextpnr-ice40 + IceStorm
**Purpose:** Replace the Nintendo GameCube Broadband Adapter (DOL-015) with an
FPGA-based implementation, exposing a W5500 100BASE-TX ethernet chip to the GC
over the EXI (Expansion Interface) serial bus, enabling game ISO streaming via
Swiss homebrew.
---
## Table of Contents
1. [System Overview](#1-system-overview)
2. [Protocol References](#2-protocol-references)
3. [Physical Interface — SP1 Edge Connector](#3-physical-interface--sp1-edge-connector)
4. [Clock Domains](#4-clock-domains)
5. [Clock Domain Crossing Strategy](#5-clock-domain-crossing-strategy)
6. [Module Hierarchy](#6-module-hierarchy)
7. [Module Specifications](#7-module-specifications)
- 7.1 [SPIMode3Slave](#71-spimode3slave)
- 7.2 [BBARegisterFile](#72-bbaregisterfile)
- 7.3 [SPRAMArbiter](#73-spramarbiter)
- 7.4 [RXFrameAssembler](#74-rxframeassembler)
- 7.5 [TXFrameDrain](#75-txframedrain)
- 7.6 [W5500SPIMaster](#76-w5500spimaster)
- 7.7 [EEPROMModel](#77-eeprommodel)
- 7.8 [BBATop](#78-bbatop)
8. [Memory Map](#8-memory-map)
9. [EXI Transaction Protocol](#9-exi-transaction-protocol)
10. [BBA Register Reference](#10-bba-register-reference)
11. [Initialisation Sequence](#11-initialisation-sequence)
12. [RX Data Path — Detailed Flow](#12-rx-data-path--detailed-flow)
13. [TX Data Path — Detailed Flow](#13-tx-data-path--detailed-flow)
14. [SPRAM Layout](#14-spram-layout)
15. [Critical Timing Constraints](#15-critical-timing-constraints)
16. [SPRAM Read Prefetch Pipeline](#16-spram-read-prefetch-pipeline)
17. [Interrupt Handling](#17-interrupt-handling)
18. [EEPROM / MAC Address](#18-eeprom--mac-address)
19. [iCE40UP5K Resource Budget](#19-ice40up5k-resource-budget)
20. [PCB / Connector Notes](#20-pcb--connector-notes)
21. [Known Hardware Quirks](#21-known-hardware-quirks)
22. [File Structure](#22-file-structure)
23. [Simulation Strategy](#23-simulation-strategy)
24. [Open Issues and Extension Points](#24-open-issues-and-extension-points)
---
## 1. System Overview
The GameCube Broadband Adapter (BBA) is a hardware peripheral that plugs into
Serial Port 1 (SP1) on the underside of the GameCube. It presents a network
interface to the GC CPU using a Macronix MX98730EC custom IC. GC software
(primarily Swiss homebrew) communicates with the BBA through a memory-mapped
register interface accessed over the EXI serial bus.
This project replaces the MX98730EC with an iCEbreaker FPGA that emulates the
register interface, and connects to a W5500 ethernet chip (on a Pmod-compatible
module) for actual network communication.
### High-level data flow
```
GameCube CPU
│ EXI (SPI Mode 3, 32 MHz, Serial Port 1)
iCEbreaker FPGA
├── exi domain (64 MHz): SPI slave, register file, prefetch pipeline
└── sync domain (48 MHz): SPRAM arbiter, RX assembler, TX drain, W5500 driver
│ SPI (up to 40 MHz)
W5500 Pmod module (100BASE-TX ethernet)
│ RJ-45
Network
```
### What this design does NOT implement
- A network stack. The GC CPU runs TCP/IP. The FPGA is a dumb MAC bridge.
- IP address awareness. The FPGA never parses ethernet frame payloads.
- The GC's DMA engine quirk (only relevant to GC-side software).
- Video/audio streaming logic (handled by Swiss on the GC CPU side).
---
## 2. Protocol References
| Source | Content |
|---|---|
| YAGCD §2.4.1.4 | SP1 (P6) connector pinout |
| YAGCD §5.9 | EXI bus register descriptions |
| YAGCD §10.8 | MX98730EC (BBA chip) register map |
| Dolphin source `EXI_DeviceEthernet.h` | Register offsets, init sequence, RX/TX flow |
| Dolphin source `EXI_DeviceEthernet.cpp` | Transaction encoding, interrupt logic |
| Swiss source `bba.c` | GC-side driver, exact register access patterns |
| MX98730EC datasheet | Unavailable publicly; YAGCD is the primary reference |
| W5500 datasheet | SPI interface, register map, socket model |
| iCE40UP5K datasheet | SPRAM timing, PLL parameters, I/O standards |
**Critical implementation note:** The MX98730EC uses **SPI Mode 3** (CPOL=1,
CPHA=1). CLK idles HIGH. Data is sampled on the FALLING edge of CLK and set up
on the RISING edge. This is the opposite of memory cards and the RTC chip, which
use SPI Mode 0. Getting this wrong means the GC will never enumerate the device.
---
## 3. Physical Interface — SP1 Edge Connector
### Slot characteristics
- Dual-sided PCB edge connector
- Contacts on both top and bottom faces of the PCB edge
- Top and bottom contact rows are **staggered** (offset by half a pitch), not
mirrored — similar to ISA/PCI card edge geometry
- PCB must be ordered at **1.2 mm thickness** with **ENIG (gold) finish**
- Keying notch at top-right corner of housing (when looking into console socket
with front of console facing right)
### Connector footprint
Exact pad positions and pitch must be taken from the SP1ETH KiCad project
(github.com/silverstee1/SP1ETH). Do not attempt to derive dimensions from YAGCD
alone — the document lists signals but not physical geometry. Cross-reference
against the ETH2SP1 (LaserBear) open model files as a second source.
Key parameters to verify from those files before PCB layout:
- Contact pitch (expected: 2.0 mm or 2.54 mm — measure from KiCad file)
- Stagger offset between top and bottom rows
- Total contact count per side (expected: 6 per side = 12 total, or 12 per side
= 24 total with duplicated power/ground)
- Insertion depth from board edge to first contact
- Board width at connector edge
### Signal pinout (YAGCD §2.4.1.4)
Pin numbering: looking into the console socket, front of console to the right,
pin 1 is on the left. On the adapter PCB (component side up, inserting down),
pin 1 is also on the left — numbering does not mirror.
| Pin | Signal | Direction | Notes |
|---|---|---|---|
| 1 | EXTIN | Adapter → GC | Device detect/sense. Tie to 3.3V via 10 kΩ resistor. Without this the GC does not enumerate the device. |
| 2 | GND | — | Shield ground |
| 3 | INT | Adapter → GC | Active-low interrupt to GC CPU. Assert when IR & IMR != 0. |
| 4 | CLK | GC → Adapter | SPI clock, up to 32 MHz, idles HIGH (Mode 3) |
| 5 | 12V | — | 12 V supply from GC. **Do not connect to FPGA I/O.** Leave unconnected or route to a test point only. |
| 6 | DO (MISO) | Adapter → GC | Serial data out: adapter drives, GC samples |
| 7 | 3.3V | — | 3.3 V supply (~200 mA available combined with pin 8) |
| 8 | 3.3V | — | 3.3 V supply (parallel with pin 7) |
| 9 | DI (MOSI) | GC → Adapter | Serial data in: GC drives, adapter samples |
| 10 | CS | GC → Adapter | Chip select, active low. Delineates each transaction. |
| 11 | GND | — | Signal ground |
| 12 | GND | — | Signal ground |
**Power budget:** Pins 7+8 together supply 3.3 V. The iCEbreaker draws ~80 mA
active, the W5500 ~150 mA peak. Total ~230 mA. The GC's 3.3 V rail on SP1 is
rated for the original BBA which also drew ~200 mA, so headroom is tight. Add a
100 µF bulk capacitor on the interposer PCB close to the FPGA power pins.
**Voltage levels:** All EXI signals are 3.3 V logic. The iCEbreaker I/O is 3.3 V.
The W5500 is 3.3 V. No level shifting required anywhere in this design.
---
## 4. Clock Domains
The design uses two clock domains. The iCE40UP5K has one PLL and one internal
48 MHz oscillator (SB_HFOSC).
### Domain table
| Domain | Frequency | Source | Purpose |
|---|---|---|---|
| `exi` | 64 MHz | PLL (12 MHz × 16 / 3) | SPI Mode 3 slave, BBA register file, prefetch pipeline |
| `sync` | 48 MHz | SB_HFOSC internal oscillator | SPRAM arbiter, RX/TX ethernet engines, W5500 SPI master |
### Rationale
**Why 64 MHz for `exi`?**
The EXI bus runs at 32 MHz. The SPI Mode 3 slave needs to detect CLK edges and
respond on the correct edge. Running the `exi` domain at 2× the bus rate (64 MHz)
gives two FPGA ticks per EXI CLK half-period. One tick for the setup phase
(MOSI→shift register, prepare MISO), one tick for the sample/drive phase. This
is the minimum oversampling ratio that cleanly implements Mode 3 without
combinatorial timing risk on the MISO output path.
**Why 48 MHz for `sync`?**
The iCE40UP5K's internal 48 MHz oscillator (SB_HFOSC) is available without
consuming the PLL. This leaves the one PLL free for the 64 MHz `exi` domain. The
W5500 SPI can run up to 80 MHz but we drive it at 24 MHz (48 MHz ÷ 2 via clock
enable), which is well within spec and requires no additional PLL output.
### PLL configuration (iCE40UP5K)
```
Input: 12 MHz crystal (iCEbreaker on-board)
DIVR: 0 (input divider: 12 MHz / (0+1) = 12 MHz)
DIVF: 63 (feedback mult: 12 MHz × (63+1) = 768 MHz VCO)
DIVQ: 3 (output divider: 768 MHz / 2^3 = 96 MHz)
... actually for 64 MHz:
DIVR: 0
DIVF: 15 (12 × 16 = 192 MHz VCO) -- VCO must be 5331066 MHz on UP5K
```
The iCE40UP5K VCO range is 5331066 MHz. To reach 64 MHz cleanly:
```
DIVR = 0 → F_pfd = 12 MHz
DIVF = 63 → F_vco = 12 × (63+1) = 768 MHz (within range)
DIVQ = 3 → F_out = 768 / 8 = 96 MHz (too fast)
Better: target 64 MHz
DIVF = 53 → F_vco = 12 × 54 = 648 MHz
DIVQ = 3 → F_out = 648 / 8 = 81 MHz (still off)
Correct combination:
DIVR = 0, DIVF = 42, DIVQ = 3
F_vco = 12 × 43 = 516 MHz (just below range minimum — not valid)
Use:
DIVR = 0, DIVF = 63, DIVQ = 3 → 96 MHz, then use clock enable for /1.5
-- or --
Accept 96 MHz exi domain (3× bus rate instead of 2×): more margin, same logic
-- or --
DIVR = 2, DIVF = 63, DIVQ = 2 → (12/3) × 64 / 4 = 64 MHz exactly
F_pfd = 4 MHz, F_vco = 4×64 = 256 MHz — below 533 MHz minimum, invalid
Recommended: use 96 MHz (DIVR=0, DIVF=63, DIVQ=3) for exi domain.
At 96 MHz there are 3 ticks per 32 MHz EXI half-period.
Adjust SPIMode3Slave edge detection accordingly (3-tick phases instead of 2).
```
**Implementation note:** Verify exact PLL parameters with `icepll` tool:
```bash
icepll -i 12 -o 64 # finds closest achievable output
icepll -i 12 -o 96 # alternative
```
The agent implementing this should run `icepll` and use whatever output it
recommends, then adjust the `SPIMode3Slave` tick counts accordingly.
### Reset strategy
Each domain has its own reset, deasserted synchronously using
`ResetSynchronizer` from `amaranth.lib.cdc`:
```python
# In platform create_missing_domain("exi"):
m.submodules.exi_rst = ResetSynchronizer(
arst = ResetSignal("sync"),
domain = "exi",
)
```
The `sync` domain reset comes from the iCEbreaker's on-chip power-on reset
(SB_GB driven by SB_HFOSC, which has built-in POR).
---
## 5. Clock Domain Crossing Strategy
All signals crossing between `exi` and `sync` domains must use one of the
following CDC primitives from `amaranth.lib.cdc`. Never pass a raw multi-bit
signal directly between domains — only one bit may change per clock crossing.
### CDC primitive selection guide
| Signal type | Primitive | Latency |
|---|---|---|
| Single bit, slow-changing (flags, status) | `FFSynchronizer` | 2 dest clocks |
| Single-cycle pulse / event | `PulseSynchronizer` | ~34 dest clocks |
| Multi-bit data stream (packet bytes) | `AsyncFIFO` | ~34 dest clocks |
| Reset deassertion | `ResetSynchronizer` | 2 dest clocks |
| Async external pin (CLK, MOSI, CS) | `FFSynchronizer` | 2 dest clocks |
### CDC inventory for this design
| Signal | From | To | Primitive | Notes |
|---|---|---|---|---|
| EXI CLK pin | async | exi | FFSynchronizer | stages=2, reset=1 (CLK idles high) |
| EXI MOSI pin | async | exi | FFSynchronizer | stages=2 |
| EXI CS pin | async | exi | FFSynchronizer | stages=2, reset=1 (CS idles high) |
| SPRAM read request (addr) | exi | sync | AsyncFIFO 16-bit wide, depth=4 | Prefetch pipeline |
| SPRAM read result (data) | sync | exi | AsyncFIFO 8-bit wide, depth=4 | Prefetch pipeline |
| TX packet bytes | exi | sync | AsyncFIFO 8-bit wide, depth=64 | GC→ethernet |
| TX packet start/len | exi | sync | AsyncFIFO 16-bit wide, depth=4 | Frame delimiter |
| RX packet bytes | sync | exi | AsyncFIFO 8-bit wide, depth=64 | ethernet→GC |
| RWP update (new value) | sync | exi | AsyncFIFO 8-bit wide, depth=4 | After frame committed |
| RRP update (new value) | exi | sync | AsyncFIFO 8-bit wide, depth=4 | After GC advances pointer |
| IR[RI] set (RX ready) | sync | exi | PulseSynchronizer | Triggers RI interrupt |
| IR[TI] set (TX done) | sync | exi | PulseSynchronizer | Triggers TI interrupt |
| NCRA reset pulse | exi | sync | PulseSynchronizer | Resets ethernet engine |
| exi_int_n output | exi | physical pin | Direct (output register) | Active-low to GC |
**Critical rule:** The register file lives entirely in the `exi` domain. The
`sync` domain never directly reads or writes EXI registers. All interaction
between the two domains goes through the AsyncFIFOs and PulseSynchronizers
listed above. This ensures the GC's register reads always respond within the
`exi` domain without waiting on CDC latency.
---
## 6. Module Hierarchy
```
BBATop (top-level, sets up clock domains)
├── SPIMode3Slave (exi domain — bit engine)
├── BBARegisterFile (exi domain — register decode + response)
│ ├── [AsyncFIFO: spram_req] (exi→sync: read address requests)
│ ├── [AsyncFIFO: spram_rsp] (sync→exi: read data responses)
│ ├── [AsyncFIFO: tx_bytes] (exi→sync: TX packet data)
│ ├── [AsyncFIFO: tx_ctrl] (exi→sync: TX frame length)
│ ├── [AsyncFIFO: rx_wptr] (sync→exi: RWP updates)
│ ├── [AsyncFIFO: rx_rptr] (exi→sync: RRP updates from GC)
│ ├── [PulseSynchronizer: rx_irq] (sync→exi)
│ ├── [PulseSynchronizer: tx_irq] (sync→exi)
│ └── [PulseSynchronizer: ncra_rst] (exi→sync)
├── SPRAMArbiter (sync domain — owns all SPRAM)
├── RXFrameAssembler (sync domain — ethernet→SPRAM)
├── TXFrameDrain (sync domain — SPRAM→ethernet)
├── W5500SPIMaster (sync domain — SPI master to W5500)
└── EEPROMModel (exi domain — 93C46 bit-bang model)
```
---
## 7. Module Specifications
### 7.1 SPIMode3Slave
**Domain:** `exi`
**File:** `exi_bba/spi_mode3_slave.py`
Implements a byte-oriented SPI Mode 3 slave. Handles CLK/MOSI/MISO/CS at the
bit level and presents a clean byte interface to `BBARegisterFile`.
**SPI Mode 3 timing recap:**
- CLK idles HIGH
- MOSI is set up by master before the FALLING edge
- Slave samples MOSI on the FALLING edge of CLK
- Slave drives MISO on the RISING edge of CLK (ready for master to sample on
next falling edge)
**Port list:**
| Port | Width | Dir | Domain | Description |
|---|---|---|---|---|
| `spi_clk` | 1 | in | async→exi | Raw SPI clock from GC, synchronized internally |
| `spi_mosi` | 1 | in | async→exi | Raw MOSI from GC, synchronized internally |
| `spi_miso` | 1 | out | exi | MISO output to GC |
| `spi_cs_n` | 1 | in | async→exi | Raw CS from GC (active low), synchronized internally |
| `rx_byte` | 8 | out | exi | Last complete received byte |
| `rx_valid` | 1 | out | exi | Pulses 1 cycle when `rx_byte` contains a new byte |
| `tx_byte` | 8 | in | exi | Byte to transmit; sampled when `tx_load` pulses |
| `tx_load` | 1 | out | exi | Requests next TX byte from upstream |
**Internal behaviour:**
1. Instantiate FFSynchronizer stages=2 on each of `spi_clk`, `spi_mosi`,
`spi_cs_n`. Reset values: `spi_clk`=1, `spi_cs_n`=1.
2. Register the synchronized signals one further cycle to form edge detectors:
`rising_clk = clk_s & ~clk_prev`, `falling_clk = ~clk_s & clk_prev`.
3. On CS falling edge: load `tx_byte` into internal shift register, pulse
`tx_load`, reset `bit_ctr` to 0.
4. On FALLING CLK edge (sample): shift `mosi_s` into `rx_shift` MSB-first,
increment `bit_ctr`. When `bit_ctr == 8`: register `rx_shift` into `rx_byte`,
pulse `rx_valid`, reset `bit_ctr` to 0, pulse `tx_load` to request next byte.
5. On RISING CLK edge (drive): shift `tx_shift` left by 1, drive MSB onto
`spi_miso`.
6. On CS rising edge: drive `spi_miso` high (idle), reset state.
**Note on `tx_load` timing:** `tx_load` pulses at two points — CS assertion
(loads first byte before any bits are clocked) and after each complete received
byte (loads the next byte). The upstream (`BBARegisterFile`) must register the
next TX byte within one `exi` clock of `tx_load` pulsing.
---
### 7.2 BBARegisterFile
**Domain:** `exi` (with AsyncFIFO interfaces to `sync`)
**File:** `exi_bba/bba_register_file.py`
Decodes EXI transactions (2-byte header + N data bytes), reads/writes the BBA
register space, and manages all CDC crossings to the `sync` domain.
#### EXI transaction decoder FSM
States: `HEADER0``HEADER1``DATA` → (back to `HEADER0`)
**Header format:**
```
Byte 0: [7] = write flag (1 = write, 0 = read)
[6:0] = addr[12:6] (upper 7 bits of 13-bit address)
Byte 1: [7:2] = addr[5:0] (lower 6 bits of 13-bit address)
[1:0] = xfer_len (0=1 byte, 1=2 bytes, 2=3 bytes, 3=4 bytes)
```
Full address = `{ byte0[6:0], byte1[7:2] }` = 13 bits → range 0x00000x1FFF.
**`HEADER0` state:** Wait for `rx_valid`. Latch `rx_byte` as `hdr0`.
**`HEADER1` state:** Wait for `rx_valid`. Decode address and flags. For read
transactions, immediately issue SPRAM prefetch request if address ≥ 0x100
(ring buffer region). Load `tx_byte` with the register value for addresses
< 0x100 (register file region). Transition to `DATA`.
**`DATA` state (write path):** For each `rx_valid`, write `rx_byte` to
`regs[addr + byte_ctr]` and handle side effects (see register side effects
table). Increment `byte_ctr`. When `byte_ctr == xfer_len`, go to `HEADER0`.
**`DATA` state (read path):** Drive `tx_byte` from prefetch result (addresses
≥ 0x100) or directly from `regs[]` (addresses < 0x100). On each `tx_load`,
advance the read pointer and issue next prefetch. When `byte_ctr == xfer_len`,
go to `HEADER0`.
**CS deassertion abort:** In any state, if `cs_n` rises, return to `HEADER0`.
#### Register file storage
Registers 0x000x1FF are implemented as an `Array` of 8-bit `Signal`s (512
registers). In synthesis this maps to distributed RAM on iCE40. Not SPRAM —
SPRAM is reserved for the packet ring buffer.
The register file is entirely in the `exi` domain. No CDC is needed to read
or write registers 0x000xFF.
#### Register side effects
| Register | Write side effect |
|---|---|
| NCRA (0x00) | If bit 0 (RESET) written: pulse `ncra_rst` PulseSynchronizer to `sync` domain. Self-clear bit 0 on next cycle. Reset TX/RX pointers in register file. |
| IR (0x09) | Write-1-to-clear: `IR <= IR & ~written_value` |
| RRP (0x180x19) | After GC writes new RRP value, push value into `rx_rptr` AsyncFIFO (exi→sync) so RX engine knows GC has consumed those pages |
| TWD (0x340x37) | Bytes written here are the TX frame length field (2 bytes little-endian). Latch for TX engine. |
| TXDATA (0x48) | Each byte written goes into `tx_bytes` AsyncFIFO (exi→sync). When `byte_ctr == xfer_len` on last write chunk, push frame length into `tx_ctrl` AsyncFIFO. |
#### Interrupt register update (from sync domain)
- `rx_irq` PulseSynchronizer arriving from sync: set `IR[1]` (RI bit)
- `tx_irq` PulseSynchronizer arriving from sync: set `IR[2]` (TI bit), clear
`NCRA[3:2]` (ST1:ST0 — transmit start bits)
#### Interrupt output
```
exi_int_n <= ~|(IR & IMR) # active-low: assert when any unmasked bit set
```
Register this one flip-flop in the `exi` domain. The physical pin is a direct
output — no CDC needed because the GC only reads the interrupt state via polling
IR over EXI (which is already in the `exi` domain) or via the interrupt line
which the GC CPU samples asynchronously.
#### NWAYS register
Always return `0x17` (link up, 100 Mbps, full duplex, autoneg complete).
The GC's BBA driver polls NWAYS after reset to confirm link status before
enabling RX. Hardcode this value — do not attempt to forward real link status
from the W5500.
```python
# NWAYS = 0x17:
# bit 4 (LS100) = 1: 100BASE-TX link up
# bit 2 (ANCLPT) = 1: autoneg complete
# bit 1 (100TXH) = 1: 100BASE-TX half (also set in practice)
# bit 0 (LS10) = 1: 10BASE-T (also reported)
```
---
### 7.3 SPRAMArbiter
**Domain:** `sync`
**File:** `exi_bba/spram_arbiter.py`
Arbitrates access to the iCE40UP5K's 128 KB SPRAM between two clients:
- **Client A (EXI read):** Issues read requests from the prefetch pipeline
(`spram_req` AsyncFIFO). Must service requests fast enough to keep the
prefetch pipeline full.
- **Client B (ETH write):** The `RXFrameAssembler` writes incoming ethernet
frames into the ring buffer area.
**Priority:** ETH write wins over EXI read when both request simultaneously.
This is safe because:
1. The GC only reads a ring buffer page after RWP has advanced past it (i.e.,
the ETH engine has finished writing that page).
2. Even if an EXI read is delayed by one SPRAM cycle, the prefetch pipeline
has enough depth (4 entries) to absorb the stall without the SPI slave
running out of data.
**SPRAM interface (iCE40UP5K SB_SPRAM256KA):**
```
WREN : write enable
CHIPSELECT : always 1
CLOCK : sync domain clock (48 MHz)
STANDBY : 0
SLEEP : 0
POWEROFF_N : 1
ADDRESS[13:0] : byte address divided by 2 (SPRAM is 16-bit wide)
DATAIN[15:0] : write data (use only [7:0] for byte writes, mask upper byte)
MASKWREN[3:0] : byte enable (0b0011 for lower byte, 0b1100 for upper byte)
DATAOUT[15:0] : read data
```
The SPRAM is 16-bit wide. Byte addressing is done via `MASKWREN`. For an 8-bit
write to address `A`: set `ADDRESS = A >> 1`, `MASKWREN = (A & 1) ? 0b1100 :
0b0011`, write data in the appropriate byte of `DATAIN`.
**Read latency:** SPRAM has 1-cycle synchronous read latency. The result of a
read issued at cycle N is valid at cycle N+1. The arbiter must account for this
when responding to the prefetch pipeline.
**Port list:**
| Port | Width | Dir | Notes |
|---|---|---|---|
| `exi_req_addr` | 16 | in | From spram_req AsyncFIFO (exi→sync) |
| `exi_req_valid` | 1 | in | FIFO r_rdy |
| `exi_req_ready` | 1 | out | FIFO r_en (pop when serviced) |
| `exi_rsp_data` | 8 | out | To spram_rsp AsyncFIFO (sync→exi) |
| `exi_rsp_valid` | 1 | out | FIFO w_en |
| `eth_wr_addr` | 16 | in | From RXFrameAssembler |
| `eth_wr_data` | 8 | in | Byte to write |
| `eth_wr_valid` | 1 | in | Write request |
| `eth_wr_ready` | 1 | out | Write accepted this cycle |
---
### 7.4 RXFrameAssembler
**Domain:** `sync`
**File:** `exi_bba/rx_frame_assembler.py`
Receives complete ethernet frames from `W5500SPIMaster` and writes them into
the SPRAM ring buffer in the correct MX98730EC format.
**Ring buffer layout (in SPRAM):**
```
SPRAM address 0x01000x0FFF (3840 bytes = 15 × 256-byte pages)
Page 0x01: first usable RX page
Page 0x0F: last usable RX page (RHBP default)
Pages wrap: after 0x0F, next is 0x01 (not 0x00, which is reserved)
```
Each page is 256 bytes. A received frame may span multiple pages.
**Frame descriptor (first 4 bytes of first page):**
```
Byte 0: LRPS value (Last Received Packet Status — set to 0x00 or actual status)
Byte 1: 0x00
Byte 2: frame_length[15:8] (big-endian, includes descriptor bytes)
Byte 3: frame_length[7:0]
Bytes 4+: raw ethernet frame data (DA, SA, EtherType, payload, FCS)
```
**Flow:**
1. Wait for `W5500SPIMaster` to signal frame available (`rx_sof` pulse).
2. Read frame bytes from W5500 frame FIFO.
3. Compute how many 256-byte pages are needed:
`pages_needed = ceil((frame_length + 4) / 256)`
4. Check that `(RWP + pages_needed) mod 16 != RRP` (ring not full). If full,
drop the frame and increment a drop counter.
5. Write 4-byte descriptor at SPRAM address `0x100 + (RWP * 0x100)`.
6. Write frame bytes sequentially, wrapping pages at 256-byte boundaries.
Page wrap: `next_page = (current_page % 15) + 1` (pages 115, skip 0).
7. After last byte written, update `RWP` in the `rx_wptr` AsyncFIFO (sync→exi).
The `exi` domain will update the RWP register from this FIFO.
8. Pulse `rx_irq` PulseSynchronizer to `exi` domain.
**MAC address filter:**
Before writing a frame, check destination MAC against PAR0PAR5 (broadcast
FF:FF:FF:FF:FF:FF always accepted). The GC will typically configure PAR0PAR5
via EXI after boot, so the `BBARegisterFile` must expose these to the
`RXFrameAssembler`. Pass them via a dedicated small AsyncFIFO or by reading
them from a shared register shadow (6 bytes, sync domain copy updated when
GC writes PAR0PAR5). Multicast hash table (MAR0MAR7) filtering is optional
for initial implementation — accept all frames (promiscuous mode) until the GC
configures the filter.
---
### 7.5 TXFrameDrain
**Domain:** `sync`
**File:** `exi_bba/tx_frame_drain.py`
Drains the TX byte FIFO (fed from the `exi` domain as the GC writes to TXDATA
register 0x48) and forwards complete frames to `W5500SPIMaster`.
**Flow:**
1. Wait for `tx_ctrl` AsyncFIFO to contain a frame length value. This is pushed
by `BBARegisterFile` when the GC has written the complete TX frame (i.e.,
NCRA ST1:ST0 transitions to 01 or 10).
2. Pop `frame_length` from `tx_ctrl`.
3. Pop exactly `frame_length` bytes from `tx_bytes` AsyncFIFO.
4. Forward bytes to `W5500SPIMaster` TX interface with SOF/EOF framing.
5. Wait for `W5500SPIMaster` to signal TX complete.
6. Pulse `tx_irq` PulseSynchronizer to `exi` domain.
**NCRA ST bits:** The GC writes NCRA with ST1:ST0 = 01 (start transmit from
buffer 1) or 10 (start transmit from buffer 2). The BBA hardware has two TX
buffers; this implementation uses a single TX FIFO and ignores the buffer
selection. When ST1:ST0 goes non-zero, treat it as a TX trigger regardless of
which bits are set. The `BBARegisterFile` should push the frame length into
`tx_ctrl` on this transition.
---
### 7.6 W5500SPIMaster
**Domain:** `sync`
**File:** `exi_bba/w5500_spi_master.py`
Implements the W5500 SPI master interface. The W5500 uses SPI Mode 0 (CPOL=0,
CPHA=0), opposite to the BBA EXI interface.
**W5500 SPI frame format:**
```
Byte 01: Address (16-bit, big-endian)
Byte 2: Control byte:
[7:3] = Block Select (BSB):
00000 = Common Register
00001 = Socket 0 Register
00010 = Socket 0 TX buffer
00011 = Socket 0 RX buffer
[2] = Read/Write (0=read, 1=write)
[1:0] = Operation Mode (00=variable, 01=fixed 1B, 10=fixed 2B, 11=fixed 4B)
Byte 3+: Data bytes
```
**W5500 configuration (to be performed once on NCRA reset):**
```
1. Write MR (Mode Register, 0x0000): 0x80 — software reset
2. Wait ~1 ms
3. Write SHAR (Source MAC, 0x00090x000E): copy from PAR0PAR5 register shadow
4. Write S0_MR (Socket 0 Mode, 0x4000): 0x04 — MACRAW mode (raw ethernet)
5. Write S0_CR (Socket 0 Command, 0x4001): 0x01 — OPEN
6. Write S0_IMR (Socket 0 Interrupt Mask, 0x4024): 0x04 | 0x01 — RECV | SEND_OK
```
**MACRAW mode:** In MACRAW mode the W5500 Socket 0 sends and receives raw
ethernet frames including the full MAC header and FCS. This is exactly what
the MX98730EC presents to the GC. No IP stack runs in the FPGA.
**RX polling:** The W5500 asserts its INT_N pin (active low) when a frame
arrives. Connect W5500 INT_N to an FPGA input pin and use it to trigger the
`RXFrameAssembler`. Alternatively poll `S0_IR` (Socket 0 Interrupt Register,
0x4002) periodically. The INT_N approach has lower latency and is preferred.
**SPI clock rate:** Drive W5500 SPI at 24 MHz (sync clock 48 MHz ÷ 2 using a
clock enable toggle). The W5500 supports up to 80 MHz so there is ample margin.
**Port list:**
| Port | Width | Dir | Notes |
|---|---|---|---|
| `spi_clk` | 1 | out | To W5500 CLK pin (SPI Mode 0, idles LOW) |
| `spi_mosi` | 1 | out | To W5500 MOSI |
| `spi_miso` | 1 | in | From W5500 MISO |
| `spi_cs_n` | 1 | out | To W5500 CS (active low) |
| `w5500_int_n` | 1 | in | W5500 interrupt (active low) |
| `tx_data` | 8 | in | Byte to transmit (from TXFrameDrain) |
| `tx_valid` | 1 | in | TX byte available |
| `tx_ready` | 1 | out | TX byte consumed |
| `tx_sof` | 1 | in | Start of frame marker |
| `tx_eof` | 1 | in | End of frame marker |
| `rx_data` | 8 | out | Received byte (to RXFrameAssembler) |
| `rx_valid` | 1 | out | RX byte available |
| `rx_ready` | 1 | in | RX byte consumed |
| `rx_sof` | 1 | out | Start of frame |
| `rx_eof` | 1 | out | End of frame |
---
### 7.7 EEPROMModel
**Domain:** `exi`
**File:** `exi_bba/eeprom_model.py`
Models the 93C46-compatible serial EEPROM that stores the BBA's MAC address.
The GC software bit-bangs the EEPROM interface through register 0x1C
(EEPROM Interface Register) of the BBA chip.
**Register 0x1C bit fields:**
```
[3] EECK — EEPROM clock
[2] EECS — EEPROM chip select
[1] EEDI — EEPROM data in (GC → EEPROM)
[0] EEDO — EEPROM data out (EEPROM → GC) [read-only]
```
The GC reads EEDO by reading register 0x1C bit 0.
**93C46 protocol summary:**
The 93C46 uses a 3-wire serial protocol (SK=clock, CS=select, DI=data in,
DO=data out). Commands:
- READ: start bit (1) + opcode (10) + 6-bit address → 16-bit data out
- WRITE: start bit (1) + opcode (01) + 6-bit address + 16-bit data
- EWEN (write enable): start bit (1) + opcode (00) + address (11xxxx)
Each 93C46 word is 16 bits. The MAC address occupies words 02 (6 bytes).
**Implementation approach:**
Maintain a small ROM of 64 × 16-bit words in the `exi` domain (as a Const
array, synthesises to LUTs). Pre-populate words 02 with the chosen MAC
address. Implement a small FSM that watches writes to register 0x1C for the
93C46 protocol, drives EEDO accordingly.
**Simpler alternative:** Many GC BBA drivers read the EEPROM once at boot and
then write the MAC to PAR0PAR5 themselves. Pre-populate PAR0PAR5 in the
register file reset state with a valid Nintendo OUI MAC (00:09:BF:xx:xx:xx).
Skip a full 93C46 implementation for the first version — if Swiss ignores the
EEPROM read result and uses a hardcoded or user-configurable MAC, this is
sufficient.
---
### 7.8 BBATop
**Domain:** both
**File:** `exi_bba/bba_top.py`
Top-level module. Instantiates all submodules, creates clock domains, connects
physical pins.
**Clock domain creation:**
```python
def elaborate(self, platform):
m = Module()
# exi domain: 96 MHz from PLL (3× 32 MHz EXI bus rate)
exi_domain = ClockDomain("exi")
m.domains += exi_domain
pll = platform.get_pll() # platform-specific PLL primitive
m.d.comb += exi_domain.clk.eq(pll.clkout)
m.submodules.exi_rst = ResetSynchronizer(
arst=ResetSignal("sync"), domain="exi"
)
# sync domain: 48 MHz from SB_HFOSC (platform default)
# Created automatically by iCEbreaker platform
# Instantiate submodules...
m.submodules.spi = spi = SPIMode3Slave()
m.submodules.regfile = regfile = BBARegisterFile()
m.submodules.arbiter = arbiter = SPRAMArbiter()
m.submodules.rx_asm = rx_asm = RXFrameAssembler()
m.submodules.tx_drn = tx_drn = TXFrameDrain()
m.submodules.w5500 = w5500 = W5500SPIMaster()
m.submodules.eeprom = eeprom = EEPROMModel()
# ... wiring ...
```
**Physical pin connections (iCEbreaker):**
The SP1 EXI signals connect via the interposer PCB to iCEbreaker PMOD pins.
The W5500 Pmod connects to the second PMOD connector. Exact pin mapping depends
on the interposer PCB layout — define these in a platform resource file.
```python
# Example resource definitions (add to iCEbreaker platform file):
Resource("exi", 0,
Subsignal("clk", Pins("1", conn=("pmod", 0), dir="i")),
Subsignal("mosi", Pins("2", conn=("pmod", 0), dir="i")),
Subsignal("miso", Pins("3", conn=("pmod", 0), dir="o")),
Subsignal("cs_n", Pins("4", conn=("pmod", 0), dir="i")),
Subsignal("int_n",Pins("7", conn=("pmod", 0), dir="o")),
Attrs(IO_STANDARD="SB_LVCMOS"),
),
Resource("w5500", 0,
Subsignal("clk", Pins("1", conn=("pmod", 1), dir="o")),
Subsignal("mosi", Pins("2", conn=("pmod", 1), dir="o")),
Subsignal("miso", Pins("3", conn=("pmod", 1), dir="i")),
Subsignal("cs_n", Pins("4", conn=("pmod", 1), dir="o")),
Subsignal("int_n",Pins("7", conn=("pmod", 1), dir="i")),
Subsignal("rst_n",Pins("8", conn=("pmod", 1), dir="o")),
Attrs(IO_STANDARD="SB_LVCMOS"),
),
```
---
## 8. Memory Map
The BBA register address space is 13 bits wide (0x00000x1FFF).
| Address range | Region | Implemented in | Notes |
|---|---|---|---|
| 0x00000x0033 | MAC control registers | Register file (exi) | NCRA, NCRB, IMR, IR, pointers |
| 0x00340x0037 | TWD — TX write data | Register file (exi) | TX frame length (2 bytes) |
| 0x00380x0039 | Reserved | — | Ignore |
| 0x003A | HIPR — Host Interface Protocol | Register file (exi) | Read: 0x01 (BBA present) |
| 0x003B | NAFR — Network Address Filter | Register file (exi) | |
| 0x003C | NWBA — Network Write Buffer Addr | Register file (exi) | |
| 0x003D0x0047 | Reserved | — | Ignore |
| 0x0048 | TXDATA — Bulk TX data port | Register file → tx_bytes FIFO | Write path to ethernet |
| 0x00490x00FF | Reserved | — | Ignore |
| 0x01000x0FFF | RX ring buffer | SPRAM (sync) | Read path from ethernet |
---
## 9. EXI Transaction Protocol
All BBA register accesses follow a strict two-phase (header + data) format.
### Header encoding
```
Byte 0: [7] write flag 1=write, 0=read
[6:0] addr[12:6] upper 7 bits of address
Byte 1: [7:2] addr[5:0] lower 6 bits of address
[1:0] xfer_len-1 0=1 byte, 1=2 bytes, 2=3 bytes, 3=4 bytes
```
CS is asserted (low) before byte 0 and remains low through the entire
transaction including all data bytes. CS deasserts (high) after the last
data byte.
### Read transaction timing
```
CS ─┐ ┌─
└────────────────────────────────────┘
CLK ┌┐┌┐┌┐┌┐┌┐┌┐┌┐┌┐ ┌┐┌┐┌┐┌┐┌┐┌┐┌┐┌┐ ┌┐┌┐...
header byte 0 header byte 1 data byte 0...
MOSI [addr+flags] [addr+len] [don't care]
MISO [don't care] [don't care] [register data]
```
The register file must have data ready on MISO from the **very first clock
edge of the data phase**. For register-file-backed reads (address < 0x100),
the data is available immediately after header decode. For SPRAM-backed reads
(address ≥ 0x100), the prefetch pipeline issues the SPRAM read request during
the header phase so data is ready in time.
### Write transaction timing
Identical header, then MOSI carries the write data. The FPGA samples MOSI on
each falling CLK edge during the data phase and writes to the register.
### ID query
On power-on the GC queries the device ID. The query is two 0x00 bytes written,
then four bytes read. The BBA returns `0x04020200`. Implement this as a special
case: when address decodes to 0x0000 on a read with no prior NCRA reset, return
the hardcoded ID.
Alternatively, read the Dolphin source for the exact byte sequence GC software
uses to detect the BBA and replicate it faithfully.
---
## 10. BBA Register Reference
Key registers the GC driver accesses. Full register map in YAGCD §10.8.
| Addr | Name | R/W | Reset | Description |
|---|---|---|---|---|
| 0x00 | NCRA | R/W | 0x00 | Network Control A. [0]=RESET (self-clear), [2:1]=ST (TX start), [3]=SR (start receive), [6]=INTMODE (0=int active low) |
| 0x01 | NCRB | R/W | 0x00 | Network Control B |
| 0x04 | LTPS | R | 0x00 | Last TX packet status |
| 0x05 | LRPS | R | 0x00 | Last RX packet status |
| 0x08 | IMR | R/W | 0x00 | Interrupt mask. Bits match IR. Interrupt fires when IR & IMR != 0 |
| 0x09 | IR | R/W | 0x00 | Interrupt register. Write 1 to clear. [7]=RBFI, [4]=TEI, [2]=TI, [1]=RI |
| 0x0A0x0B | BP | R/W | — | Boundary page pointer |
| 0x0C0x0D | TLBP | R/W | — | TX low boundary page |
| 0x0E0x0F | TWP | R/W | 0x00 | TX write page pointer |
| 0x120x13 | TRP | R/W | 0x00 | TX read page pointer |
| 0x160x17 | RWP | R | updates | RX write page pointer. Advances after each frame written |
| 0x180x19 | RRP | R/W | 0x01 | RX read page pointer. GC writes to advance after consuming frames |
| 0x1A0x1B | RHBP | R/W | 0x0F | RX high boundary page (last valid page). Default 0x0F |
| 0x1C | EEPROM | R/W | — | EEPROM bit-bang interface [3:0] = EECK, EECS, EEDI, EEDO |
| 0x200x25 | PAR05 | R/W | MAC | MAC address bytes 05. GC writes after reading EEPROM |
| 0x260x2D | MAR07 | R/W | 0xFF | Multicast hash table. 0xFF = accept all |
| 0x2E | ANALOG | R/W | — | PHY analog control. GC writes 0xD6 to enable PHY |
| 0x30 | NWAYC | R/W | — | Autoneg config. GC sets ANE + LTE bits |
| 0x31 | NWAYS | R | 0x17 | Autoneg status. Hardcode 0x17 = 100M full duplex link up |
| 0x32 | GCA | R/W | — | GMAC config A. GC sets AUTOPUB bit |
| 0x33 | GCB | R/W | — | GMAC config B |
| 0x340x37 | TWD | W | — | TX write data (frame length, 2 bytes LE, then ignored) |
| 0x3A | HIPR | R | 0x01 | Host interface protocol version. Return 0x01 |
| 0x3B | NAFR | R/W | — | Network address filter |
| 0x3C | NWBA | R/W | — | Network write buffer address |
| 0x48 | TXDATA | W | — | Bulk TX data port. GC streams frame bytes here |
| 0x100+ | RX buf | R | — | RX ring buffer. GC reads frames from here |
---
## 11. Initialisation Sequence
This is the exact sequence Swiss/GC software executes. The register file must
respond correctly to each step.
```
1. Assert CS, write 0x0000 (2 bytes), read 4 bytes
→ Must return: 0x04 0x02 0x02 0x00 (device ID)
2. Write 0x01 to NCRA (0x00) — software reset
→ RESET bit self-clears next cycle
→ Pulse ncra_rst to sync domain (resets W5500, clears SPRAM pointers)
3. Poll NCRA bit 0 until clear — wait for reset complete
→ Return 0x00 from NCRA reads after self-clear
4. Write 6 bytes to PAR0PAR5 (0x200x25)
→ Latch MAC address; forward to sync domain MAC filter shadow
5. Write 8 bytes to MAR0MAR7 (0x260x2D)
→ Typically all 0xFF (promiscuous mode)
6. Write 0xD6 to ANALOG (0x2E) — enable PHY
→ Store in register file; no hardware effect in FPGA
7. Write NWAYC (0x30): set bits for ANE + LTE
→ Store; no hardware effect
8. Write IMR (0x08): typically 0x86 (RBFI | TI | RI)
→ Enables interrupts; INT line will now assert when frames arrive
9. Write GCA (0x32): set AUTOPUB bit
→ Store; AUTOPUB means RWP auto-updates — we always do this anyway
10. Write NCRA (0x00): set SR bit (0x08) — start receive
→ Enable RX path; the RXFrameAssembler should begin accepting frames
11. Poll NWAYS (0x31) until link up
→ Return hardcoded 0x17 immediately
```
---
## 12. RX Data Path — Detailed Flow
```
W5500 receives frame on wire
W5500SPIMaster detects S0_IR[RECV] (via INT_N pin)
Reads frame length from S0_RX_RSR (Socket 0 RX Received Size, 0x4026)
Reads frame bytes from Socket 0 RX buffer (BSB=0b00011)
Pulses rx_sof, streams rx_data bytes, pulses rx_eof
▼ (sync domain)
RXFrameAssembler
- Checks destination MAC vs PAR shadow
- Checks NCRA SR bit is set (RX enabled)
- Computes pages_needed
- Checks ring buffer not full (RWP+pages != RRP)
- Writes descriptor + frame data into SPRAM via SPRAMArbiter
- Advances RWP (local register in sync domain)
- Pushes new RWP value into rx_wptr AsyncFIFO (sync→exi)
- Pulses rx_irq PulseSynchronizer (sync→exi)
▼ AsyncFIFO / PulseSynchronizer crossing
│ (exi domain)
BBARegisterFile
- Pops new RWP from rx_wptr FIFO, updates RWP register
- rx_irq pulse arrives: sets IR[1] (RI bit)
- IR & IMR now non-zero: asserts exi_int_n (INT low to GC)
▼ (GC CPU, driven by interrupt or polling)
GC reads IR register: sees RI=1
GC reads RWP (0x16): gets updated pointer
GC reads frame from 0x100+RRP (bulk read, up to 1500+ bytes)
→ BBARegisterFile issues SPRAM read requests via spram_req FIFO (exi→sync)
→ SPRAMArbiter services reads from SPRAM
→ Results flow back via spram_rsp FIFO (sync→exi)
→ Prefetch pipeline keeps data ready for SPI bit engine
GC writes new RRP (0x18) to advance past consumed pages
→ BBARegisterFile pushes RRP update into rx_rptr FIFO (exi→sync)
→ RXFrameAssembler updates its local RRP shadow
GC writes IR register with RI=1 (write-1-to-clear)
→ IR[1] clears, INT line deasserts
```
---
## 13. TX Data Path — Detailed Flow
```
GC CPU constructs ethernet frame in GC RAM
▼ (GC CPU → EXI)
GC writes 2-byte length to TWD register (0x34)
GC writes frame bytes to TXDATA register (0x48) in chunks
→ BBARegisterFile: each written byte goes into tx_bytes AsyncFIFO (exi→sync)
GC writes NCRA with ST1:ST0 = 01 (transmit trigger)
→ BBARegisterFile pushes frame_length into tx_ctrl AsyncFIFO (exi→sync)
▼ AsyncFIFO crossing
│ (sync domain)
TXFrameDrain
- Pops frame_length from tx_ctrl
- Pops frame_length bytes from tx_bytes
- Forwards to W5500SPIMaster with SOF/EOF
▼ (sync domain)
W5500SPIMaster
- Writes frame length to S0_TX_FSR (TX Free Size Register, 0x4020)
- Writes frame bytes into Socket 0 TX buffer (BSB=0b00010)
- Writes SEND command to S0_CR (0x4001 = 0x20)
- Polls S0_IR until SEND_OK bit set
- Clears S0_IR[SEND_OK]
- Pulses tx_irq PulseSynchronizer (sync→exi)
▼ PulseSynchronizer crossing
│ (exi domain)
BBARegisterFile
- tx_irq arrives: sets IR[2] (TI bit), clears NCRA ST1:ST0
- If IMR[2] set: INT asserts to GC
▼ (GC CPU)
GC reads IR, sees TI=1
GC writes IR with TI=1 to clear
```
---
## 14. SPRAM Layout
The iCE40UP5K has 4 × 32 KB SPRAM banks (128 KB total). Map them as:
| SPRAM region | Size | Usage |
|---|---|---|
| 0x00000x00FF | 256 B | Reserved (address 0x00 page not used by ring buffer) |
| 0x01000x0FFF | 3840 B | RX ring buffer (15 × 256-byte pages, pages 0x010x0F) |
| 0x10000x17FF | 2048 B | TX frame staging buffer |
| 0x18000x1FFF | 2048 B | Reserved / future use |
The ring buffer uses pages 0x010x0F (15 pages × 256 bytes = 3840 bytes). This
matches the MX98730EC default `RHBP` (RX High Boundary Page) value of 0x0F and
`RRP` reset value of 0x01.
**SPRAM addressing:** iCE40UP5K SB_SPRAM256KA instances are 64K × 16-bit
(128 KB total across 4 instances). To address the ring buffer region as bytes:
- Byte address 0x0100 maps to SPRAM word address 0x0080 (byte 0x0100 >> 1)
- The arbiter converts byte addresses to word addresses and uses MASKWREN for
byte selection
---
## 15. Critical Timing Constraints
### Must-meet timing in `exi` domain (96 MHz → 10.4 ns period)
| Path | Budget | Notes |
|---|---|---|
| FFSynchronizer output → edge detect flip-flop | 1 cycle = 10.4 ns | Trivially met — just a register |
| Edge detect → shift register update | 1 cycle | Register-to-register, no logic |
| `rx_valid` → header decode → `spram_req` FIFO write | 2 cycles | Address decode is combinatorial MUX; must close at 96 MHz |
| `tx_load``tx_byte` driven from register file | 1 cycle | `regs[addr]` array lookup — critical path; keep address decode combinatorial depth ≤ 4 LUTs |
| `tx_load``tx_byte` driven from prefetch buffer | 1 cycle | Just a register read — trivial |
### Must-meet timing in `sync` domain (48 MHz → 20.8 ns period)
| Path | Budget | Notes |
|---|---|---|
| SPRAM read request → SPRAM address valid | 1 cycle | AsyncFIFO read + mux — easy |
| SPRAM DATAOUT → result FIFO write | 1 cycle | Register-to-FIFO — easy |
| W5500 SPI bit engine | N/A | Clock-enable based at 24 MHz effective; no hard timing |
### Cross-domain latency budget for SPRAM prefetch
```
EXI header phase duration: 16 exi clocks at 96 MHz = 167 ns
SPRAM prefetch round trip:
exi → spram_req FIFO write: 1 exi tick = 10 ns
FIFO cross-domain: 2 sync ticks = 42 ns
SPRAM read (1 cycle latency): 1 sync tick = 21 ns
Result → spram_rsp FIFO write: 1 sync tick = 21 ns
FIFO cross-domain: 2 exi ticks = 21 ns
Result available in prefetch buffer: = 21 ns
Total: ~136 ns
136 ns < 167 ns header window → prefetch completes before first data bit needed ✓
```
This is the tightest timing consideration in the design. The prefetch must be
issued during HEADER1 (not after) to make the deadline.
---
## 16. SPRAM Read Prefetch Pipeline
The prefetch pipeline ensures MISO data is always ready before the SPI slave
needs it for the data phase.
### State machine (in BBARegisterFile, exi domain)
```
State HEADER1 (decoding second header byte):
If is_read AND address >= 0x100:
push address into spram_req AsyncFIFO ← issued NOW, during header decode
set prefetch_pending = True
State DATA (read phase):
On each tx_load pulse:
If prefetch_pending AND spram_rsp FIFO has data:
pop byte from spram_rsp FIFO
load into tx_byte
push (address + byte_ctr + 1) into spram_req for NEXT byte ← pipelining
Elif address < 0x100:
tx_byte = regs[address + byte_ctr] ← direct register file read
```
### Pipeline depth
The `spram_req` and `spram_rsp` FIFOs each have depth 4. This allows up to 4
read requests to be in-flight simultaneously, which absorbs any SPRAM arbiter
stalls (ETH write winning the arbitration) without stalling the SPI data phase.
### SPRAM arbiter stall handling
If the SPRAM arbiter defers an EXI read by 1 cycle (due to ETH write priority),
the `spram_rsp` FIFO will be momentarily empty when `tx_load` arrives. The
BBARegisterFile must stall the SPI slave in this case.
However: the SPI slave cannot be stalled mid-bit. The stall mechanism must
work at byte boundaries only — i.e., after a complete byte has been transmitted,
hold MISO at 0 (or 1) and do not toggle until the next byte is ready. Since the
GC is the SPI master and controls CLK, it will simply clock in garbage on the
retry byte.
**Practical note:** At 48 MHz sync with 24 MHz effective W5500 access rate, the
ETH write path can only consume the SPRAM arbiter for ~1 sync cycle per byte
written. The EXI read path gets the remaining cycles. With 4-deep FIFOs the
pipeline should almost never stall in practice. Monitor the stall condition in
simulation.
---
## 17. Interrupt Handling
The `exi_int_n` output (pin 3 of SP1) is active-low. Assert it (drive low)
when `IR & IMR != 0`.
```python
# In BBARegisterFile, exi domain:
ir_masked = Signal(8)
m.d.comb += ir_masked.eq(regs[BBARegs.IR] & regs[BBARegs.IMR])
m.d.exi += exi_int_n.eq(~ir_masked.any())
```
Register the output — do not drive `exi_int_n` combinatorially. A registered
output prevents glitches from propagating onto the GC board.
**Interrupt sources and IR bit assignments:**
| IR bit | Name | Set by | Cleared by |
|---|---|---|---|
| 7 | RBFI | RXFrameAssembler when ring full | GC write-1-to-clear |
| 4 | TEI | TXFrameDrain on TX error | GC write-1-to-clear |
| 2 | TI | tx_irq pulse from sync | GC write-1-to-clear |
| 1 | RI | rx_irq pulse from sync | GC write-1-to-clear |
The GC typically masks in IMR: 0x86 = 0b10000110 (RBFI | TI | RI).
---
## 18. EEPROM / MAC Address
The GC software reads the MAC address from the 93C46 EEPROM during
initialisation (bit-banging through register 0x1C). It then writes the MAC
to PAR0PAR5.
**Recommended approach for initial implementation:**
Skip full 93C46 emulation. Pre-populate `regs[0x1C]` with a pattern that makes
the EEPROM read return a valid MAC. Use Nintendo's OUI `00:09:BF` for the first
3 bytes, with locally administered bits for the last 3:
```
MAC: 00:09:BF:00:00:01
```
Verify against Swiss source whether it validates the MAC read from EEPROM or
accepts whatever PAR0PAR5 contains. If it re-reads EEPROM after writing PAR,
a full 93C46 model is required. If it only uses PAR0PAR5, pre-populating the
register file is sufficient.
**MAC address propagation:**
When the GC writes PAR0PAR5, forward the new MAC to the W5500 SHAR register
via the `sync` domain. Use a 6-byte AsyncFIFO or a dedicated MAC update pulse.
The W5500 uses SHAR as its source MAC for all transmitted frames.
---
## 19. iCE40UP5K Resource Budget
| Resource | Available | Estimated use | Margin |
|---|---|---|---|
| Logic cells (4-LUT + FF) | 5280 | ~1800 | 66% free |
| EBR (4 Kbit blocks) | 30 (120 Kbit) | 4 (FIFOs) | 26 free |
| SPRAM (32 KB banks) | 4 (128 KB) | 1 bank for ring buffer | 3 free |
| PLL | 1 | 1 (for exi domain) | 0 free |
| SB_HFOSC | 1 | 1 (sync domain) | 0 free |
| I/O pins | 39 usable | ~14 (EXI:5 + W5500:6 + misc:3) | 25 free |
**Logic cell breakdown:**
| Module | Estimated cells |
|---|---|
| SPIMode3Slave | 90 |
| BBARegisterFile FSM + decode | 250 |
| Register file (512 × 8b) | ~200 (distributed RAM) |
| AsyncFIFO × 8 | 400 |
| PulseSynchronizer × 4 | 40 |
| FFSynchronizer × 5 | 30 |
| SPRAMArbiter | 80 |
| RXFrameAssembler | 200 |
| TXFrameDrain | 150 |
| W5500SPIMaster | 200 |
| EEPROMModel | 100 |
| Misc glue | 60 |
| **Total** | **~1800** |
iCE40UP5K fmax with nextpnr: typically 6080 MHz for logic of this complexity.
The `exi` domain at 96 MHz is the tightest. If nextpnr fails to close timing:
1. First option: reduce to 64 MHz `exi` domain (icepll alternative).
2. Second option: reduce EXI bus speed in Swiss settings to 16 MHz (clock index
4 instead of 5), halving the FPGA timing requirement.
3. Third option: add pipeline registers on the critical address decode path.
---
## 20. PCB / Connector Notes
### Interposer PCB
A simple pass-through interposer PCB connects the GC SP1 slot to the iCEbreaker
via a ribbon cable or header.
**Required PCB spec:**
- Thickness: **1.2 mm** (not standard 1.6 mm — critical for fit)
- Copper finish: **ENIG (gold)** — prevents oxidation on edge contacts
- Board material: FR4 standard
**Footprint source:** Copy the edge connector footprint from
`github.com/silverstee1/SP1ETH` KiCad files. Do not design from scratch.
The staggered dual-row geometry requires exact pad positions that have been
physically verified. Cross-reference with the ETH2SP1 LaserBear open files.
**Additional interposer components:**
- 10 kΩ resistor: EXTIN (pin 1) to 3.3V (pin 7) — device detect
- 100 µF capacitor: 3.3V to GND — bulk decoupling near connector
- 100 nF capacitor × 2: additional HF decoupling
- ESD protection diode array: on CLK, MOSI, MISO, CS lines (optional but
recommended — the GC motherboard is difficult to repair if damaged)
**Do not connect pin 5 (12V) to anything on the FPGA side.**
### iCEbreaker connection
The interposer PCB exposes EXI signals on a 2.54 mm pitch 8-pin header.
Connect to iCEbreaker PMOD1 connector using a short ribbon cable. Keep the
cable as short as possible (< 10 cm) to minimize signal integrity issues at
32 MHz.
---
## 21. Known Hardware Quirks
### EXI DMA bug
The GC's EXI DMA engine has a bug where data on the MISO line during a DMA
write is clocked back out with a 1-bit shift. This only affects GC software
doing DMA writes (rare). Swiss uses IMM (immediate) mode transfers. No FPGA
workaround needed.
### SPI Mode 3 vs Mode 0
Every other EXI device (memory cards, RTC, IPL) uses SPI Mode 0. The BBA
is the only device using Mode 3. Do not share the SPI slave implementation
with other EXI device implementations without parameterising CPOL/CPHA.
### MISO tristate
On real hardware, MISO (DO) is tristated when CS is deasserted. Other EXI
devices on the same bus would otherwise conflict. On this FPGA implementation,
drive MISO high (not tristated) when CS is deasserted. The iCE40UP5K does
not easily support pin tristate from user logic — drive high is safe because
the BBA occupies a dedicated CS line (SP1 device 2) separate from memory cards
and the RTC.
### GC hardware revisions
- DOL-001 (original): SP1 present, BBA compatible
- DOL-001 Rev B: SP1 physically absent on motherboard but case hole present
- DOL-101 (later): SP1 present again (but Serial Port 2 absent)
- Panasonic Q: SP1 present
Swiss supports all revisions with SP1 via the EXI hypervisor driver (required
from Swiss build 1788 onwards for BBA emulation features).
### EXI clock index
The real BBA uses clock index 5 (32 MHz). Swiss allows configuring a lower
clock index for compatibility. If 96 MHz fmax is not achievable, instruct users
to configure Swiss to use clock index 4 (16 MHz EXI), which requires only
32 MHz `exi` domain and is trivially achievable.
---
## 22. File Structure
```
gc_bba_fpga/
├── exi_bba/
│ ├── __init__.py
│ ├── spi_mode3_slave.py # SPIMode3Slave
│ ├── bba_register_file.py # BBARegisterFile + register constants
│ ├── spram_arbiter.py # SPRAMArbiter
│ ├── rx_frame_assembler.py # RXFrameAssembler
│ ├── tx_frame_drain.py # TXFrameDrain
│ ├── w5500_spi_master.py # W5500SPIMaster
│ ├── eeprom_model.py # EEPROMModel (93C46)
│ └── bba_top.py # BBATop + clock domain setup
├── sim/
│ ├── sim_spi_slave.py # SPIMode3Slave unit test
│ ├── sim_register_file.py # BBARegisterFile unit test
│ ├── sim_bba_init.py # Full init sequence simulation
│ ├── sim_rx_path.py # RX data path end-to-end test
│ ├── sim_tx_path.py # TX data path end-to-end test
│ ├── gc_master_model.py # GC CPU SPI master simulation model
│ ├── w5500_slave_model.py # W5500 SPI slave simulation model
│ └── ethernet_frame_gen.py # Test frame generator
├── platform/
│ ├── icebreaker_bba.py # iCEbreaker platform with BBA resources
│ └── interposer_pinmap.py # SP1 ↔ PMOD pin mapping
├── pcb/
│ ├── interposer/ # KiCad project for interposer PCB
│ └── README.md # PCB ordering instructions (1.2mm, ENIG)
├── constraints/
│ └── timing.py # nextpnr timing constraints (if needed)
├── tests/
│ └── test_bba.py # pytest suite
├── build.py # Amaranth build script
└── README.md
```
---
## 23. Simulation Strategy
Each module should have a standalone simulation before integration. All
simulations use Amaranth's `Simulator` with two clock domains:
`sim.add_clock(1/96e6, domain="exi")` and `sim.add_clock(1/48e6, domain="sync")`.
### Unit tests
**SPIMode3Slave:** Drive CLK/MOSI/CS manually from a process in the `exi`
domain. Verify `rx_byte`/`rx_valid` match sent data. Verify `spi_miso`
matches pre-loaded `tx_byte`. Test CS abort mid-byte.
**BBARegisterFile:** Use a `GCMasterModel` (SPI Mode 3 master process) to
perform read/write transactions. Verify register writes are stored. Verify
register reads return correct values. Verify IR bit setting and clearing.
Verify NWAYS returns 0x17. Verify ID query returns 0x04020200.
**SPRAMArbiter:** Issue concurrent EXI reads and ETH writes. Verify ETH writes
win arbitration. Verify EXI reads complete within 3 sync cycles. Verify no
data corruption.
**RXFrameAssembler:** Feed a known ethernet frame byte-by-byte. Verify SPRAM
contents match expected descriptor + frame layout. Verify RWP advances by
correct page count. Verify rx_irq fires.
**TXFrameDrain + W5500SPIMaster:** Issue TX frame from `tx_bytes` FIFO. Use
`W5500SlaveModel` process to simulate W5500 responses. Verify frame bytes
arrive at W5500 correctly. Verify tx_irq fires after SEND_OK.
### Integration test
**sim_bba_init.py:** Full GC init sequence (all 11 steps from Section 11).
`GCMasterModel` performs every transaction. Verify no stalls, correct responses.
**sim_rx_path.py:** `W5500SlaveModel` delivers a 64-byte test frame.
`GCMasterModel` polls IR, reads RWP, bulk-reads the frame, advances RRP.
Verify GC receives identical bytes to what W5500 sent.
**sim_tx_path.py:** `GCMasterModel` writes a 64-byte frame through TXDATA.
`W5500SlaveModel` captures it. Verify W5500 receives identical bytes.
---
## 24. Open Issues and Extension Points
### Must resolve before first synthesis
- [ ] Exact PLL parameters for iCE40UP5K: run `icepll -i 12 -o 96` and
confirm the output is achievable (VCO in 5331066 MHz range).
- [ ] SP1 connector footprint: clone SP1ETH repo, extract pad positions, verify
stagger geometry and pitch before PCB layout.
- [ ] W5500 Pmod module pin mapping: confirm which Pmod pins INT_N and RST_N
appear on (varies by module vendor).
- [ ] Swiss version requirement: confirm Swiss build ≥ 1788 for BBA hypervisor
support. Earlier builds use a different driver that may have different
register access patterns.
### Known limitations
- Single TX buffer (MX98730EC has two). ST1:ST0 = 01 and 10 are treated
identically. No known GC title relies on dual TX buffering.
- No DMA mode support. IMM mode only. Matches real-world Swiss usage.
- No Serial Port 2 support (different connector, different project scope).
- 93C46 EEPROM emulation is simplified (hardcoded MAC). A full bit-bang
model can be added later if Swiss requires it.
- RX ring buffer is 15 pages (3840 bytes). The real BBA has 4KB. Frames
larger than ~3800 bytes (jumbo frames) will be dropped. Standard 1500-byte
MTU frames fit in at most 7 pages — no practical issue.
### Extension points
- **Larger ring buffer:** Use additional SPRAM banks for more RX buffering.
- **Multiple sockets:** W5500 supports 8 sockets; only socket 0 in MACRAW
mode is used here.
- **Link status passthrough:** Read W5500 PHYCFGR register and forward real
link status to NWAYS instead of hardcoding 0x17.
- **Statistics counters:** LTPS/LRPS (last packet status) are currently 0x00.
A more complete implementation would fill these from W5500 socket status.
- **Serial Port 2 support:** Different physical connector and EXI channel but
same FPGA logic; would require a second interposer PCB.