1444 lines
57 KiB
Markdown
1444 lines
57 KiB
Markdown
# GameCube BBA FPGA Replacement — Design Document
|
||
|
||
**Target hardware:** iCEbreaker (Lattice iCE40UP5K)
|
||
**Target language:** Amaranth HDL (Python)
|
||
**Toolchain:** Yosys + nextpnr-ice40 + IceStorm
|
||
**Purpose:** Replace the Nintendo GameCube Broadband Adapter (DOL-015) with an
|
||
FPGA-based implementation, exposing a W5500 100BASE-TX ethernet chip to the GC
|
||
over the EXI (Expansion Interface) serial bus, enabling game ISO streaming via
|
||
Swiss homebrew.
|
||
|
||
---
|
||
|
||
## Table of Contents
|
||
|
||
1. [System Overview](#1-system-overview)
|
||
2. [Protocol References](#2-protocol-references)
|
||
3. [Physical Interface — SP1 Edge Connector](#3-physical-interface--sp1-edge-connector)
|
||
4. [Clock Domains](#4-clock-domains)
|
||
5. [Clock Domain Crossing Strategy](#5-clock-domain-crossing-strategy)
|
||
6. [Module Hierarchy](#6-module-hierarchy)
|
||
7. [Module Specifications](#7-module-specifications)
|
||
- 7.1 [SPIMode3Slave](#71-spimode3slave)
|
||
- 7.2 [BBARegisterFile](#72-bbaregisterfile)
|
||
- 7.3 [SPRAMArbiter](#73-spramarbiter)
|
||
- 7.4 [RXFrameAssembler](#74-rxframeassembler)
|
||
- 7.5 [TXFrameDrain](#75-txframedrain)
|
||
- 7.6 [W5500SPIMaster](#76-w5500spimaster)
|
||
- 7.7 [EEPROMModel](#77-eeprommodel)
|
||
- 7.8 [BBATop](#78-bbatop)
|
||
8. [Memory Map](#8-memory-map)
|
||
9. [EXI Transaction Protocol](#9-exi-transaction-protocol)
|
||
10. [BBA Register Reference](#10-bba-register-reference)
|
||
11. [Initialisation Sequence](#11-initialisation-sequence)
|
||
12. [RX Data Path — Detailed Flow](#12-rx-data-path--detailed-flow)
|
||
13. [TX Data Path — Detailed Flow](#13-tx-data-path--detailed-flow)
|
||
14. [SPRAM Layout](#14-spram-layout)
|
||
15. [Critical Timing Constraints](#15-critical-timing-constraints)
|
||
16. [SPRAM Read Prefetch Pipeline](#16-spram-read-prefetch-pipeline)
|
||
17. [Interrupt Handling](#17-interrupt-handling)
|
||
18. [EEPROM / MAC Address](#18-eeprom--mac-address)
|
||
19. [iCE40UP5K Resource Budget](#19-ice40up5k-resource-budget)
|
||
20. [PCB / Connector Notes](#20-pcb--connector-notes)
|
||
21. [Known Hardware Quirks](#21-known-hardware-quirks)
|
||
22. [File Structure](#22-file-structure)
|
||
23. [Simulation Strategy](#23-simulation-strategy)
|
||
24. [Open Issues and Extension Points](#24-open-issues-and-extension-points)
|
||
|
||
---
|
||
|
||
## 1. System Overview
|
||
|
||
The GameCube Broadband Adapter (BBA) is a hardware peripheral that plugs into
|
||
Serial Port 1 (SP1) on the underside of the GameCube. It presents a network
|
||
interface to the GC CPU using a Macronix MX98730EC custom IC. GC software
|
||
(primarily Swiss homebrew) communicates with the BBA through a memory-mapped
|
||
register interface accessed over the EXI serial bus.
|
||
|
||
This project replaces the MX98730EC with an iCEbreaker FPGA that emulates the
|
||
register interface, and connects to a W5500 ethernet chip (on a Pmod-compatible
|
||
module) for actual network communication.
|
||
|
||
### High-level data flow
|
||
|
||
```
|
||
GameCube CPU
|
||
│ EXI (SPI Mode 3, 32 MHz, Serial Port 1)
|
||
▼
|
||
iCEbreaker FPGA
|
||
├── exi domain (64 MHz): SPI slave, register file, prefetch pipeline
|
||
└── sync domain (48 MHz): SPRAM arbiter, RX assembler, TX drain, W5500 driver
|
||
│ SPI (up to 40 MHz)
|
||
▼
|
||
W5500 Pmod module (100BASE-TX ethernet)
|
||
│ RJ-45
|
||
▼
|
||
Network
|
||
```
|
||
|
||
### What this design does NOT implement
|
||
|
||
- A network stack. The GC CPU runs TCP/IP. The FPGA is a dumb MAC bridge.
|
||
- IP address awareness. The FPGA never parses ethernet frame payloads.
|
||
- The GC's DMA engine quirk (only relevant to GC-side software).
|
||
- Video/audio streaming logic (handled by Swiss on the GC CPU side).
|
||
|
||
---
|
||
|
||
## 2. Protocol References
|
||
|
||
| Source | Content |
|
||
|---|---|
|
||
| YAGCD §2.4.1.4 | SP1 (P6) connector pinout |
|
||
| YAGCD §5.9 | EXI bus register descriptions |
|
||
| YAGCD §10.8 | MX98730EC (BBA chip) register map |
|
||
| Dolphin source `EXI_DeviceEthernet.h` | Register offsets, init sequence, RX/TX flow |
|
||
| Dolphin source `EXI_DeviceEthernet.cpp` | Transaction encoding, interrupt logic |
|
||
| Swiss source `bba.c` | GC-side driver, exact register access patterns |
|
||
| MX98730EC datasheet | Unavailable publicly; YAGCD is the primary reference |
|
||
| W5500 datasheet | SPI interface, register map, socket model |
|
||
| iCE40UP5K datasheet | SPRAM timing, PLL parameters, I/O standards |
|
||
|
||
**Critical implementation note:** The MX98730EC uses **SPI Mode 3** (CPOL=1,
|
||
CPHA=1). CLK idles HIGH. Data is sampled on the FALLING edge of CLK and set up
|
||
on the RISING edge. This is the opposite of memory cards and the RTC chip, which
|
||
use SPI Mode 0. Getting this wrong means the GC will never enumerate the device.
|
||
|
||
---
|
||
|
||
## 3. Physical Interface — SP1 Edge Connector
|
||
|
||
### Slot characteristics
|
||
|
||
- Dual-sided PCB edge connector
|
||
- Contacts on both top and bottom faces of the PCB edge
|
||
- Top and bottom contact rows are **staggered** (offset by half a pitch), not
|
||
mirrored — similar to ISA/PCI card edge geometry
|
||
- PCB must be ordered at **1.2 mm thickness** with **ENIG (gold) finish**
|
||
- Keying notch at top-right corner of housing (when looking into console socket
|
||
with front of console facing right)
|
||
|
||
### Connector footprint
|
||
|
||
Exact pad positions and pitch must be taken from the SP1ETH KiCad project
|
||
(github.com/silverstee1/SP1ETH). Do not attempt to derive dimensions from YAGCD
|
||
alone — the document lists signals but not physical geometry. Cross-reference
|
||
against the ETH2SP1 (LaserBear) open model files as a second source.
|
||
|
||
Key parameters to verify from those files before PCB layout:
|
||
- Contact pitch (expected: 2.0 mm or 2.54 mm — measure from KiCad file)
|
||
- Stagger offset between top and bottom rows
|
||
- Total contact count per side (expected: 6 per side = 12 total, or 12 per side
|
||
= 24 total with duplicated power/ground)
|
||
- Insertion depth from board edge to first contact
|
||
- Board width at connector edge
|
||
|
||
### Signal pinout (YAGCD §2.4.1.4)
|
||
|
||
Pin numbering: looking into the console socket, front of console to the right,
|
||
pin 1 is on the left. On the adapter PCB (component side up, inserting down),
|
||
pin 1 is also on the left — numbering does not mirror.
|
||
|
||
| Pin | Signal | Direction | Notes |
|
||
|---|---|---|---|
|
||
| 1 | EXTIN | Adapter → GC | Device detect/sense. Tie to 3.3V via 10 kΩ resistor. Without this the GC does not enumerate the device. |
|
||
| 2 | GND | — | Shield ground |
|
||
| 3 | INT | Adapter → GC | Active-low interrupt to GC CPU. Assert when IR & IMR != 0. |
|
||
| 4 | CLK | GC → Adapter | SPI clock, up to 32 MHz, idles HIGH (Mode 3) |
|
||
| 5 | 12V | — | 12 V supply from GC. **Do not connect to FPGA I/O.** Leave unconnected or route to a test point only. |
|
||
| 6 | DO (MISO) | Adapter → GC | Serial data out: adapter drives, GC samples |
|
||
| 7 | 3.3V | — | 3.3 V supply (~200 mA available combined with pin 8) |
|
||
| 8 | 3.3V | — | 3.3 V supply (parallel with pin 7) |
|
||
| 9 | DI (MOSI) | GC → Adapter | Serial data in: GC drives, adapter samples |
|
||
| 10 | CS | GC → Adapter | Chip select, active low. Delineates each transaction. |
|
||
| 11 | GND | — | Signal ground |
|
||
| 12 | GND | — | Signal ground |
|
||
|
||
**Power budget:** Pins 7+8 together supply 3.3 V. The iCEbreaker draws ~80 mA
|
||
active, the W5500 ~150 mA peak. Total ~230 mA. The GC's 3.3 V rail on SP1 is
|
||
rated for the original BBA which also drew ~200 mA, so headroom is tight. Add a
|
||
100 µF bulk capacitor on the interposer PCB close to the FPGA power pins.
|
||
|
||
**Voltage levels:** All EXI signals are 3.3 V logic. The iCEbreaker I/O is 3.3 V.
|
||
The W5500 is 3.3 V. No level shifting required anywhere in this design.
|
||
|
||
---
|
||
|
||
## 4. Clock Domains
|
||
|
||
The design uses two clock domains. The iCE40UP5K has one PLL and one internal
|
||
48 MHz oscillator (SB_HFOSC).
|
||
|
||
### Domain table
|
||
|
||
| Domain | Frequency | Source | Purpose |
|
||
|---|---|---|---|
|
||
| `exi` | 64 MHz | PLL (12 MHz × 16 / 3) | SPI Mode 3 slave, BBA register file, prefetch pipeline |
|
||
| `sync` | 48 MHz | SB_HFOSC internal oscillator | SPRAM arbiter, RX/TX ethernet engines, W5500 SPI master |
|
||
|
||
### Rationale
|
||
|
||
**Why 64 MHz for `exi`?**
|
||
The EXI bus runs at 32 MHz. The SPI Mode 3 slave needs to detect CLK edges and
|
||
respond on the correct edge. Running the `exi` domain at 2× the bus rate (64 MHz)
|
||
gives two FPGA ticks per EXI CLK half-period. One tick for the setup phase
|
||
(MOSI→shift register, prepare MISO), one tick for the sample/drive phase. This
|
||
is the minimum oversampling ratio that cleanly implements Mode 3 without
|
||
combinatorial timing risk on the MISO output path.
|
||
|
||
**Why 48 MHz for `sync`?**
|
||
The iCE40UP5K's internal 48 MHz oscillator (SB_HFOSC) is available without
|
||
consuming the PLL. This leaves the one PLL free for the 64 MHz `exi` domain. The
|
||
W5500 SPI can run up to 80 MHz but we drive it at 24 MHz (48 MHz ÷ 2 via clock
|
||
enable), which is well within spec and requires no additional PLL output.
|
||
|
||
### PLL configuration (iCE40UP5K)
|
||
|
||
```
|
||
Input: 12 MHz crystal (iCEbreaker on-board)
|
||
DIVR: 0 (input divider: 12 MHz / (0+1) = 12 MHz)
|
||
DIVF: 63 (feedback mult: 12 MHz × (63+1) = 768 MHz VCO)
|
||
DIVQ: 3 (output divider: 768 MHz / 2^3 = 96 MHz)
|
||
... actually for 64 MHz:
|
||
DIVR: 0
|
||
DIVF: 15 (12 × 16 = 192 MHz VCO) -- VCO must be 533–1066 MHz on UP5K
|
||
```
|
||
|
||
The iCE40UP5K VCO range is 533–1066 MHz. To reach 64 MHz cleanly:
|
||
|
||
```
|
||
DIVR = 0 → F_pfd = 12 MHz
|
||
DIVF = 63 → F_vco = 12 × (63+1) = 768 MHz (within range)
|
||
DIVQ = 3 → F_out = 768 / 8 = 96 MHz (too fast)
|
||
|
||
Better: target 64 MHz
|
||
DIVF = 53 → F_vco = 12 × 54 = 648 MHz
|
||
DIVQ = 3 → F_out = 648 / 8 = 81 MHz (still off)
|
||
|
||
Correct combination:
|
||
DIVR = 0, DIVF = 42, DIVQ = 3
|
||
F_vco = 12 × 43 = 516 MHz (just below range minimum — not valid)
|
||
|
||
Use:
|
||
DIVR = 0, DIVF = 63, DIVQ = 3 → 96 MHz, then use clock enable for /1.5
|
||
-- or --
|
||
Accept 96 MHz exi domain (3× bus rate instead of 2×): more margin, same logic
|
||
-- or --
|
||
DIVR = 2, DIVF = 63, DIVQ = 2 → (12/3) × 64 / 4 = 64 MHz exactly
|
||
F_pfd = 4 MHz, F_vco = 4×64 = 256 MHz — below 533 MHz minimum, invalid
|
||
|
||
Recommended: use 96 MHz (DIVR=0, DIVF=63, DIVQ=3) for exi domain.
|
||
At 96 MHz there are 3 ticks per 32 MHz EXI half-period.
|
||
Adjust SPIMode3Slave edge detection accordingly (3-tick phases instead of 2).
|
||
```
|
||
|
||
**Implementation note:** Verify exact PLL parameters with `icepll` tool:
|
||
```bash
|
||
icepll -i 12 -o 64 # finds closest achievable output
|
||
icepll -i 12 -o 96 # alternative
|
||
```
|
||
The agent implementing this should run `icepll` and use whatever output it
|
||
recommends, then adjust the `SPIMode3Slave` tick counts accordingly.
|
||
|
||
### Reset strategy
|
||
|
||
Each domain has its own reset, deasserted synchronously using
|
||
`ResetSynchronizer` from `amaranth.lib.cdc`:
|
||
|
||
```python
|
||
# In platform create_missing_domain("exi"):
|
||
m.submodules.exi_rst = ResetSynchronizer(
|
||
arst = ResetSignal("sync"),
|
||
domain = "exi",
|
||
)
|
||
```
|
||
|
||
The `sync` domain reset comes from the iCEbreaker's on-chip power-on reset
|
||
(SB_GB driven by SB_HFOSC, which has built-in POR).
|
||
|
||
---
|
||
|
||
## 5. Clock Domain Crossing Strategy
|
||
|
||
All signals crossing between `exi` and `sync` domains must use one of the
|
||
following CDC primitives from `amaranth.lib.cdc`. Never pass a raw multi-bit
|
||
signal directly between domains — only one bit may change per clock crossing.
|
||
|
||
### CDC primitive selection guide
|
||
|
||
| Signal type | Primitive | Latency |
|
||
|---|---|---|
|
||
| Single bit, slow-changing (flags, status) | `FFSynchronizer` | 2 dest clocks |
|
||
| Single-cycle pulse / event | `PulseSynchronizer` | ~3–4 dest clocks |
|
||
| Multi-bit data stream (packet bytes) | `AsyncFIFO` | ~3–4 dest clocks |
|
||
| Reset deassertion | `ResetSynchronizer` | 2 dest clocks |
|
||
| Async external pin (CLK, MOSI, CS) | `FFSynchronizer` | 2 dest clocks |
|
||
|
||
### CDC inventory for this design
|
||
|
||
| Signal | From | To | Primitive | Notes |
|
||
|---|---|---|---|---|
|
||
| EXI CLK pin | async | exi | FFSynchronizer | stages=2, reset=1 (CLK idles high) |
|
||
| EXI MOSI pin | async | exi | FFSynchronizer | stages=2 |
|
||
| EXI CS pin | async | exi | FFSynchronizer | stages=2, reset=1 (CS idles high) |
|
||
| SPRAM read request (addr) | exi | sync | AsyncFIFO 16-bit wide, depth=4 | Prefetch pipeline |
|
||
| SPRAM read result (data) | sync | exi | AsyncFIFO 8-bit wide, depth=4 | Prefetch pipeline |
|
||
| TX packet bytes | exi | sync | AsyncFIFO 8-bit wide, depth=64 | GC→ethernet |
|
||
| TX packet start/len | exi | sync | AsyncFIFO 16-bit wide, depth=4 | Frame delimiter |
|
||
| RX packet bytes | sync | exi | AsyncFIFO 8-bit wide, depth=64 | ethernet→GC |
|
||
| RWP update (new value) | sync | exi | AsyncFIFO 8-bit wide, depth=4 | After frame committed |
|
||
| RRP update (new value) | exi | sync | AsyncFIFO 8-bit wide, depth=4 | After GC advances pointer |
|
||
| IR[RI] set (RX ready) | sync | exi | PulseSynchronizer | Triggers RI interrupt |
|
||
| IR[TI] set (TX done) | sync | exi | PulseSynchronizer | Triggers TI interrupt |
|
||
| NCRA reset pulse | exi | sync | PulseSynchronizer | Resets ethernet engine |
|
||
| exi_int_n output | exi | physical pin | Direct (output register) | Active-low to GC |
|
||
|
||
**Critical rule:** The register file lives entirely in the `exi` domain. The
|
||
`sync` domain never directly reads or writes EXI registers. All interaction
|
||
between the two domains goes through the AsyncFIFOs and PulseSynchronizers
|
||
listed above. This ensures the GC's register reads always respond within the
|
||
`exi` domain without waiting on CDC latency.
|
||
|
||
---
|
||
|
||
## 6. Module Hierarchy
|
||
|
||
```
|
||
BBATop (top-level, sets up clock domains)
|
||
├── SPIMode3Slave (exi domain — bit engine)
|
||
├── BBARegisterFile (exi domain — register decode + response)
|
||
│ ├── [AsyncFIFO: spram_req] (exi→sync: read address requests)
|
||
│ ├── [AsyncFIFO: spram_rsp] (sync→exi: read data responses)
|
||
│ ├── [AsyncFIFO: tx_bytes] (exi→sync: TX packet data)
|
||
│ ├── [AsyncFIFO: tx_ctrl] (exi→sync: TX frame length)
|
||
│ ├── [AsyncFIFO: rx_wptr] (sync→exi: RWP updates)
|
||
│ ├── [AsyncFIFO: rx_rptr] (exi→sync: RRP updates from GC)
|
||
│ ├── [PulseSynchronizer: rx_irq] (sync→exi)
|
||
│ ├── [PulseSynchronizer: tx_irq] (sync→exi)
|
||
│ └── [PulseSynchronizer: ncra_rst] (exi→sync)
|
||
├── SPRAMArbiter (sync domain — owns all SPRAM)
|
||
├── RXFrameAssembler (sync domain — ethernet→SPRAM)
|
||
├── TXFrameDrain (sync domain — SPRAM→ethernet)
|
||
├── W5500SPIMaster (sync domain — SPI master to W5500)
|
||
└── EEPROMModel (exi domain — 93C46 bit-bang model)
|
||
```
|
||
|
||
---
|
||
|
||
## 7. Module Specifications
|
||
|
||
### 7.1 SPIMode3Slave
|
||
|
||
**Domain:** `exi`
|
||
**File:** `exi_bba/spi_mode3_slave.py`
|
||
|
||
Implements a byte-oriented SPI Mode 3 slave. Handles CLK/MOSI/MISO/CS at the
|
||
bit level and presents a clean byte interface to `BBARegisterFile`.
|
||
|
||
**SPI Mode 3 timing recap:**
|
||
- CLK idles HIGH
|
||
- MOSI is set up by master before the FALLING edge
|
||
- Slave samples MOSI on the FALLING edge of CLK
|
||
- Slave drives MISO on the RISING edge of CLK (ready for master to sample on
|
||
next falling edge)
|
||
|
||
**Port list:**
|
||
|
||
| Port | Width | Dir | Domain | Description |
|
||
|---|---|---|---|---|
|
||
| `spi_clk` | 1 | in | async→exi | Raw SPI clock from GC, synchronized internally |
|
||
| `spi_mosi` | 1 | in | async→exi | Raw MOSI from GC, synchronized internally |
|
||
| `spi_miso` | 1 | out | exi | MISO output to GC |
|
||
| `spi_cs_n` | 1 | in | async→exi | Raw CS from GC (active low), synchronized internally |
|
||
| `rx_byte` | 8 | out | exi | Last complete received byte |
|
||
| `rx_valid` | 1 | out | exi | Pulses 1 cycle when `rx_byte` contains a new byte |
|
||
| `tx_byte` | 8 | in | exi | Byte to transmit; sampled when `tx_load` pulses |
|
||
| `tx_load` | 1 | out | exi | Requests next TX byte from upstream |
|
||
|
||
**Internal behaviour:**
|
||
|
||
1. Instantiate FFSynchronizer stages=2 on each of `spi_clk`, `spi_mosi`,
|
||
`spi_cs_n`. Reset values: `spi_clk`=1, `spi_cs_n`=1.
|
||
2. Register the synchronized signals one further cycle to form edge detectors:
|
||
`rising_clk = clk_s & ~clk_prev`, `falling_clk = ~clk_s & clk_prev`.
|
||
3. On CS falling edge: load `tx_byte` into internal shift register, pulse
|
||
`tx_load`, reset `bit_ctr` to 0.
|
||
4. On FALLING CLK edge (sample): shift `mosi_s` into `rx_shift` MSB-first,
|
||
increment `bit_ctr`. When `bit_ctr == 8`: register `rx_shift` into `rx_byte`,
|
||
pulse `rx_valid`, reset `bit_ctr` to 0, pulse `tx_load` to request next byte.
|
||
5. On RISING CLK edge (drive): shift `tx_shift` left by 1, drive MSB onto
|
||
`spi_miso`.
|
||
6. On CS rising edge: drive `spi_miso` high (idle), reset state.
|
||
|
||
**Note on `tx_load` timing:** `tx_load` pulses at two points — CS assertion
|
||
(loads first byte before any bits are clocked) and after each complete received
|
||
byte (loads the next byte). The upstream (`BBARegisterFile`) must register the
|
||
next TX byte within one `exi` clock of `tx_load` pulsing.
|
||
|
||
---
|
||
|
||
### 7.2 BBARegisterFile
|
||
|
||
**Domain:** `exi` (with AsyncFIFO interfaces to `sync`)
|
||
**File:** `exi_bba/bba_register_file.py`
|
||
|
||
Decodes EXI transactions (2-byte header + N data bytes), reads/writes the BBA
|
||
register space, and manages all CDC crossings to the `sync` domain.
|
||
|
||
#### EXI transaction decoder FSM
|
||
|
||
States: `HEADER0` → `HEADER1` → `DATA` → (back to `HEADER0`)
|
||
|
||
**Header format:**
|
||
|
||
```
|
||
Byte 0: [7] = write flag (1 = write, 0 = read)
|
||
[6:0] = addr[12:6] (upper 7 bits of 13-bit address)
|
||
|
||
Byte 1: [7:2] = addr[5:0] (lower 6 bits of 13-bit address)
|
||
[1:0] = xfer_len (0=1 byte, 1=2 bytes, 2=3 bytes, 3=4 bytes)
|
||
```
|
||
|
||
Full address = `{ byte0[6:0], byte1[7:2] }` = 13 bits → range 0x0000–0x1FFF.
|
||
|
||
**`HEADER0` state:** Wait for `rx_valid`. Latch `rx_byte` as `hdr0`.
|
||
|
||
**`HEADER1` state:** Wait for `rx_valid`. Decode address and flags. For read
|
||
transactions, immediately issue SPRAM prefetch request if address ≥ 0x100
|
||
(ring buffer region). Load `tx_byte` with the register value for addresses
|
||
< 0x100 (register file region). Transition to `DATA`.
|
||
|
||
**`DATA` state (write path):** For each `rx_valid`, write `rx_byte` to
|
||
`regs[addr + byte_ctr]` and handle side effects (see register side effects
|
||
table). Increment `byte_ctr`. When `byte_ctr == xfer_len`, go to `HEADER0`.
|
||
|
||
**`DATA` state (read path):** Drive `tx_byte` from prefetch result (addresses
|
||
≥ 0x100) or directly from `regs[]` (addresses < 0x100). On each `tx_load`,
|
||
advance the read pointer and issue next prefetch. When `byte_ctr == xfer_len`,
|
||
go to `HEADER0`.
|
||
|
||
**CS deassertion abort:** In any state, if `cs_n` rises, return to `HEADER0`.
|
||
|
||
#### Register file storage
|
||
|
||
Registers 0x00–0x1FF are implemented as an `Array` of 8-bit `Signal`s (512
|
||
registers). In synthesis this maps to distributed RAM on iCE40. Not SPRAM —
|
||
SPRAM is reserved for the packet ring buffer.
|
||
|
||
The register file is entirely in the `exi` domain. No CDC is needed to read
|
||
or write registers 0x00–0xFF.
|
||
|
||
#### Register side effects
|
||
|
||
| Register | Write side effect |
|
||
|---|---|
|
||
| NCRA (0x00) | If bit 0 (RESET) written: pulse `ncra_rst` PulseSynchronizer to `sync` domain. Self-clear bit 0 on next cycle. Reset TX/RX pointers in register file. |
|
||
| IR (0x09) | Write-1-to-clear: `IR <= IR & ~written_value` |
|
||
| RRP (0x18–0x19) | After GC writes new RRP value, push value into `rx_rptr` AsyncFIFO (exi→sync) so RX engine knows GC has consumed those pages |
|
||
| TWD (0x34–0x37) | Bytes written here are the TX frame length field (2 bytes little-endian). Latch for TX engine. |
|
||
| TXDATA (0x48) | Each byte written goes into `tx_bytes` AsyncFIFO (exi→sync). When `byte_ctr == xfer_len` on last write chunk, push frame length into `tx_ctrl` AsyncFIFO. |
|
||
|
||
#### Interrupt register update (from sync domain)
|
||
|
||
- `rx_irq` PulseSynchronizer arriving from sync: set `IR[1]` (RI bit)
|
||
- `tx_irq` PulseSynchronizer arriving from sync: set `IR[2]` (TI bit), clear
|
||
`NCRA[3:2]` (ST1:ST0 — transmit start bits)
|
||
|
||
#### Interrupt output
|
||
|
||
```
|
||
exi_int_n <= ~|(IR & IMR) # active-low: assert when any unmasked bit set
|
||
```
|
||
|
||
Register this one flip-flop in the `exi` domain. The physical pin is a direct
|
||
output — no CDC needed because the GC only reads the interrupt state via polling
|
||
IR over EXI (which is already in the `exi` domain) or via the interrupt line
|
||
which the GC CPU samples asynchronously.
|
||
|
||
#### NWAYS register
|
||
|
||
Always return `0x17` (link up, 100 Mbps, full duplex, autoneg complete).
|
||
The GC's BBA driver polls NWAYS after reset to confirm link status before
|
||
enabling RX. Hardcode this value — do not attempt to forward real link status
|
||
from the W5500.
|
||
|
||
```python
|
||
# NWAYS = 0x17:
|
||
# bit 4 (LS100) = 1: 100BASE-TX link up
|
||
# bit 2 (ANCLPT) = 1: autoneg complete
|
||
# bit 1 (100TXH) = 1: 100BASE-TX half (also set in practice)
|
||
# bit 0 (LS10) = 1: 10BASE-T (also reported)
|
||
```
|
||
|
||
---
|
||
|
||
### 7.3 SPRAMArbiter
|
||
|
||
**Domain:** `sync`
|
||
**File:** `exi_bba/spram_arbiter.py`
|
||
|
||
Arbitrates access to the iCE40UP5K's 128 KB SPRAM between two clients:
|
||
|
||
- **Client A (EXI read):** Issues read requests from the prefetch pipeline
|
||
(`spram_req` AsyncFIFO). Must service requests fast enough to keep the
|
||
prefetch pipeline full.
|
||
- **Client B (ETH write):** The `RXFrameAssembler` writes incoming ethernet
|
||
frames into the ring buffer area.
|
||
|
||
**Priority:** ETH write wins over EXI read when both request simultaneously.
|
||
This is safe because:
|
||
1. The GC only reads a ring buffer page after RWP has advanced past it (i.e.,
|
||
the ETH engine has finished writing that page).
|
||
2. Even if an EXI read is delayed by one SPRAM cycle, the prefetch pipeline
|
||
has enough depth (4 entries) to absorb the stall without the SPI slave
|
||
running out of data.
|
||
|
||
**SPRAM interface (iCE40UP5K SB_SPRAM256KA):**
|
||
|
||
```
|
||
WREN : write enable
|
||
CHIPSELECT : always 1
|
||
CLOCK : sync domain clock (48 MHz)
|
||
STANDBY : 0
|
||
SLEEP : 0
|
||
POWEROFF_N : 1
|
||
ADDRESS[13:0] : byte address divided by 2 (SPRAM is 16-bit wide)
|
||
DATAIN[15:0] : write data (use only [7:0] for byte writes, mask upper byte)
|
||
MASKWREN[3:0] : byte enable (0b0011 for lower byte, 0b1100 for upper byte)
|
||
DATAOUT[15:0] : read data
|
||
```
|
||
|
||
The SPRAM is 16-bit wide. Byte addressing is done via `MASKWREN`. For an 8-bit
|
||
write to address `A`: set `ADDRESS = A >> 1`, `MASKWREN = (A & 1) ? 0b1100 :
|
||
0b0011`, write data in the appropriate byte of `DATAIN`.
|
||
|
||
**Read latency:** SPRAM has 1-cycle synchronous read latency. The result of a
|
||
read issued at cycle N is valid at cycle N+1. The arbiter must account for this
|
||
when responding to the prefetch pipeline.
|
||
|
||
**Port list:**
|
||
|
||
| Port | Width | Dir | Notes |
|
||
|---|---|---|---|
|
||
| `exi_req_addr` | 16 | in | From spram_req AsyncFIFO (exi→sync) |
|
||
| `exi_req_valid` | 1 | in | FIFO r_rdy |
|
||
| `exi_req_ready` | 1 | out | FIFO r_en (pop when serviced) |
|
||
| `exi_rsp_data` | 8 | out | To spram_rsp AsyncFIFO (sync→exi) |
|
||
| `exi_rsp_valid` | 1 | out | FIFO w_en |
|
||
| `eth_wr_addr` | 16 | in | From RXFrameAssembler |
|
||
| `eth_wr_data` | 8 | in | Byte to write |
|
||
| `eth_wr_valid` | 1 | in | Write request |
|
||
| `eth_wr_ready` | 1 | out | Write accepted this cycle |
|
||
|
||
---
|
||
|
||
### 7.4 RXFrameAssembler
|
||
|
||
**Domain:** `sync`
|
||
**File:** `exi_bba/rx_frame_assembler.py`
|
||
|
||
Receives complete ethernet frames from `W5500SPIMaster` and writes them into
|
||
the SPRAM ring buffer in the correct MX98730EC format.
|
||
|
||
**Ring buffer layout (in SPRAM):**
|
||
|
||
```
|
||
SPRAM address 0x0100–0x0FFF (3840 bytes = 15 × 256-byte pages)
|
||
Page 0x01: first usable RX page
|
||
Page 0x0F: last usable RX page (RHBP default)
|
||
Pages wrap: after 0x0F, next is 0x01 (not 0x00, which is reserved)
|
||
```
|
||
|
||
Each page is 256 bytes. A received frame may span multiple pages.
|
||
|
||
**Frame descriptor (first 4 bytes of first page):**
|
||
|
||
```
|
||
Byte 0: LRPS value (Last Received Packet Status — set to 0x00 or actual status)
|
||
Byte 1: 0x00
|
||
Byte 2: frame_length[15:8] (big-endian, includes descriptor bytes)
|
||
Byte 3: frame_length[7:0]
|
||
Bytes 4+: raw ethernet frame data (DA, SA, EtherType, payload, FCS)
|
||
```
|
||
|
||
**Flow:**
|
||
|
||
1. Wait for `W5500SPIMaster` to signal frame available (`rx_sof` pulse).
|
||
2. Read frame bytes from W5500 frame FIFO.
|
||
3. Compute how many 256-byte pages are needed:
|
||
`pages_needed = ceil((frame_length + 4) / 256)`
|
||
4. Check that `(RWP + pages_needed) mod 16 != RRP` (ring not full). If full,
|
||
drop the frame and increment a drop counter.
|
||
5. Write 4-byte descriptor at SPRAM address `0x100 + (RWP * 0x100)`.
|
||
6. Write frame bytes sequentially, wrapping pages at 256-byte boundaries.
|
||
Page wrap: `next_page = (current_page % 15) + 1` (pages 1–15, skip 0).
|
||
7. After last byte written, update `RWP` in the `rx_wptr` AsyncFIFO (sync→exi).
|
||
The `exi` domain will update the RWP register from this FIFO.
|
||
8. Pulse `rx_irq` PulseSynchronizer to `exi` domain.
|
||
|
||
**MAC address filter:**
|
||
|
||
Before writing a frame, check destination MAC against PAR0–PAR5 (broadcast
|
||
FF:FF:FF:FF:FF:FF always accepted). The GC will typically configure PAR0–PAR5
|
||
via EXI after boot, so the `BBARegisterFile` must expose these to the
|
||
`RXFrameAssembler`. Pass them via a dedicated small AsyncFIFO or by reading
|
||
them from a shared register shadow (6 bytes, sync domain copy updated when
|
||
GC writes PAR0–PAR5). Multicast hash table (MAR0–MAR7) filtering is optional
|
||
for initial implementation — accept all frames (promiscuous mode) until the GC
|
||
configures the filter.
|
||
|
||
---
|
||
|
||
### 7.5 TXFrameDrain
|
||
|
||
**Domain:** `sync`
|
||
**File:** `exi_bba/tx_frame_drain.py`
|
||
|
||
Drains the TX byte FIFO (fed from the `exi` domain as the GC writes to TXDATA
|
||
register 0x48) and forwards complete frames to `W5500SPIMaster`.
|
||
|
||
**Flow:**
|
||
|
||
1. Wait for `tx_ctrl` AsyncFIFO to contain a frame length value. This is pushed
|
||
by `BBARegisterFile` when the GC has written the complete TX frame (i.e.,
|
||
NCRA ST1:ST0 transitions to 01 or 10).
|
||
2. Pop `frame_length` from `tx_ctrl`.
|
||
3. Pop exactly `frame_length` bytes from `tx_bytes` AsyncFIFO.
|
||
4. Forward bytes to `W5500SPIMaster` TX interface with SOF/EOF framing.
|
||
5. Wait for `W5500SPIMaster` to signal TX complete.
|
||
6. Pulse `tx_irq` PulseSynchronizer to `exi` domain.
|
||
|
||
**NCRA ST bits:** The GC writes NCRA with ST1:ST0 = 01 (start transmit from
|
||
buffer 1) or 10 (start transmit from buffer 2). The BBA hardware has two TX
|
||
buffers; this implementation uses a single TX FIFO and ignores the buffer
|
||
selection. When ST1:ST0 goes non-zero, treat it as a TX trigger regardless of
|
||
which bits are set. The `BBARegisterFile` should push the frame length into
|
||
`tx_ctrl` on this transition.
|
||
|
||
---
|
||
|
||
### 7.6 W5500SPIMaster
|
||
|
||
**Domain:** `sync`
|
||
**File:** `exi_bba/w5500_spi_master.py`
|
||
|
||
Implements the W5500 SPI master interface. The W5500 uses SPI Mode 0 (CPOL=0,
|
||
CPHA=0), opposite to the BBA EXI interface.
|
||
|
||
**W5500 SPI frame format:**
|
||
|
||
```
|
||
Byte 0–1: Address (16-bit, big-endian)
|
||
Byte 2: Control byte:
|
||
[7:3] = Block Select (BSB):
|
||
00000 = Common Register
|
||
00001 = Socket 0 Register
|
||
00010 = Socket 0 TX buffer
|
||
00011 = Socket 0 RX buffer
|
||
[2] = Read/Write (0=read, 1=write)
|
||
[1:0] = Operation Mode (00=variable, 01=fixed 1B, 10=fixed 2B, 11=fixed 4B)
|
||
Byte 3+: Data bytes
|
||
```
|
||
|
||
**W5500 configuration (to be performed once on NCRA reset):**
|
||
|
||
```
|
||
1. Write MR (Mode Register, 0x0000): 0x80 — software reset
|
||
2. Wait ~1 ms
|
||
3. Write SHAR (Source MAC, 0x0009–0x000E): copy from PAR0–PAR5 register shadow
|
||
4. Write S0_MR (Socket 0 Mode, 0x4000): 0x04 — MACRAW mode (raw ethernet)
|
||
5. Write S0_CR (Socket 0 Command, 0x4001): 0x01 — OPEN
|
||
6. Write S0_IMR (Socket 0 Interrupt Mask, 0x4024): 0x04 | 0x01 — RECV | SEND_OK
|
||
```
|
||
|
||
**MACRAW mode:** In MACRAW mode the W5500 Socket 0 sends and receives raw
|
||
ethernet frames including the full MAC header and FCS. This is exactly what
|
||
the MX98730EC presents to the GC. No IP stack runs in the FPGA.
|
||
|
||
**RX polling:** The W5500 asserts its INT_N pin (active low) when a frame
|
||
arrives. Connect W5500 INT_N to an FPGA input pin and use it to trigger the
|
||
`RXFrameAssembler`. Alternatively poll `S0_IR` (Socket 0 Interrupt Register,
|
||
0x4002) periodically. The INT_N approach has lower latency and is preferred.
|
||
|
||
**SPI clock rate:** Drive W5500 SPI at 24 MHz (sync clock 48 MHz ÷ 2 using a
|
||
clock enable toggle). The W5500 supports up to 80 MHz so there is ample margin.
|
||
|
||
**Port list:**
|
||
|
||
| Port | Width | Dir | Notes |
|
||
|---|---|---|---|
|
||
| `spi_clk` | 1 | out | To W5500 CLK pin (SPI Mode 0, idles LOW) |
|
||
| `spi_mosi` | 1 | out | To W5500 MOSI |
|
||
| `spi_miso` | 1 | in | From W5500 MISO |
|
||
| `spi_cs_n` | 1 | out | To W5500 CS (active low) |
|
||
| `w5500_int_n` | 1 | in | W5500 interrupt (active low) |
|
||
| `tx_data` | 8 | in | Byte to transmit (from TXFrameDrain) |
|
||
| `tx_valid` | 1 | in | TX byte available |
|
||
| `tx_ready` | 1 | out | TX byte consumed |
|
||
| `tx_sof` | 1 | in | Start of frame marker |
|
||
| `tx_eof` | 1 | in | End of frame marker |
|
||
| `rx_data` | 8 | out | Received byte (to RXFrameAssembler) |
|
||
| `rx_valid` | 1 | out | RX byte available |
|
||
| `rx_ready` | 1 | in | RX byte consumed |
|
||
| `rx_sof` | 1 | out | Start of frame |
|
||
| `rx_eof` | 1 | out | End of frame |
|
||
|
||
---
|
||
|
||
### 7.7 EEPROMModel
|
||
|
||
**Domain:** `exi`
|
||
**File:** `exi_bba/eeprom_model.py`
|
||
|
||
Models the 93C46-compatible serial EEPROM that stores the BBA's MAC address.
|
||
The GC software bit-bangs the EEPROM interface through register 0x1C
|
||
(EEPROM Interface Register) of the BBA chip.
|
||
|
||
**Register 0x1C bit fields:**
|
||
|
||
```
|
||
[3] EECK — EEPROM clock
|
||
[2] EECS — EEPROM chip select
|
||
[1] EEDI — EEPROM data in (GC → EEPROM)
|
||
[0] EEDO — EEPROM data out (EEPROM → GC) [read-only]
|
||
```
|
||
|
||
The GC reads EEDO by reading register 0x1C bit 0.
|
||
|
||
**93C46 protocol summary:**
|
||
|
||
The 93C46 uses a 3-wire serial protocol (SK=clock, CS=select, DI=data in,
|
||
DO=data out). Commands:
|
||
- READ: start bit (1) + opcode (10) + 6-bit address → 16-bit data out
|
||
- WRITE: start bit (1) + opcode (01) + 6-bit address + 16-bit data
|
||
- EWEN (write enable): start bit (1) + opcode (00) + address (11xxxx)
|
||
|
||
Each 93C46 word is 16 bits. The MAC address occupies words 0–2 (6 bytes).
|
||
|
||
**Implementation approach:**
|
||
|
||
Maintain a small ROM of 64 × 16-bit words in the `exi` domain (as a Const
|
||
array, synthesises to LUTs). Pre-populate words 0–2 with the chosen MAC
|
||
address. Implement a small FSM that watches writes to register 0x1C for the
|
||
93C46 protocol, drives EEDO accordingly.
|
||
|
||
**Simpler alternative:** Many GC BBA drivers read the EEPROM once at boot and
|
||
then write the MAC to PAR0–PAR5 themselves. Pre-populate PAR0–PAR5 in the
|
||
register file reset state with a valid Nintendo OUI MAC (00:09:BF:xx:xx:xx).
|
||
Skip a full 93C46 implementation for the first version — if Swiss ignores the
|
||
EEPROM read result and uses a hardcoded or user-configurable MAC, this is
|
||
sufficient.
|
||
|
||
---
|
||
|
||
### 7.8 BBATop
|
||
|
||
**Domain:** both
|
||
**File:** `exi_bba/bba_top.py`
|
||
|
||
Top-level module. Instantiates all submodules, creates clock domains, connects
|
||
physical pins.
|
||
|
||
**Clock domain creation:**
|
||
|
||
```python
|
||
def elaborate(self, platform):
|
||
m = Module()
|
||
|
||
# exi domain: 96 MHz from PLL (3× 32 MHz EXI bus rate)
|
||
exi_domain = ClockDomain("exi")
|
||
m.domains += exi_domain
|
||
pll = platform.get_pll() # platform-specific PLL primitive
|
||
m.d.comb += exi_domain.clk.eq(pll.clkout)
|
||
m.submodules.exi_rst = ResetSynchronizer(
|
||
arst=ResetSignal("sync"), domain="exi"
|
||
)
|
||
|
||
# sync domain: 48 MHz from SB_HFOSC (platform default)
|
||
# Created automatically by iCEbreaker platform
|
||
|
||
# Instantiate submodules...
|
||
m.submodules.spi = spi = SPIMode3Slave()
|
||
m.submodules.regfile = regfile = BBARegisterFile()
|
||
m.submodules.arbiter = arbiter = SPRAMArbiter()
|
||
m.submodules.rx_asm = rx_asm = RXFrameAssembler()
|
||
m.submodules.tx_drn = tx_drn = TXFrameDrain()
|
||
m.submodules.w5500 = w5500 = W5500SPIMaster()
|
||
m.submodules.eeprom = eeprom = EEPROMModel()
|
||
# ... wiring ...
|
||
```
|
||
|
||
**Physical pin connections (iCEbreaker):**
|
||
|
||
The SP1 EXI signals connect via the interposer PCB to iCEbreaker PMOD pins.
|
||
The W5500 Pmod connects to the second PMOD connector. Exact pin mapping depends
|
||
on the interposer PCB layout — define these in a platform resource file.
|
||
|
||
```python
|
||
# Example resource definitions (add to iCEbreaker platform file):
|
||
Resource("exi", 0,
|
||
Subsignal("clk", Pins("1", conn=("pmod", 0), dir="i")),
|
||
Subsignal("mosi", Pins("2", conn=("pmod", 0), dir="i")),
|
||
Subsignal("miso", Pins("3", conn=("pmod", 0), dir="o")),
|
||
Subsignal("cs_n", Pins("4", conn=("pmod", 0), dir="i")),
|
||
Subsignal("int_n",Pins("7", conn=("pmod", 0), dir="o")),
|
||
Attrs(IO_STANDARD="SB_LVCMOS"),
|
||
),
|
||
Resource("w5500", 0,
|
||
Subsignal("clk", Pins("1", conn=("pmod", 1), dir="o")),
|
||
Subsignal("mosi", Pins("2", conn=("pmod", 1), dir="o")),
|
||
Subsignal("miso", Pins("3", conn=("pmod", 1), dir="i")),
|
||
Subsignal("cs_n", Pins("4", conn=("pmod", 1), dir="o")),
|
||
Subsignal("int_n",Pins("7", conn=("pmod", 1), dir="i")),
|
||
Subsignal("rst_n",Pins("8", conn=("pmod", 1), dir="o")),
|
||
Attrs(IO_STANDARD="SB_LVCMOS"),
|
||
),
|
||
```
|
||
|
||
---
|
||
|
||
## 8. Memory Map
|
||
|
||
The BBA register address space is 13 bits wide (0x0000–0x1FFF).
|
||
|
||
| Address range | Region | Implemented in | Notes |
|
||
|---|---|---|---|
|
||
| 0x0000–0x0033 | MAC control registers | Register file (exi) | NCRA, NCRB, IMR, IR, pointers |
|
||
| 0x0034–0x0037 | TWD — TX write data | Register file (exi) | TX frame length (2 bytes) |
|
||
| 0x0038–0x0039 | Reserved | — | Ignore |
|
||
| 0x003A | HIPR — Host Interface Protocol | Register file (exi) | Read: 0x01 (BBA present) |
|
||
| 0x003B | NAFR — Network Address Filter | Register file (exi) | |
|
||
| 0x003C | NWBA — Network Write Buffer Addr | Register file (exi) | |
|
||
| 0x003D–0x0047 | Reserved | — | Ignore |
|
||
| 0x0048 | TXDATA — Bulk TX data port | Register file → tx_bytes FIFO | Write path to ethernet |
|
||
| 0x0049–0x00FF | Reserved | — | Ignore |
|
||
| 0x0100–0x0FFF | RX ring buffer | SPRAM (sync) | Read path from ethernet |
|
||
|
||
---
|
||
|
||
## 9. EXI Transaction Protocol
|
||
|
||
All BBA register accesses follow a strict two-phase (header + data) format.
|
||
|
||
### Header encoding
|
||
|
||
```
|
||
Byte 0: [7] write flag 1=write, 0=read
|
||
[6:0] addr[12:6] upper 7 bits of address
|
||
|
||
Byte 1: [7:2] addr[5:0] lower 6 bits of address
|
||
[1:0] xfer_len-1 0=1 byte, 1=2 bytes, 2=3 bytes, 3=4 bytes
|
||
```
|
||
|
||
CS is asserted (low) before byte 0 and remains low through the entire
|
||
transaction including all data bytes. CS deasserts (high) after the last
|
||
data byte.
|
||
|
||
### Read transaction timing
|
||
|
||
```
|
||
CS ─┐ ┌─
|
||
└────────────────────────────────────┘
|
||
CLK ┌┐┌┐┌┐┌┐┌┐┌┐┌┐┌┐ ┌┐┌┐┌┐┌┐┌┐┌┐┌┐┌┐ ┌┐┌┐...
|
||
header byte 0 header byte 1 data byte 0...
|
||
MOSI [addr+flags] [addr+len] [don't care]
|
||
MISO [don't care] [don't care] [register data]
|
||
```
|
||
|
||
The register file must have data ready on MISO from the **very first clock
|
||
edge of the data phase**. For register-file-backed reads (address < 0x100),
|
||
the data is available immediately after header decode. For SPRAM-backed reads
|
||
(address ≥ 0x100), the prefetch pipeline issues the SPRAM read request during
|
||
the header phase so data is ready in time.
|
||
|
||
### Write transaction timing
|
||
|
||
Identical header, then MOSI carries the write data. The FPGA samples MOSI on
|
||
each falling CLK edge during the data phase and writes to the register.
|
||
|
||
### ID query
|
||
|
||
On power-on the GC queries the device ID. The query is two 0x00 bytes written,
|
||
then four bytes read. The BBA returns `0x04020200`. Implement this as a special
|
||
case: when address decodes to 0x0000 on a read with no prior NCRA reset, return
|
||
the hardcoded ID.
|
||
|
||
Alternatively, read the Dolphin source for the exact byte sequence GC software
|
||
uses to detect the BBA and replicate it faithfully.
|
||
|
||
---
|
||
|
||
## 10. BBA Register Reference
|
||
|
||
Key registers the GC driver accesses. Full register map in YAGCD §10.8.
|
||
|
||
| Addr | Name | R/W | Reset | Description |
|
||
|---|---|---|---|---|
|
||
| 0x00 | NCRA | R/W | 0x00 | Network Control A. [0]=RESET (self-clear), [2:1]=ST (TX start), [3]=SR (start receive), [6]=INTMODE (0=int active low) |
|
||
| 0x01 | NCRB | R/W | 0x00 | Network Control B |
|
||
| 0x04 | LTPS | R | 0x00 | Last TX packet status |
|
||
| 0x05 | LRPS | R | 0x00 | Last RX packet status |
|
||
| 0x08 | IMR | R/W | 0x00 | Interrupt mask. Bits match IR. Interrupt fires when IR & IMR != 0 |
|
||
| 0x09 | IR | R/W | 0x00 | Interrupt register. Write 1 to clear. [7]=RBFI, [4]=TEI, [2]=TI, [1]=RI |
|
||
| 0x0A–0x0B | BP | R/W | — | Boundary page pointer |
|
||
| 0x0C–0x0D | TLBP | R/W | — | TX low boundary page |
|
||
| 0x0E–0x0F | TWP | R/W | 0x00 | TX write page pointer |
|
||
| 0x12–0x13 | TRP | R/W | 0x00 | TX read page pointer |
|
||
| 0x16–0x17 | RWP | R | updates | RX write page pointer. Advances after each frame written |
|
||
| 0x18–0x19 | RRP | R/W | 0x01 | RX read page pointer. GC writes to advance after consuming frames |
|
||
| 0x1A–0x1B | RHBP | R/W | 0x0F | RX high boundary page (last valid page). Default 0x0F |
|
||
| 0x1C | EEPROM | R/W | — | EEPROM bit-bang interface [3:0] = EECK, EECS, EEDI, EEDO |
|
||
| 0x20–0x25 | PAR0–5 | R/W | MAC | MAC address bytes 0–5. GC writes after reading EEPROM |
|
||
| 0x26–0x2D | MAR0–7 | R/W | 0xFF | Multicast hash table. 0xFF = accept all |
|
||
| 0x2E | ANALOG | R/W | — | PHY analog control. GC writes 0xD6 to enable PHY |
|
||
| 0x30 | NWAYC | R/W | — | Autoneg config. GC sets ANE + LTE bits |
|
||
| 0x31 | NWAYS | R | 0x17 | Autoneg status. Hardcode 0x17 = 100M full duplex link up |
|
||
| 0x32 | GCA | R/W | — | GMAC config A. GC sets AUTOPUB bit |
|
||
| 0x33 | GCB | R/W | — | GMAC config B |
|
||
| 0x34–0x37 | TWD | W | — | TX write data (frame length, 2 bytes LE, then ignored) |
|
||
| 0x3A | HIPR | R | 0x01 | Host interface protocol version. Return 0x01 |
|
||
| 0x3B | NAFR | R/W | — | Network address filter |
|
||
| 0x3C | NWBA | R/W | — | Network write buffer address |
|
||
| 0x48 | TXDATA | W | — | Bulk TX data port. GC streams frame bytes here |
|
||
| 0x100+ | RX buf | R | — | RX ring buffer. GC reads frames from here |
|
||
|
||
---
|
||
|
||
## 11. Initialisation Sequence
|
||
|
||
This is the exact sequence Swiss/GC software executes. The register file must
|
||
respond correctly to each step.
|
||
|
||
```
|
||
1. Assert CS, write 0x0000 (2 bytes), read 4 bytes
|
||
→ Must return: 0x04 0x02 0x02 0x00 (device ID)
|
||
|
||
2. Write 0x01 to NCRA (0x00) — software reset
|
||
→ RESET bit self-clears next cycle
|
||
→ Pulse ncra_rst to sync domain (resets W5500, clears SPRAM pointers)
|
||
|
||
3. Poll NCRA bit 0 until clear — wait for reset complete
|
||
→ Return 0x00 from NCRA reads after self-clear
|
||
|
||
4. Write 6 bytes to PAR0–PAR5 (0x20–0x25)
|
||
→ Latch MAC address; forward to sync domain MAC filter shadow
|
||
|
||
5. Write 8 bytes to MAR0–MAR7 (0x26–0x2D)
|
||
→ Typically all 0xFF (promiscuous mode)
|
||
|
||
6. Write 0xD6 to ANALOG (0x2E) — enable PHY
|
||
→ Store in register file; no hardware effect in FPGA
|
||
|
||
7. Write NWAYC (0x30): set bits for ANE + LTE
|
||
→ Store; no hardware effect
|
||
|
||
8. Write IMR (0x08): typically 0x86 (RBFI | TI | RI)
|
||
→ Enables interrupts; INT line will now assert when frames arrive
|
||
|
||
9. Write GCA (0x32): set AUTOPUB bit
|
||
→ Store; AUTOPUB means RWP auto-updates — we always do this anyway
|
||
|
||
10. Write NCRA (0x00): set SR bit (0x08) — start receive
|
||
→ Enable RX path; the RXFrameAssembler should begin accepting frames
|
||
|
||
11. Poll NWAYS (0x31) until link up
|
||
→ Return hardcoded 0x17 immediately
|
||
```
|
||
|
||
---
|
||
|
||
## 12. RX Data Path — Detailed Flow
|
||
|
||
```
|
||
W5500 receives frame on wire
|
||
│
|
||
▼
|
||
W5500SPIMaster detects S0_IR[RECV] (via INT_N pin)
|
||
Reads frame length from S0_RX_RSR (Socket 0 RX Received Size, 0x4026)
|
||
Reads frame bytes from Socket 0 RX buffer (BSB=0b00011)
|
||
Pulses rx_sof, streams rx_data bytes, pulses rx_eof
|
||
│
|
||
▼ (sync domain)
|
||
RXFrameAssembler
|
||
- Checks destination MAC vs PAR shadow
|
||
- Checks NCRA SR bit is set (RX enabled)
|
||
- Computes pages_needed
|
||
- Checks ring buffer not full (RWP+pages != RRP)
|
||
- Writes descriptor + frame data into SPRAM via SPRAMArbiter
|
||
- Advances RWP (local register in sync domain)
|
||
- Pushes new RWP value into rx_wptr AsyncFIFO (sync→exi)
|
||
- Pulses rx_irq PulseSynchronizer (sync→exi)
|
||
│
|
||
▼ AsyncFIFO / PulseSynchronizer crossing
|
||
│ (exi domain)
|
||
BBARegisterFile
|
||
- Pops new RWP from rx_wptr FIFO, updates RWP register
|
||
- rx_irq pulse arrives: sets IR[1] (RI bit)
|
||
- IR & IMR now non-zero: asserts exi_int_n (INT low to GC)
|
||
│
|
||
▼ (GC CPU, driven by interrupt or polling)
|
||
GC reads IR register: sees RI=1
|
||
GC reads RWP (0x16): gets updated pointer
|
||
GC reads frame from 0x100+RRP (bulk read, up to 1500+ bytes)
|
||
→ BBARegisterFile issues SPRAM read requests via spram_req FIFO (exi→sync)
|
||
→ SPRAMArbiter services reads from SPRAM
|
||
→ Results flow back via spram_rsp FIFO (sync→exi)
|
||
→ Prefetch pipeline keeps data ready for SPI bit engine
|
||
GC writes new RRP (0x18) to advance past consumed pages
|
||
→ BBARegisterFile pushes RRP update into rx_rptr FIFO (exi→sync)
|
||
→ RXFrameAssembler updates its local RRP shadow
|
||
GC writes IR register with RI=1 (write-1-to-clear)
|
||
→ IR[1] clears, INT line deasserts
|
||
```
|
||
|
||
---
|
||
|
||
## 13. TX Data Path — Detailed Flow
|
||
|
||
```
|
||
GC CPU constructs ethernet frame in GC RAM
|
||
│
|
||
▼ (GC CPU → EXI)
|
||
GC writes 2-byte length to TWD register (0x34)
|
||
GC writes frame bytes to TXDATA register (0x48) in chunks
|
||
→ BBARegisterFile: each written byte goes into tx_bytes AsyncFIFO (exi→sync)
|
||
GC writes NCRA with ST1:ST0 = 01 (transmit trigger)
|
||
→ BBARegisterFile pushes frame_length into tx_ctrl AsyncFIFO (exi→sync)
|
||
│
|
||
▼ AsyncFIFO crossing
|
||
│ (sync domain)
|
||
TXFrameDrain
|
||
- Pops frame_length from tx_ctrl
|
||
- Pops frame_length bytes from tx_bytes
|
||
- Forwards to W5500SPIMaster with SOF/EOF
|
||
│
|
||
▼ (sync domain)
|
||
W5500SPIMaster
|
||
- Writes frame length to S0_TX_FSR (TX Free Size Register, 0x4020)
|
||
- Writes frame bytes into Socket 0 TX buffer (BSB=0b00010)
|
||
- Writes SEND command to S0_CR (0x4001 = 0x20)
|
||
- Polls S0_IR until SEND_OK bit set
|
||
- Clears S0_IR[SEND_OK]
|
||
- Pulses tx_irq PulseSynchronizer (sync→exi)
|
||
│
|
||
▼ PulseSynchronizer crossing
|
||
│ (exi domain)
|
||
BBARegisterFile
|
||
- tx_irq arrives: sets IR[2] (TI bit), clears NCRA ST1:ST0
|
||
- If IMR[2] set: INT asserts to GC
|
||
│
|
||
▼ (GC CPU)
|
||
GC reads IR, sees TI=1
|
||
GC writes IR with TI=1 to clear
|
||
```
|
||
|
||
---
|
||
|
||
## 14. SPRAM Layout
|
||
|
||
The iCE40UP5K has 4 × 32 KB SPRAM banks (128 KB total). Map them as:
|
||
|
||
| SPRAM region | Size | Usage |
|
||
|---|---|---|
|
||
| 0x0000–0x00FF | 256 B | Reserved (address 0x00 page not used by ring buffer) |
|
||
| 0x0100–0x0FFF | 3840 B | RX ring buffer (15 × 256-byte pages, pages 0x01–0x0F) |
|
||
| 0x1000–0x17FF | 2048 B | TX frame staging buffer |
|
||
| 0x1800–0x1FFF | 2048 B | Reserved / future use |
|
||
|
||
The ring buffer uses pages 0x01–0x0F (15 pages × 256 bytes = 3840 bytes). This
|
||
matches the MX98730EC default `RHBP` (RX High Boundary Page) value of 0x0F and
|
||
`RRP` reset value of 0x01.
|
||
|
||
**SPRAM addressing:** iCE40UP5K SB_SPRAM256KA instances are 64K × 16-bit
|
||
(128 KB total across 4 instances). To address the ring buffer region as bytes:
|
||
- Byte address 0x0100 maps to SPRAM word address 0x0080 (byte 0x0100 >> 1)
|
||
- The arbiter converts byte addresses to word addresses and uses MASKWREN for
|
||
byte selection
|
||
|
||
---
|
||
|
||
## 15. Critical Timing Constraints
|
||
|
||
### Must-meet timing in `exi` domain (96 MHz → 10.4 ns period)
|
||
|
||
| Path | Budget | Notes |
|
||
|---|---|---|
|
||
| FFSynchronizer output → edge detect flip-flop | 1 cycle = 10.4 ns | Trivially met — just a register |
|
||
| Edge detect → shift register update | 1 cycle | Register-to-register, no logic |
|
||
| `rx_valid` → header decode → `spram_req` FIFO write | 2 cycles | Address decode is combinatorial MUX; must close at 96 MHz |
|
||
| `tx_load` → `tx_byte` driven from register file | 1 cycle | `regs[addr]` array lookup — critical path; keep address decode combinatorial depth ≤ 4 LUTs |
|
||
| `tx_load` → `tx_byte` driven from prefetch buffer | 1 cycle | Just a register read — trivial |
|
||
|
||
### Must-meet timing in `sync` domain (48 MHz → 20.8 ns period)
|
||
|
||
| Path | Budget | Notes |
|
||
|---|---|---|
|
||
| SPRAM read request → SPRAM address valid | 1 cycle | AsyncFIFO read + mux — easy |
|
||
| SPRAM DATAOUT → result FIFO write | 1 cycle | Register-to-FIFO — easy |
|
||
| W5500 SPI bit engine | N/A | Clock-enable based at 24 MHz effective; no hard timing |
|
||
|
||
### Cross-domain latency budget for SPRAM prefetch
|
||
|
||
```
|
||
EXI header phase duration: 16 exi clocks at 96 MHz = 167 ns
|
||
|
||
SPRAM prefetch round trip:
|
||
exi → spram_req FIFO write: 1 exi tick = 10 ns
|
||
FIFO cross-domain: 2 sync ticks = 42 ns
|
||
SPRAM read (1 cycle latency): 1 sync tick = 21 ns
|
||
Result → spram_rsp FIFO write: 1 sync tick = 21 ns
|
||
FIFO cross-domain: 2 exi ticks = 21 ns
|
||
Result available in prefetch buffer: = 21 ns
|
||
Total: ~136 ns
|
||
|
||
136 ns < 167 ns header window → prefetch completes before first data bit needed ✓
|
||
```
|
||
|
||
This is the tightest timing consideration in the design. The prefetch must be
|
||
issued during HEADER1 (not after) to make the deadline.
|
||
|
||
---
|
||
|
||
## 16. SPRAM Read Prefetch Pipeline
|
||
|
||
The prefetch pipeline ensures MISO data is always ready before the SPI slave
|
||
needs it for the data phase.
|
||
|
||
### State machine (in BBARegisterFile, exi domain)
|
||
|
||
```
|
||
State HEADER1 (decoding second header byte):
|
||
If is_read AND address >= 0x100:
|
||
push address into spram_req AsyncFIFO ← issued NOW, during header decode
|
||
set prefetch_pending = True
|
||
|
||
State DATA (read phase):
|
||
On each tx_load pulse:
|
||
If prefetch_pending AND spram_rsp FIFO has data:
|
||
pop byte from spram_rsp FIFO
|
||
load into tx_byte
|
||
push (address + byte_ctr + 1) into spram_req for NEXT byte ← pipelining
|
||
Elif address < 0x100:
|
||
tx_byte = regs[address + byte_ctr] ← direct register file read
|
||
```
|
||
|
||
### Pipeline depth
|
||
|
||
The `spram_req` and `spram_rsp` FIFOs each have depth 4. This allows up to 4
|
||
read requests to be in-flight simultaneously, which absorbs any SPRAM arbiter
|
||
stalls (ETH write winning the arbitration) without stalling the SPI data phase.
|
||
|
||
### SPRAM arbiter stall handling
|
||
|
||
If the SPRAM arbiter defers an EXI read by 1 cycle (due to ETH write priority),
|
||
the `spram_rsp` FIFO will be momentarily empty when `tx_load` arrives. The
|
||
BBARegisterFile must stall the SPI slave in this case.
|
||
|
||
However: the SPI slave cannot be stalled mid-bit. The stall mechanism must
|
||
work at byte boundaries only — i.e., after a complete byte has been transmitted,
|
||
hold MISO at 0 (or 1) and do not toggle until the next byte is ready. Since the
|
||
GC is the SPI master and controls CLK, it will simply clock in garbage on the
|
||
retry byte.
|
||
|
||
**Practical note:** At 48 MHz sync with 24 MHz effective W5500 access rate, the
|
||
ETH write path can only consume the SPRAM arbiter for ~1 sync cycle per byte
|
||
written. The EXI read path gets the remaining cycles. With 4-deep FIFOs the
|
||
pipeline should almost never stall in practice. Monitor the stall condition in
|
||
simulation.
|
||
|
||
---
|
||
|
||
## 17. Interrupt Handling
|
||
|
||
The `exi_int_n` output (pin 3 of SP1) is active-low. Assert it (drive low)
|
||
when `IR & IMR != 0`.
|
||
|
||
```python
|
||
# In BBARegisterFile, exi domain:
|
||
ir_masked = Signal(8)
|
||
m.d.comb += ir_masked.eq(regs[BBARegs.IR] & regs[BBARegs.IMR])
|
||
m.d.exi += exi_int_n.eq(~ir_masked.any())
|
||
```
|
||
|
||
Register the output — do not drive `exi_int_n` combinatorially. A registered
|
||
output prevents glitches from propagating onto the GC board.
|
||
|
||
**Interrupt sources and IR bit assignments:**
|
||
|
||
| IR bit | Name | Set by | Cleared by |
|
||
|---|---|---|---|
|
||
| 7 | RBFI | RXFrameAssembler when ring full | GC write-1-to-clear |
|
||
| 4 | TEI | TXFrameDrain on TX error | GC write-1-to-clear |
|
||
| 2 | TI | tx_irq pulse from sync | GC write-1-to-clear |
|
||
| 1 | RI | rx_irq pulse from sync | GC write-1-to-clear |
|
||
|
||
The GC typically masks in IMR: 0x86 = 0b10000110 (RBFI | TI | RI).
|
||
|
||
---
|
||
|
||
## 18. EEPROM / MAC Address
|
||
|
||
The GC software reads the MAC address from the 93C46 EEPROM during
|
||
initialisation (bit-banging through register 0x1C). It then writes the MAC
|
||
to PAR0–PAR5.
|
||
|
||
**Recommended approach for initial implementation:**
|
||
|
||
Skip full 93C46 emulation. Pre-populate `regs[0x1C]` with a pattern that makes
|
||
the EEPROM read return a valid MAC. Use Nintendo's OUI `00:09:BF` for the first
|
||
3 bytes, with locally administered bits for the last 3:
|
||
|
||
```
|
||
MAC: 00:09:BF:00:00:01
|
||
```
|
||
|
||
Verify against Swiss source whether it validates the MAC read from EEPROM or
|
||
accepts whatever PAR0–PAR5 contains. If it re-reads EEPROM after writing PAR,
|
||
a full 93C46 model is required. If it only uses PAR0–PAR5, pre-populating the
|
||
register file is sufficient.
|
||
|
||
**MAC address propagation:**
|
||
|
||
When the GC writes PAR0–PAR5, forward the new MAC to the W5500 SHAR register
|
||
via the `sync` domain. Use a 6-byte AsyncFIFO or a dedicated MAC update pulse.
|
||
The W5500 uses SHAR as its source MAC for all transmitted frames.
|
||
|
||
---
|
||
|
||
## 19. iCE40UP5K Resource Budget
|
||
|
||
| Resource | Available | Estimated use | Margin |
|
||
|---|---|---|---|
|
||
| Logic cells (4-LUT + FF) | 5280 | ~1800 | 66% free |
|
||
| EBR (4 Kbit blocks) | 30 (120 Kbit) | 4 (FIFOs) | 26 free |
|
||
| SPRAM (32 KB banks) | 4 (128 KB) | 1 bank for ring buffer | 3 free |
|
||
| PLL | 1 | 1 (for exi domain) | 0 free |
|
||
| SB_HFOSC | 1 | 1 (sync domain) | 0 free |
|
||
| I/O pins | 39 usable | ~14 (EXI:5 + W5500:6 + misc:3) | 25 free |
|
||
|
||
**Logic cell breakdown:**
|
||
|
||
| Module | Estimated cells |
|
||
|---|---|
|
||
| SPIMode3Slave | 90 |
|
||
| BBARegisterFile FSM + decode | 250 |
|
||
| Register file (512 × 8b) | ~200 (distributed RAM) |
|
||
| AsyncFIFO × 8 | 400 |
|
||
| PulseSynchronizer × 4 | 40 |
|
||
| FFSynchronizer × 5 | 30 |
|
||
| SPRAMArbiter | 80 |
|
||
| RXFrameAssembler | 200 |
|
||
| TXFrameDrain | 150 |
|
||
| W5500SPIMaster | 200 |
|
||
| EEPROMModel | 100 |
|
||
| Misc glue | 60 |
|
||
| **Total** | **~1800** |
|
||
|
||
iCE40UP5K fmax with nextpnr: typically 60–80 MHz for logic of this complexity.
|
||
The `exi` domain at 96 MHz is the tightest. If nextpnr fails to close timing:
|
||
|
||
1. First option: reduce to 64 MHz `exi` domain (icepll alternative).
|
||
2. Second option: reduce EXI bus speed in Swiss settings to 16 MHz (clock index
|
||
4 instead of 5), halving the FPGA timing requirement.
|
||
3. Third option: add pipeline registers on the critical address decode path.
|
||
|
||
---
|
||
|
||
## 20. PCB / Connector Notes
|
||
|
||
### Interposer PCB
|
||
|
||
A simple pass-through interposer PCB connects the GC SP1 slot to the iCEbreaker
|
||
via a ribbon cable or header.
|
||
|
||
**Required PCB spec:**
|
||
- Thickness: **1.2 mm** (not standard 1.6 mm — critical for fit)
|
||
- Copper finish: **ENIG (gold)** — prevents oxidation on edge contacts
|
||
- Board material: FR4 standard
|
||
|
||
**Footprint source:** Copy the edge connector footprint from
|
||
`github.com/silverstee1/SP1ETH` KiCad files. Do not design from scratch.
|
||
The staggered dual-row geometry requires exact pad positions that have been
|
||
physically verified. Cross-reference with the ETH2SP1 LaserBear open files.
|
||
|
||
**Additional interposer components:**
|
||
- 10 kΩ resistor: EXTIN (pin 1) to 3.3V (pin 7) — device detect
|
||
- 100 µF capacitor: 3.3V to GND — bulk decoupling near connector
|
||
- 100 nF capacitor × 2: additional HF decoupling
|
||
- ESD protection diode array: on CLK, MOSI, MISO, CS lines (optional but
|
||
recommended — the GC motherboard is difficult to repair if damaged)
|
||
|
||
**Do not connect pin 5 (12V) to anything on the FPGA side.**
|
||
|
||
### iCEbreaker connection
|
||
|
||
The interposer PCB exposes EXI signals on a 2.54 mm pitch 8-pin header.
|
||
Connect to iCEbreaker PMOD1 connector using a short ribbon cable. Keep the
|
||
cable as short as possible (< 10 cm) to minimize signal integrity issues at
|
||
32 MHz.
|
||
|
||
---
|
||
|
||
## 21. Known Hardware Quirks
|
||
|
||
### EXI DMA bug
|
||
|
||
The GC's EXI DMA engine has a bug where data on the MISO line during a DMA
|
||
write is clocked back out with a 1-bit shift. This only affects GC software
|
||
doing DMA writes (rare). Swiss uses IMM (immediate) mode transfers. No FPGA
|
||
workaround needed.
|
||
|
||
### SPI Mode 3 vs Mode 0
|
||
|
||
Every other EXI device (memory cards, RTC, IPL) uses SPI Mode 0. The BBA
|
||
is the only device using Mode 3. Do not share the SPI slave implementation
|
||
with other EXI device implementations without parameterising CPOL/CPHA.
|
||
|
||
### MISO tristate
|
||
|
||
On real hardware, MISO (DO) is tristated when CS is deasserted. Other EXI
|
||
devices on the same bus would otherwise conflict. On this FPGA implementation,
|
||
drive MISO high (not tristated) when CS is deasserted. The iCE40UP5K does
|
||
not easily support pin tristate from user logic — drive high is safe because
|
||
the BBA occupies a dedicated CS line (SP1 device 2) separate from memory cards
|
||
and the RTC.
|
||
|
||
### GC hardware revisions
|
||
|
||
- DOL-001 (original): SP1 present, BBA compatible
|
||
- DOL-001 Rev B: SP1 physically absent on motherboard but case hole present
|
||
- DOL-101 (later): SP1 present again (but Serial Port 2 absent)
|
||
- Panasonic Q: SP1 present
|
||
|
||
Swiss supports all revisions with SP1 via the EXI hypervisor driver (required
|
||
from Swiss build 1788 onwards for BBA emulation features).
|
||
|
||
### EXI clock index
|
||
|
||
The real BBA uses clock index 5 (32 MHz). Swiss allows configuring a lower
|
||
clock index for compatibility. If 96 MHz fmax is not achievable, instruct users
|
||
to configure Swiss to use clock index 4 (16 MHz EXI), which requires only
|
||
32 MHz `exi` domain and is trivially achievable.
|
||
|
||
---
|
||
|
||
## 22. File Structure
|
||
|
||
```
|
||
gc_bba_fpga/
|
||
├── exi_bba/
|
||
│ ├── __init__.py
|
||
│ ├── spi_mode3_slave.py # SPIMode3Slave
|
||
│ ├── bba_register_file.py # BBARegisterFile + register constants
|
||
│ ├── spram_arbiter.py # SPRAMArbiter
|
||
│ ├── rx_frame_assembler.py # RXFrameAssembler
|
||
│ ├── tx_frame_drain.py # TXFrameDrain
|
||
│ ├── w5500_spi_master.py # W5500SPIMaster
|
||
│ ├── eeprom_model.py # EEPROMModel (93C46)
|
||
│ └── bba_top.py # BBATop + clock domain setup
|
||
├── sim/
|
||
│ ├── sim_spi_slave.py # SPIMode3Slave unit test
|
||
│ ├── sim_register_file.py # BBARegisterFile unit test
|
||
│ ├── sim_bba_init.py # Full init sequence simulation
|
||
│ ├── sim_rx_path.py # RX data path end-to-end test
|
||
│ ├── sim_tx_path.py # TX data path end-to-end test
|
||
│ ├── gc_master_model.py # GC CPU SPI master simulation model
|
||
│ ├── w5500_slave_model.py # W5500 SPI slave simulation model
|
||
│ └── ethernet_frame_gen.py # Test frame generator
|
||
├── platform/
|
||
│ ├── icebreaker_bba.py # iCEbreaker platform with BBA resources
|
||
│ └── interposer_pinmap.py # SP1 ↔ PMOD pin mapping
|
||
├── pcb/
|
||
│ ├── interposer/ # KiCad project for interposer PCB
|
||
│ └── README.md # PCB ordering instructions (1.2mm, ENIG)
|
||
├── constraints/
|
||
│ └── timing.py # nextpnr timing constraints (if needed)
|
||
├── tests/
|
||
│ └── test_bba.py # pytest suite
|
||
├── build.py # Amaranth build script
|
||
└── README.md
|
||
```
|
||
|
||
---
|
||
|
||
## 23. Simulation Strategy
|
||
|
||
Each module should have a standalone simulation before integration. All
|
||
simulations use Amaranth's `Simulator` with two clock domains:
|
||
`sim.add_clock(1/96e6, domain="exi")` and `sim.add_clock(1/48e6, domain="sync")`.
|
||
|
||
### Unit tests
|
||
|
||
**SPIMode3Slave:** Drive CLK/MOSI/CS manually from a process in the `exi`
|
||
domain. Verify `rx_byte`/`rx_valid` match sent data. Verify `spi_miso`
|
||
matches pre-loaded `tx_byte`. Test CS abort mid-byte.
|
||
|
||
**BBARegisterFile:** Use a `GCMasterModel` (SPI Mode 3 master process) to
|
||
perform read/write transactions. Verify register writes are stored. Verify
|
||
register reads return correct values. Verify IR bit setting and clearing.
|
||
Verify NWAYS returns 0x17. Verify ID query returns 0x04020200.
|
||
|
||
**SPRAMArbiter:** Issue concurrent EXI reads and ETH writes. Verify ETH writes
|
||
win arbitration. Verify EXI reads complete within 3 sync cycles. Verify no
|
||
data corruption.
|
||
|
||
**RXFrameAssembler:** Feed a known ethernet frame byte-by-byte. Verify SPRAM
|
||
contents match expected descriptor + frame layout. Verify RWP advances by
|
||
correct page count. Verify rx_irq fires.
|
||
|
||
**TXFrameDrain + W5500SPIMaster:** Issue TX frame from `tx_bytes` FIFO. Use
|
||
`W5500SlaveModel` process to simulate W5500 responses. Verify frame bytes
|
||
arrive at W5500 correctly. Verify tx_irq fires after SEND_OK.
|
||
|
||
### Integration test
|
||
|
||
**sim_bba_init.py:** Full GC init sequence (all 11 steps from Section 11).
|
||
`GCMasterModel` performs every transaction. Verify no stalls, correct responses.
|
||
|
||
**sim_rx_path.py:** `W5500SlaveModel` delivers a 64-byte test frame.
|
||
`GCMasterModel` polls IR, reads RWP, bulk-reads the frame, advances RRP.
|
||
Verify GC receives identical bytes to what W5500 sent.
|
||
|
||
**sim_tx_path.py:** `GCMasterModel` writes a 64-byte frame through TXDATA.
|
||
`W5500SlaveModel` captures it. Verify W5500 receives identical bytes.
|
||
|
||
---
|
||
|
||
## 24. Open Issues and Extension Points
|
||
|
||
### Must resolve before first synthesis
|
||
|
||
- [ ] Exact PLL parameters for iCE40UP5K: run `icepll -i 12 -o 96` and
|
||
confirm the output is achievable (VCO in 533–1066 MHz range).
|
||
- [ ] SP1 connector footprint: clone SP1ETH repo, extract pad positions, verify
|
||
stagger geometry and pitch before PCB layout.
|
||
- [ ] W5500 Pmod module pin mapping: confirm which Pmod pins INT_N and RST_N
|
||
appear on (varies by module vendor).
|
||
- [ ] Swiss version requirement: confirm Swiss build ≥ 1788 for BBA hypervisor
|
||
support. Earlier builds use a different driver that may have different
|
||
register access patterns.
|
||
|
||
### Known limitations
|
||
|
||
- Single TX buffer (MX98730EC has two). ST1:ST0 = 01 and 10 are treated
|
||
identically. No known GC title relies on dual TX buffering.
|
||
- No DMA mode support. IMM mode only. Matches real-world Swiss usage.
|
||
- No Serial Port 2 support (different connector, different project scope).
|
||
- 93C46 EEPROM emulation is simplified (hardcoded MAC). A full bit-bang
|
||
model can be added later if Swiss requires it.
|
||
- RX ring buffer is 15 pages (3840 bytes). The real BBA has 4KB. Frames
|
||
larger than ~3800 bytes (jumbo frames) will be dropped. Standard 1500-byte
|
||
MTU frames fit in at most 7 pages — no practical issue.
|
||
|
||
### Extension points
|
||
|
||
- **Larger ring buffer:** Use additional SPRAM banks for more RX buffering.
|
||
- **Multiple sockets:** W5500 supports 8 sockets; only socket 0 in MACRAW
|
||
mode is used here.
|
||
- **Link status passthrough:** Read W5500 PHYCFGR register and forward real
|
||
link status to NWAYS instead of hardcoding 0x17.
|
||
- **Statistics counters:** LTPS/LRPS (last packet status) are currently 0x00.
|
||
A more complete implementation would fill these from W5500 socket status.
|
||
- **Serial Port 2 support:** Different physical connector and EXI channel but
|
||
same FPGA logic; would require a second interposer PCB.
|