Files
rebbarb/CLAUDE.md
T
2026-06-13 18:35:38 +02:00

494 lines
25 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project: GC BBA FPGA Replacement
Replace the GameCube Broadband Adapter (DOL-015 / MX98730EC) with an iCEbreaker
FPGA (Lattice iCE40UP5K) written in Amaranth HDL. The FPGA emulates the BBA
register interface over the GameCube EXI bus and bridges to a WIZnet ethernet
chip for real 100BASE-TX ethernet — default **W5100** (indirect parallel bus,
reaches the EXI throughput ceiling) or **W5500** (SPI Pmod, simpler wiring but
~12 Mbit/s). GC software (Swiss homebrew) sees an identical BBA. See "W5100 vs
W5500 ethernet back-end".
---
## Development Environment
**Preferred:** Use the devcontainer (`.devcontainer/`) which includes Python 3.12,
`nextpnr-ice40`, and `fpga-icestorm` pre-installed.
**Windows host + WSL2 devcontainer — USB flashing setup:**
1. Install `usbipd-win` (https://github.com/dorssel/usbipd-win/releases)
2. Run `.devcontainer/attach-icebreaker.ps1` as Administrator before opening the devcontainer
3. The devcontainer runs `--privileged` to pass through the USB device
**Local venv (outside devcontainer):**
```bash
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
```
Yosys is bundled in `amaranth-yosys`; `nextpnr-ice40` and `iceprog` must be
installed separately (via apt on Linux, or via the devcontainer).
---
## Commands
**Build and flash the iCEbreaker (must run from workspace root):**
```bash
python rebbarb/rebbarb.py
```
Runs synthesis (yosys), place-and-route (nextpnr-ice40), and flashes via `iceprog`.
Set `ICEPROG=/path/to/iceprog` env var to override the binary location.
Note: `rebbarb/rebbarb.py` builds a 36 MHz LED blink demo. The BBA
implementation (`exi_bba/`) uses a split-domain clock: `capture` @ 54 MHz (PLL)
for the SPI bit engine, `exi`/`sync` @ 24 MHz (HFOSC) for everything else.
Synthesize/flash the real design with `python -m exi_bba.synth [--flash]`.
**Run a simulation:**
```bash
# New-API testbench style (preferred for new code):
python rebbarb/toggle_button.py # writes ToggleButton.vcd
python rebbarb/pulse_button.py # writes PulseButton.vcd
# Old-API process style (reference only, do not replicate in new code):
python examples/amaranth_cdc.py # CDC primitives demo
python examples/async_fifo.py # AsyncFIFO behaviour
python examples/icebreaker_fifo.py # iCEbreaker-specific FIFO (Verilog dump)
```
Open VCD output with `gtkwave`. Simulations are the primary testing mechanism —
there is no separate test runner.
**Verify PLL parameters:**
```bash
icepll -i 12 -o 54 # confirms DIVR=0 DIVF=71 DIVQ=4 → 54 MHz (capture domain)
```
(`exi`/`sync` come from the internal SB_HFOSC ÷2 = 24 MHz — no PLL.)
---
## Current Implementation State
The `exi_bba/` module tree is **fully implemented** with simulation testbenches.
All modules elaborate without errors and pass their unit tests. The full design
**synthesizes, places, routes, and meets timing** on the iCE40UP5K
(`python -m exi_bba.synth`): `capture` closes ~70 MHz (target 54) and `exi`/
`sync` close ~36 MHz (target 24) — both PASS.
### `exi_bba/` module status
| Module | File | Tests pass |
|---|---|---|
| `BBATop` | `exi_bba/bba_top.py` | ✅ EXI integration + full W5100→SPRAM→GC RX loop; synth PASS |
| `ExiCapture` | `exi_bba/exi_capture.py` | ✅ rx/tx byte-stream + over-push/flush |
| `SPIMode3Slave` | `exi_bba/spi_mode3_slave.py` | ✅ 4 tests (live-drive TX) |
| `BBARegisterFile` | `exi_bba/bba_register_file.py` | ✅ 7 tests (proactive push + DMA stream) |
| `SPRAMArbiter` | `exi_bba/spram_arbiter.py` | ✅ 3 tests |
| `RXFrameAssembler` | `exi_bba/rx_frame_assembler.py` | ✅ 3 tests |
| `TXFrameDrain` | `exi_bba/tx_frame_drain.py` | ✅ 2 tests |
| `W5100ParallelMaster` | `exi_bba/w5100_parallel_master.py` | ✅ 5 tests (init/TX/RX vs bus model, incl. ring wrap) — **default eth back-end** |
| `W5500SPIMaster` | `exi_bba/w5500_spi_master.py` | ✅ init/TX/RX vs SPI-slave model (alt back-end) |
| `StatusPanel` | `exi_bba/status_panel.py` | ✅ 6 tests (heartbeat, stretched activity LEDs, debounced buttons, freeze) |
| `EEPROMModel` | `exi_bba/eeprom_model.py` | ✅ 4 tests |
**Bring-up status panel (optional):** `BBATop(status_panel=True)` adds a
`StatusPanel` driving onboard iCEbreaker LEDs + button (dedicated pins, so it
coexists with EXI + W5100). `synth.py` enables it: **LEDG=heartbeat**,
**LEDR=EXI activity** (the GC is talking), **BTN_N=manual re-init**. The full
EXI + W5100 + panel build synthesizes and meets timing (slow ~35≥24, capture
~64≥54, 44% LC). Panel LEDs 35 (rx/tx/ready) exist in the module but aren't
mapped on the iCEbreaker (only 2 discrete LEDs); the onboard RGB or a custom
PCB can expose them.
**Ethernet back-end is selectable:** `BBATop(eth="w5100")` (default — indirect
parallel bus, reaches the ~27 Mbit/s EXI ceiling) or `BBATop(eth="w5500")` (SPI,
~12 Mbit/s). Both masters expose the identical tx/rx/init/par streaming
interface; only the physical pins differ. See "W5100 vs W5500" below.
### Run all module testbenches (from workspace root)
```bash
python -m exi_bba.spi_mode3_slave
python -m exi_bba.exi_capture
python -m exi_bba.bba_register_file
python -m exi_bba.spram_arbiter
python -m exi_bba.rx_frame_assembler
python -m exi_bba.tx_frame_drain
python -m exi_bba.w5100_parallel_master # 5 tests: init, TX(+wrap), RX(+wrap)
python -m exi_bba.w5500_spi_master
python -m exi_bba.status_panel # 6 tests: heartbeat/activity/buttons
python -m exi_bba.eeprom_model
python -m exi_bba.bba_top # end-to-end EXI integration test (W5100 RX loop)
```
### Pending work
- **Synthesis/timing**: ✅ done — `python -m exi_bba.synth` synthesizes, P&Rs,
and meets timing on both clock domains (capture ~68≥54, slow ~40≥24).
- **W5500 init/TX/RX**: ✅ done — `W5500SPIMaster` has a real Mode-0 byte engine,
a generic register-transaction engine (header + wbuf/stream payload), the full
init sequence (MR reset, SHAR, S0_MR MACRAW, S0_CR OPEN, S0_IMR), MACRAW TX
(read TX_WR → stream frame to TX buffer → advance TX_WR → SEND) and MACRAW RX
(RSR → RD → 2-byte length → stream frame out → advance RD → RECV). All verified
on the wire by a responding W5500 SPI-slave model in the testbench.
- **PAR05 → W5500 SHAR**: ✅ done — `reg.par` wired to `w5500.par` in `BBATop`
(PAR0 packed in the low byte so it is the first SHAR octet).
- **NCRA SR bit**: ✅ done — `BBARegisterFile.ncra_sr` (= NCRA[3]) gates
`asm.rx_enabled` in `BBATop` (was hard-wired to 1).
- **W5500 SPI throughput**: SCK = sync÷2 = 12 MHz (~12 Mbit/s) — exceeds
real-world GC BBA TCP throughput (~610 Mbit/s) but is below the 27 Mbit/s raw
EXI ceiling. Pushing past 12 Mbit/s was investigated and found NOT achievable
on this UP5K (the W5500-operating logic is distributed ~40 MHz, not just the
bit-bang) — see the "Full-rate W5500 SPI" item below.
`W5500SPIMaster(clk_div=N)` divides SCK further if signal integrity needs it.
- **EXI DMA bulk reads**: ✅ done — SPRAM-region reads (addr ≥ 0x100) now STREAM
until CS deasserts instead of stopping at the header's 2-bit length, so they
serve both ≤4-byte immediate reads (Swiss) AND arbitrary-length DMA reads
(other GC software, and a future Swiss path for loading ROMs from a network
file store). Implementation:
- `SPIMode3Slave.cs_active` (synchronised CS level) → `ExiCapture` crosses it
to the exi domain (FFSynchronizer) → `BBARegisterFile.cs_active`.
- `BBARegisterFile` SPRAM_STREAM state: auto-increments the SPRAM address,
prefetches up to SP_LIMIT=4 reads in flight, pushes responses to tx_fifo;
SPRAM_END drains the in-flight pipeline + rx dummies on CS-rise.
- `ExiCapture` flushes tx_fifo on CS-fall to clear prefetch over-push so a
truncated DMA read can't leak stale bytes into the next transaction.
Tested: register-file streaming read (SPRAM model, 12 bytes), ExiCapture
over-push/flush, AND the full BBATop loop — a W5500 model delivers a frame →
W5500 master RX → RXFrameAssembler writes the SPRAM ring → GC reads RWP then
DMA-reads the descriptor+frame back (verified byte-for-byte).
Note: a DMA read header must keep length-1 within the 2-bit field; the GC
driver sets it ≤3 and clocks the real length via CS (the design streams
until CS regardless). (EXI DMA *writes* are not implemented; the GC's
DMA-write engine has a 1-bit-shift bug and Swiss avoids them — see
design-doc §"EXI DMA bug".)
- **S0_IR interrupt clear after RX**: ✅ done — `W5500SPIMaster` RX_CLR_IR state
writes Sn_IR[2]=1 after RECV so `INT_N` deasserts (else the FSM would re-enter
RX_CHECK forever on real hardware).
- **Full-rate W5500 SPI (27 Mbit/s) — INVESTIGATED, NOT achievable on UP5K**:
the W5500 SCK is sync÷2 = 12 MHz. Raising it needs the SPI engine on a ≥54 MHz
clock, but a standalone synth of `W5500SPIMaster` in the capture domain closes
only **40 MHz** — and the slack histogram shows the failure is *distributed*
(~140 endpoints fail 54, incl. the `wbuf`/header mux feeding the shift
register), NOT a single cuttable path. So the bottleneck is the **logic that
operates the SPI device** (transaction FSM, byte sourcing), not the bit-bang.
Consequences:
- The "split the bit engine to capture + per-byte CDC handshake" idea nets
only ~14 Mbit/s — the CDC round-trip ≈ the SPI byte time — not worth it.
- A capture-domain "streaming executor" would still contain that distributed
~40 MHz logic, so it wouldn't close 54 either.
- Hardware `SB_SPI` wouldn't help (it only offloads the bit-bang, which was
never the bottleneck) and is unsimulatable.
- There is no usable clock between 24 (HFOSC) and 54 (the one PLL, needed at
54 for the EXI front-end); PLL÷2 = 27 → SCK 13.5 MHz, a ~12% gain, not
worth the fabric divider.
Net: 12 Mbit/s is the practical W5500 ceiling on this part. It exceeds
real-world GC BBA TCP throughput and is fine for chunked ROM streaming.
Reaching 27 Mbit/s would need a faster FPGA or a much shallower W5500-operating
redesign (uncertain) — **OR a parallel-bus ethernet chip (see W5100 below)**,
which is the implemented solution for the ROM-streaming throughput target.
## W5100 vs W5500 ethernet back-end
The throughput insight: SPI serialises 8 bits/byte, so the W5500 byte rate is
(operating-logic clock)/16 — and that logic caps ~40 MHz on this UP5K → ~12
Mbit/s. A **parallel** bus moves a whole byte per access, so the *same* ~24 MHz
`sync` logic clears the 27 Mbit/s EXI ceiling (the real hard limit — the GC EXI
bus tops out there). So `W5100ParallelMaster` is the throughput path and is now
the `BBATop` default.
- **Interface:** W5100 **indirect parallel bus** (IDM). Only A[1:0] are wired
(board ties A[14:2]=0 so a power-up direct access at A=00 still hits MR):
`00`=MR, `01`=IDM_AR0(hi), `10`=IDM_AR1(lo), `11`=IDM_DR. A register/buffer
access = write IDM_AR (the 16-bit address) then read/write IDM_DR. With MR.AI
set, IDM_DR auto-increments → a multi-byte block is one address-set + a burst.
- **Bus engine:** drives A + D with `/CS` and `/RD`|`/WR` asserted for
`strobe_cycles` (default 3 ≈ 125 ns at 24 MHz, ≥ the W5100's ~80 ns access).
DATA[7:0] is bidirectional → an SB_IO tristate (`bus_data_o`/`oe`/`i`).
- **Pins (15):** A[1:0]=2, D[7:0]=8, /CS,/RD,/WR=3, /INT=1, /RST=1. With EXI (5)
+ clk (1) = **21 of ~34 usable SG48 I/O** — comfortable. See `synth.py`.
- **MR.AI requires init first:** unlike the W5500 (each SPI transaction is
self-framed), the W5100's multi-byte accesses depend on MR.AI, so the init
sequence (triggered by the GC's NCRA reset) MUST run before any TX/RX. The
BBATop test issues NCRA-reset before its RX loop for this reason; on hardware
the GC driver already does. (`BBATop(reset_cycles=N)` shrinks the MR settle
wait for sim.)
- **Ring wraparound is in fabric:** the W5100 does NOT auto-wrap the IDM address
at the socket-buffer boundary (the W5500 did), so the streamer re-sets IDM_AR
to the buffer base when the running address reaches the 2 KB boundary. Handled
in the SW/SR/RB paths (`xfer_wrap`/`xfer_wbase`/`xfer_wend`/`cur_addr`); both
TX and RX wrap cases are tested.
- **Register map differs from the W5500:** common regs at 0x0000 (MR, SHAR 0x09,
IMR 0x16, RMSR/TMSR 0x1A/0x1B), socket 0 at 0x0400 (S0_MR/CR/IR, TX_WR 0x424,
RX_RSR 0x426, RX_RD 0x428), TX buffer 0x4000, RX buffer 0x6000. MACRAW mode.
- **Status:** init/TX/RX (with wrap) verified vs a bus model; BBATop full
W5100→SPRAM→GC RX loop passes byte-for-byte; synth PASS (slow ~32≥24, capture
~56≥54, 42% LC). Register addresses/MR bits are from the datasheet (from
memory) — **confirm at hardware bring-up**.
### `rebbarb/` — LED blink demo (unchanged)
- `rebbarb.py` — blinks LEDs via a PLL (36 MHz), demonstrates `IceBreakerPlatform`
- `debouncer.py``Debouncer(cycles)` — synchronous debounce, configurable hold
- `toggle_button.py``ToggleButton` — edge-to-toggle state machine (wraps Debouncer)
- `pulse_button.py``PulseButton` — single-cycle pulse on rising edge (wraps Debouncer)
These components are reusable building blocks. The `Debouncer` and button wrappers
will be needed for any physical input in `exi_bba/`.
**Import note:** `rebbarb/` files use bare imports (`from debouncer import Debouncer`).
Run them as `python rebbarb/<file>.py` from the workspace root so Python adds
`rebbarb/` to `sys.path` automatically.
**Simulation at module level:** `toggle_button.py` and `pulse_button.py` run
their simulations unconditionally (no `__main__` guard) — importing either file
triggers a VCD write. New modules should guard simulation code with
`if __name__ == "__main__":`.
`examples/amaranth_cdc.py` contains handwritten `SyncFF` and `TogglePulseSync`
reference implementations — use `amaranth.lib.cdc` primitives (`FFSynchronizer`,
`PulseSynchronizer`) in production code instead.
`hardware/sp1_test_plug/` — KiCad project for a physical SP1 edge-connector test
plug (schematic, PCB, custom GameCube symbol library). Used to verify pad geometry
before ordering the interposer PCB; not part of the FPGA build.
---
## Amaranth Simulator API
Two API generations are present in this repo:
| API | Where used | Status |
|---|---|---|
| `sim.add_testbench(async_fn)` + `await ctx.tick()` + `Period(MHz=n)` | `rebbarb/*.py` | **Use this for new code** |
| `sim.add_sync_process(gen_fn)` + `sim.run_until(t)` | `examples/` | Old — reference only |
New modules should use the testbench API (`add_testbench`, `sim.write_vcd(ctx)`
context manager). The old process API still works but is not idiomatic in current
Amaranth.
**Critical testbench timing rule:** `ctx.get(signal)` reads signal values AFTER
the clock edge (post-update registered values). Combinatorial signals that depend
on registered signals that were updated by the SAME tick will already reflect the
new registered values. For example: if `tx_sof = tx_bytes_r_rdy & is_first` and
`is_first` is cleared synchronously on the first byte, then reading `tx_sof` after
the first byte's tick always returns 0 — read BEFORE the tick instead.
**`ctx.set()` takes effect immediately** (combinatorial, not registered). Use it
AFTER `await ctx.tick()` to prepare inputs for the NEXT tick.
The full design specification lives in `docs/gc_bba_fpga_design.md`.
---
## Key Architecture Decisions
- **No network stack in the FPGA.** The GC CPU runs TCP/IP. The FPGA is a dumb
MAC bridge.
- **Split-domain clocking — 3 domains, 2 sources (1 PLL + 1 HFOSC):**
- `capture` — 54 MHz (PLL, DIVR=0 DIVF=71 DIVQ=4). Hosts ONLY the SPI Mode 3
bit engine inside `ExiCapture`. 54 MHz = 2× the **real 27 MHz** EXI clock —
the minimum oversampling for clean Mode 3. The isolated bit engine closes
~91 MHz; integrated with the byte-FIFO read path the capture domain closes
~62 MHz, so 54 passes with margin.
- `exi` — 24 MHz (HFOSC ÷2). BBA register file / transaction FSM.
- `sync` — 24 MHz (same HFOSC net as `exi`). SPRAM arbiter, RX/TX engines,
W5500 SPI master.
- **Why split:** only the tiny SPI bit engine needs a fast clock to sample
27 MHz EXI. The bulky register-file/SPRAM/W5500 logic is routing-bound at
~3344 MHz on the UP5K and only needs the byte rate (27 MHz ÷ 8 ≈ 3.4 MHz).
`ExiCapture` bridges capture↔exi with rx/tx byte AsyncFIFOs.
- **EXI clock reality:** the GC EXI clock tops out at ~27 MHz. libogc's
`EXI_SPEED32MHZ` is a nominal name — the real rate is 27 MHz. The old
"96 MHz = 3× 32 MHz EXI" target was doubly wrong and unreachable on UP5K
(which caps ~44 MHz for non-trivial logic).
- **TX/MISO across the split:** the register file PROACTIVELY pushes read
responses into the tx byte FIFO during the EXI clock-idle gap (the GC pauses
the clock between an EXI_Imm header-write and the data-read). The bit engine
drives MISO live from the FIFO head; see `ExiCapture` / `SPIMode3Slave`.
- **All CDC via `amaranth.lib.cdc`.** Never pass raw multi-bit signals across
domains. Use `FFSynchronizer` for slow single bits, `PulseSynchronizer` for
events, `AsyncFIFO` for data streams, `ResetSynchronizer` for resets.
- **Register file lives entirely in `exi` domain.** The `sync` domain only
communicates through AsyncFIFOs and PulseSynchronizers — never direct register
reads/writes.
---
## Critical Protocol Notes
### EXI / SPI Mode 3
- CLK idles **HIGH** (CPOL=1, CPHA=1).
- MOSI sampled on **falling** CLK edge. MISO driven on **rising** CLK edge.
- Getting this wrong means the GC never enumerates the device.
- CS is active **low**, delineates each transaction.
### EXI Transaction Header (2 bytes before data)
```
Byte 0: [7]=write_flag [6:0]=addr[12:6]
Byte 1: [7:2]=addr[5:0] [1:0]=xfer_len-1 (0=1B … 3=4B)
```
Full address = 13 bits → 0x00000x1FFF.
### Device ID Query
On power-on the GC writes `0x0000` (2 bytes) then reads 4 bytes.
Must return: `0x04 0x02 0x02 0x00`.
---
## Memory Map (abridged)
| Range | Region |
|---|---|
| 0x00000x0033 | MAC control registers (register file, exi domain) |
| 0x0048 | TXDATA — bulk TX data port (→ `tx_bytes` AsyncFIFO) |
| 0x01000x0FFF | RX ring buffer in SPRAM (15 × 256-byte pages, pages 115) |
| 0x01000x1FFF | any read ≥ 0x0100 streams from SPRAM (DMA path); the ring proper is pages 115 above |
---
## Key Registers
| Addr | Name | Notes |
|---|---|---|
| 0x00 | NCRA | [0]=RESET self-clears; pulses `ncra_rst` to sync domain |
| 0x08 | IMR | Interrupt mask |
| 0x09 | IR | Write-1-to-clear. [1]=RI, [2]=TI. INT_N asserts when IR & IMR ≠ 0 |
| 0x1617 | RWP | RX write pointer — updated by sync domain via `rx_wptr` FIFO |
| 0x1819 | RRP | RX read pointer — GC writes after consuming frames |
| 0x2025 | PAR05 | MAC address; also forwarded to W5500 as SHAR |
| 0x31 | NWAYS | Hardcode **0x17** (100M full-duplex link up, autoneg complete) |
| 0x3A | HIPR | Hardcode **0x01** (BBA present) |
| 0x48 | TXDATA | GC streams TX frame bytes here |
---
## Module Breakdown
| Module | Domain | File |
|---|---|---|
| `BBATop` | all | `exi_bba/bba_top.py` |
| `ExiCapture` | capture (+exi FIFOs) | `exi_bba/exi_capture.py` |
| `SPIMode3Slave` | capture (param `domain`) | `exi_bba/spi_mode3_slave.py` |
| `BBARegisterFile` | exi (+FIFO to sync) | `exi_bba/bba_register_file.py` |
| `SPRAMArbiter` | sync | `exi_bba/spram_arbiter.py` |
| `RXFrameAssembler` | sync | `exi_bba/rx_frame_assembler.py` |
| `TXFrameDrain` | sync | `exi_bba/tx_frame_drain.py` |
| `W5100ParallelMaster` | sync | `exi_bba/w5100_parallel_master.py` (default eth) |
| `W5500SPIMaster` | sync | `exi_bba/w5500_spi_master.py` (alt eth) |
| `EEPROMModel` | exi | `exi_bba/eeprom_model.py` |
`ExiCapture` wraps `SPIMode3Slave` (in the fast `capture` domain) plus the
capture↔exi rx/tx byte AsyncFIFOs. `BBARegisterFile` consumes the rx byte
stream and proactively pushes read responses into the tx byte FIFO — it no
longer sees the per-bit SPI cadence (that lives entirely in `capture`).
---
## CDC Signal Inventory
| Signal | Direction | Primitive |
|---|---|---|
| EXI CLK / MOSI / CS pins | async → capture | `FFSynchronizer` (stages=2) |
| RX byte stream (capture→core) | capture → exi | `AsyncFIFO` 8-bit, depth=4 |
| TX byte stream (core→capture) | exi → capture | `AsyncFIFO` 8-bit, depth=2 |
| cs_active (transaction in progress) | capture → exi | `FFSynchronizer` (DMA read length) |
| SPRAM read request (addr) | exi → sync | `AsyncFIFO` 16-bit, depth=4 |
| SPRAM read result (data) | sync → exi | `AsyncFIFO` 8-bit, depth=4 |
| TX packet bytes | exi → sync | `AsyncFIFO` 8-bit, depth=16 |
| TX frame length | exi → sync | `AsyncFIFO` 16-bit, depth=4 |
| RX frame bytes | sync → SPRAM | `RXFrameAssembler``SPRAMArbiter` (not a byte FIFO; the GC reads frames back out of SPRAM via the SPRAM read req/rsp FIFOs) |
| RWP update | sync → exi | `AsyncFIFO` 8-bit, depth=4 |
| RRP update | exi → sync | `AsyncFIFO` 8-bit, depth=4 |
| RX ready (IR[RI]) | sync → exi | `PulseSynchronizer` |
| TX done (IR[TI]) | sync → exi | `PulseSynchronizer` |
| NCRA reset pulse | exi → sync | `PulseSynchronizer` |
---
## W5500 Configuration (on NCRA reset)
The W5500 selects the register **block** via the BSB field of the control byte,
NOT via the address — so register addresses below are **block offsets**, not flat
0x4000-style addresses (see `_W5500_*` and `_CTRL_*` in `w5500_spi_master.py`).
```
1. Write MR = 0x80 (common block, offset 0x0000) software reset
2. Wait ~1 ms
3. Write SHAR = MAC (common block, offset 0x0009, 6 bytes from PAR05)
4. Write S0_MR = 0x04 (socket-0 reg block, offset 0x0000) MACRAW
5. Write S0_CR = 0x01 (socket-0 reg block, offset 0x0001) OPEN
6. Write S0_IMR = 0x05 (socket-0 reg block, offset 0x002C) RECV | SEND_OK
```
W5500 SPI is **Mode 0** (CPOL=0 CPHA=0); SCK = **12 MHz** (the 24 MHz `sync`
domain ÷ 2 via a toggle clock-enable). Connect W5500 `INT_N` to an FPGA input
for low-latency RX detection. (The W5500 is the alternate back-end; the W5100
parallel master is the default — see "W5100 vs W5500".)
---
## Physical Interface (SP1 Edge Connector)
- PCB must be **1.2 mm thick, ENIG finish**.
- Staggered (not mirrored) top/bottom contact rows — same geometry as PCI/ISA.
- Derive exact pad geometry from **SP1ETH KiCad project** (silverstee1/SP1ETH),
cross-referenced with ETH2SP1 (LaserBear). Do not rely on YAGCD alone.
- Add **100 µF bulk cap** on the interposer near FPGA power pins (3.3 V budget
is tight: iCEbreaker ~80 mA + W5500 ~150 mA ≈ 230 mA).
- **Pin 5 is 12 V — do not connect to FPGA I/O.** Test point or leave open.
- `EXTIN` (pin 1): tie to 3.3 V via 10 kΩ — required for GC device enumeration.
- All signal levels are 3.3 V. No level shifting needed.
---
## SPRAM Notes
- iCE40UP5K has 128 KB SPRAM (SB_SPRAM256KA, 16-bit wide).
- **1-cycle synchronous read latency** — result of read at cycle N is valid at N+1.
- Byte writes via `MASKWREN`: lower byte = `0b0011`, upper byte = `0b1100`.
- Address to SPRAM = byte_address >> 1.
- ETH writes take priority over EXI reads in the arbiter (safe by ring-buffer
invariant: GC only reads pages the ETH engine has already finished).
---
## GC Initialisation Sequence (Swiss/BBA driver)
```
1. Write 0x0000 × 2, read 4 B → must get 0x04020200 (device ID)
2. Write NCRA = 0x01 (reset, self-clears; resets W5500 + SPRAM ptrs)
3. Poll NCRA bit 0 until 0 (wait reset complete)
4. Write PAR05 (MAC address)
5. Write MAR07 = 0xFF (promiscuous multicast)
6. Write ANALOG = 0xD6 (enable PHY — no FPGA effect, just store)
7. Write NWAYC (autoneg config — store only)
8. Write IMR = 0x86 (enable RBFI | TI | RI interrupts)
9. Write GCA (AUTOPUB bit)
10. Write NCRA SR bit = 0x08 (start receive)
11. Poll NWAYS until link up → return hardcoded 0x17 immediately
```
---
## Implementation Notes & Gotchas
- **`NWAYS` must return `0x17` always.** GC polls it to confirm 100 Mbps link
before enabling RX. Do not attempt to reflect real W5500 link status.
- **`EEPROMModel` can be stubbed initially.** Many GC BBA drivers write their own
MAC to PAR05 rather than using the EEPROM. Pre-populate PAR05 reset state
with a valid Nintendo OUI MAC (`00:09:BF:xx:xx:xx`).
- **`tx_load` timing in `SPIMode3Slave`:** pulses at CS assertion (first byte)
and after each complete received byte. Upstream must register next TX byte
within one `exi` clock.
- **PLL target 54 MHz**: verify with `icepll -i 12 -o 54` (DIVR=0 DIVF=71 DIVQ=4)
before coding PLL parameters; the capture-domain bit engine oversamples the
27 MHz EXI clock 2×.
- **TX buffer selection (NCRA ST bits):** Ignore buffer select (ST1 vs ST0).
Treat any non-zero ST as a TX trigger.
- **If nextpnr fails capture-domain timing at 54 MHz:** the isolated bit engine
closes ~91 MHz, so 54 has margin; if a seed fails, sweep seeds
(`synth.py --seeds N`) or instruct users to configure Swiss to a lower EXI
clock index.