# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project: GC BBA FPGA Replacement

Replace the GameCube Broadband Adapter (DOL-015 / MX98730EC) with an iCEbreaker
FPGA (Lattice iCE40UP5K) written in Amaranth HDL. The FPGA emulates the BBA
register interface over the GameCube EXI bus and bridges to a WIZnet ethernet
chip for real 100BASE-TX ethernet — default **W5100** (indirect parallel bus,
reaches the EXI throughput ceiling) or **W5500** (SPI Pmod, simpler wiring but
~12 Mbit/s). GC software (Swiss homebrew) sees an identical BBA. See "W5100 vs
W5500 ethernet back-end".

---

## Development Environment

**Preferred:** Use the devcontainer (`.devcontainer/`) which includes Python 3.12,
`nextpnr-ice40`, and `fpga-icestorm` pre-installed.

**Windows host + WSL2 devcontainer — USB flashing setup:**
1. Install `usbipd-win` (https://github.com/dorssel/usbipd-win/releases)
2. Run `.devcontainer/attach-icebreaker.ps1` as Administrator before opening the devcontainer
3. The devcontainer runs `--privileged` to pass through the USB device

**Local venv (outside devcontainer):**
```bash
python -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install -r requirements.txt
```

Yosys is bundled in `amaranth-yosys`; `nextpnr-ice40` and `iceprog` must be
installed separately (via apt on Linux, or via the devcontainer).

---

## Commands

**Build and flash the iCEbreaker (must run from workspace root):**
```bash
python rebbarb/rebbarb.py
```
Runs synthesis (yosys), place-and-route (nextpnr-ice40), and flashes via `iceprog`.
Set `ICEPROG=/path/to/iceprog` env var to override the binary location.
Note: `rebbarb/rebbarb.py` builds a 36 MHz LED blink demo. The BBA
implementation (`exi_bba/`) uses a split-domain clock: `capture` @ 54 MHz (PLL)
for the SPI bit engine, `exi`/`sync` @ 24 MHz (HFOSC) for everything else.
Synthesize/flash the real design with `python -m exi_bba.synth [--flash]`.

**Run a simulation:**
```bash
# New-API testbench style (preferred for new code):
python rebbarb/toggle_button.py     # writes ToggleButton.vcd
python rebbarb/pulse_button.py      # writes PulseButton.vcd

# Old-API process style (reference only, do not replicate in new code):
python examples/amaranth_cdc.py     # CDC primitives demo
python examples/async_fifo.py       # AsyncFIFO behaviour
python examples/icebreaker_fifo.py  # iCEbreaker-specific FIFO (Verilog dump)
```
Open VCD output with `gtkwave`. Simulations are the primary testing mechanism —
there is no separate test runner.

**Verify PLL parameters:**
```bash
icepll -i 12 -o 54    # confirms DIVR=0 DIVF=71 DIVQ=4 → 54 MHz (capture domain)
```
(`exi`/`sync` come from the internal SB_HFOSC ÷2 = 24 MHz — no PLL.)

---

## Current Implementation State

The `exi_bba/` module tree is **fully implemented** with simulation testbenches.
All modules elaborate without errors and pass their unit tests. The full design
**synthesizes, places, routes, and meets timing** on the iCE40UP5K
(`python -m exi_bba.synth`): `capture` closes ~70 MHz (target 54) and `exi`/
`sync` close ~36 MHz (target 24) — both PASS.

### `exi_bba/` module status

| Module | File | Tests pass |
|---|---|---|
| `BBATop` | `exi_bba/bba_top.py` | ✅ EXI integration + full W5100→SPRAM→GC RX loop; synth PASS |
| `ExiCapture` | `exi_bba/exi_capture.py` | ✅ rx/tx byte-stream + over-push/flush |
| `SPIMode3Slave` | `exi_bba/spi_mode3_slave.py` | ✅ 4 tests (live-drive TX) |
| `BBARegisterFile` | `exi_bba/bba_register_file.py` | ✅ 7 tests (proactive push + DMA stream) |
| `SPRAMArbiter` | `exi_bba/spram_arbiter.py` | ✅ 3 tests |
| `RXFrameAssembler` | `exi_bba/rx_frame_assembler.py` | ✅ 3 tests |
| `TXFrameDrain` | `exi_bba/tx_frame_drain.py` | ✅ 2 tests |
| `W5100ParallelMaster` | `exi_bba/w5100_parallel_master.py` | ✅ 5 tests (init/TX/RX vs bus model, incl. ring wrap) — **default eth back-end** |
| `W5500SPIMaster` | `exi_bba/w5500_spi_master.py` | ✅ init/TX/RX vs SPI-slave model (alt back-end) |
| `StatusPanel` | `exi_bba/status_panel.py` | ✅ 6 tests (heartbeat, stretched activity LEDs, debounced buttons, freeze) |
| `EEPROMModel` | `exi_bba/eeprom_model.py` | ✅ 4 tests |

**Bring-up status panel (optional):** `BBATop(status_panel=True)` adds a
`StatusPanel` driving onboard iCEbreaker LEDs + button (dedicated pins, so it
coexists with EXI + W5100). `synth.py` enables it: **LEDG=heartbeat**,
**LEDR=EXI activity** (the GC is talking), **RGB red=rx / green=tx / blue=ready**
(via `SB_RGBA_DRV` on pins 39/40/41), **BTN_N=manual re-init**. All 5 panel
LEDs are now mapped on the iCEbreaker. The full EXI + W5100 + panel build
synthesizes and meets timing (slow ~35≥24, capture ~64≥54, 44% LC).

**Ethernet back-end is selectable:** `BBATop(eth="w5100")` (default — indirect
parallel bus, reaches the ~27 Mbit/s EXI ceiling) or `BBATop(eth="w5500")` (SPI,
~12 Mbit/s). Both masters expose the identical tx/rx/init/par streaming
interface; only the physical pins differ. See "W5100 vs W5500" below.

### Run all module testbenches (from workspace root)
```bash
python -m exi_bba.spi_mode3_slave
python -m exi_bba.exi_capture
python -m exi_bba.bba_register_file
python -m exi_bba.spram_arbiter
python -m exi_bba.rx_frame_assembler
python -m exi_bba.tx_frame_drain
python -m exi_bba.w5100_parallel_master   # 5 tests: init, TX(+wrap), RX(+wrap)
python -m exi_bba.w5500_spi_master
python -m exi_bba.status_panel            # 6 tests: heartbeat/activity/buttons
python -m exi_bba.eeprom_model
python -m exi_bba.bba_top        # end-to-end EXI integration test (W5100 RX loop)
```

### Pending work
- **Synthesis/timing**: ✅ done — `python -m exi_bba.synth` synthesizes, P&Rs,
  and meets timing on both clock domains (capture ~68≥54, slow ~40≥24).
- **W5500 init/TX/RX**: ✅ done — `W5500SPIMaster` has a real Mode-0 byte engine,
  a generic register-transaction engine (header + wbuf/stream payload), the full
  init sequence (MR reset, SHAR, S0_MR MACRAW, S0_CR OPEN, S0_IMR), MACRAW TX
  (read TX_WR → stream frame to TX buffer → advance TX_WR → SEND) and MACRAW RX
  (RSR → RD → 2-byte length → stream frame out → advance RD → RECV). All verified
  on the wire by a responding W5500 SPI-slave model in the testbench.
- **PAR0–5 → W5500 SHAR**: ✅ done — `reg.par` wired to `w5500.par` in `BBATop`
  (PAR0 packed in the low byte so it is the first SHAR octet).
- **NCRA SR bit**: ✅ done — `BBARegisterFile.ncra_sr` (= NCRA[3]) gates
  `asm.rx_enabled` in `BBATop` (was hard-wired to 1).
- **W5500 SPI throughput**: SCK = sync÷2 = 12 MHz (~12 Mbit/s) — exceeds
  real-world GC BBA TCP throughput (~6–10 Mbit/s) but is below the 27 Mbit/s raw
  EXI ceiling. Pushing past 12 Mbit/s was investigated and found NOT achievable
  on this UP5K (the W5500-operating logic is distributed ~40 MHz, not just the
  bit-bang) — see the "Full-rate W5500 SPI" item below.
  `W5500SPIMaster(clk_div=N)` divides SCK further if signal integrity needs it.
- **EXI DMA bulk reads**: ✅ done — SPRAM-region reads (addr ≥ 0x100) now STREAM
  until CS deasserts instead of stopping at the header's 2-bit length, so they
  serve both ≤4-byte immediate reads (Swiss) AND arbitrary-length DMA reads
  (other GC software, and a future Swiss path for loading ROMs from a network
  file store). Implementation:
    - `SPIMode3Slave.cs_active` (synchronised CS level) → `ExiCapture` crosses it
      to the exi domain (FFSynchronizer) → `BBARegisterFile.cs_active`.
    - `BBARegisterFile` SPRAM_STREAM state: auto-increments the SPRAM address,
      prefetches up to SP_LIMIT=4 reads in flight, pushes responses to tx_fifo;
      SPRAM_END drains the in-flight pipeline + rx dummies on CS-rise.
    - `ExiCapture` flushes tx_fifo on CS-fall to clear prefetch over-push so a
      truncated DMA read can't leak stale bytes into the next transaction.
    Tested: register-file streaming read (SPRAM model, 12 bytes), ExiCapture
    over-push/flush, AND the full BBATop loop — a W5500 model delivers a frame →
    W5500 master RX → RXFrameAssembler writes the SPRAM ring → GC reads RWP then
    DMA-reads the descriptor+frame back (verified byte-for-byte).
    Note: a DMA read header must keep length-1 within the 2-bit field; the GC
    driver sets it ≤3 and clocks the real length via CS (the design streams
    until CS regardless). (EXI DMA *writes* are not implemented; the GC's
    DMA-write engine has a 1-bit-shift bug and Swiss avoids them — see
    design-doc §"EXI DMA bug".)
- **S0_IR interrupt clear after RX**: ✅ done — `W5500SPIMaster` RX_CLR_IR state
  writes Sn_IR[2]=1 after RECV so `INT_N` deasserts (else the FSM would re-enter
  RX_CHECK forever on real hardware).
- **Full-rate W5500 SPI (27 Mbit/s) — INVESTIGATED, NOT achievable on UP5K**:
  the W5500 SCK is sync÷2 = 12 MHz. Raising it needs the SPI engine on a ≥54 MHz
  clock, but a standalone synth of `W5500SPIMaster` in the capture domain closes
  only **40 MHz** — and the slack histogram shows the failure is *distributed*
  (~140 endpoints fail 54, incl. the `wbuf`/header mux feeding the shift
  register), NOT a single cuttable path. So the bottleneck is the **logic that
  operates the SPI device** (transaction FSM, byte sourcing), not the bit-bang.
  Consequences:
    - The "split the bit engine to capture + per-byte CDC handshake" idea nets
      only ~14 Mbit/s — the CDC round-trip ≈ the SPI byte time — not worth it.
    - A capture-domain "streaming executor" would still contain that distributed
      ~40 MHz logic, so it wouldn't close 54 either.
    - Hardware `SB_SPI` wouldn't help (it only offloads the bit-bang, which was
      never the bottleneck) and is unsimulatable.
    - There is no usable clock between 24 (HFOSC) and 54 (the one PLL, needed at
      54 for the EXI front-end); PLL÷2 = 27 → SCK 13.5 MHz, a ~12% gain, not
      worth the fabric divider.
  Net: 12 Mbit/s is the practical W5500 ceiling on this part. It exceeds
  real-world GC BBA TCP throughput and is fine for chunked ROM streaming.
  Reaching 27 Mbit/s would need a faster FPGA or a much shallower W5500-operating
  redesign (uncertain) — **OR a parallel-bus ethernet chip (see W5100 below)**,
  which is the implemented solution for the ROM-streaming throughput target.

## W5100 vs W5500 ethernet back-end

The throughput insight: SPI serialises 8 bits/byte, so the W5500 byte rate is
(operating-logic clock)/16 — and that logic caps ~40 MHz on this UP5K → ~12
Mbit/s. A **parallel** bus moves a whole byte per access, so the *same* ~24 MHz
`sync` logic clears the 27 Mbit/s EXI ceiling (the real hard limit — the GC EXI
bus tops out there). So `W5100ParallelMaster` is the throughput path and is now
the `BBATop` default.

- **Interface:** W5100 **indirect parallel bus** (IDM). Only A[1:0] are wired
  (board ties A[14:2]=0 so a power-up direct access at A=00 still hits MR):
  `00`=MR, `01`=IDM_AR0(hi), `10`=IDM_AR1(lo), `11`=IDM_DR. A register/buffer
  access = write IDM_AR (the 16-bit address) then read/write IDM_DR. With MR.AI
  set, IDM_DR auto-increments → a multi-byte block is one address-set + a burst.
- **Bus engine:** drives A + D with `/CS` and `/RD`|`/WR` asserted for
  `strobe_cycles` (default 3 ≈ 125 ns at 24 MHz, ≥ the W5100's ~80 ns access).
  DATA[7:0] is bidirectional → an SB_IO tristate (`bus_data_o`/`oe`/`i`).
- **Pins (15):** A[1:0]=2, D[7:0]=8, /CS,/RD,/WR=3, /INT=1, /RST=1. With EXI (5)
  + clk (1) = **21 of ~34 usable SG48 I/O** — comfortable. See `synth.py`.
- **MR.AI requires init first:** unlike the W5500 (each SPI transaction is
  self-framed), the W5100's multi-byte accesses depend on MR.AI, so the init
  sequence (triggered by the GC's NCRA reset) MUST run before any TX/RX. The
  BBATop test issues NCRA-reset before its RX loop for this reason; on hardware
  the GC driver already does. (`BBATop(reset_cycles=N)` shrinks the MR settle
  wait for sim.)
- **Ring wraparound is in fabric:** the W5100 does NOT auto-wrap the IDM address
  at the socket-buffer boundary (the W5500 did), so the streamer re-sets IDM_AR
  to the buffer base when the running address reaches the 2 KB boundary. Handled
  in the SW/SR/RB paths (`xfer_wrap`/`xfer_wbase`/`xfer_wend`/`cur_addr`); both
  TX and RX wrap cases are tested.
- **Register map differs from the W5500:** common regs at 0x0000 (MR, SHAR 0x09,
  IMR 0x16, RMSR/TMSR 0x1A/0x1B), socket 0 at 0x0400 (S0_MR/CR/IR, TX_WR 0x424,
  RX_RSR 0x426, RX_RD 0x428), TX buffer 0x4000, RX buffer 0x6000. MACRAW mode.
- **Status:** init/TX/RX (with wrap) verified vs a bus model; BBATop full
  W5100→SPRAM→GC RX loop passes byte-for-byte; synth PASS (slow ~32≥24, capture
  ~56≥54, 42% LC). Register addresses/MR bits are from the datasheet (from
  memory) — **confirm at hardware bring-up**.

### `rebbarb/` — LED blink demo (unchanged)
- `rebbarb.py` — blinks LEDs via a PLL (36 MHz), demonstrates `IceBreakerPlatform`
- `debouncer.py` — `Debouncer(cycles)` — synchronous debounce, configurable hold
- `toggle_button.py` — `ToggleButton` — edge-to-toggle state machine (wraps Debouncer)
- `pulse_button.py` — `PulseButton` — single-cycle pulse on rising edge (wraps Debouncer)

These components are reusable building blocks. The `Debouncer` and button wrappers
will be needed for any physical input in `exi_bba/`.

**Import note:** `rebbarb/` files use bare imports (`from debouncer import Debouncer`).
Run them as `python rebbarb/<file>.py` from the workspace root so Python adds
`rebbarb/` to `sys.path` automatically.

**Simulation at module level:** `toggle_button.py` and `pulse_button.py` run
their simulations unconditionally (no `__main__` guard) — importing either file
triggers a VCD write. New modules should guard simulation code with
`if __name__ == "__main__":`.

`examples/amaranth_cdc.py` contains handwritten `SyncFF` and `TogglePulseSync`
reference implementations — use `amaranth.lib.cdc` primitives (`FFSynchronizer`,
`PulseSynchronizer`) in production code instead.

`hardware/sp1_test_plug/` — KiCad project for a physical SP1 edge-connector test
plug (schematic, PCB, custom GameCube symbol library). Used to verify pad geometry
before ordering the interposer PCB; not part of the FPGA build.

---

## Amaranth Simulator API

Two API generations are present in this repo:

| API | Where used | Status |
|---|---|---|
| `sim.add_testbench(async_fn)` + `await ctx.tick()` + `Period(MHz=n)` | `rebbarb/*.py` | **Use this for new code** |
| `sim.add_sync_process(gen_fn)` + `sim.run_until(t)` | `examples/` | Old — reference only |

New modules should use the testbench API (`add_testbench`, `sim.write_vcd(ctx)`
context manager). The old process API still works but is not idiomatic in current
Amaranth.

**Critical testbench timing rule:** `ctx.get(signal)` reads signal values AFTER
the clock edge (post-update registered values). Combinatorial signals that depend
on registered signals that were updated by the SAME tick will already reflect the
new registered values. For example: if `tx_sof = tx_bytes_r_rdy & is_first` and
`is_first` is cleared synchronously on the first byte, then reading `tx_sof` after
the first byte's tick always returns 0 — read BEFORE the tick instead.

**`ctx.set()` takes effect immediately** (combinatorial, not registered). Use it
AFTER `await ctx.tick()` to prepare inputs for the NEXT tick.

The full design specification lives in `docs/gc_bba_fpga_design.md`.

---

## Key Architecture Decisions

- **No network stack in the FPGA.** The GC CPU runs TCP/IP. The FPGA is a dumb
  MAC bridge.
- **Split-domain clocking — 3 domains, 2 sources (1 PLL + 1 HFOSC):**
  - `capture` — 54 MHz (PLL, DIVR=0 DIVF=71 DIVQ=4). Hosts ONLY the SPI Mode 3
    bit engine inside `ExiCapture`. 54 MHz = 2× the **real 27 MHz** EXI clock —
    the minimum oversampling for clean Mode 3. The isolated bit engine closes
    ~91 MHz; integrated with the byte-FIFO read path the capture domain closes
    ~62 MHz, so 54 passes with margin.
  - `exi` — 24 MHz (HFOSC ÷2). BBA register file / transaction FSM.
  - `sync` — 24 MHz (same HFOSC net as `exi`). SPRAM arbiter, RX/TX engines,
    W5500 SPI master.
  - **Why split:** only the tiny SPI bit engine needs a fast clock to sample
    27 MHz EXI. The bulky register-file/SPRAM/W5500 logic is routing-bound at
    ~33–44 MHz on the UP5K and only needs the byte rate (27 MHz ÷ 8 ≈ 3.4 MHz).
    `ExiCapture` bridges capture↔exi with rx/tx byte AsyncFIFOs.
  - **EXI clock reality:** the GC EXI clock tops out at ~27 MHz. libogc's
    `EXI_SPEED32MHZ` is a nominal name — the real rate is 27 MHz. The old
    "96 MHz = 3× 32 MHz EXI" target was doubly wrong and unreachable on UP5K
    (which caps ~44 MHz for non-trivial logic).
  - **TX/MISO across the split:** the register file PROACTIVELY pushes read
    responses into the tx byte FIFO during the EXI clock-idle gap (the GC pauses
    the clock between an EXI_Imm header-write and the data-read). The bit engine
    drives MISO live from the FIFO head; see `ExiCapture` / `SPIMode3Slave`.
- **All CDC via `amaranth.lib.cdc`.** Never pass raw multi-bit signals across
  domains. Use `FFSynchronizer` for slow single bits, `PulseSynchronizer` for
  events, `AsyncFIFO` for data streams, `ResetSynchronizer` for resets.
- **Register file lives entirely in `exi` domain.** The `sync` domain only
  communicates through AsyncFIFOs and PulseSynchronizers — never direct register
  reads/writes.

---

## Critical Protocol Notes

### EXI / SPI Mode 3
- CLK idles **HIGH** (CPOL=1, CPHA=1).
- MOSI sampled on **falling** CLK edge. MISO driven on **rising** CLK edge.
- Getting this wrong means the GC never enumerates the device.
- CS is active **low**, delineates each transaction.

### EXI Transaction Header (2 bytes before data)
```
Byte 0: [7]=write_flag  [6:0]=addr[12:6]
Byte 1: [7:2]=addr[5:0] [1:0]=xfer_len-1  (0=1B … 3=4B)
```
Full address = 13 bits → 0x0000–0x1FFF.

### Device ID Query
On power-on the GC writes `0x0000` (2 bytes) then reads 4 bytes.
Must return: `0x04 0x02 0x02 0x00`.

---

## Memory Map (abridged)

| Range | Region |
|---|---|
| 0x0000–0x0033 | MAC control registers (register file, exi domain) |
| 0x0048 | TXDATA — bulk TX data port (→ `tx_bytes` AsyncFIFO) |
| 0x0100–0x0FFF | RX ring buffer in SPRAM (15 × 256-byte pages, pages 1–15) |
| 0x0100–0x1FFF | any read ≥ 0x0100 streams from SPRAM (DMA path); the ring proper is pages 1–15 above |

---

## Key Registers

| Addr | Name | Notes |
|---|---|---|
| 0x00 | NCRA | [0]=RESET self-clears; pulses `ncra_rst` to sync domain |
| 0x08 | IMR | Interrupt mask |
| 0x09 | IR | Write-1-to-clear. [1]=RI, [2]=TI. INT_N asserts when IR & IMR ≠ 0 |
| 0x16–17 | RWP | RX write pointer — updated by sync domain via `rx_wptr` FIFO |
| 0x18–19 | RRP | RX read pointer — GC writes after consuming frames |
| 0x20–25 | PAR0–5 | MAC address; also forwarded to W5500 as SHAR |
| 0x31 | NWAYS | Hardcode **0x17** (100M full-duplex link up, autoneg complete) |
| 0x3A | HIPR | Hardcode **0x01** (BBA present) |
| 0x48 | TXDATA | GC streams TX frame bytes here |

---

## Module Breakdown

| Module | Domain | File |
|---|---|---|
| `BBATop` | all | `exi_bba/bba_top.py` |
| `ExiCapture` | capture (+exi FIFOs) | `exi_bba/exi_capture.py` |
| `SPIMode3Slave` | capture (param `domain`) | `exi_bba/spi_mode3_slave.py` |
| `BBARegisterFile` | exi (+FIFO to sync) | `exi_bba/bba_register_file.py` |
| `SPRAMArbiter` | sync | `exi_bba/spram_arbiter.py` |
| `RXFrameAssembler` | sync | `exi_bba/rx_frame_assembler.py` |
| `TXFrameDrain` | sync | `exi_bba/tx_frame_drain.py` |
| `W5100ParallelMaster` | sync | `exi_bba/w5100_parallel_master.py` (default eth) |
| `W5500SPIMaster` | sync | `exi_bba/w5500_spi_master.py` (alt eth) |
| `EEPROMModel` | exi | `exi_bba/eeprom_model.py` |

`ExiCapture` wraps `SPIMode3Slave` (in the fast `capture` domain) plus the
capture↔exi rx/tx byte AsyncFIFOs. `BBARegisterFile` consumes the rx byte
stream and proactively pushes read responses into the tx byte FIFO — it no
longer sees the per-bit SPI cadence (that lives entirely in `capture`).

---

## CDC Signal Inventory

| Signal | Direction | Primitive |
|---|---|---|
| EXI CLK / MOSI / CS pins | async → capture | `FFSynchronizer` (stages=2) |
| RX byte stream (capture→core) | capture → exi | `AsyncFIFO` 8-bit, depth=4 |
| TX byte stream (core→capture) | exi → capture | `AsyncFIFO` 8-bit, depth=2 |
| cs_active (transaction in progress) | capture → exi | `FFSynchronizer` (DMA read length) |
| SPRAM read request (addr) | exi → sync | `AsyncFIFO` 16-bit, depth=4 |
| SPRAM read result (data) | sync → exi | `AsyncFIFO` 8-bit, depth=4 |
| TX packet bytes | exi → sync | `AsyncFIFO` 8-bit, depth=16 |
| TX frame length | exi → sync | `AsyncFIFO` 16-bit, depth=4 |
| RX frame bytes | sync → SPRAM | `RXFrameAssembler` → `SPRAMArbiter` (not a byte FIFO; the GC reads frames back out of SPRAM via the SPRAM read req/rsp FIFOs) |
| RWP update | sync → exi | `AsyncFIFO` 8-bit, depth=4 |
| RRP update | exi → sync | `AsyncFIFO` 8-bit, depth=4 |
| RX ready (IR[RI]) | sync → exi | `PulseSynchronizer` |
| TX done (IR[TI]) | sync → exi | `PulseSynchronizer` |
| NCRA reset pulse | exi → sync | `PulseSynchronizer` |

---

## W5500 Configuration (on NCRA reset)

The W5500 selects the register **block** via the BSB field of the control byte,
NOT via the address — so register addresses below are **block offsets**, not flat
0x4000-style addresses (see `_W5500_*` and `_CTRL_*` in `w5500_spi_master.py`).
```
1. Write MR     = 0x80   (common block, offset 0x0000)  software reset
2. Wait ~1 ms
3. Write SHAR   = MAC     (common block, offset 0x0009, 6 bytes from PAR0–5)
4. Write S0_MR  = 0x04    (socket-0 reg block, offset 0x0000)  MACRAW
5. Write S0_CR  = 0x01    (socket-0 reg block, offset 0x0001)  OPEN
6. Write S0_IMR = 0x05    (socket-0 reg block, offset 0x002C)  RECV | SEND_OK
```

W5500 SPI is **Mode 0** (CPOL=0 CPHA=0); SCK = **12 MHz** (the 24 MHz `sync`
domain ÷ 2 via a toggle clock-enable). Connect W5500 `INT_N` to an FPGA input
for low-latency RX detection. (The W5500 is the alternate back-end; the W5100
parallel master is the default — see "W5100 vs W5500".)

---

## Physical Interface (SP1 Edge Connector)

- PCB must be **1.2 mm thick, ENIG finish**.
- Staggered (not mirrored) top/bottom contact rows — same geometry as PCI/ISA.
- Derive exact pad geometry from **SP1ETH KiCad project** (silverstee1/SP1ETH),
  cross-referenced with ETH2SP1 (LaserBear). Do not rely on YAGCD alone.
- Add **100 µF bulk cap** on the interposer near FPGA power pins (3.3 V budget
  is tight: iCEbreaker ~80 mA + W5500 ~150 mA ≈ 230 mA).
- **Pin 5 is 12 V — do not connect to FPGA I/O.** Test point or leave open.
- `EXTIN` (pin 1): tie to 3.3 V via 10 kΩ — required for GC device enumeration.
- All signal levels are 3.3 V. No level shifting needed.

---

## SPRAM Notes

- iCE40UP5K has 128 KB SPRAM (SB_SPRAM256KA, 16-bit wide).
- **1-cycle synchronous read latency** — result of read at cycle N is valid at N+1.
- Byte writes via `MASKWREN`: lower byte = `0b0011`, upper byte = `0b1100`.
- Address to SPRAM = byte_address >> 1.
- ETH writes take priority over EXI reads in the arbiter (safe by ring-buffer
  invariant: GC only reads pages the ETH engine has already finished).

---

## GC Initialisation Sequence (Swiss/BBA driver)

```
1.  Write 0x0000 × 2, read 4 B → must get 0x04020200 (device ID)
2.  Write NCRA = 0x01            (reset, self-clears; resets W5500 + SPRAM ptrs)
3.  Poll NCRA bit 0 until 0      (wait reset complete)
4.  Write PAR0–5                 (MAC address)
5.  Write MAR0–7 = 0xFF          (promiscuous multicast)
6.  Write ANALOG = 0xD6          (enable PHY — no FPGA effect, just store)
7.  Write NWAYC                  (autoneg config — store only)
8.  Write IMR = 0x86             (enable RBFI | TI | RI interrupts)
9.  Write GCA (AUTOPUB bit)
10. Write NCRA SR bit = 0x08     (start receive)
11. Poll NWAYS until link up     → return hardcoded 0x17 immediately
```

---

## Implementation Notes & Gotchas

- **`NWAYS` must return `0x17` always.** GC polls it to confirm 100 Mbps link
  before enabling RX. Do not attempt to reflect real W5500 link status.
- **`EEPROMModel` can be stubbed initially.** Many GC BBA drivers write their own
  MAC to PAR0–5 rather than using the EEPROM. Pre-populate PAR0–5 reset state
  with a valid Nintendo OUI MAC (`00:09:BF:xx:xx:xx`).
- **`tx_load` timing in `SPIMode3Slave`:** pulses at CS assertion (first byte)
  and after each complete received byte. Upstream must register next TX byte
  within one `exi` clock.
- **PLL target 54 MHz**: verify with `icepll -i 12 -o 54` (DIVR=0 DIVF=71 DIVQ=4)
  before coding PLL parameters; the capture-domain bit engine oversamples the
  27 MHz EXI clock 2×.
- **TX buffer selection (NCRA ST bits):** Ignore buffer select (ST1 vs ST0).
  Treat any non-zero ST as a TX trigger.
- **If nextpnr fails capture-domain timing at 54 MHz:** the isolated bit engine
  closes ~91 MHz, so 54 has margin; if a seed fails, sweep seeds
  (`synth.py --seeds N`) or instruct users to configure Swiss to a lower EXI
  clock index.