# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project: GC BBA FPGA Replacement Replace the GameCube Broadband Adapter (DOL-015 / MX98730EC) with an iCEbreaker FPGA (Lattice iCE40UP5K) written in Amaranth HDL. The FPGA emulates the BBA register interface over the GameCube EXI bus and bridges to a WIZnet ethernet chip for real 100BASE-TX ethernet — default **W5100** (indirect parallel bus, reaches the EXI throughput ceiling) or **W5500** (SPI Pmod, simpler wiring but ~12 Mbit/s). GC software (Swiss homebrew) sees an identical BBA. See "W5100 vs W5500 ethernet back-end". --- ## Development Environment **Preferred:** Use the devcontainer (`.devcontainer/`) which includes Python 3.12, `nextpnr-ice40`, and `fpga-icestorm` pre-installed. **Windows host + WSL2 devcontainer — USB flashing setup:** 1. Install `usbipd-win` (https://github.com/dorssel/usbipd-win/releases) 2. Run `.devcontainer/attach-icebreaker.ps1` as Administrator before opening the devcontainer 3. The devcontainer runs `--privileged` to pass through the USB device **Local venv (outside devcontainer):** ```bash python -m venv .venv source .venv/bin/activate # Windows: .venv\Scripts\activate pip install -r requirements.txt ``` Yosys is bundled in `amaranth-yosys`; `nextpnr-ice40` and `iceprog` must be installed separately (via apt on Linux, or via the devcontainer). --- ## Commands **Build and flash the iCEbreaker (must run from workspace root):** ```bash python rebbarb/rebbarb.py ``` Runs synthesis (yosys), place-and-route (nextpnr-ice40), and flashes via `iceprog`. Set `ICEPROG=/path/to/iceprog` env var to override the binary location. Note: `rebbarb/rebbarb.py` builds a 36 MHz LED blink demo. The BBA implementation (`exi_bba/`) uses a split-domain clock: `capture` @ 54 MHz (PLL) for the SPI bit engine, `exi`/`sync` @ 24 MHz (HFOSC) for everything else. Synthesize/flash the real design with `python -m exi_bba.synth [--flash]`. **Run a simulation:** ```bash # New-API testbench style (preferred for new code): python rebbarb/toggle_button.py # writes ToggleButton.vcd python rebbarb/pulse_button.py # writes PulseButton.vcd # Old-API process style (reference only, do not replicate in new code): python examples/amaranth_cdc.py # CDC primitives demo python examples/async_fifo.py # AsyncFIFO behaviour python examples/icebreaker_fifo.py # iCEbreaker-specific FIFO (Verilog dump) ``` Open VCD output with `gtkwave`. Simulations are the primary testing mechanism — there is no separate test runner. **Verify PLL parameters:** ```bash icepll -i 12 -o 54 # confirms DIVR=0 DIVF=71 DIVQ=4 → 54 MHz (capture domain) ``` (`exi`/`sync` come from the internal SB_HFOSC ÷2 = 24 MHz — no PLL.) --- ## Current Implementation State The `exi_bba/` module tree is **fully implemented** with simulation testbenches. All modules elaborate without errors and pass their unit tests. The full design **synthesizes, places, routes, and meets timing** on the iCE40UP5K (`python -m exi_bba.synth`): `capture` closes ~70 MHz (target 54) and `exi`/ `sync` close ~36 MHz (target 24) — both PASS. ### `exi_bba/` module status | Module | File | Tests pass | |---|---|---| | `BBATop` | `exi_bba/bba_top.py` | ✅ EXI integration + full W5100→SPRAM→GC RX loop; synth PASS | | `ExiCapture` | `exi_bba/exi_capture.py` | ✅ rx/tx byte-stream + over-push/flush | | `SPIMode3Slave` | `exi_bba/spi_mode3_slave.py` | ✅ 4 tests (live-drive TX) | | `BBARegisterFile` | `exi_bba/bba_register_file.py` | ✅ 7 tests (proactive push + DMA stream) | | `SPRAMArbiter` | `exi_bba/spram_arbiter.py` | ✅ 3 tests | | `RXFrameAssembler` | `exi_bba/rx_frame_assembler.py` | ✅ 3 tests | | `TXFrameDrain` | `exi_bba/tx_frame_drain.py` | ✅ 2 tests | | `W5100ParallelMaster` | `exi_bba/w5100_parallel_master.py` | ✅ 5 tests (init/TX/RX vs bus model, incl. ring wrap) — **default eth back-end** | | `W5500SPIMaster` | `exi_bba/w5500_spi_master.py` | ✅ init/TX/RX vs SPI-slave model (alt back-end) | | `StatusPanel` | `exi_bba/status_panel.py` | ✅ 6 tests (heartbeat, stretched activity LEDs, debounced buttons, freeze) | | `EEPROMModel` | `exi_bba/eeprom_model.py` | ✅ 4 tests | **Bring-up status panel (optional):** `BBATop(status_panel=True)` adds a `StatusPanel` driving onboard iCEbreaker LEDs + button (dedicated pins, so it coexists with EXI + W5100). `synth.py` enables it: **LEDG=heartbeat**, **LEDR=EXI activity** (the GC is talking), **RGB red=rx / green=tx / blue=ready** (via `SB_RGBA_DRV` on pins 39/40/41), **BTN_N=manual re-init**. All 5 panel LEDs are now mapped on the iCEbreaker. The full EXI + W5100 + panel build synthesizes and meets timing (slow ~35≥24, capture ~64≥54, 44% LC). **Ethernet back-end is selectable:** `BBATop(eth="w5100")` (default — indirect parallel bus, reaches the ~27 Mbit/s EXI ceiling) or `BBATop(eth="w5500")` (SPI, ~12 Mbit/s). Both masters expose the identical tx/rx/init/par streaming interface; only the physical pins differ. See "W5100 vs W5500" below. ### Run all module testbenches (from workspace root) ```bash python -m exi_bba.spi_mode3_slave python -m exi_bba.exi_capture python -m exi_bba.bba_register_file python -m exi_bba.spram_arbiter python -m exi_bba.rx_frame_assembler python -m exi_bba.tx_frame_drain python -m exi_bba.w5100_parallel_master # 5 tests: init, TX(+wrap), RX(+wrap) python -m exi_bba.w5500_spi_master python -m exi_bba.status_panel # 6 tests: heartbeat/activity/buttons python -m exi_bba.eeprom_model python -m exi_bba.bba_top # end-to-end EXI integration test (W5100 RX loop) ``` ### Pending work - **Synthesis/timing**: ✅ done — `python -m exi_bba.synth` synthesizes, P&Rs, and meets timing on both clock domains (capture ~68≥54, slow ~40≥24). - **W5500 init/TX/RX**: ✅ done — `W5500SPIMaster` has a real Mode-0 byte engine, a generic register-transaction engine (header + wbuf/stream payload), the full init sequence (MR reset, SHAR, S0_MR MACRAW, S0_CR OPEN, S0_IMR), MACRAW TX (read TX_WR → stream frame to TX buffer → advance TX_WR → SEND) and MACRAW RX (RSR → RD → 2-byte length → stream frame out → advance RD → RECV). All verified on the wire by a responding W5500 SPI-slave model in the testbench. - **PAR0–5 → W5500 SHAR**: ✅ done — `reg.par` wired to `w5500.par` in `BBATop` (PAR0 packed in the low byte so it is the first SHAR octet). - **NCRA SR bit**: ✅ done — `BBARegisterFile.ncra_sr` (= NCRA[3]) gates `asm.rx_enabled` in `BBATop` (was hard-wired to 1). - **W5500 SPI throughput**: SCK = sync÷2 = 12 MHz (~12 Mbit/s) — exceeds real-world GC BBA TCP throughput (~6–10 Mbit/s) but is below the 27 Mbit/s raw EXI ceiling. Pushing past 12 Mbit/s was investigated and found NOT achievable on this UP5K (the W5500-operating logic is distributed ~40 MHz, not just the bit-bang) — see the "Full-rate W5500 SPI" item below. `W5500SPIMaster(clk_div=N)` divides SCK further if signal integrity needs it. - **EXI DMA bulk reads**: ✅ done — SPRAM-region reads (addr ≥ 0x100) now STREAM until CS deasserts instead of stopping at the header's 2-bit length, so they serve both ≤4-byte immediate reads (Swiss) AND arbitrary-length DMA reads (other GC software, and a future Swiss path for loading ROMs from a network file store). Implementation: - `SPIMode3Slave.cs_active` (synchronised CS level) → `ExiCapture` crosses it to the exi domain (FFSynchronizer) → `BBARegisterFile.cs_active`. - `BBARegisterFile` SPRAM_STREAM state: auto-increments the SPRAM address, prefetches up to SP_LIMIT=4 reads in flight, pushes responses to tx_fifo; SPRAM_END drains the in-flight pipeline + rx dummies on CS-rise. - `ExiCapture` flushes tx_fifo on CS-fall to clear prefetch over-push so a truncated DMA read can't leak stale bytes into the next transaction. Tested: register-file streaming read (SPRAM model, 12 bytes), ExiCapture over-push/flush, AND the full BBATop loop — a W5500 model delivers a frame → W5500 master RX → RXFrameAssembler writes the SPRAM ring → GC reads RWP then DMA-reads the descriptor+frame back (verified byte-for-byte). Note: a DMA read header must keep length-1 within the 2-bit field; the GC driver sets it ≤3 and clocks the real length via CS (the design streams until CS regardless). (EXI DMA *writes* are not implemented; the GC's DMA-write engine has a 1-bit-shift bug and Swiss avoids them — see design-doc §"EXI DMA bug".) - **S0_IR interrupt clear after RX**: ✅ done — `W5500SPIMaster` RX_CLR_IR state writes Sn_IR[2]=1 after RECV so `INT_N` deasserts (else the FSM would re-enter RX_CHECK forever on real hardware). - **Full-rate W5500 SPI (27 Mbit/s) — INVESTIGATED, NOT achievable on UP5K**: the W5500 SCK is sync÷2 = 12 MHz. Raising it needs the SPI engine on a ≥54 MHz clock, but a standalone synth of `W5500SPIMaster` in the capture domain closes only **40 MHz** — and the slack histogram shows the failure is *distributed* (~140 endpoints fail 54, incl. the `wbuf`/header mux feeding the shift register), NOT a single cuttable path. So the bottleneck is the **logic that operates the SPI device** (transaction FSM, byte sourcing), not the bit-bang. Consequences: - The "split the bit engine to capture + per-byte CDC handshake" idea nets only ~14 Mbit/s — the CDC round-trip ≈ the SPI byte time — not worth it. - A capture-domain "streaming executor" would still contain that distributed ~40 MHz logic, so it wouldn't close 54 either. - Hardware `SB_SPI` wouldn't help (it only offloads the bit-bang, which was never the bottleneck) and is unsimulatable. - There is no usable clock between 24 (HFOSC) and 54 (the one PLL, needed at 54 for the EXI front-end); PLL÷2 = 27 → SCK 13.5 MHz, a ~12% gain, not worth the fabric divider. Net: 12 Mbit/s is the practical W5500 ceiling on this part. It exceeds real-world GC BBA TCP throughput and is fine for chunked ROM streaming. Reaching 27 Mbit/s would need a faster FPGA or a much shallower W5500-operating redesign (uncertain) — **OR a parallel-bus ethernet chip (see W5100 below)**, which is the implemented solution for the ROM-streaming throughput target. ## W5100 vs W5500 ethernet back-end The throughput insight: SPI serialises 8 bits/byte, so the W5500 byte rate is (operating-logic clock)/16 — and that logic caps ~40 MHz on this UP5K → ~12 Mbit/s. A **parallel** bus moves a whole byte per access, so the *same* ~24 MHz `sync` logic clears the 27 Mbit/s EXI ceiling (the real hard limit — the GC EXI bus tops out there). So `W5100ParallelMaster` is the throughput path and is now the `BBATop` default. - **Interface:** W5100 **indirect parallel bus** (IDM). Only A[1:0] are wired (board ties A[14:2]=0 so a power-up direct access at A=00 still hits MR): `00`=MR, `01`=IDM_AR0(hi), `10`=IDM_AR1(lo), `11`=IDM_DR. A register/buffer access = write IDM_AR (the 16-bit address) then read/write IDM_DR. With MR.AI set, IDM_DR auto-increments → a multi-byte block is one address-set + a burst. - **Bus engine:** drives A + D with `/CS` and `/RD`|`/WR` asserted for `strobe_cycles` (default 3 ≈ 125 ns at 24 MHz, ≥ the W5100's ~80 ns access). DATA[7:0] is bidirectional → an SB_IO tristate (`bus_data_o`/`oe`/`i`). - **Pins (15):** A[1:0]=2, D[7:0]=8, /CS,/RD,/WR=3, /INT=1, /RST=1. With EXI (5) + clk (1) = **21 of ~34 usable SG48 I/O** — comfortable. See `synth.py`. - **MR.AI requires init first:** unlike the W5500 (each SPI transaction is self-framed), the W5100's multi-byte accesses depend on MR.AI, so the init sequence (triggered by the GC's NCRA reset) MUST run before any TX/RX. The BBATop test issues NCRA-reset before its RX loop for this reason; on hardware the GC driver already does. (`BBATop(reset_cycles=N)` shrinks the MR settle wait for sim.) - **Ring wraparound is in fabric:** the W5100 does NOT auto-wrap the IDM address at the socket-buffer boundary (the W5500 did), so the streamer re-sets IDM_AR to the buffer base when the running address reaches the 2 KB boundary. Handled in the SW/SR/RB paths (`xfer_wrap`/`xfer_wbase`/`xfer_wend`/`cur_addr`); both TX and RX wrap cases are tested. - **Register map differs from the W5500:** common regs at 0x0000 (MR, SHAR 0x09, IMR 0x16, RMSR/TMSR 0x1A/0x1B), socket 0 at 0x0400 (S0_MR/CR/IR, TX_WR 0x424, RX_RSR 0x426, RX_RD 0x428), TX buffer 0x4000, RX buffer 0x6000. MACRAW mode. - **Status:** init/TX/RX (with wrap) verified vs a bus model; BBATop full W5100→SPRAM→GC RX loop passes byte-for-byte; synth PASS (slow ~32≥24, capture ~56≥54, 42% LC). Register addresses/MR bits are from the datasheet (from memory) — **confirm at hardware bring-up**. ### `rebbarb/` — LED blink demo (unchanged) - `rebbarb.py` — blinks LEDs via a PLL (36 MHz), demonstrates `IceBreakerPlatform` - `debouncer.py` — `Debouncer(cycles)` — synchronous debounce, configurable hold - `toggle_button.py` — `ToggleButton` — edge-to-toggle state machine (wraps Debouncer) - `pulse_button.py` — `PulseButton` — single-cycle pulse on rising edge (wraps Debouncer) These components are reusable building blocks. The `Debouncer` and button wrappers will be needed for any physical input in `exi_bba/`. **Import note:** `rebbarb/` files use bare imports (`from debouncer import Debouncer`). Run them as `python rebbarb/.py` from the workspace root so Python adds `rebbarb/` to `sys.path` automatically. **Simulation at module level:** `toggle_button.py` and `pulse_button.py` run their simulations unconditionally (no `__main__` guard) — importing either file triggers a VCD write. New modules should guard simulation code with `if __name__ == "__main__":`. `examples/amaranth_cdc.py` contains handwritten `SyncFF` and `TogglePulseSync` reference implementations — use `amaranth.lib.cdc` primitives (`FFSynchronizer`, `PulseSynchronizer`) in production code instead. `hardware/sp1_test_plug/` — KiCad project for a physical SP1 edge-connector test plug (schematic, PCB, custom GameCube symbol library). Used to verify pad geometry before ordering the interposer PCB; not part of the FPGA build. --- ## Amaranth Simulator API Two API generations are present in this repo: | API | Where used | Status | |---|---|---| | `sim.add_testbench(async_fn)` + `await ctx.tick()` + `Period(MHz=n)` | `rebbarb/*.py` | **Use this for new code** | | `sim.add_sync_process(gen_fn)` + `sim.run_until(t)` | `examples/` | Old — reference only | New modules should use the testbench API (`add_testbench`, `sim.write_vcd(ctx)` context manager). The old process API still works but is not idiomatic in current Amaranth. **Critical testbench timing rule:** `ctx.get(signal)` reads signal values AFTER the clock edge (post-update registered values). Combinatorial signals that depend on registered signals that were updated by the SAME tick will already reflect the new registered values. For example: if `tx_sof = tx_bytes_r_rdy & is_first` and `is_first` is cleared synchronously on the first byte, then reading `tx_sof` after the first byte's tick always returns 0 — read BEFORE the tick instead. **`ctx.set()` takes effect immediately** (combinatorial, not registered). Use it AFTER `await ctx.tick()` to prepare inputs for the NEXT tick. The full design specification lives in `docs/gc_bba_fpga_design.md`. --- ## Key Architecture Decisions - **No network stack in the FPGA.** The GC CPU runs TCP/IP. The FPGA is a dumb MAC bridge. - **Split-domain clocking — 3 domains, 2 sources (1 PLL + 1 HFOSC):** - `capture` — 54 MHz (PLL, DIVR=0 DIVF=71 DIVQ=4). Hosts ONLY the SPI Mode 3 bit engine inside `ExiCapture`. 54 MHz = 2× the **real 27 MHz** EXI clock — the minimum oversampling for clean Mode 3. The isolated bit engine closes ~91 MHz; integrated with the byte-FIFO read path the capture domain closes ~62 MHz, so 54 passes with margin. - `exi` — 24 MHz (HFOSC ÷2). BBA register file / transaction FSM. - `sync` — 24 MHz (same HFOSC net as `exi`). SPRAM arbiter, RX/TX engines, W5500 SPI master. - **Why split:** only the tiny SPI bit engine needs a fast clock to sample 27 MHz EXI. The bulky register-file/SPRAM/W5500 logic is routing-bound at ~33–44 MHz on the UP5K and only needs the byte rate (27 MHz ÷ 8 ≈ 3.4 MHz). `ExiCapture` bridges capture↔exi with rx/tx byte AsyncFIFOs. - **EXI clock reality:** the GC EXI clock tops out at ~27 MHz. libogc's `EXI_SPEED32MHZ` is a nominal name — the real rate is 27 MHz. The old "96 MHz = 3× 32 MHz EXI" target was doubly wrong and unreachable on UP5K (which caps ~44 MHz for non-trivial logic). - **TX/MISO across the split:** the register file PROACTIVELY pushes read responses into the tx byte FIFO during the EXI clock-idle gap (the GC pauses the clock between an EXI_Imm header-write and the data-read). The bit engine drives MISO live from the FIFO head; see `ExiCapture` / `SPIMode3Slave`. - **All CDC via `amaranth.lib.cdc`.** Never pass raw multi-bit signals across domains. Use `FFSynchronizer` for slow single bits, `PulseSynchronizer` for events, `AsyncFIFO` for data streams, `ResetSynchronizer` for resets. - **Register file lives entirely in `exi` domain.** The `sync` domain only communicates through AsyncFIFOs and PulseSynchronizers — never direct register reads/writes. --- ## Critical Protocol Notes ### EXI / SPI Mode 3 - CLK idles **HIGH** (CPOL=1, CPHA=1). - MOSI sampled on **falling** CLK edge. MISO driven on **rising** CLK edge. - Getting this wrong means the GC never enumerates the device. - CS is active **low**, delineates each transaction. ### EXI Transaction Header (2 bytes before data) ``` Byte 0: [7]=write_flag [6:0]=addr[12:6] Byte 1: [7:2]=addr[5:0] [1:0]=xfer_len-1 (0=1B … 3=4B) ``` Full address = 13 bits → 0x0000–0x1FFF. ### Device ID Query On power-on the GC writes `0x0000` (2 bytes) then reads 4 bytes. Must return: `0x04 0x02 0x02 0x00`. --- ## Memory Map (abridged) | Range | Region | |---|---| | 0x0000–0x0033 | MAC control registers (register file, exi domain) | | 0x0048 | TXDATA — bulk TX data port (→ `tx_bytes` AsyncFIFO) | | 0x0100–0x0FFF | RX ring buffer in SPRAM (15 × 256-byte pages, pages 1–15) | | 0x0100–0x1FFF | any read ≥ 0x0100 streams from SPRAM (DMA path); the ring proper is pages 1–15 above | --- ## Key Registers | Addr | Name | Notes | |---|---|---| | 0x00 | NCRA | [0]=RESET self-clears; pulses `ncra_rst` to sync domain | | 0x08 | IMR | Interrupt mask | | 0x09 | IR | Write-1-to-clear. [1]=RI, [2]=TI. INT_N asserts when IR & IMR ≠ 0 | | 0x16–17 | RWP | RX write pointer — updated by sync domain via `rx_wptr` FIFO | | 0x18–19 | RRP | RX read pointer — GC writes after consuming frames | | 0x20–25 | PAR0–5 | MAC address; also forwarded to W5500 as SHAR | | 0x31 | NWAYS | Hardcode **0x17** (100M full-duplex link up, autoneg complete) | | 0x3A | HIPR | Hardcode **0x01** (BBA present) | | 0x48 | TXDATA | GC streams TX frame bytes here | --- ## Module Breakdown | Module | Domain | File | |---|---|---| | `BBATop` | all | `exi_bba/bba_top.py` | | `ExiCapture` | capture (+exi FIFOs) | `exi_bba/exi_capture.py` | | `SPIMode3Slave` | capture (param `domain`) | `exi_bba/spi_mode3_slave.py` | | `BBARegisterFile` | exi (+FIFO to sync) | `exi_bba/bba_register_file.py` | | `SPRAMArbiter` | sync | `exi_bba/spram_arbiter.py` | | `RXFrameAssembler` | sync | `exi_bba/rx_frame_assembler.py` | | `TXFrameDrain` | sync | `exi_bba/tx_frame_drain.py` | | `W5100ParallelMaster` | sync | `exi_bba/w5100_parallel_master.py` (default eth) | | `W5500SPIMaster` | sync | `exi_bba/w5500_spi_master.py` (alt eth) | | `EEPROMModel` | exi | `exi_bba/eeprom_model.py` | `ExiCapture` wraps `SPIMode3Slave` (in the fast `capture` domain) plus the capture↔exi rx/tx byte AsyncFIFOs. `BBARegisterFile` consumes the rx byte stream and proactively pushes read responses into the tx byte FIFO — it no longer sees the per-bit SPI cadence (that lives entirely in `capture`). --- ## CDC Signal Inventory | Signal | Direction | Primitive | |---|---|---| | EXI CLK / MOSI / CS pins | async → capture | `FFSynchronizer` (stages=2) | | RX byte stream (capture→core) | capture → exi | `AsyncFIFO` 8-bit, depth=4 | | TX byte stream (core→capture) | exi → capture | `AsyncFIFO` 8-bit, depth=2 | | cs_active (transaction in progress) | capture → exi | `FFSynchronizer` (DMA read length) | | SPRAM read request (addr) | exi → sync | `AsyncFIFO` 16-bit, depth=4 | | SPRAM read result (data) | sync → exi | `AsyncFIFO` 8-bit, depth=4 | | TX packet bytes | exi → sync | `AsyncFIFO` 8-bit, depth=16 | | TX frame length | exi → sync | `AsyncFIFO` 16-bit, depth=4 | | RX frame bytes | sync → SPRAM | `RXFrameAssembler` → `SPRAMArbiter` (not a byte FIFO; the GC reads frames back out of SPRAM via the SPRAM read req/rsp FIFOs) | | RWP update | sync → exi | `AsyncFIFO` 8-bit, depth=4 | | RRP update | exi → sync | `AsyncFIFO` 8-bit, depth=4 | | RX ready (IR[RI]) | sync → exi | `PulseSynchronizer` | | TX done (IR[TI]) | sync → exi | `PulseSynchronizer` | | NCRA reset pulse | exi → sync | `PulseSynchronizer` | --- ## W5500 Configuration (on NCRA reset) The W5500 selects the register **block** via the BSB field of the control byte, NOT via the address — so register addresses below are **block offsets**, not flat 0x4000-style addresses (see `_W5500_*` and `_CTRL_*` in `w5500_spi_master.py`). ``` 1. Write MR = 0x80 (common block, offset 0x0000) software reset 2. Wait ~1 ms 3. Write SHAR = MAC (common block, offset 0x0009, 6 bytes from PAR0–5) 4. Write S0_MR = 0x04 (socket-0 reg block, offset 0x0000) MACRAW 5. Write S0_CR = 0x01 (socket-0 reg block, offset 0x0001) OPEN 6. Write S0_IMR = 0x05 (socket-0 reg block, offset 0x002C) RECV | SEND_OK ``` W5500 SPI is **Mode 0** (CPOL=0 CPHA=0); SCK = **12 MHz** (the 24 MHz `sync` domain ÷ 2 via a toggle clock-enable). Connect W5500 `INT_N` to an FPGA input for low-latency RX detection. (The W5500 is the alternate back-end; the W5100 parallel master is the default — see "W5100 vs W5500".) --- ## Physical Interface (SP1 Edge Connector) - PCB must be **1.2 mm thick, ENIG finish**. - Staggered (not mirrored) top/bottom contact rows — same geometry as PCI/ISA. - Derive exact pad geometry from **SP1ETH KiCad project** (silverstee1/SP1ETH), cross-referenced with ETH2SP1 (LaserBear). Do not rely on YAGCD alone. - Add **100 µF bulk cap** on the interposer near FPGA power pins (3.3 V budget is tight: iCEbreaker ~80 mA + W5500 ~150 mA ≈ 230 mA). - **Pin 5 is 12 V — do not connect to FPGA I/O.** Test point or leave open. - `EXTIN` (pin 1): tie to 3.3 V via 10 kΩ — required for GC device enumeration. - All signal levels are 3.3 V. No level shifting needed. --- ## SPRAM Notes - iCE40UP5K has 128 KB SPRAM (SB_SPRAM256KA, 16-bit wide). - **1-cycle synchronous read latency** — result of read at cycle N is valid at N+1. - Byte writes via `MASKWREN`: lower byte = `0b0011`, upper byte = `0b1100`. - Address to SPRAM = byte_address >> 1. - ETH writes take priority over EXI reads in the arbiter (safe by ring-buffer invariant: GC only reads pages the ETH engine has already finished). --- ## GC Initialisation Sequence (Swiss/BBA driver) ``` 1. Write 0x0000 × 2, read 4 B → must get 0x04020200 (device ID) 2. Write NCRA = 0x01 (reset, self-clears; resets W5500 + SPRAM ptrs) 3. Poll NCRA bit 0 until 0 (wait reset complete) 4. Write PAR0–5 (MAC address) 5. Write MAR0–7 = 0xFF (promiscuous multicast) 6. Write ANALOG = 0xD6 (enable PHY — no FPGA effect, just store) 7. Write NWAYC (autoneg config — store only) 8. Write IMR = 0x86 (enable RBFI | TI | RI interrupts) 9. Write GCA (AUTOPUB bit) 10. Write NCRA SR bit = 0x08 (start receive) 11. Poll NWAYS until link up → return hardcoded 0x17 immediately ``` --- ## Implementation Notes & Gotchas - **`NWAYS` must return `0x17` always.** GC polls it to confirm 100 Mbps link before enabling RX. Do not attempt to reflect real W5500 link status. - **`EEPROMModel` can be stubbed initially.** Many GC BBA drivers write their own MAC to PAR0–5 rather than using the EEPROM. Pre-populate PAR0–5 reset state with a valid Nintendo OUI MAC (`00:09:BF:xx:xx:xx`). - **`tx_load` timing in `SPIMode3Slave`:** pulses at CS assertion (first byte) and after each complete received byte. Upstream must register next TX byte within one `exi` clock. - **PLL target 54 MHz**: verify with `icepll -i 12 -o 54` (DIVR=0 DIVF=71 DIVQ=4) before coding PLL parameters; the capture-domain bit engine oversamples the 27 MHz EXI clock 2×. - **TX buffer selection (NCRA ST bits):** Ignore buffer select (ST1 vs ST0). Treat any non-zero ST as a TX trigger. - **If nextpnr fails capture-domain timing at 54 MHz:** the isolated bit engine closes ~91 MHz, so 54 has margin; if a seed fails, sweep seeds (`synth.py --seeds N`) or instruct users to configure Swiss to a lower EXI clock index.