Files

25 KiB
Raw Permalink Blame History

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project: GC BBA FPGA Replacement

Replace the GameCube Broadband Adapter (DOL-015 / MX98730EC) with an iCEbreaker FPGA (Lattice iCE40UP5K) written in Amaranth HDL. The FPGA emulates the BBA register interface over the GameCube EXI bus and bridges to a WIZnet ethernet chip for real 100BASE-TX ethernet — default W5100 (indirect parallel bus, reaches the EXI throughput ceiling) or W5500 (SPI Pmod, simpler wiring but ~12 Mbit/s). GC software (Swiss homebrew) sees an identical BBA. See "W5100 vs W5500 ethernet back-end".


Development Environment

Preferred: Use the devcontainer (.devcontainer/) which includes Python 3.12, nextpnr-ice40, and fpga-icestorm pre-installed.

Windows host + WSL2 devcontainer — USB flashing setup:

  1. Install usbipd-win (https://github.com/dorssel/usbipd-win/releases)
  2. Run .devcontainer/attach-icebreaker.ps1 as Administrator before opening the devcontainer
  3. The devcontainer runs --privileged to pass through the USB device

Local venv (outside devcontainer):

python -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install -r requirements.txt

Yosys is bundled in amaranth-yosys; nextpnr-ice40 and iceprog must be installed separately (via apt on Linux, or via the devcontainer).


Commands

Build and flash the iCEbreaker (must run from workspace root):

python rebbarb/rebbarb.py

Runs synthesis (yosys), place-and-route (nextpnr-ice40), and flashes via iceprog. Set ICEPROG=/path/to/iceprog env var to override the binary location. Note: rebbarb/rebbarb.py builds a 36 MHz LED blink demo. The BBA implementation (exi_bba/) uses a split-domain clock: capture @ 54 MHz (PLL) for the SPI bit engine, exi/sync @ 24 MHz (HFOSC) for everything else. Synthesize/flash the real design with python -m exi_bba.synth [--flash].

Run a simulation:

# New-API testbench style (preferred for new code):
python rebbarb/toggle_button.py     # writes ToggleButton.vcd
python rebbarb/pulse_button.py      # writes PulseButton.vcd

# Old-API process style (reference only, do not replicate in new code):
python examples/amaranth_cdc.py     # CDC primitives demo
python examples/async_fifo.py       # AsyncFIFO behaviour
python examples/icebreaker_fifo.py  # iCEbreaker-specific FIFO (Verilog dump)

Open VCD output with gtkwave. Simulations are the primary testing mechanism — there is no separate test runner.

Verify PLL parameters:

icepll -i 12 -o 54    # confirms DIVR=0 DIVF=71 DIVQ=4 → 54 MHz (capture domain)

(exi/sync come from the internal SB_HFOSC ÷2 = 24 MHz — no PLL.)


Current Implementation State

The exi_bba/ module tree is fully implemented with simulation testbenches. All modules elaborate without errors and pass their unit tests. The full design synthesizes, places, routes, and meets timing on the iCE40UP5K (python -m exi_bba.synth): capture closes ~70 MHz (target 54) and exi/ sync close ~36 MHz (target 24) — both PASS.

exi_bba/ module status

Module File Tests pass
BBATop exi_bba/bba_top.py EXI integration + full W5100→SPRAM→GC RX loop; synth PASS
ExiCapture exi_bba/exi_capture.py rx/tx byte-stream + over-push/flush
SPIMode3Slave exi_bba/spi_mode3_slave.py 4 tests (live-drive TX)
BBARegisterFile exi_bba/bba_register_file.py 7 tests (proactive push + DMA stream)
SPRAMArbiter exi_bba/spram_arbiter.py 3 tests
RXFrameAssembler exi_bba/rx_frame_assembler.py 3 tests
TXFrameDrain exi_bba/tx_frame_drain.py 2 tests
W5100ParallelMaster exi_bba/w5100_parallel_master.py 5 tests (init/TX/RX vs bus model, incl. ring wrap) — default eth back-end
W5500SPIMaster exi_bba/w5500_spi_master.py init/TX/RX vs SPI-slave model (alt back-end)
StatusPanel exi_bba/status_panel.py 6 tests (heartbeat, stretched activity LEDs, debounced buttons, freeze)
EEPROMModel exi_bba/eeprom_model.py 4 tests

Bring-up status panel (optional): BBATop(status_panel=True) adds a StatusPanel driving onboard iCEbreaker LEDs + button (dedicated pins, so it coexists with EXI + W5100). synth.py enables it: LEDG=heartbeat, LEDR=EXI activity (the GC is talking), RGB red=rx / green=tx / blue=ready (via SB_RGBA_DRV on pins 39/40/41), BTN_N=manual re-init. All 5 panel LEDs are now mapped on the iCEbreaker. The full EXI + W5100 + panel build synthesizes and meets timing (slow ~35≥24, capture ~64≥54, 44% LC).

Ethernet back-end is selectable: BBATop(eth="w5100") (default — indirect parallel bus, reaches the ~27 Mbit/s EXI ceiling) or BBATop(eth="w5500") (SPI, ~12 Mbit/s). Both masters expose the identical tx/rx/init/par streaming interface; only the physical pins differ. See "W5100 vs W5500" below.

Run all module testbenches (from workspace root)

python -m exi_bba.spi_mode3_slave
python -m exi_bba.exi_capture
python -m exi_bba.bba_register_file
python -m exi_bba.spram_arbiter
python -m exi_bba.rx_frame_assembler
python -m exi_bba.tx_frame_drain
python -m exi_bba.w5100_parallel_master   # 5 tests: init, TX(+wrap), RX(+wrap)
python -m exi_bba.w5500_spi_master
python -m exi_bba.status_panel            # 6 tests: heartbeat/activity/buttons
python -m exi_bba.eeprom_model
python -m exi_bba.bba_top        # end-to-end EXI integration test (W5100 RX loop)

Pending work

  • Synthesis/timing: done — python -m exi_bba.synth synthesizes, P&Rs, and meets timing on both clock domains (capture ~68≥54, slow ~40≥24).
  • W5500 init/TX/RX: done — W5500SPIMaster has a real Mode-0 byte engine, a generic register-transaction engine (header + wbuf/stream payload), the full init sequence (MR reset, SHAR, S0_MR MACRAW, S0_CR OPEN, S0_IMR), MACRAW TX (read TX_WR → stream frame to TX buffer → advance TX_WR → SEND) and MACRAW RX (RSR → RD → 2-byte length → stream frame out → advance RD → RECV). All verified on the wire by a responding W5500 SPI-slave model in the testbench.
  • PAR05 → W5500 SHAR: done — reg.par wired to w5500.par in BBATop (PAR0 packed in the low byte so it is the first SHAR octet).
  • NCRA SR bit: done — BBARegisterFile.ncra_sr (= NCRA[3]) gates asm.rx_enabled in BBATop (was hard-wired to 1).
  • W5500 SPI throughput: SCK = sync÷2 = 12 MHz (~12 Mbit/s) — exceeds real-world GC BBA TCP throughput (~610 Mbit/s) but is below the 27 Mbit/s raw EXI ceiling. Pushing past 12 Mbit/s was investigated and found NOT achievable on this UP5K (the W5500-operating logic is distributed ~40 MHz, not just the bit-bang) — see the "Full-rate W5500 SPI" item below. W5500SPIMaster(clk_div=N) divides SCK further if signal integrity needs it.
  • EXI DMA bulk reads: done — SPRAM-region reads (addr ≥ 0x100) now STREAM until CS deasserts instead of stopping at the header's 2-bit length, so they serve both ≤4-byte immediate reads (Swiss) AND arbitrary-length DMA reads (other GC software, and a future Swiss path for loading ROMs from a network file store). Implementation:
    • SPIMode3Slave.cs_active (synchronised CS level) → ExiCapture crosses it to the exi domain (FFSynchronizer) → BBARegisterFile.cs_active.
    • BBARegisterFile SPRAM_STREAM state: auto-increments the SPRAM address, prefetches up to SP_LIMIT=4 reads in flight, pushes responses to tx_fifo; SPRAM_END drains the in-flight pipeline + rx dummies on CS-rise.
    • ExiCapture flushes tx_fifo on CS-fall to clear prefetch over-push so a truncated DMA read can't leak stale bytes into the next transaction. Tested: register-file streaming read (SPRAM model, 12 bytes), ExiCapture over-push/flush, AND the full BBATop loop — a W5500 model delivers a frame → W5500 master RX → RXFrameAssembler writes the SPRAM ring → GC reads RWP then DMA-reads the descriptor+frame back (verified byte-for-byte). Note: a DMA read header must keep length-1 within the 2-bit field; the GC driver sets it ≤3 and clocks the real length via CS (the design streams until CS regardless). (EXI DMA writes are not implemented; the GC's DMA-write engine has a 1-bit-shift bug and Swiss avoids them — see design-doc §"EXI DMA bug".)
  • S0_IR interrupt clear after RX: done — W5500SPIMaster RX_CLR_IR state writes Sn_IR[2]=1 after RECV so INT_N deasserts (else the FSM would re-enter RX_CHECK forever on real hardware).
  • Full-rate W5500 SPI (27 Mbit/s) — INVESTIGATED, NOT achievable on UP5K: the W5500 SCK is sync÷2 = 12 MHz. Raising it needs the SPI engine on a ≥54 MHz clock, but a standalone synth of W5500SPIMaster in the capture domain closes only 40 MHz — and the slack histogram shows the failure is distributed (~140 endpoints fail 54, incl. the wbuf/header mux feeding the shift register), NOT a single cuttable path. So the bottleneck is the logic that operates the SPI device (transaction FSM, byte sourcing), not the bit-bang. Consequences:
    • The "split the bit engine to capture + per-byte CDC handshake" idea nets only ~14 Mbit/s — the CDC round-trip ≈ the SPI byte time — not worth it.
    • A capture-domain "streaming executor" would still contain that distributed ~40 MHz logic, so it wouldn't close 54 either.
    • Hardware SB_SPI wouldn't help (it only offloads the bit-bang, which was never the bottleneck) and is unsimulatable.
    • There is no usable clock between 24 (HFOSC) and 54 (the one PLL, needed at 54 for the EXI front-end); PLL÷2 = 27 → SCK 13.5 MHz, a ~12% gain, not worth the fabric divider. Net: 12 Mbit/s is the practical W5500 ceiling on this part. It exceeds real-world GC BBA TCP throughput and is fine for chunked ROM streaming. Reaching 27 Mbit/s would need a faster FPGA or a much shallower W5500-operating redesign (uncertain) — OR a parallel-bus ethernet chip (see W5100 below), which is the implemented solution for the ROM-streaming throughput target.

W5100 vs W5500 ethernet back-end

The throughput insight: SPI serialises 8 bits/byte, so the W5500 byte rate is (operating-logic clock)/16 — and that logic caps ~40 MHz on this UP5K → ~12 Mbit/s. A parallel bus moves a whole byte per access, so the same ~24 MHz sync logic clears the 27 Mbit/s EXI ceiling (the real hard limit — the GC EXI bus tops out there). So W5100ParallelMaster is the throughput path and is now the BBATop default.

  • Interface: W5100 indirect parallel bus (IDM). Only A[1:0] are wired (board ties A[14:2]=0 so a power-up direct access at A=00 still hits MR): 00=MR, 01=IDM_AR0(hi), 10=IDM_AR1(lo), 11=IDM_DR. A register/buffer access = write IDM_AR (the 16-bit address) then read/write IDM_DR. With MR.AI set, IDM_DR auto-increments → a multi-byte block is one address-set + a burst.
  • Bus engine: drives A + D with /CS and /RD|/WR asserted for strobe_cycles (default 3 ≈ 125 ns at 24 MHz, ≥ the W5100's ~80 ns access). DATA[7:0] is bidirectional → an SB_IO tristate (bus_data_o/oe/i).
  • Pins (15): A[1:0]=2, D[7:0]=8, /CS,/RD,/WR=3, /INT=1, /RST=1. With EXI (5)
    • clk (1) = 21 of ~34 usable SG48 I/O — comfortable. See synth.py.
  • MR.AI requires init first: unlike the W5500 (each SPI transaction is self-framed), the W5100's multi-byte accesses depend on MR.AI, so the init sequence (triggered by the GC's NCRA reset) MUST run before any TX/RX. The BBATop test issues NCRA-reset before its RX loop for this reason; on hardware the GC driver already does. (BBATop(reset_cycles=N) shrinks the MR settle wait for sim.)
  • Ring wraparound is in fabric: the W5100 does NOT auto-wrap the IDM address at the socket-buffer boundary (the W5500 did), so the streamer re-sets IDM_AR to the buffer base when the running address reaches the 2 KB boundary. Handled in the SW/SR/RB paths (xfer_wrap/xfer_wbase/xfer_wend/cur_addr); both TX and RX wrap cases are tested.
  • Register map differs from the W5500: common regs at 0x0000 (MR, SHAR 0x09, IMR 0x16, RMSR/TMSR 0x1A/0x1B), socket 0 at 0x0400 (S0_MR/CR/IR, TX_WR 0x424, RX_RSR 0x426, RX_RD 0x428), TX buffer 0x4000, RX buffer 0x6000. MACRAW mode.
  • Status: init/TX/RX (with wrap) verified vs a bus model; BBATop full W5100→SPRAM→GC RX loop passes byte-for-byte; synth PASS (slow ~32≥24, capture ~56≥54, 42% LC). Register addresses/MR bits are from the datasheet (from memory) — confirm at hardware bring-up.
  • rebbarb.py — blinks LEDs via a PLL (36 MHz), demonstrates IceBreakerPlatform
  • debouncer.pyDebouncer(cycles) — synchronous debounce, configurable hold
  • toggle_button.pyToggleButton — edge-to-toggle state machine (wraps Debouncer)
  • pulse_button.pyPulseButton — single-cycle pulse on rising edge (wraps Debouncer)

These components are reusable building blocks. The Debouncer and button wrappers will be needed for any physical input in exi_bba/.

Import note: rebbarb/ files use bare imports (from debouncer import Debouncer). Run them as python rebbarb/<file>.py from the workspace root so Python adds rebbarb/ to sys.path automatically.

Simulation at module level: toggle_button.py and pulse_button.py run their simulations unconditionally (no __main__ guard) — importing either file triggers a VCD write. New modules should guard simulation code with if __name__ == "__main__":.

examples/amaranth_cdc.py contains handwritten SyncFF and TogglePulseSync reference implementations — use amaranth.lib.cdc primitives (FFSynchronizer, PulseSynchronizer) in production code instead.

hardware/sp1_test_plug/ — KiCad project for a physical SP1 edge-connector test plug (schematic, PCB, custom GameCube symbol library). Used to verify pad geometry before ordering the interposer PCB; not part of the FPGA build.


Amaranth Simulator API

Two API generations are present in this repo:

API Where used Status
sim.add_testbench(async_fn) + await ctx.tick() + Period(MHz=n) rebbarb/*.py Use this for new code
sim.add_sync_process(gen_fn) + sim.run_until(t) examples/ Old — reference only

New modules should use the testbench API (add_testbench, sim.write_vcd(ctx) context manager). The old process API still works but is not idiomatic in current Amaranth.

Critical testbench timing rule: ctx.get(signal) reads signal values AFTER the clock edge (post-update registered values). Combinatorial signals that depend on registered signals that were updated by the SAME tick will already reflect the new registered values. For example: if tx_sof = tx_bytes_r_rdy & is_first and is_first is cleared synchronously on the first byte, then reading tx_sof after the first byte's tick always returns 0 — read BEFORE the tick instead.

ctx.set() takes effect immediately (combinatorial, not registered). Use it AFTER await ctx.tick() to prepare inputs for the NEXT tick.

The full design specification lives in docs/gc_bba_fpga_design.md.


Key Architecture Decisions

  • No network stack in the FPGA. The GC CPU runs TCP/IP. The FPGA is a dumb MAC bridge.
  • Split-domain clocking — 3 domains, 2 sources (1 PLL + 1 HFOSC):
    • capture — 54 MHz (PLL, DIVR=0 DIVF=71 DIVQ=4). Hosts ONLY the SPI Mode 3 bit engine inside ExiCapture. 54 MHz = 2× the real 27 MHz EXI clock — the minimum oversampling for clean Mode 3. The isolated bit engine closes ~91 MHz; integrated with the byte-FIFO read path the capture domain closes ~62 MHz, so 54 passes with margin.
    • exi — 24 MHz (HFOSC ÷2). BBA register file / transaction FSM.
    • sync — 24 MHz (same HFOSC net as exi). SPRAM arbiter, RX/TX engines, W5500 SPI master.
    • Why split: only the tiny SPI bit engine needs a fast clock to sample 27 MHz EXI. The bulky register-file/SPRAM/W5500 logic is routing-bound at ~3344 MHz on the UP5K and only needs the byte rate (27 MHz ÷ 8 ≈ 3.4 MHz). ExiCapture bridges capture↔exi with rx/tx byte AsyncFIFOs.
    • EXI clock reality: the GC EXI clock tops out at ~27 MHz. libogc's EXI_SPEED32MHZ is a nominal name — the real rate is 27 MHz. The old "96 MHz = 3× 32 MHz EXI" target was doubly wrong and unreachable on UP5K (which caps ~44 MHz for non-trivial logic).
    • TX/MISO across the split: the register file PROACTIVELY pushes read responses into the tx byte FIFO during the EXI clock-idle gap (the GC pauses the clock between an EXI_Imm header-write and the data-read). The bit engine drives MISO live from the FIFO head; see ExiCapture / SPIMode3Slave.
  • All CDC via amaranth.lib.cdc. Never pass raw multi-bit signals across domains. Use FFSynchronizer for slow single bits, PulseSynchronizer for events, AsyncFIFO for data streams, ResetSynchronizer for resets.
  • Register file lives entirely in exi domain. The sync domain only communicates through AsyncFIFOs and PulseSynchronizers — never direct register reads/writes.

Critical Protocol Notes

EXI / SPI Mode 3

  • CLK idles HIGH (CPOL=1, CPHA=1).
  • MOSI sampled on falling CLK edge. MISO driven on rising CLK edge.
  • Getting this wrong means the GC never enumerates the device.
  • CS is active low, delineates each transaction.

EXI Transaction Header (2 bytes before data)

Byte 0: [7]=write_flag  [6:0]=addr[12:6]
Byte 1: [7:2]=addr[5:0] [1:0]=xfer_len-1  (0=1B … 3=4B)

Full address = 13 bits → 0x00000x1FFF.

Device ID Query

On power-on the GC writes 0x0000 (2 bytes) then reads 4 bytes. Must return: 0x04 0x02 0x02 0x00.


Memory Map (abridged)

Range Region
0x00000x0033 MAC control registers (register file, exi domain)
0x0048 TXDATA — bulk TX data port (→ tx_bytes AsyncFIFO)
0x01000x0FFF RX ring buffer in SPRAM (15 × 256-byte pages, pages 115)
0x01000x1FFF any read ≥ 0x0100 streams from SPRAM (DMA path); the ring proper is pages 115 above

Key Registers

Addr Name Notes
0x00 NCRA [0]=RESET self-clears; pulses ncra_rst to sync domain
0x08 IMR Interrupt mask
0x09 IR Write-1-to-clear. [1]=RI, [2]=TI. INT_N asserts when IR & IMR ≠ 0
0x1617 RWP RX write pointer — updated by sync domain via rx_wptr FIFO
0x1819 RRP RX read pointer — GC writes after consuming frames
0x2025 PAR05 MAC address; also forwarded to W5500 as SHAR
0x31 NWAYS Hardcode 0x17 (100M full-duplex link up, autoneg complete)
0x3A HIPR Hardcode 0x01 (BBA present)
0x48 TXDATA GC streams TX frame bytes here

Module Breakdown

Module Domain File
BBATop all exi_bba/bba_top.py
ExiCapture capture (+exi FIFOs) exi_bba/exi_capture.py
SPIMode3Slave capture (param domain) exi_bba/spi_mode3_slave.py
BBARegisterFile exi (+FIFO to sync) exi_bba/bba_register_file.py
SPRAMArbiter sync exi_bba/spram_arbiter.py
RXFrameAssembler sync exi_bba/rx_frame_assembler.py
TXFrameDrain sync exi_bba/tx_frame_drain.py
W5100ParallelMaster sync exi_bba/w5100_parallel_master.py (default eth)
W5500SPIMaster sync exi_bba/w5500_spi_master.py (alt eth)
EEPROMModel exi exi_bba/eeprom_model.py

ExiCapture wraps SPIMode3Slave (in the fast capture domain) plus the capture↔exi rx/tx byte AsyncFIFOs. BBARegisterFile consumes the rx byte stream and proactively pushes read responses into the tx byte FIFO — it no longer sees the per-bit SPI cadence (that lives entirely in capture).


CDC Signal Inventory

Signal Direction Primitive
EXI CLK / MOSI / CS pins async → capture FFSynchronizer (stages=2)
RX byte stream (capture→core) capture → exi AsyncFIFO 8-bit, depth=4
TX byte stream (core→capture) exi → capture AsyncFIFO 8-bit, depth=2
cs_active (transaction in progress) capture → exi FFSynchronizer (DMA read length)
SPRAM read request (addr) exi → sync AsyncFIFO 16-bit, depth=4
SPRAM read result (data) sync → exi AsyncFIFO 8-bit, depth=4
TX packet bytes exi → sync AsyncFIFO 8-bit, depth=16
TX frame length exi → sync AsyncFIFO 16-bit, depth=4
RX frame bytes sync → SPRAM RXFrameAssemblerSPRAMArbiter (not a byte FIFO; the GC reads frames back out of SPRAM via the SPRAM read req/rsp FIFOs)
RWP update sync → exi AsyncFIFO 8-bit, depth=4
RRP update exi → sync AsyncFIFO 8-bit, depth=4
RX ready (IR[RI]) sync → exi PulseSynchronizer
TX done (IR[TI]) sync → exi PulseSynchronizer
NCRA reset pulse exi → sync PulseSynchronizer

W5500 Configuration (on NCRA reset)

The W5500 selects the register block via the BSB field of the control byte, NOT via the address — so register addresses below are block offsets, not flat 0x4000-style addresses (see _W5500_* and _CTRL_* in w5500_spi_master.py).

1. Write MR     = 0x80   (common block, offset 0x0000)  software reset
2. Wait ~1 ms
3. Write SHAR   = MAC     (common block, offset 0x0009, 6 bytes from PAR05)
4. Write S0_MR  = 0x04    (socket-0 reg block, offset 0x0000)  MACRAW
5. Write S0_CR  = 0x01    (socket-0 reg block, offset 0x0001)  OPEN
6. Write S0_IMR = 0x05    (socket-0 reg block, offset 0x002C)  RECV | SEND_OK

W5500 SPI is Mode 0 (CPOL=0 CPHA=0); SCK = 12 MHz (the 24 MHz sync domain ÷ 2 via a toggle clock-enable). Connect W5500 INT_N to an FPGA input for low-latency RX detection. (The W5500 is the alternate back-end; the W5100 parallel master is the default — see "W5100 vs W5500".)


Physical Interface (SP1 Edge Connector)

  • PCB must be 1.2 mm thick, ENIG finish.
  • Staggered (not mirrored) top/bottom contact rows — same geometry as PCI/ISA.
  • Derive exact pad geometry from SP1ETH KiCad project (silverstee1/SP1ETH), cross-referenced with ETH2SP1 (LaserBear). Do not rely on YAGCD alone.
  • Add 100 µF bulk cap on the interposer near FPGA power pins (3.3 V budget is tight: iCEbreaker ~80 mA + W5500 ~150 mA ≈ 230 mA).
  • Pin 5 is 12 V — do not connect to FPGA I/O. Test point or leave open.
  • EXTIN (pin 1): tie to 3.3 V via 10 kΩ — required for GC device enumeration.
  • All signal levels are 3.3 V. No level shifting needed.

SPRAM Notes

  • iCE40UP5K has 128 KB SPRAM (SB_SPRAM256KA, 16-bit wide).
  • 1-cycle synchronous read latency — result of read at cycle N is valid at N+1.
  • Byte writes via MASKWREN: lower byte = 0b0011, upper byte = 0b1100.
  • Address to SPRAM = byte_address >> 1.
  • ETH writes take priority over EXI reads in the arbiter (safe by ring-buffer invariant: GC only reads pages the ETH engine has already finished).

GC Initialisation Sequence (Swiss/BBA driver)

1.  Write 0x0000 × 2, read 4 B → must get 0x04020200 (device ID)
2.  Write NCRA = 0x01            (reset, self-clears; resets W5500 + SPRAM ptrs)
3.  Poll NCRA bit 0 until 0      (wait reset complete)
4.  Write PAR05                 (MAC address)
5.  Write MAR07 = 0xFF          (promiscuous multicast)
6.  Write ANALOG = 0xD6          (enable PHY — no FPGA effect, just store)
7.  Write NWAYC                  (autoneg config — store only)
8.  Write IMR = 0x86             (enable RBFI | TI | RI interrupts)
9.  Write GCA (AUTOPUB bit)
10. Write NCRA SR bit = 0x08     (start receive)
11. Poll NWAYS until link up     → return hardcoded 0x17 immediately

Implementation Notes & Gotchas

  • NWAYS must return 0x17 always. GC polls it to confirm 100 Mbps link before enabling RX. Do not attempt to reflect real W5500 link status.
  • EEPROMModel can be stubbed initially. Many GC BBA drivers write their own MAC to PAR05 rather than using the EEPROM. Pre-populate PAR05 reset state with a valid Nintendo OUI MAC (00:09:BF:xx:xx:xx).
  • tx_load timing in SPIMode3Slave: pulses at CS assertion (first byte) and after each complete received byte. Upstream must register next TX byte within one exi clock.
  • PLL target 54 MHz: verify with icepll -i 12 -o 54 (DIVR=0 DIVF=71 DIVQ=4) before coding PLL parameters; the capture-domain bit engine oversamples the 27 MHz EXI clock 2×.
  • TX buffer selection (NCRA ST bits): Ignore buffer select (ST1 vs ST0). Treat any non-zero ST as a TX trigger.
  • If nextpnr fails capture-domain timing at 54 MHz: the isolated bit engine closes ~91 MHz, so 54 has margin; if a seed fails, sweep seeds (synth.py --seeds N) or instruct users to configure Swiss to a lower EXI clock index.