25 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project: GC BBA FPGA Replacement
Replace the GameCube Broadband Adapter (DOL-015 / MX98730EC) with an iCEbreaker FPGA (Lattice iCE40UP5K) written in Amaranth HDL. The FPGA emulates the BBA register interface over the GameCube EXI bus and bridges to a WIZnet ethernet chip for real 100BASE-TX ethernet — default W5100 (indirect parallel bus, reaches the EXI throughput ceiling) or W5500 (SPI Pmod, simpler wiring but ~12 Mbit/s). GC software (Swiss homebrew) sees an identical BBA. See "W5100 vs W5500 ethernet back-end".
Development Environment
Preferred: Use the devcontainer (.devcontainer/) which includes Python 3.12,
nextpnr-ice40, and fpga-icestorm pre-installed.
Windows host + WSL2 devcontainer — USB flashing setup:
- Install
usbipd-win(https://github.com/dorssel/usbipd-win/releases) - Run
.devcontainer/attach-icebreaker.ps1as Administrator before opening the devcontainer - The devcontainer runs
--privilegedto pass through the USB device
Local venv (outside devcontainer):
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
Yosys is bundled in amaranth-yosys; nextpnr-ice40 and iceprog must be
installed separately (via apt on Linux, or via the devcontainer).
Commands
Build and flash the iCEbreaker (must run from workspace root):
python rebbarb/rebbarb.py
Runs synthesis (yosys), place-and-route (nextpnr-ice40), and flashes via iceprog.
Set ICEPROG=/path/to/iceprog env var to override the binary location.
Note: rebbarb/rebbarb.py builds a 36 MHz LED blink demo. The BBA
implementation (exi_bba/) uses a split-domain clock: capture @ 54 MHz (PLL)
for the SPI bit engine, exi/sync @ 24 MHz (HFOSC) for everything else.
Synthesize/flash the real design with python -m exi_bba.synth [--flash].
Run a simulation:
# New-API testbench style (preferred for new code):
python rebbarb/toggle_button.py # writes ToggleButton.vcd
python rebbarb/pulse_button.py # writes PulseButton.vcd
# Old-API process style (reference only, do not replicate in new code):
python examples/amaranth_cdc.py # CDC primitives demo
python examples/async_fifo.py # AsyncFIFO behaviour
python examples/icebreaker_fifo.py # iCEbreaker-specific FIFO (Verilog dump)
Open VCD output with gtkwave. Simulations are the primary testing mechanism —
there is no separate test runner.
Verify PLL parameters:
icepll -i 12 -o 54 # confirms DIVR=0 DIVF=71 DIVQ=4 → 54 MHz (capture domain)
(exi/sync come from the internal SB_HFOSC ÷2 = 24 MHz — no PLL.)
Current Implementation State
The exi_bba/ module tree is fully implemented with simulation testbenches.
All modules elaborate without errors and pass their unit tests. The full design
synthesizes, places, routes, and meets timing on the iCE40UP5K
(python -m exi_bba.synth): capture closes ~70 MHz (target 54) and exi/
sync close ~36 MHz (target 24) — both PASS.
exi_bba/ module status
| Module | File | Tests pass |
|---|---|---|
BBATop |
exi_bba/bba_top.py |
✅ EXI integration + full W5100→SPRAM→GC RX loop; synth PASS |
ExiCapture |
exi_bba/exi_capture.py |
✅ rx/tx byte-stream + over-push/flush |
SPIMode3Slave |
exi_bba/spi_mode3_slave.py |
✅ 4 tests (live-drive TX) |
BBARegisterFile |
exi_bba/bba_register_file.py |
✅ 7 tests (proactive push + DMA stream) |
SPRAMArbiter |
exi_bba/spram_arbiter.py |
✅ 3 tests |
RXFrameAssembler |
exi_bba/rx_frame_assembler.py |
✅ 3 tests |
TXFrameDrain |
exi_bba/tx_frame_drain.py |
✅ 2 tests |
W5100ParallelMaster |
exi_bba/w5100_parallel_master.py |
✅ 5 tests (init/TX/RX vs bus model, incl. ring wrap) — default eth back-end |
W5500SPIMaster |
exi_bba/w5500_spi_master.py |
✅ init/TX/RX vs SPI-slave model (alt back-end) |
StatusPanel |
exi_bba/status_panel.py |
✅ 6 tests (heartbeat, stretched activity LEDs, debounced buttons, freeze) |
EEPROMModel |
exi_bba/eeprom_model.py |
✅ 4 tests |
Bring-up status panel (optional): BBATop(status_panel=True) adds a
StatusPanel driving onboard iCEbreaker LEDs + button (dedicated pins, so it
coexists with EXI + W5100). synth.py enables it: LEDG=heartbeat,
LEDR=EXI activity (the GC is talking), RGB red=rx / green=tx / blue=ready
(via SB_RGBA_DRV on pins 39/40/41), BTN_N=manual re-init. All 5 panel
LEDs are now mapped on the iCEbreaker. The full EXI + W5100 + panel build
synthesizes and meets timing (slow ~35≥24, capture ~64≥54, 44% LC).
Ethernet back-end is selectable: BBATop(eth="w5100") (default — indirect
parallel bus, reaches the ~27 Mbit/s EXI ceiling) or BBATop(eth="w5500") (SPI,
~12 Mbit/s). Both masters expose the identical tx/rx/init/par streaming
interface; only the physical pins differ. See "W5100 vs W5500" below.
Run all module testbenches (from workspace root)
python -m exi_bba.spi_mode3_slave
python -m exi_bba.exi_capture
python -m exi_bba.bba_register_file
python -m exi_bba.spram_arbiter
python -m exi_bba.rx_frame_assembler
python -m exi_bba.tx_frame_drain
python -m exi_bba.w5100_parallel_master # 5 tests: init, TX(+wrap), RX(+wrap)
python -m exi_bba.w5500_spi_master
python -m exi_bba.status_panel # 6 tests: heartbeat/activity/buttons
python -m exi_bba.eeprom_model
python -m exi_bba.bba_top # end-to-end EXI integration test (W5100 RX loop)
Pending work
- Synthesis/timing: ✅ done —
python -m exi_bba.synthsynthesizes, P&Rs, and meets timing on both clock domains (capture ~68≥54, slow ~40≥24). - W5500 init/TX/RX: ✅ done —
W5500SPIMasterhas a real Mode-0 byte engine, a generic register-transaction engine (header + wbuf/stream payload), the full init sequence (MR reset, SHAR, S0_MR MACRAW, S0_CR OPEN, S0_IMR), MACRAW TX (read TX_WR → stream frame to TX buffer → advance TX_WR → SEND) and MACRAW RX (RSR → RD → 2-byte length → stream frame out → advance RD → RECV). All verified on the wire by a responding W5500 SPI-slave model in the testbench. - PAR0–5 → W5500 SHAR: ✅ done —
reg.parwired tow5500.parinBBATop(PAR0 packed in the low byte so it is the first SHAR octet). - NCRA SR bit: ✅ done —
BBARegisterFile.ncra_sr(= NCRA[3]) gatesasm.rx_enabledinBBATop(was hard-wired to 1). - W5500 SPI throughput: SCK = sync÷2 = 12 MHz (~12 Mbit/s) — exceeds
real-world GC BBA TCP throughput (~6–10 Mbit/s) but is below the 27 Mbit/s raw
EXI ceiling. Pushing past 12 Mbit/s was investigated and found NOT achievable
on this UP5K (the W5500-operating logic is distributed ~40 MHz, not just the
bit-bang) — see the "Full-rate W5500 SPI" item below.
W5500SPIMaster(clk_div=N)divides SCK further if signal integrity needs it. - EXI DMA bulk reads: ✅ done — SPRAM-region reads (addr ≥ 0x100) now STREAM
until CS deasserts instead of stopping at the header's 2-bit length, so they
serve both ≤4-byte immediate reads (Swiss) AND arbitrary-length DMA reads
(other GC software, and a future Swiss path for loading ROMs from a network
file store). Implementation:
SPIMode3Slave.cs_active(synchronised CS level) →ExiCapturecrosses it to the exi domain (FFSynchronizer) →BBARegisterFile.cs_active.BBARegisterFileSPRAM_STREAM state: auto-increments the SPRAM address, prefetches up to SP_LIMIT=4 reads in flight, pushes responses to tx_fifo; SPRAM_END drains the in-flight pipeline + rx dummies on CS-rise.ExiCaptureflushes tx_fifo on CS-fall to clear prefetch over-push so a truncated DMA read can't leak stale bytes into the next transaction. Tested: register-file streaming read (SPRAM model, 12 bytes), ExiCapture over-push/flush, AND the full BBATop loop — a W5500 model delivers a frame → W5500 master RX → RXFrameAssembler writes the SPRAM ring → GC reads RWP then DMA-reads the descriptor+frame back (verified byte-for-byte). Note: a DMA read header must keep length-1 within the 2-bit field; the GC driver sets it ≤3 and clocks the real length via CS (the design streams until CS regardless). (EXI DMA writes are not implemented; the GC's DMA-write engine has a 1-bit-shift bug and Swiss avoids them — see design-doc §"EXI DMA bug".)
- S0_IR interrupt clear after RX: ✅ done —
W5500SPIMasterRX_CLR_IR state writes Sn_IR[2]=1 after RECV soINT_Ndeasserts (else the FSM would re-enter RX_CHECK forever on real hardware). - Full-rate W5500 SPI (27 Mbit/s) — INVESTIGATED, NOT achievable on UP5K:
the W5500 SCK is sync÷2 = 12 MHz. Raising it needs the SPI engine on a ≥54 MHz
clock, but a standalone synth of
W5500SPIMasterin the capture domain closes only 40 MHz — and the slack histogram shows the failure is distributed (~140 endpoints fail 54, incl. thewbuf/header mux feeding the shift register), NOT a single cuttable path. So the bottleneck is the logic that operates the SPI device (transaction FSM, byte sourcing), not the bit-bang. Consequences:- The "split the bit engine to capture + per-byte CDC handshake" idea nets only ~14 Mbit/s — the CDC round-trip ≈ the SPI byte time — not worth it.
- A capture-domain "streaming executor" would still contain that distributed ~40 MHz logic, so it wouldn't close 54 either.
- Hardware
SB_SPIwouldn't help (it only offloads the bit-bang, which was never the bottleneck) and is unsimulatable. - There is no usable clock between 24 (HFOSC) and 54 (the one PLL, needed at 54 for the EXI front-end); PLL÷2 = 27 → SCK 13.5 MHz, a ~12% gain, not worth the fabric divider. Net: 12 Mbit/s is the practical W5500 ceiling on this part. It exceeds real-world GC BBA TCP throughput and is fine for chunked ROM streaming. Reaching 27 Mbit/s would need a faster FPGA or a much shallower W5500-operating redesign (uncertain) — OR a parallel-bus ethernet chip (see W5100 below), which is the implemented solution for the ROM-streaming throughput target.
W5100 vs W5500 ethernet back-end
The throughput insight: SPI serialises 8 bits/byte, so the W5500 byte rate is
(operating-logic clock)/16 — and that logic caps ~40 MHz on this UP5K → ~12
Mbit/s. A parallel bus moves a whole byte per access, so the same ~24 MHz
sync logic clears the 27 Mbit/s EXI ceiling (the real hard limit — the GC EXI
bus tops out there). So W5100ParallelMaster is the throughput path and is now
the BBATop default.
- Interface: W5100 indirect parallel bus (IDM). Only A[1:0] are wired
(board ties A[14:2]=0 so a power-up direct access at A=00 still hits MR):
00=MR,01=IDM_AR0(hi),10=IDM_AR1(lo),11=IDM_DR. A register/buffer access = write IDM_AR (the 16-bit address) then read/write IDM_DR. With MR.AI set, IDM_DR auto-increments → a multi-byte block is one address-set + a burst. - Bus engine: drives A + D with
/CSand/RD|/WRasserted forstrobe_cycles(default 3 ≈ 125 ns at 24 MHz, ≥ the W5100's ~80 ns access). DATA[7:0] is bidirectional → an SB_IO tristate (bus_data_o/oe/i). - Pins (15): A[1:0]=2, D[7:0]=8, /CS,/RD,/WR=3, /INT=1, /RST=1. With EXI (5)
- clk (1) = 21 of ~34 usable SG48 I/O — comfortable. See
synth.py.
- clk (1) = 21 of ~34 usable SG48 I/O — comfortable. See
- MR.AI requires init first: unlike the W5500 (each SPI transaction is
self-framed), the W5100's multi-byte accesses depend on MR.AI, so the init
sequence (triggered by the GC's NCRA reset) MUST run before any TX/RX. The
BBATop test issues NCRA-reset before its RX loop for this reason; on hardware
the GC driver already does. (
BBATop(reset_cycles=N)shrinks the MR settle wait for sim.) - Ring wraparound is in fabric: the W5100 does NOT auto-wrap the IDM address
at the socket-buffer boundary (the W5500 did), so the streamer re-sets IDM_AR
to the buffer base when the running address reaches the 2 KB boundary. Handled
in the SW/SR/RB paths (
xfer_wrap/xfer_wbase/xfer_wend/cur_addr); both TX and RX wrap cases are tested. - Register map differs from the W5500: common regs at 0x0000 (MR, SHAR 0x09, IMR 0x16, RMSR/TMSR 0x1A/0x1B), socket 0 at 0x0400 (S0_MR/CR/IR, TX_WR 0x424, RX_RSR 0x426, RX_RD 0x428), TX buffer 0x4000, RX buffer 0x6000. MACRAW mode.
- Status: init/TX/RX (with wrap) verified vs a bus model; BBATop full W5100→SPRAM→GC RX loop passes byte-for-byte; synth PASS (slow ~32≥24, capture ~56≥54, 42% LC). Register addresses/MR bits are from the datasheet (from memory) — confirm at hardware bring-up.
rebbarb/ — LED blink demo (unchanged)
rebbarb.py— blinks LEDs via a PLL (36 MHz), demonstratesIceBreakerPlatformdebouncer.py—Debouncer(cycles)— synchronous debounce, configurable holdtoggle_button.py—ToggleButton— edge-to-toggle state machine (wraps Debouncer)pulse_button.py—PulseButton— single-cycle pulse on rising edge (wraps Debouncer)
These components are reusable building blocks. The Debouncer and button wrappers
will be needed for any physical input in exi_bba/.
Import note: rebbarb/ files use bare imports (from debouncer import Debouncer).
Run them as python rebbarb/<file>.py from the workspace root so Python adds
rebbarb/ to sys.path automatically.
Simulation at module level: toggle_button.py and pulse_button.py run
their simulations unconditionally (no __main__ guard) — importing either file
triggers a VCD write. New modules should guard simulation code with
if __name__ == "__main__":.
examples/amaranth_cdc.py contains handwritten SyncFF and TogglePulseSync
reference implementations — use amaranth.lib.cdc primitives (FFSynchronizer,
PulseSynchronizer) in production code instead.
hardware/sp1_test_plug/ — KiCad project for a physical SP1 edge-connector test
plug (schematic, PCB, custom GameCube symbol library). Used to verify pad geometry
before ordering the interposer PCB; not part of the FPGA build.
Amaranth Simulator API
Two API generations are present in this repo:
| API | Where used | Status |
|---|---|---|
sim.add_testbench(async_fn) + await ctx.tick() + Period(MHz=n) |
rebbarb/*.py |
Use this for new code |
sim.add_sync_process(gen_fn) + sim.run_until(t) |
examples/ |
Old — reference only |
New modules should use the testbench API (add_testbench, sim.write_vcd(ctx)
context manager). The old process API still works but is not idiomatic in current
Amaranth.
Critical testbench timing rule: ctx.get(signal) reads signal values AFTER
the clock edge (post-update registered values). Combinatorial signals that depend
on registered signals that were updated by the SAME tick will already reflect the
new registered values. For example: if tx_sof = tx_bytes_r_rdy & is_first and
is_first is cleared synchronously on the first byte, then reading tx_sof after
the first byte's tick always returns 0 — read BEFORE the tick instead.
ctx.set() takes effect immediately (combinatorial, not registered). Use it
AFTER await ctx.tick() to prepare inputs for the NEXT tick.
The full design specification lives in docs/gc_bba_fpga_design.md.
Key Architecture Decisions
- No network stack in the FPGA. The GC CPU runs TCP/IP. The FPGA is a dumb MAC bridge.
- Split-domain clocking — 3 domains, 2 sources (1 PLL + 1 HFOSC):
capture— 54 MHz (PLL, DIVR=0 DIVF=71 DIVQ=4). Hosts ONLY the SPI Mode 3 bit engine insideExiCapture. 54 MHz = 2× the real 27 MHz EXI clock — the minimum oversampling for clean Mode 3. The isolated bit engine closes ~91 MHz; integrated with the byte-FIFO read path the capture domain closes ~62 MHz, so 54 passes with margin.exi— 24 MHz (HFOSC ÷2). BBA register file / transaction FSM.sync— 24 MHz (same HFOSC net asexi). SPRAM arbiter, RX/TX engines, W5500 SPI master.- Why split: only the tiny SPI bit engine needs a fast clock to sample
27 MHz EXI. The bulky register-file/SPRAM/W5500 logic is routing-bound at
~33–44 MHz on the UP5K and only needs the byte rate (27 MHz ÷ 8 ≈ 3.4 MHz).
ExiCapturebridges capture↔exi with rx/tx byte AsyncFIFOs. - EXI clock reality: the GC EXI clock tops out at ~27 MHz. libogc's
EXI_SPEED32MHZis a nominal name — the real rate is 27 MHz. The old "96 MHz = 3× 32 MHz EXI" target was doubly wrong and unreachable on UP5K (which caps ~44 MHz for non-trivial logic). - TX/MISO across the split: the register file PROACTIVELY pushes read
responses into the tx byte FIFO during the EXI clock-idle gap (the GC pauses
the clock between an EXI_Imm header-write and the data-read). The bit engine
drives MISO live from the FIFO head; see
ExiCapture/SPIMode3Slave.
- All CDC via
amaranth.lib.cdc. Never pass raw multi-bit signals across domains. UseFFSynchronizerfor slow single bits,PulseSynchronizerfor events,AsyncFIFOfor data streams,ResetSynchronizerfor resets. - Register file lives entirely in
exidomain. Thesyncdomain only communicates through AsyncFIFOs and PulseSynchronizers — never direct register reads/writes.
Critical Protocol Notes
EXI / SPI Mode 3
- CLK idles HIGH (CPOL=1, CPHA=1).
- MOSI sampled on falling CLK edge. MISO driven on rising CLK edge.
- Getting this wrong means the GC never enumerates the device.
- CS is active low, delineates each transaction.
EXI Transaction Header (2 bytes before data)
Byte 0: [7]=write_flag [6:0]=addr[12:6]
Byte 1: [7:2]=addr[5:0] [1:0]=xfer_len-1 (0=1B … 3=4B)
Full address = 13 bits → 0x0000–0x1FFF.
Device ID Query
On power-on the GC writes 0x0000 (2 bytes) then reads 4 bytes.
Must return: 0x04 0x02 0x02 0x00.
Memory Map (abridged)
| Range | Region |
|---|---|
| 0x0000–0x0033 | MAC control registers (register file, exi domain) |
| 0x0048 | TXDATA — bulk TX data port (→ tx_bytes AsyncFIFO) |
| 0x0100–0x0FFF | RX ring buffer in SPRAM (15 × 256-byte pages, pages 1–15) |
| 0x0100–0x1FFF | any read ≥ 0x0100 streams from SPRAM (DMA path); the ring proper is pages 1–15 above |
Key Registers
| Addr | Name | Notes |
|---|---|---|
| 0x00 | NCRA | [0]=RESET self-clears; pulses ncra_rst to sync domain |
| 0x08 | IMR | Interrupt mask |
| 0x09 | IR | Write-1-to-clear. [1]=RI, [2]=TI. INT_N asserts when IR & IMR ≠ 0 |
| 0x16–17 | RWP | RX write pointer — updated by sync domain via rx_wptr FIFO |
| 0x18–19 | RRP | RX read pointer — GC writes after consuming frames |
| 0x20–25 | PAR0–5 | MAC address; also forwarded to W5500 as SHAR |
| 0x31 | NWAYS | Hardcode 0x17 (100M full-duplex link up, autoneg complete) |
| 0x3A | HIPR | Hardcode 0x01 (BBA present) |
| 0x48 | TXDATA | GC streams TX frame bytes here |
Module Breakdown
| Module | Domain | File |
|---|---|---|
BBATop |
all | exi_bba/bba_top.py |
ExiCapture |
capture (+exi FIFOs) | exi_bba/exi_capture.py |
SPIMode3Slave |
capture (param domain) |
exi_bba/spi_mode3_slave.py |
BBARegisterFile |
exi (+FIFO to sync) | exi_bba/bba_register_file.py |
SPRAMArbiter |
sync | exi_bba/spram_arbiter.py |
RXFrameAssembler |
sync | exi_bba/rx_frame_assembler.py |
TXFrameDrain |
sync | exi_bba/tx_frame_drain.py |
W5100ParallelMaster |
sync | exi_bba/w5100_parallel_master.py (default eth) |
W5500SPIMaster |
sync | exi_bba/w5500_spi_master.py (alt eth) |
EEPROMModel |
exi | exi_bba/eeprom_model.py |
ExiCapture wraps SPIMode3Slave (in the fast capture domain) plus the
capture↔exi rx/tx byte AsyncFIFOs. BBARegisterFile consumes the rx byte
stream and proactively pushes read responses into the tx byte FIFO — it no
longer sees the per-bit SPI cadence (that lives entirely in capture).
CDC Signal Inventory
| Signal | Direction | Primitive |
|---|---|---|
| EXI CLK / MOSI / CS pins | async → capture | FFSynchronizer (stages=2) |
| RX byte stream (capture→core) | capture → exi | AsyncFIFO 8-bit, depth=4 |
| TX byte stream (core→capture) | exi → capture | AsyncFIFO 8-bit, depth=2 |
| cs_active (transaction in progress) | capture → exi | FFSynchronizer (DMA read length) |
| SPRAM read request (addr) | exi → sync | AsyncFIFO 16-bit, depth=4 |
| SPRAM read result (data) | sync → exi | AsyncFIFO 8-bit, depth=4 |
| TX packet bytes | exi → sync | AsyncFIFO 8-bit, depth=16 |
| TX frame length | exi → sync | AsyncFIFO 16-bit, depth=4 |
| RX frame bytes | sync → SPRAM | RXFrameAssembler → SPRAMArbiter (not a byte FIFO; the GC reads frames back out of SPRAM via the SPRAM read req/rsp FIFOs) |
| RWP update | sync → exi | AsyncFIFO 8-bit, depth=4 |
| RRP update | exi → sync | AsyncFIFO 8-bit, depth=4 |
| RX ready (IR[RI]) | sync → exi | PulseSynchronizer |
| TX done (IR[TI]) | sync → exi | PulseSynchronizer |
| NCRA reset pulse | exi → sync | PulseSynchronizer |
W5500 Configuration (on NCRA reset)
The W5500 selects the register block via the BSB field of the control byte,
NOT via the address — so register addresses below are block offsets, not flat
0x4000-style addresses (see _W5500_* and _CTRL_* in w5500_spi_master.py).
1. Write MR = 0x80 (common block, offset 0x0000) software reset
2. Wait ~1 ms
3. Write SHAR = MAC (common block, offset 0x0009, 6 bytes from PAR0–5)
4. Write S0_MR = 0x04 (socket-0 reg block, offset 0x0000) MACRAW
5. Write S0_CR = 0x01 (socket-0 reg block, offset 0x0001) OPEN
6. Write S0_IMR = 0x05 (socket-0 reg block, offset 0x002C) RECV | SEND_OK
W5500 SPI is Mode 0 (CPOL=0 CPHA=0); SCK = 12 MHz (the 24 MHz sync
domain ÷ 2 via a toggle clock-enable). Connect W5500 INT_N to an FPGA input
for low-latency RX detection. (The W5500 is the alternate back-end; the W5100
parallel master is the default — see "W5100 vs W5500".)
Physical Interface (SP1 Edge Connector)
- PCB must be 1.2 mm thick, ENIG finish.
- Staggered (not mirrored) top/bottom contact rows — same geometry as PCI/ISA.
- Derive exact pad geometry from SP1ETH KiCad project (silverstee1/SP1ETH), cross-referenced with ETH2SP1 (LaserBear). Do not rely on YAGCD alone.
- Add 100 µF bulk cap on the interposer near FPGA power pins (3.3 V budget is tight: iCEbreaker ~80 mA + W5500 ~150 mA ≈ 230 mA).
- Pin 5 is 12 V — do not connect to FPGA I/O. Test point or leave open.
EXTIN(pin 1): tie to 3.3 V via 10 kΩ — required for GC device enumeration.- All signal levels are 3.3 V. No level shifting needed.
SPRAM Notes
- iCE40UP5K has 128 KB SPRAM (SB_SPRAM256KA, 16-bit wide).
- 1-cycle synchronous read latency — result of read at cycle N is valid at N+1.
- Byte writes via
MASKWREN: lower byte =0b0011, upper byte =0b1100. - Address to SPRAM = byte_address >> 1.
- ETH writes take priority over EXI reads in the arbiter (safe by ring-buffer invariant: GC only reads pages the ETH engine has already finished).
GC Initialisation Sequence (Swiss/BBA driver)
1. Write 0x0000 × 2, read 4 B → must get 0x04020200 (device ID)
2. Write NCRA = 0x01 (reset, self-clears; resets W5500 + SPRAM ptrs)
3. Poll NCRA bit 0 until 0 (wait reset complete)
4. Write PAR0–5 (MAC address)
5. Write MAR0–7 = 0xFF (promiscuous multicast)
6. Write ANALOG = 0xD6 (enable PHY — no FPGA effect, just store)
7. Write NWAYC (autoneg config — store only)
8. Write IMR = 0x86 (enable RBFI | TI | RI interrupts)
9. Write GCA (AUTOPUB bit)
10. Write NCRA SR bit = 0x08 (start receive)
11. Poll NWAYS until link up → return hardcoded 0x17 immediately
Implementation Notes & Gotchas
NWAYSmust return0x17always. GC polls it to confirm 100 Mbps link before enabling RX. Do not attempt to reflect real W5500 link status.EEPROMModelcan be stubbed initially. Many GC BBA drivers write their own MAC to PAR0–5 rather than using the EEPROM. Pre-populate PAR0–5 reset state with a valid Nintendo OUI MAC (00:09:BF:xx:xx:xx).tx_loadtiming inSPIMode3Slave: pulses at CS assertion (first byte) and after each complete received byte. Upstream must register next TX byte within oneexiclock.- PLL target 54 MHz: verify with
icepll -i 12 -o 54(DIVR=0 DIVF=71 DIVQ=4) before coding PLL parameters; the capture-domain bit engine oversamples the 27 MHz EXI clock 2×. - TX buffer selection (NCRA ST bits): Ignore buffer select (ST1 vs ST0). Treat any non-zero ST as a TX trigger.
- If nextpnr fails capture-domain timing at 54 MHz: the isolated bit engine
closes ~91 MHz, so 54 has margin; if a seed fails, sweep seeds
(
synth.py --seeds N) or instruct users to configure Swiss to a lower EXI clock index.