Added full design created with Claude
This commit is contained in:
@@ -0,0 +1,493 @@
|
||||
# CLAUDE.md
|
||||
|
||||
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||
|
||||
## Project: GC BBA FPGA Replacement
|
||||
|
||||
Replace the GameCube Broadband Adapter (DOL-015 / MX98730EC) with an iCEbreaker
|
||||
FPGA (Lattice iCE40UP5K) written in Amaranth HDL. The FPGA emulates the BBA
|
||||
register interface over the GameCube EXI bus and bridges to a WIZnet ethernet
|
||||
chip for real 100BASE-TX ethernet — default **W5100** (indirect parallel bus,
|
||||
reaches the EXI throughput ceiling) or **W5500** (SPI Pmod, simpler wiring but
|
||||
~12 Mbit/s). GC software (Swiss homebrew) sees an identical BBA. See "W5100 vs
|
||||
W5500 ethernet back-end".
|
||||
|
||||
---
|
||||
|
||||
## Development Environment
|
||||
|
||||
**Preferred:** Use the devcontainer (`.devcontainer/`) which includes Python 3.12,
|
||||
`nextpnr-ice40`, and `fpga-icestorm` pre-installed.
|
||||
|
||||
**Windows host + WSL2 devcontainer — USB flashing setup:**
|
||||
1. Install `usbipd-win` (https://github.com/dorssel/usbipd-win/releases)
|
||||
2. Run `.devcontainer/attach-icebreaker.ps1` as Administrator before opening the devcontainer
|
||||
3. The devcontainer runs `--privileged` to pass through the USB device
|
||||
|
||||
**Local venv (outside devcontainer):**
|
||||
```bash
|
||||
python -m venv .venv
|
||||
source .venv/bin/activate # Windows: .venv\Scripts\activate
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
Yosys is bundled in `amaranth-yosys`; `nextpnr-ice40` and `iceprog` must be
|
||||
installed separately (via apt on Linux, or via the devcontainer).
|
||||
|
||||
---
|
||||
|
||||
## Commands
|
||||
|
||||
**Build and flash the iCEbreaker (must run from workspace root):**
|
||||
```bash
|
||||
python rebbarb/rebbarb.py
|
||||
```
|
||||
Runs synthesis (yosys), place-and-route (nextpnr-ice40), and flashes via `iceprog`.
|
||||
Set `ICEPROG=/path/to/iceprog` env var to override the binary location.
|
||||
Note: `rebbarb/rebbarb.py` builds a 36 MHz LED blink demo. The BBA
|
||||
implementation (`exi_bba/`) uses a split-domain clock: `capture` @ 54 MHz (PLL)
|
||||
for the SPI bit engine, `exi`/`sync` @ 24 MHz (HFOSC) for everything else.
|
||||
Synthesize/flash the real design with `python -m exi_bba.synth [--flash]`.
|
||||
|
||||
**Run a simulation:**
|
||||
```bash
|
||||
# New-API testbench style (preferred for new code):
|
||||
python rebbarb/toggle_button.py # writes ToggleButton.vcd
|
||||
python rebbarb/pulse_button.py # writes PulseButton.vcd
|
||||
|
||||
# Old-API process style (reference only, do not replicate in new code):
|
||||
python examples/amaranth_cdc.py # CDC primitives demo
|
||||
python examples/async_fifo.py # AsyncFIFO behaviour
|
||||
python examples/icebreaker_fifo.py # iCEbreaker-specific FIFO (Verilog dump)
|
||||
```
|
||||
Open VCD output with `gtkwave`. Simulations are the primary testing mechanism —
|
||||
there is no separate test runner.
|
||||
|
||||
**Verify PLL parameters:**
|
||||
```bash
|
||||
icepll -i 12 -o 54 # confirms DIVR=0 DIVF=71 DIVQ=4 → 54 MHz (capture domain)
|
||||
```
|
||||
(`exi`/`sync` come from the internal SB_HFOSC ÷2 = 24 MHz — no PLL.)
|
||||
|
||||
---
|
||||
|
||||
## Current Implementation State
|
||||
|
||||
The `exi_bba/` module tree is **fully implemented** with simulation testbenches.
|
||||
All modules elaborate without errors and pass their unit tests. The full design
|
||||
**synthesizes, places, routes, and meets timing** on the iCE40UP5K
|
||||
(`python -m exi_bba.synth`): `capture` closes ~70 MHz (target 54) and `exi`/
|
||||
`sync` close ~36 MHz (target 24) — both PASS.
|
||||
|
||||
### `exi_bba/` module status
|
||||
|
||||
| Module | File | Tests pass |
|
||||
|---|---|---|
|
||||
| `BBATop` | `exi_bba/bba_top.py` | ✅ EXI integration + full W5100→SPRAM→GC RX loop; synth PASS |
|
||||
| `ExiCapture` | `exi_bba/exi_capture.py` | ✅ rx/tx byte-stream + over-push/flush |
|
||||
| `SPIMode3Slave` | `exi_bba/spi_mode3_slave.py` | ✅ 4 tests (live-drive TX) |
|
||||
| `BBARegisterFile` | `exi_bba/bba_register_file.py` | ✅ 7 tests (proactive push + DMA stream) |
|
||||
| `SPRAMArbiter` | `exi_bba/spram_arbiter.py` | ✅ 3 tests |
|
||||
| `RXFrameAssembler` | `exi_bba/rx_frame_assembler.py` | ✅ 3 tests |
|
||||
| `TXFrameDrain` | `exi_bba/tx_frame_drain.py` | ✅ 2 tests |
|
||||
| `W5100ParallelMaster` | `exi_bba/w5100_parallel_master.py` | ✅ 5 tests (init/TX/RX vs bus model, incl. ring wrap) — **default eth back-end** |
|
||||
| `W5500SPIMaster` | `exi_bba/w5500_spi_master.py` | ✅ init/TX/RX vs SPI-slave model (alt back-end) |
|
||||
| `StatusPanel` | `exi_bba/status_panel.py` | ✅ 6 tests (heartbeat, stretched activity LEDs, debounced buttons, freeze) |
|
||||
| `EEPROMModel` | `exi_bba/eeprom_model.py` | ✅ 4 tests |
|
||||
|
||||
**Bring-up status panel (optional):** `BBATop(status_panel=True)` adds a
|
||||
`StatusPanel` driving onboard iCEbreaker LEDs + button (dedicated pins, so it
|
||||
coexists with EXI + W5100). `synth.py` enables it: **LEDG=heartbeat**,
|
||||
**LEDR=EXI activity** (the GC is talking), **BTN_N=manual re-init**. The full
|
||||
EXI + W5100 + panel build synthesizes and meets timing (slow ~35≥24, capture
|
||||
~64≥54, 44% LC). Panel LEDs 3–5 (rx/tx/ready) exist in the module but aren't
|
||||
mapped on the iCEbreaker (only 2 discrete LEDs); the onboard RGB or a custom
|
||||
PCB can expose them.
|
||||
|
||||
**Ethernet back-end is selectable:** `BBATop(eth="w5100")` (default — indirect
|
||||
parallel bus, reaches the ~27 Mbit/s EXI ceiling) or `BBATop(eth="w5500")` (SPI,
|
||||
~12 Mbit/s). Both masters expose the identical tx/rx/init/par streaming
|
||||
interface; only the physical pins differ. See "W5100 vs W5500" below.
|
||||
|
||||
### Run all module testbenches (from workspace root)
|
||||
```bash
|
||||
python -m exi_bba.spi_mode3_slave
|
||||
python -m exi_bba.exi_capture
|
||||
python -m exi_bba.bba_register_file
|
||||
python -m exi_bba.spram_arbiter
|
||||
python -m exi_bba.rx_frame_assembler
|
||||
python -m exi_bba.tx_frame_drain
|
||||
python -m exi_bba.w5100_parallel_master # 5 tests: init, TX(+wrap), RX(+wrap)
|
||||
python -m exi_bba.w5500_spi_master
|
||||
python -m exi_bba.status_panel # 6 tests: heartbeat/activity/buttons
|
||||
python -m exi_bba.eeprom_model
|
||||
python -m exi_bba.bba_top # end-to-end EXI integration test (W5100 RX loop)
|
||||
```
|
||||
|
||||
### Pending work
|
||||
- **Synthesis/timing**: ✅ done — `python -m exi_bba.synth` synthesizes, P&Rs,
|
||||
and meets timing on both clock domains (capture ~68≥54, slow ~40≥24).
|
||||
- **W5500 init/TX/RX**: ✅ done — `W5500SPIMaster` has a real Mode-0 byte engine,
|
||||
a generic register-transaction engine (header + wbuf/stream payload), the full
|
||||
init sequence (MR reset, SHAR, S0_MR MACRAW, S0_CR OPEN, S0_IMR), MACRAW TX
|
||||
(read TX_WR → stream frame to TX buffer → advance TX_WR → SEND) and MACRAW RX
|
||||
(RSR → RD → 2-byte length → stream frame out → advance RD → RECV). All verified
|
||||
on the wire by a responding W5500 SPI-slave model in the testbench.
|
||||
- **PAR0–5 → W5500 SHAR**: ✅ done — `reg.par` wired to `w5500.par` in `BBATop`
|
||||
(PAR0 packed in the low byte so it is the first SHAR octet).
|
||||
- **NCRA SR bit**: ✅ done — `BBARegisterFile.ncra_sr` (= NCRA[3]) gates
|
||||
`asm.rx_enabled` in `BBATop` (was hard-wired to 1).
|
||||
- **W5500 SPI throughput**: SCK = sync÷2 = 12 MHz (~12 Mbit/s) — exceeds
|
||||
real-world GC BBA TCP throughput (~6–10 Mbit/s) but is below the 27 Mbit/s raw
|
||||
EXI ceiling. Pushing past 12 Mbit/s was investigated and found NOT achievable
|
||||
on this UP5K (the W5500-operating logic is distributed ~40 MHz, not just the
|
||||
bit-bang) — see the "Full-rate W5500 SPI" item below.
|
||||
`W5500SPIMaster(clk_div=N)` divides SCK further if signal integrity needs it.
|
||||
- **EXI DMA bulk reads**: ✅ done — SPRAM-region reads (addr ≥ 0x100) now STREAM
|
||||
until CS deasserts instead of stopping at the header's 2-bit length, so they
|
||||
serve both ≤4-byte immediate reads (Swiss) AND arbitrary-length DMA reads
|
||||
(other GC software, and a future Swiss path for loading ROMs from a network
|
||||
file store). Implementation:
|
||||
- `SPIMode3Slave.cs_active` (synchronised CS level) → `ExiCapture` crosses it
|
||||
to the exi domain (FFSynchronizer) → `BBARegisterFile.cs_active`.
|
||||
- `BBARegisterFile` SPRAM_STREAM state: auto-increments the SPRAM address,
|
||||
prefetches up to SP_LIMIT=4 reads in flight, pushes responses to tx_fifo;
|
||||
SPRAM_END drains the in-flight pipeline + rx dummies on CS-rise.
|
||||
- `ExiCapture` flushes tx_fifo on CS-fall to clear prefetch over-push so a
|
||||
truncated DMA read can't leak stale bytes into the next transaction.
|
||||
Tested: register-file streaming read (SPRAM model, 12 bytes), ExiCapture
|
||||
over-push/flush, AND the full BBATop loop — a W5500 model delivers a frame →
|
||||
W5500 master RX → RXFrameAssembler writes the SPRAM ring → GC reads RWP then
|
||||
DMA-reads the descriptor+frame back (verified byte-for-byte).
|
||||
Note: a DMA read header must keep length-1 within the 2-bit field; the GC
|
||||
driver sets it ≤3 and clocks the real length via CS (the design streams
|
||||
until CS regardless). (EXI DMA *writes* are not implemented; the GC's
|
||||
DMA-write engine has a 1-bit-shift bug and Swiss avoids them — see
|
||||
design-doc §"EXI DMA bug".)
|
||||
- **S0_IR interrupt clear after RX**: ✅ done — `W5500SPIMaster` RX_CLR_IR state
|
||||
writes Sn_IR[2]=1 after RECV so `INT_N` deasserts (else the FSM would re-enter
|
||||
RX_CHECK forever on real hardware).
|
||||
- **Full-rate W5500 SPI (27 Mbit/s) — INVESTIGATED, NOT achievable on UP5K**:
|
||||
the W5500 SCK is sync÷2 = 12 MHz. Raising it needs the SPI engine on a ≥54 MHz
|
||||
clock, but a standalone synth of `W5500SPIMaster` in the capture domain closes
|
||||
only **40 MHz** — and the slack histogram shows the failure is *distributed*
|
||||
(~140 endpoints fail 54, incl. the `wbuf`/header mux feeding the shift
|
||||
register), NOT a single cuttable path. So the bottleneck is the **logic that
|
||||
operates the SPI device** (transaction FSM, byte sourcing), not the bit-bang.
|
||||
Consequences:
|
||||
- The "split the bit engine to capture + per-byte CDC handshake" idea nets
|
||||
only ~14 Mbit/s — the CDC round-trip ≈ the SPI byte time — not worth it.
|
||||
- A capture-domain "streaming executor" would still contain that distributed
|
||||
~40 MHz logic, so it wouldn't close 54 either.
|
||||
- Hardware `SB_SPI` wouldn't help (it only offloads the bit-bang, which was
|
||||
never the bottleneck) and is unsimulatable.
|
||||
- There is no usable clock between 24 (HFOSC) and 54 (the one PLL, needed at
|
||||
54 for the EXI front-end); PLL÷2 = 27 → SCK 13.5 MHz, a ~12% gain, not
|
||||
worth the fabric divider.
|
||||
Net: 12 Mbit/s is the practical W5500 ceiling on this part. It exceeds
|
||||
real-world GC BBA TCP throughput and is fine for chunked ROM streaming.
|
||||
Reaching 27 Mbit/s would need a faster FPGA or a much shallower W5500-operating
|
||||
redesign (uncertain) — **OR a parallel-bus ethernet chip (see W5100 below)**,
|
||||
which is the implemented solution for the ROM-streaming throughput target.
|
||||
|
||||
## W5100 vs W5500 ethernet back-end
|
||||
|
||||
The throughput insight: SPI serialises 8 bits/byte, so the W5500 byte rate is
|
||||
(operating-logic clock)/16 — and that logic caps ~40 MHz on this UP5K → ~12
|
||||
Mbit/s. A **parallel** bus moves a whole byte per access, so the *same* ~24 MHz
|
||||
`sync` logic clears the 27 Mbit/s EXI ceiling (the real hard limit — the GC EXI
|
||||
bus tops out there). So `W5100ParallelMaster` is the throughput path and is now
|
||||
the `BBATop` default.
|
||||
|
||||
- **Interface:** W5100 **indirect parallel bus** (IDM). Only A[1:0] are wired
|
||||
(board ties A[14:2]=0 so a power-up direct access at A=00 still hits MR):
|
||||
`00`=MR, `01`=IDM_AR0(hi), `10`=IDM_AR1(lo), `11`=IDM_DR. A register/buffer
|
||||
access = write IDM_AR (the 16-bit address) then read/write IDM_DR. With MR.AI
|
||||
set, IDM_DR auto-increments → a multi-byte block is one address-set + a burst.
|
||||
- **Bus engine:** drives A + D with `/CS` and `/RD`|`/WR` asserted for
|
||||
`strobe_cycles` (default 3 ≈ 125 ns at 24 MHz, ≥ the W5100's ~80 ns access).
|
||||
DATA[7:0] is bidirectional → an SB_IO tristate (`bus_data_o`/`oe`/`i`).
|
||||
- **Pins (15):** A[1:0]=2, D[7:0]=8, /CS,/RD,/WR=3, /INT=1, /RST=1. With EXI (5)
|
||||
+ clk (1) = **21 of ~34 usable SG48 I/O** — comfortable. See `synth.py`.
|
||||
- **MR.AI requires init first:** unlike the W5500 (each SPI transaction is
|
||||
self-framed), the W5100's multi-byte accesses depend on MR.AI, so the init
|
||||
sequence (triggered by the GC's NCRA reset) MUST run before any TX/RX. The
|
||||
BBATop test issues NCRA-reset before its RX loop for this reason; on hardware
|
||||
the GC driver already does. (`BBATop(reset_cycles=N)` shrinks the MR settle
|
||||
wait for sim.)
|
||||
- **Ring wraparound is in fabric:** the W5100 does NOT auto-wrap the IDM address
|
||||
at the socket-buffer boundary (the W5500 did), so the streamer re-sets IDM_AR
|
||||
to the buffer base when the running address reaches the 2 KB boundary. Handled
|
||||
in the SW/SR/RB paths (`xfer_wrap`/`xfer_wbase`/`xfer_wend`/`cur_addr`); both
|
||||
TX and RX wrap cases are tested.
|
||||
- **Register map differs from the W5500:** common regs at 0x0000 (MR, SHAR 0x09,
|
||||
IMR 0x16, RMSR/TMSR 0x1A/0x1B), socket 0 at 0x0400 (S0_MR/CR/IR, TX_WR 0x424,
|
||||
RX_RSR 0x426, RX_RD 0x428), TX buffer 0x4000, RX buffer 0x6000. MACRAW mode.
|
||||
- **Status:** init/TX/RX (with wrap) verified vs a bus model; BBATop full
|
||||
W5100→SPRAM→GC RX loop passes byte-for-byte; synth PASS (slow ~32≥24, capture
|
||||
~56≥54, 42% LC). Register addresses/MR bits are from the datasheet (from
|
||||
memory) — **confirm at hardware bring-up**.
|
||||
|
||||
### `rebbarb/` — LED blink demo (unchanged)
|
||||
- `rebbarb.py` — blinks LEDs via a PLL (36 MHz), demonstrates `IceBreakerPlatform`
|
||||
- `debouncer.py` — `Debouncer(cycles)` — synchronous debounce, configurable hold
|
||||
- `toggle_button.py` — `ToggleButton` — edge-to-toggle state machine (wraps Debouncer)
|
||||
- `pulse_button.py` — `PulseButton` — single-cycle pulse on rising edge (wraps Debouncer)
|
||||
|
||||
These components are reusable building blocks. The `Debouncer` and button wrappers
|
||||
will be needed for any physical input in `exi_bba/`.
|
||||
|
||||
**Import note:** `rebbarb/` files use bare imports (`from debouncer import Debouncer`).
|
||||
Run them as `python rebbarb/<file>.py` from the workspace root so Python adds
|
||||
`rebbarb/` to `sys.path` automatically.
|
||||
|
||||
**Simulation at module level:** `toggle_button.py` and `pulse_button.py` run
|
||||
their simulations unconditionally (no `__main__` guard) — importing either file
|
||||
triggers a VCD write. New modules should guard simulation code with
|
||||
`if __name__ == "__main__":`.
|
||||
|
||||
`examples/amaranth_cdc.py` contains handwritten `SyncFF` and `TogglePulseSync`
|
||||
reference implementations — use `amaranth.lib.cdc` primitives (`FFSynchronizer`,
|
||||
`PulseSynchronizer`) in production code instead.
|
||||
|
||||
`hardware/sp1_test_plug/` — KiCad project for a physical SP1 edge-connector test
|
||||
plug (schematic, PCB, custom GameCube symbol library). Used to verify pad geometry
|
||||
before ordering the interposer PCB; not part of the FPGA build.
|
||||
|
||||
---
|
||||
|
||||
## Amaranth Simulator API
|
||||
|
||||
Two API generations are present in this repo:
|
||||
|
||||
| API | Where used | Status |
|
||||
|---|---|---|
|
||||
| `sim.add_testbench(async_fn)` + `await ctx.tick()` + `Period(MHz=n)` | `rebbarb/*.py` | **Use this for new code** |
|
||||
| `sim.add_sync_process(gen_fn)` + `sim.run_until(t)` | `examples/` | Old — reference only |
|
||||
|
||||
New modules should use the testbench API (`add_testbench`, `sim.write_vcd(ctx)`
|
||||
context manager). The old process API still works but is not idiomatic in current
|
||||
Amaranth.
|
||||
|
||||
**Critical testbench timing rule:** `ctx.get(signal)` reads signal values AFTER
|
||||
the clock edge (post-update registered values). Combinatorial signals that depend
|
||||
on registered signals that were updated by the SAME tick will already reflect the
|
||||
new registered values. For example: if `tx_sof = tx_bytes_r_rdy & is_first` and
|
||||
`is_first` is cleared synchronously on the first byte, then reading `tx_sof` after
|
||||
the first byte's tick always returns 0 — read BEFORE the tick instead.
|
||||
|
||||
**`ctx.set()` takes effect immediately** (combinatorial, not registered). Use it
|
||||
AFTER `await ctx.tick()` to prepare inputs for the NEXT tick.
|
||||
|
||||
The full design specification lives in `docs/gc_bba_fpga_design.md`.
|
||||
|
||||
---
|
||||
|
||||
## Key Architecture Decisions
|
||||
|
||||
- **No network stack in the FPGA.** The GC CPU runs TCP/IP. The FPGA is a dumb
|
||||
MAC bridge.
|
||||
- **Split-domain clocking — 3 domains, 2 sources (1 PLL + 1 HFOSC):**
|
||||
- `capture` — 54 MHz (PLL, DIVR=0 DIVF=71 DIVQ=4). Hosts ONLY the SPI Mode 3
|
||||
bit engine inside `ExiCapture`. 54 MHz = 2× the **real 27 MHz** EXI clock —
|
||||
the minimum oversampling for clean Mode 3. The isolated bit engine closes
|
||||
~91 MHz; integrated with the byte-FIFO read path the capture domain closes
|
||||
~62 MHz, so 54 passes with margin.
|
||||
- `exi` — 24 MHz (HFOSC ÷2). BBA register file / transaction FSM.
|
||||
- `sync` — 24 MHz (same HFOSC net as `exi`). SPRAM arbiter, RX/TX engines,
|
||||
W5500 SPI master.
|
||||
- **Why split:** only the tiny SPI bit engine needs a fast clock to sample
|
||||
27 MHz EXI. The bulky register-file/SPRAM/W5500 logic is routing-bound at
|
||||
~33–44 MHz on the UP5K and only needs the byte rate (27 MHz ÷ 8 ≈ 3.4 MHz).
|
||||
`ExiCapture` bridges capture↔exi with rx/tx byte AsyncFIFOs.
|
||||
- **EXI clock reality:** the GC EXI clock tops out at ~27 MHz. libogc's
|
||||
`EXI_SPEED32MHZ` is a nominal name — the real rate is 27 MHz. The old
|
||||
"96 MHz = 3× 32 MHz EXI" target was doubly wrong and unreachable on UP5K
|
||||
(which caps ~44 MHz for non-trivial logic).
|
||||
- **TX/MISO across the split:** the register file PROACTIVELY pushes read
|
||||
responses into the tx byte FIFO during the EXI clock-idle gap (the GC pauses
|
||||
the clock between an EXI_Imm header-write and the data-read). The bit engine
|
||||
drives MISO live from the FIFO head; see `ExiCapture` / `SPIMode3Slave`.
|
||||
- **All CDC via `amaranth.lib.cdc`.** Never pass raw multi-bit signals across
|
||||
domains. Use `FFSynchronizer` for slow single bits, `PulseSynchronizer` for
|
||||
events, `AsyncFIFO` for data streams, `ResetSynchronizer` for resets.
|
||||
- **Register file lives entirely in `exi` domain.** The `sync` domain only
|
||||
communicates through AsyncFIFOs and PulseSynchronizers — never direct register
|
||||
reads/writes.
|
||||
|
||||
---
|
||||
|
||||
## Critical Protocol Notes
|
||||
|
||||
### EXI / SPI Mode 3
|
||||
- CLK idles **HIGH** (CPOL=1, CPHA=1).
|
||||
- MOSI sampled on **falling** CLK edge. MISO driven on **rising** CLK edge.
|
||||
- Getting this wrong means the GC never enumerates the device.
|
||||
- CS is active **low**, delineates each transaction.
|
||||
|
||||
### EXI Transaction Header (2 bytes before data)
|
||||
```
|
||||
Byte 0: [7]=write_flag [6:0]=addr[12:6]
|
||||
Byte 1: [7:2]=addr[5:0] [1:0]=xfer_len-1 (0=1B … 3=4B)
|
||||
```
|
||||
Full address = 13 bits → 0x0000–0x1FFF.
|
||||
|
||||
### Device ID Query
|
||||
On power-on the GC writes `0x0000` (2 bytes) then reads 4 bytes.
|
||||
Must return: `0x04 0x02 0x02 0x00`.
|
||||
|
||||
---
|
||||
|
||||
## Memory Map (abridged)
|
||||
|
||||
| Range | Region |
|
||||
|---|---|
|
||||
| 0x0000–0x0033 | MAC control registers (register file, exi domain) |
|
||||
| 0x0048 | TXDATA — bulk TX data port (→ `tx_bytes` AsyncFIFO) |
|
||||
| 0x0100–0x0FFF | RX ring buffer in SPRAM (15 × 256-byte pages, pages 1–15) |
|
||||
| 0x0100–0x1FFF | any read ≥ 0x0100 streams from SPRAM (DMA path); the ring proper is pages 1–15 above |
|
||||
|
||||
---
|
||||
|
||||
## Key Registers
|
||||
|
||||
| Addr | Name | Notes |
|
||||
|---|---|---|
|
||||
| 0x00 | NCRA | [0]=RESET self-clears; pulses `ncra_rst` to sync domain |
|
||||
| 0x08 | IMR | Interrupt mask |
|
||||
| 0x09 | IR | Write-1-to-clear. [1]=RI, [2]=TI. INT_N asserts when IR & IMR ≠ 0 |
|
||||
| 0x16–17 | RWP | RX write pointer — updated by sync domain via `rx_wptr` FIFO |
|
||||
| 0x18–19 | RRP | RX read pointer — GC writes after consuming frames |
|
||||
| 0x20–25 | PAR0–5 | MAC address; also forwarded to W5500 as SHAR |
|
||||
| 0x31 | NWAYS | Hardcode **0x17** (100M full-duplex link up, autoneg complete) |
|
||||
| 0x3A | HIPR | Hardcode **0x01** (BBA present) |
|
||||
| 0x48 | TXDATA | GC streams TX frame bytes here |
|
||||
|
||||
---
|
||||
|
||||
## Module Breakdown
|
||||
|
||||
| Module | Domain | File |
|
||||
|---|---|---|
|
||||
| `BBATop` | all | `exi_bba/bba_top.py` |
|
||||
| `ExiCapture` | capture (+exi FIFOs) | `exi_bba/exi_capture.py` |
|
||||
| `SPIMode3Slave` | capture (param `domain`) | `exi_bba/spi_mode3_slave.py` |
|
||||
| `BBARegisterFile` | exi (+FIFO to sync) | `exi_bba/bba_register_file.py` |
|
||||
| `SPRAMArbiter` | sync | `exi_bba/spram_arbiter.py` |
|
||||
| `RXFrameAssembler` | sync | `exi_bba/rx_frame_assembler.py` |
|
||||
| `TXFrameDrain` | sync | `exi_bba/tx_frame_drain.py` |
|
||||
| `W5100ParallelMaster` | sync | `exi_bba/w5100_parallel_master.py` (default eth) |
|
||||
| `W5500SPIMaster` | sync | `exi_bba/w5500_spi_master.py` (alt eth) |
|
||||
| `EEPROMModel` | exi | `exi_bba/eeprom_model.py` |
|
||||
|
||||
`ExiCapture` wraps `SPIMode3Slave` (in the fast `capture` domain) plus the
|
||||
capture↔exi rx/tx byte AsyncFIFOs. `BBARegisterFile` consumes the rx byte
|
||||
stream and proactively pushes read responses into the tx byte FIFO — it no
|
||||
longer sees the per-bit SPI cadence (that lives entirely in `capture`).
|
||||
|
||||
---
|
||||
|
||||
## CDC Signal Inventory
|
||||
|
||||
| Signal | Direction | Primitive |
|
||||
|---|---|---|
|
||||
| EXI CLK / MOSI / CS pins | async → capture | `FFSynchronizer` (stages=2) |
|
||||
| RX byte stream (capture→core) | capture → exi | `AsyncFIFO` 8-bit, depth=4 |
|
||||
| TX byte stream (core→capture) | exi → capture | `AsyncFIFO` 8-bit, depth=2 |
|
||||
| cs_active (transaction in progress) | capture → exi | `FFSynchronizer` (DMA read length) |
|
||||
| SPRAM read request (addr) | exi → sync | `AsyncFIFO` 16-bit, depth=4 |
|
||||
| SPRAM read result (data) | sync → exi | `AsyncFIFO` 8-bit, depth=4 |
|
||||
| TX packet bytes | exi → sync | `AsyncFIFO` 8-bit, depth=16 |
|
||||
| TX frame length | exi → sync | `AsyncFIFO` 16-bit, depth=4 |
|
||||
| RX frame bytes | sync → SPRAM | `RXFrameAssembler` → `SPRAMArbiter` (not a byte FIFO; the GC reads frames back out of SPRAM via the SPRAM read req/rsp FIFOs) |
|
||||
| RWP update | sync → exi | `AsyncFIFO` 8-bit, depth=4 |
|
||||
| RRP update | exi → sync | `AsyncFIFO` 8-bit, depth=4 |
|
||||
| RX ready (IR[RI]) | sync → exi | `PulseSynchronizer` |
|
||||
| TX done (IR[TI]) | sync → exi | `PulseSynchronizer` |
|
||||
| NCRA reset pulse | exi → sync | `PulseSynchronizer` |
|
||||
|
||||
---
|
||||
|
||||
## W5500 Configuration (on NCRA reset)
|
||||
|
||||
The W5500 selects the register **block** via the BSB field of the control byte,
|
||||
NOT via the address — so register addresses below are **block offsets**, not flat
|
||||
0x4000-style addresses (see `_W5500_*` and `_CTRL_*` in `w5500_spi_master.py`).
|
||||
```
|
||||
1. Write MR = 0x80 (common block, offset 0x0000) software reset
|
||||
2. Wait ~1 ms
|
||||
3. Write SHAR = MAC (common block, offset 0x0009, 6 bytes from PAR0–5)
|
||||
4. Write S0_MR = 0x04 (socket-0 reg block, offset 0x0000) MACRAW
|
||||
5. Write S0_CR = 0x01 (socket-0 reg block, offset 0x0001) OPEN
|
||||
6. Write S0_IMR = 0x05 (socket-0 reg block, offset 0x002C) RECV | SEND_OK
|
||||
```
|
||||
|
||||
W5500 SPI is **Mode 0** (CPOL=0 CPHA=0); SCK = **12 MHz** (the 24 MHz `sync`
|
||||
domain ÷ 2 via a toggle clock-enable). Connect W5500 `INT_N` to an FPGA input
|
||||
for low-latency RX detection. (The W5500 is the alternate back-end; the W5100
|
||||
parallel master is the default — see "W5100 vs W5500".)
|
||||
|
||||
---
|
||||
|
||||
## Physical Interface (SP1 Edge Connector)
|
||||
|
||||
- PCB must be **1.2 mm thick, ENIG finish**.
|
||||
- Staggered (not mirrored) top/bottom contact rows — same geometry as PCI/ISA.
|
||||
- Derive exact pad geometry from **SP1ETH KiCad project** (silverstee1/SP1ETH),
|
||||
cross-referenced with ETH2SP1 (LaserBear). Do not rely on YAGCD alone.
|
||||
- Add **100 µF bulk cap** on the interposer near FPGA power pins (3.3 V budget
|
||||
is tight: iCEbreaker ~80 mA + W5500 ~150 mA ≈ 230 mA).
|
||||
- **Pin 5 is 12 V — do not connect to FPGA I/O.** Test point or leave open.
|
||||
- `EXTIN` (pin 1): tie to 3.3 V via 10 kΩ — required for GC device enumeration.
|
||||
- All signal levels are 3.3 V. No level shifting needed.
|
||||
|
||||
---
|
||||
|
||||
## SPRAM Notes
|
||||
|
||||
- iCE40UP5K has 128 KB SPRAM (SB_SPRAM256KA, 16-bit wide).
|
||||
- **1-cycle synchronous read latency** — result of read at cycle N is valid at N+1.
|
||||
- Byte writes via `MASKWREN`: lower byte = `0b0011`, upper byte = `0b1100`.
|
||||
- Address to SPRAM = byte_address >> 1.
|
||||
- ETH writes take priority over EXI reads in the arbiter (safe by ring-buffer
|
||||
invariant: GC only reads pages the ETH engine has already finished).
|
||||
|
||||
---
|
||||
|
||||
## GC Initialisation Sequence (Swiss/BBA driver)
|
||||
|
||||
```
|
||||
1. Write 0x0000 × 2, read 4 B → must get 0x04020200 (device ID)
|
||||
2. Write NCRA = 0x01 (reset, self-clears; resets W5500 + SPRAM ptrs)
|
||||
3. Poll NCRA bit 0 until 0 (wait reset complete)
|
||||
4. Write PAR0–5 (MAC address)
|
||||
5. Write MAR0–7 = 0xFF (promiscuous multicast)
|
||||
6. Write ANALOG = 0xD6 (enable PHY — no FPGA effect, just store)
|
||||
7. Write NWAYC (autoneg config — store only)
|
||||
8. Write IMR = 0x86 (enable RBFI | TI | RI interrupts)
|
||||
9. Write GCA (AUTOPUB bit)
|
||||
10. Write NCRA SR bit = 0x08 (start receive)
|
||||
11. Poll NWAYS until link up → return hardcoded 0x17 immediately
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Implementation Notes & Gotchas
|
||||
|
||||
- **`NWAYS` must return `0x17` always.** GC polls it to confirm 100 Mbps link
|
||||
before enabling RX. Do not attempt to reflect real W5500 link status.
|
||||
- **`EEPROMModel` can be stubbed initially.** Many GC BBA drivers write their own
|
||||
MAC to PAR0–5 rather than using the EEPROM. Pre-populate PAR0–5 reset state
|
||||
with a valid Nintendo OUI MAC (`00:09:BF:xx:xx:xx`).
|
||||
- **`tx_load` timing in `SPIMode3Slave`:** pulses at CS assertion (first byte)
|
||||
and after each complete received byte. Upstream must register next TX byte
|
||||
within one `exi` clock.
|
||||
- **PLL target 54 MHz**: verify with `icepll -i 12 -o 54` (DIVR=0 DIVF=71 DIVQ=4)
|
||||
before coding PLL parameters; the capture-domain bit engine oversamples the
|
||||
27 MHz EXI clock 2×.
|
||||
- **TX buffer selection (NCRA ST bits):** Ignore buffer select (ST1 vs ST0).
|
||||
Treat any non-zero ST as a TX trigger.
|
||||
- **If nextpnr fails capture-domain timing at 54 MHz:** the isolated bit engine
|
||||
closes ~91 MHz, so 54 has margin; if a seed fails, sweep seeds
|
||||
(`synth.py --seeds N`) or instruct users to configure Swiss to a lower EXI
|
||||
clock index.
|
||||
Reference in New Issue
Block a user