jameshanlon · jameshanlon · Jun 14, 2026 · Jun 14, 2026 · Jun 14, 2026 · Jun 14, 2026
diff --git a/.gitignore b/.gitignore
@@ -14,3 +14,6 @@ a.out
 simout*
 simin*
 logs/
+
+docs/_build/
+docs/_venv/
diff --git a/cmake/FindSphinx.cmake b/cmake/FindSphinx.cmake
@@ -1,7 +1,10 @@
-# Look for an executable called sphinx-build
+# Look for an executable called sphinx-build. Prefer the project-local
+# virtualenv (docs/_venv, created from docs/requirements.txt) if present, so the
+# pinned Sphinx is used; otherwise fall back to one on PATH.
 find_program(
   SPHINX_EXECUTABLE
   NAMES sphinx-build
+  HINTS ${CMAKE_SOURCE_DIR}/docs/_venv/bin
   DOC "Path to sphinx-build executable")
 
 include(FindPackageHandleStandardArgs)

diff --git a/docs/CMakeLists.txt b/docs/CMakeLists.txt
@@ -5,9 +5,11 @@ set(SPHINX_BUILD ${CMAKE_CURRENT_BINARY_DIR}/sphinx)
 
 configure_file(conf.py ${CMAKE_CURRENT_BINARY_DIR})
 
+# Build strictly: -W turns warnings (orphan pages, broken cross-references) into
+# errors and --keep-going reports them all, matching how the docs are verified.
 add_custom_target(
   Sphinx ALL
-  COMMAND ${SPHINX_EXECUTABLE} -b html -c ${CMAKE_CURRENT_BINARY_DIR}
-          ${SPHINX_SOURCE} ${SPHINX_BUILD}
+  COMMAND ${SPHINX_EXECUTABLE} -b html -W --keep-going
+          -c ${CMAKE_CURRENT_BINARY_DIR} ${SPHINX_SOURCE} ${SPHINX_BUILD}
   WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
   COMMENT "Generating documentation with Sphinx")
diff --git a/docs/architecture/channels.rst b/docs/architecture/channels.rst
@@ -0,0 +1,75 @@
+Channels
+========
+
+Hex processors communicate with one another through *channels*: point-to-point
+links carrying single-word messages. This is the architecture's only mechanism
+for inter-processor communication — there is no shared memory between
+processors. The model follows the discipline of the Transputer and occam,
+deliberately kept as simple as possible: one process runs on one physical
+processor, with no scheduler and no buffering.
+
+The ``IN`` and ``OUT`` operations
+---------------------------------
+
+Channel communication uses two ``OPR`` sub-operations,
+``IN`` (operand ``0x4``) and ``OUT`` (operand ``0x5``). Both name a channel and a
+data word through the registers, symmetrically with how ``ADD`` and ``SUB`` use
+``areg`` and ``breg``:
+
+* ``breg`` selects one of the processor's **link slots** — the channel to use.
+* ``areg`` carries the data word: it is the value *received* by ``IN`` and the
+  value *sent* by ``OUT``.
+
+So ``OPR OUT`` writes the word in ``areg`` to the channel on slot ``breg``, and
+``OPR IN`` reads a word from the channel on slot ``breg`` into ``areg``.
+
+Blocking rendezvous
+-------------------
+
+A channel transfer is a **synchronous, unbuffered rendezvous**. The first party
+to arrive at the channel blocks until its partner is also ready. When both are
+present, exactly one word is copied from the writer to the reader and both
+processors continue. There is no queue and no buffering: the communication
+itself is the point of synchronisation between the two processors.
+
+While a processor is blocked on a channel operation, its program counter is not
+advanced — the operation simply has not completed yet. When the partner arrives
+and the rendezvous occurs, the blocked processor resumes with the transferred
+word in ``areg`` and steps past the instruction. No "blocked" state is left in
+the architectural registers.
+
+Link slots and wiring
+---------------------
+
+Each processor has a fixed, small number of channel link slots — four. A
+processor can therefore be wired to at most four channels at once, which matches
+the hardware link budget. The wiring (which slot on one processor connects to
+which slot on another) is fixed when a network is built; see
+:doc:`../compiler/networks`.
+
+Operating on a slot that is not wired to a channel — a slot index outside the
+valid range, or a valid slot with no channel attached — is a runtime error. The
+simulator reports it, naming the offending processor and program counter, as a
+backstop for cases that cannot be ruled out statically.
+
+If every processor in a network becomes blocked on a channel with no partner
+able to proceed, the system is deadlocked. The simulator detects this and
+reports it, listing the channel slot each processor is blocked on.
+
+Mapping to the X language and to networks
+-----------------------------------------
+
+In the :doc:`X language <../language/overview>`, channels appear as the ``chan``
+type and the statement-level operators ``!`` (send) and ``?`` (receive). The
+compiler lowers ``c ! e`` to evaluating ``e`` into ``areg``, loading the slot
+for ``c`` into ``breg``, and emitting ``OPR OUT``; ``c ? v`` loads the slot into
+``breg``, emits ``OPR IN``, and stores ``areg`` into ``v``. The language-level
+view of concurrency and message passing is described in
+:doc:`../language/concurrency`.
+
+A ``chan`` value is represented at runtime simply as the integer index of a link
+slot on the processor running that code, which is why the same procedure can be
+reused on several processors with different wirings. When ``main`` is a top-level
+``par``, the compiler emits a *network container* of one image per processor plus
+the slot wiring — see :doc:`../compiler/networks` for how the network is built
+and :doc:`../hardware/network` for how it is realised in hardware.
diff --git a/docs/architecture/execution.rst b/docs/architecture/execution.rst
@@ -0,0 +1,53 @@
+Instruction execution
+=====================
+
+The execution cycle
+-------------------
+
+Hex executes instructions in a simple three-stage cycle:
+
+#. **Fetch.** Read the instruction byte addressed by ``pc``. Because memory is
+   word-addressed but instructions are bytes, the word is selected by the high
+   bits of ``pc`` and the byte within it by the low bits.
+#. **Increment.** Advance ``pc`` to the next instruction byte.
+#. **Execute.** Fold the instruction's low nibble into ``oreg``
+   (``oreg = oreg | (inst & 0xf)``), then perform the operation selected by the
+   high nibble. Most operations clear ``oreg`` to ``0`` afterwards; the ``PFIX``
+   and ``NFIX`` prefixes instead leave a shifted operand for the next cycle.
+
+The branch instructions complete by overwriting ``pc`` rather than letting the
+increment stand, and the channel operations ``IN``/``OUT`` may suspend the cycle
+mid-execution until their partner is ready (see :doc:`channels`).
+
+The datapath
+------------
+
+The same cycle is realised in hardware by a small datapath. Its components are:
+
+* **A multiplexor** — selects the left arithmetic input from ``areg``, ``pc``,
+  ``oreg``, or zero, depending on the instruction.
+* **B multiplexor** — selects the right arithmetic input from ``breg``,
+  ``oreg``, or zero.
+* **Arithmetic unit** — adds or subtracts its two inputs to produce an address
+  or a result.
+* **Memory** — addressed by the arithmetic unit's output; its write data comes
+  from ``areg``.
+* **Result multiplexor** — selects what is written back to the registers, either
+  the memory read data or the arithmetic unit's output.
+* **Instruction register, decoder and control matrix** — hold the fetched
+  instruction byte and derive the multiplexor selects, the arithmetic operation,
+  and the memory and register write enables for the current instruction.
+* **Clock and timing generator** — sequences the fetch, increment, and execute
+  steps.
+
+.. todo:: Add a datapath block diagram.
+
+Relationship to the implementations
+-----------------------------------
+
+This cycle and datapath are mirrored in two places. The C reference
+interpreter in :doc:`simulator-model` implements the same fetch-decode-execute
+loop in software, and the SystemVerilog core in :doc:`../hardware/core`
+realises the datapath above as real hardware. The two are kept behaviourally
+identical, so a program produces the same results whether it runs on the
+simulator or on the RTL.
diff --git a/docs/architecture/instruction-encoding.rst b/docs/architecture/instruction-encoding.rst
@@ -0,0 +1,90 @@
+Instruction encoding
+====================
+
+Every Hex instruction is a single byte. The high nibble selects the operation
+and the low nibble carries a 4-bit immediate operand:
+
+.. code-block:: text
+
+   bit:   7 6 5 4 3 2 1 0
+         +-------+-------+
+         | oper  | imm   |
+         +-------+-------+
+
+The immediate field can directly express the values 0–15. Anything outside that
+range — larger offsets, larger constants, and all negative values — is built up
+using the prefix instructions described below.
+
+Operand accumulation
+--------------------
+
+Operands are assembled in the operand register ``oreg``. Before executing, every
+instruction folds its own immediate into ``oreg``::
+
+   oreg = oreg | (inst & 0xf)
+
+It then uses ``oreg`` as its operand. Most instructions clear ``oreg`` back to
+``0`` afterwards, so that an instruction with no preceding prefixes simply sees
+its own 4-bit immediate and the next instruction starts clean.
+
+The prefix instructions ``PFIX`` and ``NFIX`` are the exception: rather than
+clearing ``oreg``, they shift the accumulated nibbles up to make room for the
+next instruction's nibble.
+
+``PFIX`` — positive prefix
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+``PFIX`` shifts ``oreg`` left by four bits and continues to the next
+instruction::
+
+   oreg = oreg << 4
+
+This concatenates the prefix's nibble with whatever nibble the following
+instruction contributes, building up a positive value four bits at a time.
+
+``NFIX`` — negative prefix
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+``NFIX`` shifts left by four bits as well, but also fills the high bits with
+ones::
+
+   oreg = 0xFFFFFF00 | (oreg << 4)
+
+This sign-extends the operand, and is used to build negative or large-magnitude
+immediates.
+
+Worked example: loading a constant larger than 15
+-------------------------------------------------
+
+Suppose we want to load the constant ``0x2A`` (decimal 42) into ``areg``. It does
+not fit in a single nibble, so the assembler emits one ``PFIX`` followed by the
+``LDAC``:
+
+.. list-table::
+   :header-rows: 1
+   :widths: 16 20 64
+
+   * - Byte
+     - Mnemonic
+     - Effect on ``oreg``
+   * - ``0xE2``
+     - ``PFIX 2``
+     - ``oreg = oreg | 0x2`` → ``0x2``; then ``oreg = 0x2 << 4`` → ``0x20``
+   * - ``0x3A``
+     - ``LDAC 10``
+     - ``oreg = 0x20 | 0xA`` → ``0x2A``; then ``areg = oreg`` → ``0x2A``
+
+The first nibble (``2``) is shifted up by ``PFIX`` to occupy bits 7–4, and the
+second nibble (``A``) of ``LDAC`` fills bits 3–0, giving ``0x2A``. Larger
+constants chain more ``PFIX`` bytes, four bits per prefix; a negative constant
+begins the chain with an ``NFIX`` so the high bits are filled with ones.
+
+Prefixes are inserted automatically
+-----------------------------------
+
+Programmers and code generators do not emit prefixes by hand. The assembler and
+compiler compute the minimal prefix chain needed for each operand. Because a
+branch distance depends on the size of the instructions between the branch and
+its target — which in turn depends on how many prefixes those instructions need —
+the encoding is resolved iteratively. This minimal-prefix encoding is described
+in :doc:`../compiler/codebuffer`.
-Original file line number
+Diff line change
@@ Expand Up / @@ -14,3 +14,6 @@ a.out @@
     simout*
     simin*
     logs/
+    docs/_build/
+    docs/_venv/