Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,6 @@ a.out
simout*
simin*
logs/

docs/_build/
docs/_venv/
5 changes: 4 additions & 1 deletion cmake/FindSphinx.cmake
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
# Look for an executable called sphinx-build
# Look for an executable called sphinx-build. Prefer the project-local
# virtualenv (docs/_venv, created from docs/requirements.txt) if present, so the
# pinned Sphinx is used; otherwise fall back to one on PATH.
find_program(
SPHINX_EXECUTABLE
NAMES sphinx-build
HINTS ${CMAKE_SOURCE_DIR}/docs/_venv/bin
DOC "Path to sphinx-build executable")

include(FindPackageHandleStandardArgs)
Expand Down
6 changes: 4 additions & 2 deletions docs/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,11 @@ set(SPHINX_BUILD ${CMAKE_CURRENT_BINARY_DIR}/sphinx)

configure_file(conf.py ${CMAKE_CURRENT_BINARY_DIR})

# Build strictly: -W turns warnings (orphan pages, broken cross-references) into
# errors and --keep-going reports them all, matching how the docs are verified.
add_custom_target(
Sphinx ALL
COMMAND ${SPHINX_EXECUTABLE} -b html -c ${CMAKE_CURRENT_BINARY_DIR}
${SPHINX_SOURCE} ${SPHINX_BUILD}
COMMAND ${SPHINX_EXECUTABLE} -b html -W --keep-going
-c ${CMAKE_CURRENT_BINARY_DIR} ${SPHINX_SOURCE} ${SPHINX_BUILD}
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
COMMENT "Generating documentation with Sphinx")
75 changes: 75 additions & 0 deletions docs/architecture/channels.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
Channels
========

Hex processors communicate with one another through *channels*: point-to-point
links carrying single-word messages. This is the architecture's only mechanism
for inter-processor communication — there is no shared memory between
processors. The model follows the discipline of the Transputer and occam,
deliberately kept as simple as possible: one process runs on one physical
processor, with no scheduler and no buffering.

The ``IN`` and ``OUT`` operations
---------------------------------

Channel communication uses two ``OPR`` sub-operations,
``IN`` (operand ``0x4``) and ``OUT`` (operand ``0x5``). Both name a channel and a
data word through the registers, symmetrically with how ``ADD`` and ``SUB`` use
``areg`` and ``breg``:

* ``breg`` selects one of the processor's **link slots** — the channel to use.
* ``areg`` carries the data word: it is the value *received* by ``IN`` and the
value *sent* by ``OUT``.

So ``OPR OUT`` writes the word in ``areg`` to the channel on slot ``breg``, and
``OPR IN`` reads a word from the channel on slot ``breg`` into ``areg``.

Blocking rendezvous
-------------------

A channel transfer is a **synchronous, unbuffered rendezvous**. The first party
to arrive at the channel blocks until its partner is also ready. When both are
present, exactly one word is copied from the writer to the reader and both
processors continue. There is no queue and no buffering: the communication
itself is the point of synchronisation between the two processors.

While a processor is blocked on a channel operation, its program counter is not
advanced — the operation simply has not completed yet. When the partner arrives
and the rendezvous occurs, the blocked processor resumes with the transferred
word in ``areg`` and steps past the instruction. No "blocked" state is left in
the architectural registers.

Link slots and wiring
---------------------

Each processor has a fixed, small number of channel link slots — four. A
processor can therefore be wired to at most four channels at once, which matches
the hardware link budget. The wiring (which slot on one processor connects to
which slot on another) is fixed when a network is built; see
:doc:`../compiler/networks`.

Operating on a slot that is not wired to a channel — a slot index outside the
valid range, or a valid slot with no channel attached — is a runtime error. The
simulator reports it, naming the offending processor and program counter, as a
backstop for cases that cannot be ruled out statically.

If every processor in a network becomes blocked on a channel with no partner
able to proceed, the system is deadlocked. The simulator detects this and
reports it, listing the channel slot each processor is blocked on.

Mapping to the X language and to networks
-----------------------------------------

In the :doc:`X language <../language/overview>`, channels appear as the ``chan``
type and the statement-level operators ``!`` (send) and ``?`` (receive). The
compiler lowers ``c ! e`` to evaluating ``e`` into ``areg``, loading the slot
for ``c`` into ``breg``, and emitting ``OPR OUT``; ``c ? v`` loads the slot into
``breg``, emits ``OPR IN``, and stores ``areg`` into ``v``. The language-level
view of concurrency and message passing is described in
:doc:`../language/concurrency`.

A ``chan`` value is represented at runtime simply as the integer index of a link
slot on the processor running that code, which is why the same procedure can be
reused on several processors with different wirings. When ``main`` is a top-level
``par``, the compiler emits a *network container* of one image per processor plus
the slot wiring — see :doc:`../compiler/networks` for how the network is built
and :doc:`../hardware/network` for how it is realised in hardware.
53 changes: 53 additions & 0 deletions docs/architecture/execution.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
Instruction execution
=====================

The execution cycle
-------------------

Hex executes instructions in a simple three-stage cycle:

#. **Fetch.** Read the instruction byte addressed by ``pc``. Because memory is
word-addressed but instructions are bytes, the word is selected by the high
bits of ``pc`` and the byte within it by the low bits.
#. **Increment.** Advance ``pc`` to the next instruction byte.
#. **Execute.** Fold the instruction's low nibble into ``oreg``
(``oreg = oreg | (inst & 0xf)``), then perform the operation selected by the
high nibble. Most operations clear ``oreg`` to ``0`` afterwards; the ``PFIX``
and ``NFIX`` prefixes instead leave a shifted operand for the next cycle.

The branch instructions complete by overwriting ``pc`` rather than letting the
increment stand, and the channel operations ``IN``/``OUT`` may suspend the cycle
mid-execution until their partner is ready (see :doc:`channels`).

The datapath
------------

The same cycle is realised in hardware by a small datapath. Its components are:

* **A multiplexor** — selects the left arithmetic input from ``areg``, ``pc``,
``oreg``, or zero, depending on the instruction.
* **B multiplexor** — selects the right arithmetic input from ``breg``,
``oreg``, or zero.
* **Arithmetic unit** — adds or subtracts its two inputs to produce an address
or a result.
* **Memory** — addressed by the arithmetic unit's output; its write data comes
from ``areg``.
* **Result multiplexor** — selects what is written back to the registers, either
the memory read data or the arithmetic unit's output.
* **Instruction register, decoder and control matrix** — hold the fetched
instruction byte and derive the multiplexor selects, the arithmetic operation,
and the memory and register write enables for the current instruction.
* **Clock and timing generator** — sequences the fetch, increment, and execute
steps.

.. todo:: Add a datapath block diagram.

Relationship to the implementations
-----------------------------------

This cycle and datapath are mirrored in two places. The C reference
interpreter in :doc:`simulator-model` implements the same fetch-decode-execute
loop in software, and the SystemVerilog core in :doc:`../hardware/core`
realises the datapath above as real hardware. The two are kept behaviourally
identical, so a program produces the same results whether it runs on the
simulator or on the RTL.
90 changes: 90 additions & 0 deletions docs/architecture/instruction-encoding.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
Instruction encoding
====================

Every Hex instruction is a single byte. The high nibble selects the operation
and the low nibble carries a 4-bit immediate operand:

.. code-block:: text

bit: 7 6 5 4 3 2 1 0
+-------+-------+
| oper | imm |
+-------+-------+

The immediate field can directly express the values 0–15. Anything outside that
range — larger offsets, larger constants, and all negative values — is built up
using the prefix instructions described below.

Operand accumulation
--------------------

Operands are assembled in the operand register ``oreg``. Before executing, every
instruction folds its own immediate into ``oreg``::

oreg = oreg | (inst & 0xf)

It then uses ``oreg`` as its operand. Most instructions clear ``oreg`` back to
``0`` afterwards, so that an instruction with no preceding prefixes simply sees
its own 4-bit immediate and the next instruction starts clean.

The prefix instructions ``PFIX`` and ``NFIX`` are the exception: rather than
clearing ``oreg``, they shift the accumulated nibbles up to make room for the
next instruction's nibble.

``PFIX`` — positive prefix
~~~~~~~~~~~~~~~~~~~~~~~~~~~

``PFIX`` shifts ``oreg`` left by four bits and continues to the next
instruction::

oreg = oreg << 4

This concatenates the prefix's nibble with whatever nibble the following
instruction contributes, building up a positive value four bits at a time.

``NFIX`` — negative prefix
~~~~~~~~~~~~~~~~~~~~~~~~~~~

``NFIX`` shifts left by four bits as well, but also fills the high bits with
ones::

oreg = 0xFFFFFF00 | (oreg << 4)

This sign-extends the operand, and is used to build negative or large-magnitude
immediates.

Worked example: loading a constant larger than 15
-------------------------------------------------

Suppose we want to load the constant ``0x2A`` (decimal 42) into ``areg``. It does
not fit in a single nibble, so the assembler emits one ``PFIX`` followed by the
``LDAC``:

.. list-table::
:header-rows: 1
:widths: 16 20 64

* - Byte
- Mnemonic
- Effect on ``oreg``
* - ``0xE2``
- ``PFIX 2``
- ``oreg = oreg | 0x2`` → ``0x2``; then ``oreg = 0x2 << 4`` → ``0x20``
* - ``0x3A``
- ``LDAC 10``
- ``oreg = 0x20 | 0xA`` → ``0x2A``; then ``areg = oreg`` → ``0x2A``

The first nibble (``2``) is shifted up by ``PFIX`` to occupy bits 7–4, and the
second nibble (``A``) of ``LDAC`` fills bits 3–0, giving ``0x2A``. Larger
constants chain more ``PFIX`` bytes, four bits per prefix; a negative constant
begins the chain with an ``NFIX`` so the high bits are filled with ones.

Prefixes are inserted automatically
-----------------------------------

Programmers and code generators do not emit prefixes by hand. The assembler and
compiler compute the minimal prefix chain needed for each operand. Because a
branch distance depends on the size of the instructions between the branch and
its target — which in turn depends on how many prefixes those instructions need —
the encoding is resolved iteratively. This minimal-prefix encoding is described
in :doc:`../compiler/codebuffer`.
Loading