cowc/DETAILS.md

# DETAILS: cowc internals, semantics, and performance

This document explains:

1. **COW language semantics** as implemented by the original C++ implementation
2. How `cowc.rs` matches those semantics
3. Code structure and maintainability choices
4. Performance decisions (and how they compare to `cow.cpp` and `cowcomp.cpp`)

## 1) What exactly is “COW” here?

The “spec” is the behavior of the original implementation:

- `cow.cpp`: interpreter (parses tokens and executes a program vector)
- `cowcomp.cpp`: compiler variant (parses tokens and emits a native program via generated C++)

`cowc.rs` targets **the compiler variant semantics** (especially runtime error behavior), while still matching
the shared parsing and opcode behavior.

## 2) Tokenization: the sliding 3-byte window

Both `cow.cpp` and `cowcomp.cpp` tokenize using a rolling 3-byte buffer:

- read one byte into `buf[2]`
- compare `buf` against the 12 known tokens
- if matched: emit instruction and reset buffer to `{0,0,0}`
- else: shift (`buf[0]=buf[1]; buf[1]=buf[2]; buf[2]=0`)

`cowc.rs` implements the same logic in `parse_cow_source()`.

## 3) Instruction set

The 12 tokens map to numeric opcodes:

| Token | ID | Meaning |
|------:|---:|---------|
| `moo` | 0  | loop end (jump back) |
| `mOo` | 1  | move pointer left |
| `moO` | 2  | move pointer right |
| `mOO` | 3  | eval (execute instruction in current cell; if cell==3 exit) |
| `Moo` | 4  | if cell!=0 output char else read char and flush rest of line |
| `MOo` | 5  | decrement cell |
| `MoO` | 6  | increment cell |
| `MOO` | 7  | loop start (if cell==0 skip forward) |
| `OOO` | 8  | set cell to 0 |
| `MMM` | 9  | toggle register load/store |
| `OOM` | 10 | print int + newline |
| `oom` | 11 | read int line (`atoi`-style)

## 4) Tape / pointer / register

The C++ compiler variant uses:
- `std::vector<int> m;`
- `iterator p;`
- `int r; bool h;` for the register toggle.

Rust uses:
- `Vec<i32> m`
- `usize p`
- `i32 r; bool h`

Rust uses explicit `wrapping_*` arithmetic for deterministic overflow.

## 5) Loop matching quirks

Loop matching in `cowcomp.cpp` is done by scanning through the instruction vector with a nesting counter, but it
has a few peculiarities:

- For `moo` (case 0), it “skips previous command” before scanning backward, and it breaks when reaching the beginning
  without inspecting index 0.
- For `MOO` (case 7), it “skips next command” when scanning forward, and it decrements nesting twice when a `moo`
  immediately follows a `MOO` (`prev == 7` special case).

`cowc.rs` mirrors these behaviors in:
- `match_for_moo_back()`
- `match_for_moo_forward()`

### Why “virtual” matches?
`mOO` (eval) can dynamically execute `moo`/`MOO` relative to the *current program counter*. The C++ compiler variant
implements this by calling `compile(op, false)` at the current position, reusing the same scanning behavior.

To match that cleanly, `cowc.rs` precomputes match results **for every instruction index** for both directions.

## 6) `mOO` (eval) semantics

In `cowcomp.cpp`:
- `mOO` emits a `switch(*p)` with cases 0..2 and 4..11
- it deliberately omits case 3; value 3 falls into `default: goto x;` (exit)
- unknown values also `goto x;` (exit)

`cowc.rs` matches that:
- cell value 3 exits
- unknown values exit
- otherwise it performs the instruction’s effect *without* advancing the program counter during the eval itself
  (except when the evaluated instruction causes a jump), and then execution continues to the next instruction.

## 7) I/O semantics

### `Moo`
Matches C++ compiler variant behavior:
- if cell != 0: output as a byte (`putchar(*p)`)
- else: read one byte and then flush until newline

### `oom`
The reference reads up to 99 chars into a fixed buffer, then calls `atoi`.
It also tries to flush on overflow, but the condition never triggers (a small bug).
`cowc.rs` intentionally preserves this: it reads at most 99 bytes or until newline and does not flush extra input.

## 8) “Fully Rust” output: why a `match pc` dispatch loop?

The original `cowcomp.cpp` emits gotos and relies on `g++ -O3` to build a fast binary.

Rust does not have `goto`, but the closest equivalent that optimizes well is:

```text
loop {
  match pc {
    0 => { ... pc = 1; continue; }
    1 => { ... pc = 2; continue; }
    ...
    _ => break
  }
}
```

With `-C opt-level=3`, this typically becomes a compact jump table plus tight blocks. It avoids interpreter overhead
and stays close to the C++ compiler’s control-flow shape.

## 9) Performance choices

`cowc.rs` emits Rust that is tuned for speed:

- **Chunked stdin buffering**: reads from stdin on demand (does not block at startup on interactive consoles).
- **Buffered stdout**: append to `Vec<u8>` and write once. (now optional with a flag after issues arose)
- **Wrapping arithmetic**: `wrapping_add/sub` keeps semantics stable and avoids debug-vs-release surprises.
- **Unsafe cell access**: `get_unchecked` removes bounds checks in the hot path (safe because pointer growth is guarded).
- **Rustc flags**:
  - `-C opt-level=3`
  - `-C codegen-units=1` (better optimization at the cost of compile time)
  - `-C panic=abort` (smaller + faster)
  - optional `--lto` (`-C lto=fat`)
  - optional native CPU (`-C target-cpu=native`) for host builds

## 10) Comparing cowc.rs to the C++ files

### vs `cow.cpp` (interpreter)
- `cow.cpp` dispatches at runtime via a function / switch per step
- `cowc.rs` produces ahead-of-time code with a PC jump table
- Result: compiled output is generally much faster on loop-heavy programs

### vs `cowcomp.cpp` (compiler variant)
- `cowcomp.cpp` emits gotos in generated C++
- `cowc.rs` emits a `match pc` loop in generated Rust
- Both produce straight-line blocks with explicit jumps
- `cowc.rs` removes the dependency on an external C/C++ toolchain (clang/g++)

---

Details were documented using the AIGEN toolset.