Files
cowc/DETAILS.md

5.8 KiB
Raw Permalink Blame History

DETAILS: cowc internals, semantics, and performance

This document explains:

  1. COW language semantics as implemented by the original C++ implementation
  2. How cowc.rs matches those semantics
  3. Code structure and maintainability choices
  4. Performance decisions (and how they compare to cow.cpp and cowcomp.cpp)

1) What exactly is “COW” here?

The “spec” is the behavior of the original implementation:

  • cow.cpp: interpreter (parses tokens and executes a program vector)
  • cowcomp.cpp: compiler variant (parses tokens and emits a native program via generated C++)

cowc.rs targets the compiler variant semantics (especially runtime error behavior), while still matching the shared parsing and opcode behavior.

2) Tokenization: the sliding 3-byte window

Both cow.cpp and cowcomp.cpp tokenize using a rolling 3-byte buffer:

  • read one byte into buf[2]
  • compare buf against the 12 known tokens
  • if matched: emit instruction and reset buffer to {0,0,0}
  • else: shift (buf[0]=buf[1]; buf[1]=buf[2]; buf[2]=0)

cowc.rs implements the same logic in parse_cow_source().

3) Instruction set

The 12 tokens map to numeric opcodes:

Token ID Meaning
moo 0 loop end (jump back)
mOo 1 move pointer left
moO 2 move pointer right
mOO 3 eval (execute instruction in current cell; if cell==3 exit)
Moo 4 if cell!=0 output char else read char and flush rest of line
MOo 5 decrement cell
MoO 6 increment cell
MOO 7 loop start (if cell==0 skip forward)
OOO 8 set cell to 0
MMM 9 toggle register load/store
OOM 10 print int + newline
oom 11 read int line (atoi-style)

4) Tape / pointer / register

The C++ compiler variant uses:

  • std::vector<int> m;
  • iterator p;
  • int r; bool h; for the register toggle.

Rust uses:

  • Vec<i32> m
  • usize p
  • i32 r; bool h

Rust uses explicit wrapping_* arithmetic for deterministic overflow.

5) Loop matching quirks

Loop matching in cowcomp.cpp is done by scanning through the instruction vector with a nesting counter, but it has a few peculiarities:

  • For moo (case 0), it “skips previous command” before scanning backward, and it breaks when reaching the beginning without inspecting index 0.
  • For MOO (case 7), it “skips next command” when scanning forward, and it decrements nesting twice when a moo immediately follows a MOO (prev == 7 special case).

cowc.rs mirrors these behaviors in:

  • match_for_moo_back()
  • match_for_moo_forward()

Why “virtual” matches?

mOO (eval) can dynamically execute moo/MOO relative to the current program counter. The C++ compiler variant implements this by calling compile(op, false) at the current position, reusing the same scanning behavior.

To match that cleanly, cowc.rs precomputes match results for every instruction index for both directions.

6) mOO (eval) semantics

In cowcomp.cpp:

  • mOO emits a switch(*p) with cases 0..2 and 4..11
  • it deliberately omits case 3; value 3 falls into default: goto x; (exit)
  • unknown values also goto x; (exit)

cowc.rs matches that:

  • cell value 3 exits
  • unknown values exit
  • otherwise it performs the instructions effect without advancing the program counter during the eval itself (except when the evaluated instruction causes a jump), and then execution continues to the next instruction.

7) I/O semantics

Moo

Matches C++ compiler variant behavior:

  • if cell != 0: output as a byte (putchar(*p))
  • else: read one byte and then flush until newline

oom

The reference reads up to 99 chars into a fixed buffer, then calls atoi. It also tries to flush on overflow, but the condition never triggers (a small bug). cowc.rs intentionally preserves this: it reads at most 99 bytes or until newline and does not flush extra input.

8) “Fully Rust” output: why a match pc dispatch loop?

The original cowcomp.cpp emits gotos and relies on g++ -O3 to build a fast binary.

Rust does not have goto, but the closest equivalent that optimizes well is:

loop {
  match pc {
    0 => { ... pc = 1; continue; }
    1 => { ... pc = 2; continue; }
    ...
    _ => break
  }
}

With -C opt-level=3, this typically becomes a compact jump table plus tight blocks. It avoids interpreter overhead and stays close to the C++ compilers control-flow shape.

9) Performance choices

cowc.rs emits Rust that is tuned for speed:

  • Chunked stdin buffering: reads from stdin on demand (does not block at startup on interactive consoles).
  • Buffered stdout: append to Vec<u8> and write once. (now optional with a flag after issues arose)
  • Wrapping arithmetic: wrapping_add/sub keeps semantics stable and avoids debug-vs-release surprises.
  • Unsafe cell access: get_unchecked removes bounds checks in the hot path (safe because pointer growth is guarded).
  • Rustc flags:
    • -C opt-level=3
    • -C codegen-units=1 (better optimization at the cost of compile time)
    • -C panic=abort (smaller + faster)
    • optional --lto (-C lto=fat)
    • optional native CPU (-C target-cpu=native) for host builds

10) Comparing cowc.rs to the C++ files

vs cow.cpp (interpreter)

  • cow.cpp dispatches at runtime via a function / switch per step
  • cowc.rs produces ahead-of-time code with a PC jump table
  • Result: compiled output is generally much faster on loop-heavy programs

vs cowcomp.cpp (compiler variant)

  • cowcomp.cpp emits gotos in generated C++
  • cowc.rs emits a match pc loop in generated Rust
  • Both produce straight-line blocks with explicit jumps
  • cowc.rs removes the dependency on an external C/C++ toolchain (clang/g++)

Details were documented using the AIGEN toolset.