5.8 KiB
DETAILS: cowc internals, semantics, and performance
This document explains:
- COW language semantics as implemented by the original C++ implementation
- How
cowc.rsmatches those semantics - Code structure and maintainability choices
- Performance decisions (and how they compare to
cow.cppandcowcomp.cpp)
1) What exactly is “COW” here?
The “spec” is the behavior of the original implementation:
cow.cpp: interpreter (parses tokens and executes a program vector)cowcomp.cpp: compiler variant (parses tokens and emits a native program via generated C++)
cowc.rs targets the compiler variant semantics (especially runtime error behavior), while still matching
the shared parsing and opcode behavior.
2) Tokenization: the sliding 3-byte window
Both cow.cpp and cowcomp.cpp tokenize using a rolling 3-byte buffer:
- read one byte into
buf[2] - compare
bufagainst the 12 known tokens - if matched: emit instruction and reset buffer to
{0,0,0} - else: shift (
buf[0]=buf[1]; buf[1]=buf[2]; buf[2]=0)
cowc.rs implements the same logic in parse_cow_source().
3) Instruction set
The 12 tokens map to numeric opcodes:
| Token | ID | Meaning |
|---|---|---|
moo |
0 | loop end (jump back) |
mOo |
1 | move pointer left |
moO |
2 | move pointer right |
mOO |
3 | eval (execute instruction in current cell; if cell==3 exit) |
Moo |
4 | if cell!=0 output char else read char and flush rest of line |
MOo |
5 | decrement cell |
MoO |
6 | increment cell |
MOO |
7 | loop start (if cell==0 skip forward) |
OOO |
8 | set cell to 0 |
MMM |
9 | toggle register load/store |
OOM |
10 | print int + newline |
oom |
11 | read int line (atoi-style) |
4) Tape / pointer / register
The C++ compiler variant uses:
std::vector<int> m;iterator p;int r; bool h;for the register toggle.
Rust uses:
Vec<i32> musize pi32 r; bool h
Rust uses explicit wrapping_* arithmetic for deterministic overflow.
5) Loop matching quirks
Loop matching in cowcomp.cpp is done by scanning through the instruction vector with a nesting counter, but it
has a few peculiarities:
- For
moo(case 0), it “skips previous command” before scanning backward, and it breaks when reaching the beginning without inspecting index 0. - For
MOO(case 7), it “skips next command” when scanning forward, and it decrements nesting twice when amooimmediately follows aMOO(prev == 7special case).
cowc.rs mirrors these behaviors in:
match_for_moo_back()match_for_moo_forward()
Why “virtual” matches?
mOO (eval) can dynamically execute moo/MOO relative to the current program counter. The C++ compiler variant
implements this by calling compile(op, false) at the current position, reusing the same scanning behavior.
To match that cleanly, cowc.rs precomputes match results for every instruction index for both directions.
6) mOO (eval) semantics
In cowcomp.cpp:
mOOemits aswitch(*p)with cases 0..2 and 4..11- it deliberately omits case 3; value 3 falls into
default: goto x;(exit) - unknown values also
goto x;(exit)
cowc.rs matches that:
- cell value 3 exits
- unknown values exit
- otherwise it performs the instruction’s effect without advancing the program counter during the eval itself (except when the evaluated instruction causes a jump), and then execution continues to the next instruction.
7) I/O semantics
Moo
Matches C++ compiler variant behavior:
- if cell != 0: output as a byte (
putchar(*p)) - else: read one byte and then flush until newline
oom
The reference reads up to 99 chars into a fixed buffer, then calls atoi.
It also tries to flush on overflow, but the condition never triggers (a small bug).
cowc.rs intentionally preserves this: it reads at most 99 bytes or until newline and does not flush extra input.
8) “Fully Rust” output: why a match pc dispatch loop?
The original cowcomp.cpp emits gotos and relies on g++ -O3 to build a fast binary.
Rust does not have goto, but the closest equivalent that optimizes well is:
loop {
match pc {
0 => { ... pc = 1; continue; }
1 => { ... pc = 2; continue; }
...
_ => break
}
}
With -C opt-level=3, this typically becomes a compact jump table plus tight blocks. It avoids interpreter overhead
and stays close to the C++ compiler’s control-flow shape.
9) Performance choices
cowc.rs emits Rust that is tuned for speed:
- Chunked stdin buffering: reads from stdin on demand (does not block at startup on interactive consoles).
- Buffered stdout: append to
Vec<u8>and write once. (now optional with a flag after issues arose) - Wrapping arithmetic:
wrapping_add/subkeeps semantics stable and avoids debug-vs-release surprises. - Unsafe cell access:
get_uncheckedremoves bounds checks in the hot path (safe because pointer growth is guarded). - Rustc flags:
-C opt-level=3-C codegen-units=1(better optimization at the cost of compile time)-C panic=abort(smaller + faster)- optional
--lto(-C lto=fat) - optional native CPU (
-C target-cpu=native) for host builds
10) Comparing cowc.rs to the C++ files
vs cow.cpp (interpreter)
cow.cppdispatches at runtime via a function / switch per stepcowc.rsproduces ahead-of-time code with a PC jump table- Result: compiled output is generally much faster on loop-heavy programs
vs cowcomp.cpp (compiler variant)
cowcomp.cppemits gotos in generated C++cowc.rsemits amatch pcloop in generated Rust- Both produce straight-line blocks with explicit jumps
cowc.rsremoves the dependency on an external C/C++ toolchain (clang/g++)
Details were documented using the AIGEN toolset.