Phase 0 โ Scanner, parser & VM ready
Onigmo syntax scanner + parser โ AST, a bytecode compiler, and a backtracking VM: literals, classes, . * + ? {m,n}, groups, alternation, anchors, with captures.
Onigmo in pure Go โ Ruby's regexp engine, with the features RE2 leaves out, no cgo.
go-ruby-regexp is a pure-Go (no cgo) reimplementation of Onigmo, the regular-expression
engine used by Ruby. Go's standard regexp is RE2 โ linear-time but without backreferences or
lookaround, and with different match semantics โ so a byte-compatible Ruby regexp needs a
backtracking engine. go-ruby-regexp is that engine: a faithful backtracking VM, hardened against
catastrophic backtracking with memoization and a deterministic time/step budget. It is
standalone and reusable, and is the regexp backend for go-embedded-ruby. The engine roadmap
(Phases 0โ4) is complete โ backreferences, lookaround,
possessive/atomic quantifiers, recursive subexpression calls, \p{โฆ} and POSIX
classes, rune-level /i folding, UTF-8 / ASCII-8BIT encodings, and a transparent
optimizer prefilter โ differential-tested against MRI, 100% coverage, CI green across 6 arches.
Onigmo syntax scanner + parser โ AST, a bytecode compiler, and a backtracking VM: literals, classes, . * + ? {m,n}, groups, alternation, anchors, with captures.
Named groups (?<name>โฆ), backreferences \1 / \k<name>, and every quantifier mode โ greedy, lazy *? +? ??, possessive *+ ++ ?+, and atomic groups (?>โฆ).
Lookahead (?=โฆ) (?!โฆ), fixed/bounded-width lookbehind (?<=โฆ) (?<!โฆ), the \G anchor, and recursive subexpression calls \g<name> / \g<0>.
Unicode properties \p{โฆ}, POSIX bracket classes [[:alpha:]], \h / \H, \R, rune-level /i case folding, inline flags (?imx), and UTF-8 / ASCII-8BIT multi-encoding with multibyte class members [รฉ] / [ร -รฏ].
(pc, sp) memoization, step budget, recursion cap and a wall-clock WithTimeout; a start-position / interior-literal prefilter (up to ~210ร faster); a lazy-NFA + cached-DFA fast path that beats C Onigmo on literal/alternation/structured scans and pulls inner loops to ~1.6โ5ร of C (email โ RE2); a benchmark suite.
Downstream, not part of this engine module: the full Ruby Regexp / MatchData surface and replacement DSL live in the go-embedded-ruby adapter that consumes this engine.
A faithful backtracking VM in pure Go, cgo disabled, so it cross-compiles and embeds anywhere. It implements the Onigmo features RE2 omits โ backreferences, lookaround, possessive quantifiers, atomic groups, named groups, subexpression calls โ with Ruby's leftmost-first semantics, and is hardened against ReDoS with memoization and a deterministic budget (as Ruby โฅ3.2). Validated differentially against Onigmo/MRI. It is a standalone, reusable module and the regexp backend for the sibling org github.com/go-embedded-ruby.