Skip to content

ReDoS hardening

A backtracking VM is the price of Ruby compatibility, and the bill comes due as ReDoS — regular-expression denial of service. Certain patterns ((a+)+$ against a long non-matching input is the textbook case) make a naive backtracker explore an exponential number of paths. go-ruby-regexp/regexp treats this as a first-class concern and mitigates it the way Ruby ≥3.2 does, rather than relying on the caller to defend itself.

Threat model

The danger is not malformed input crashing the engine; it is a pattern and input pair that runs for an unbounded time. This matters most when either the pattern or the input comes from an untrusted source. An attacker who can supply a catastrophic pattern, or an input that triggers catastrophic backtracking in an existing pattern, can hang a request thread. The engine must therefore guarantee that every match terminates within a bounded amount of work.

Mitigations

Memoization where it is sound

The VM memoizes (instruction, input-position) pairs so that a path it has already proven to fail is not re-explored. For a large class of patterns this collapses the exponential search back to polynomial work. Memoization is applied only where it is safe — specifically, where the outcome does not depend on captured text. A subpattern that contains a backreference cannot be memoized on position alone, because the same (instruction, position) can succeed or fail depending on what was captured; those regions fall back to the budget.

A deterministic step budget

Independently of memoization, the VM carries a backtrack-step budget. Each unit of backtracking work decrements it; when it is exhausted the match is aborted deterministically (Match returns nil) rather than returning a wrong answer or running forever. "Deterministic" is the key word: the same pattern, input, and budget always abort at the same point, so behaviour is reproducible across runs and platforms — not dependent on wall-clock jitter. A recursion-depth cap bounds subexpression calls (\g<…>) the same way, so a non-terminating recursive grammar fails locally instead of exhausting the stack.

A wall-clock timeout

re.WithTimeout(d) returns a copy of the Regexp that aborts any single match exceeding d of real time (Ruby's Regexp.timeout equivalent) — the real-time backstop to the deterministic step budget, for the residual cases (notably backreference-bearing patterns) memoization cannot prune. The receiver is left unchanged, so a shared *Regexp stays concurrency-safe. The VM polls the monotonic clock only once every 4096 steps, so a search with no deadline pays nothing. A pathological pattern is bounded by whichever of the budget or the deadline it reaches first.

Never rely on a host watchdog alone

A common but fragile approach is to run the match on a goroutine and kill it from the outside. That is not sufficient here: a goroutine cannot be forcibly preempted mid-CPU-loop in a portable way, and an external watchdog leaks resources and is racy. The budget and timeout are enforced inside the VM's own loop, so termination is a property of the engine, not of how the caller happens to invoke it. The watchdog, if present, is a backstop — never the primary defence.