`<regex>`: Skip initial NFA nodes that do nothing during matching by muellerj2 · Pull Request #6026 · microsoft/STL · GitHub
Skip to content

<regex>: Skip initial NFA nodes that do nothing during matching#6026

Merged
StephanTLavavej merged 1 commit into
microsoft:mainfrom
muellerj2:regex-skip-initial-nfa-nodes
Jan 20, 2026
Merged

<regex>: Skip initial NFA nodes that do nothing during matching#6026
StephanTLavavej merged 1 commit into
microsoft:mainfrom
muellerj2:regex-skip-initial-nfa-nodes

Conversation

@muellerj2

Copy link
Copy Markdown

The NFA generated by the parser always starts with two nodes of type _N_begin and _N_capture for capturing group 0. However, the matcher does nothing when it encounters these nodes.

While we cannot avoid generating these nodes in the first place (see #6025), this PR skips these nodes during matching to avoid the comparatively expensive (and, in the case of regex_search(), potentially repeated) processing of these two nodes by the NFA interpreter.

The improvement vanishes in the noise in the regex_match benchmark on my machine (it probably saves about 5 ns per match and that is just too little), but it is noticeable in the regex_search benchmark (especially in those examples where regex matching is attempted in many different positions in the string).

benchmark before [ns] after [ns] speedup
bm_lorem_search/"^bibe"/2 69.7545 64.5229 1.08
bm_lorem_search/"^bibe"/3 73.2422 64.1741 1.14
bm_lorem_search/"^bibe"/4 74.986 61.3839 1.22
bm_lorem_search/"bibe"/2 3515.63 3222.66 1.09
bm_lorem_search/"bibe"/3 6835.94 5998.88 1.14
bm_lorem_search/"bibe"/4 12555.8 12276.8 1.02
bm_lorem_search/"bibe".collate/2 3278.46 3208.7 1.02
bm_lorem_search/"bibe".collate/3 6556.92 6138.39 1.07
bm_lorem_search/"bibe".collate/4 13113.8 12834.8 1.02
bm_lorem_search/"(bibe)"/2 7951.97 7847.38 1.01
bm_lorem_search/"(bibe)"/3 15694.8 15380.8 1.02
bm_lorem_search/"(bibe)"/4 30762.2 30133.8 1.02
bm_lorem_search/"(bibe)+"/2 13392.9 12555.7 1.07
bm_lorem_search/"(bibe)+"/3 24309.4 25111.6 0.97
bm_lorem_search/"(bibe)+"/4 50224.3 47571.3 1.06
bm_lorem_search/"(?:bibe)+"/2 7500 6975.45 1.08
bm_lorem_search/"(?:bibe)+"/3 14125.2 14195.1 1.00
bm_lorem_search/"(?:bibe)+"/4 27622.6 26994.9 1.02
bm_lorem_search/R"(\bbibe)"/2 133929 88936.9 1.51
bm_lorem_search/R"(\bbibe)"/3 260911 184168 1.42
bm_lorem_search/R"(\bbibe)"/4 516183 368369 1.40
bm_lorem_search/R"(\Bibe)"/2 324693 245857 1.32
bm_lorem_search/R"(\Bibe)"/3 645229 495550 1.30
bm_lorem_search/R"(\Bibe)"/4 1311380 976562 1.34
bm_lorem_search/R"((?=....)bibe)"/2 5468.75 5156.25 1.06
bm_lorem_search/R"((?=....)bibe)"/3 10498 10498 1.00
bm_lorem_search/R"((?=....)bibe)"/4 21972.7 19042.7 1.15
bm_lorem_search/R"((?=bibe)....)"/2 5156.25 5336.23 0.97
bm_lorem_search/R"((?=bibe)....)"/3 9591.24 9416.81 1.02
bm_lorem_search/R"((?=bibe)....)"/4 19496.1 19252.4 1.01
bm_lorem_search/R"((?!lorem)bibe)"/2 4865.38 4708.44 1.03
bm_lorem_search/R"((?!lorem)bibe)"/3 10044.6 9207.55 1.09
bm_lorem_search/R"((?!lorem)bibe)"/4 18415.3 18031.6 1.02

@muellerj2 muellerj2 requested a review from a team as a code owner January 18, 2026 19:29
@github-project-automation github-project-automation Bot moved this to Initial Review in STL Code Reviews Jan 18, 2026
@StephanTLavavej StephanTLavavej added performance Must go faster regex meow is a substring of homeowner labels Jan 18, 2026
@StephanTLavavej StephanTLavavej moved this from Initial Review to Ready To Merge in STL Code Reviews Jan 18, 2026
@StephanTLavavej StephanTLavavej moved this from Ready To Merge to Merging in STL Code Reviews Jan 20, 2026
@StephanTLavavej

Copy link
Copy Markdown
Member

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

@StephanTLavavej StephanTLavavej merged commit 3f76681 into microsoft:main Jan 20, 2026
45 checks passed
@github-project-automation github-project-automation Bot moved this from Merging to Done in STL Code Reviews Jan 20, 2026
@StephanTLavavej

Copy link
Copy Markdown
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Must go faster regex meow is a substring of homeowner

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants