`<regex>`: Implement small buffer optimization for capture groups by muellerj2 · Pull Request #6127 · microsoft/STL · GitHub
Skip to content

<regex>: Implement small buffer optimization for capture groups#6127

Merged
StephanTLavavej merged 3 commits into
microsoft:mainfrom
muellerj2:regex-small-buffer-optimization-for-captures
Mar 4, 2026
Merged

<regex>: Implement small buffer optimization for capture groups#6127
StephanTLavavej merged 3 commits into
microsoft:mainfrom
muellerj2:regex-small-buffer-optimization-for-captures

Conversation

@muellerj2

@muellerj2 muellerj2 commented Mar 1, 2026

Copy link
Copy Markdown

Towards #5969.

This implements small buffer optimizations for capture group ranges and validity markers. The same storage is used as for the loop state. The elements stored in the different buffers might have different alignment because some of them store user-defined types, so the offset might have to be adjusted if the elements of the buffer initialized next have stricter alignment requirements. But in the default setting, all of these elements have the same alignment, so no offset adjustments will be applied (because the if constexpr conditions are all false).

Validity markers have been stored in vector<bool> until now, so this PR adds a minimal SBO-enabled fixed-size (but not constant-size) bitset for them.

The capture group indices in the NFA nodes start from 1 (as the whole match is represented by capture group 0), and the matcher stored the capture group data at the same indexes in the vectors. This meant that the space at index 0 was wasted, but it wasn't really worth the effort to avoid this because it didn't change the number of allocations. But with this shared stack storage, avoiding wasted space might mean less (or no) allocations by the matcher. For this reason, I also adjusted the matcher to use 0-based indexing for the capture groups and no longer 1-based indexing. This is achieved by subtracting 1 whenever a capture group index from the NFA is read (after checking that it isn't 0 if necessary) or the capture groups are assigned to the match_results object.

Benchmark

Some regex_search benchmarks reliably show a speedup because allocations are saved in these cases (highlighted in bold). In the other benchmarks, differences vanish in the noise.

benchmark before [ns] after [ns] speedup
bm_lorem_search/"^bibe"/2 62.5 53.125 1.18
bm_lorem_search/"^bibe"/3 58.5938 58.5938 1.00
bm_lorem_search/"^bibe"/4 59.9888 58.5938 1.02
bm_lorem_search/"bibe"/2 2982.01 3278.46 0.91
bm_lorem_search/"bibe"/3 5859.38 5937.5 0.99
bm_lorem_search/"bibe"/4 11718.8 11997.8 0.98
bm_lorem_search/"bibe".collate/2 2999.44 3114.54 0.96
bm_lorem_search/"bibe".collate/3 5719.87 5859.38 0.98
bm_lorem_search/"bibe".collate/4 11230.5 11962.9 0.94
bm_lorem_search/"(bibe)"/2 6975.45 5580.36 1.25
bm_lorem_search/"(bibe)"/3 13671.9 10986.3 1.24
bm_lorem_search/"(bibe)"/4 26681 21972.7 1.21
bm_lorem_search/"(bibe)+"/2 10498 9416.81 1.11
bm_lorem_search/"(bibe)+"/3 21309.7 18415.3 1.16
bm_lorem_search/"(bibe)+"/4 41992.2 35992.7 1.17
bm_lorem_search/"(?:bibe)+"/2 5859.38 6138.39 0.95
bm_lorem_search/"(?:bibe)+"/3 11230.5 11718.8 0.96
bm_lorem_search/"(?:bibe)+"/4 23437.5 22949.2 1.02
bm_lorem_search/R"(\bbibe)"/2 81609.1 87193.1 0.94
bm_lorem_search/R"(\bbibe)"/3 172631 168795 1.02
bm_lorem_search/R"(\bbibe)"/4 334821 336967 0.99
bm_lorem_search/R"(\Bibe)"/2 196725 209240 0.94
bm_lorem_search/R"(\Bibe)"/3 408082 384976 1.06
bm_lorem_search/R"(\Bibe)"/4 767299 802176 0.96
bm_lorem_search/R"((?=....)bibe)"/2 4973.5 5312.5 0.94
bm_lorem_search/R"((?=....)bibe)"/3 9416.81 10253.9 0.92
bm_lorem_search/R"((?=....)bibe)"/4 18833.9 20089.5 0.94
bm_lorem_search/R"((?=bibe)....)"/2 4603.8 4813.07 0.96
bm_lorem_search/R"((?=bibe)....)"/3 8789.02 8789.02 1.00
bm_lorem_search/R"((?=bibe)....)"/4 17578.3 17648 1.00
bm_lorem_search/R"((?!lorem)bibe)"/2 4394.53 4349.18 1.01
bm_lorem_search/R"((?!lorem)bibe)"/3 8300.78 8544.92 0.97
bm_lorem_search/R"((?!lorem)bibe)"/4 17264.3 16880.7 1.02

@muellerj2 muellerj2 requested a review from a team as a code owner March 1, 2026 15:29
@github-project-automation github-project-automation Bot moved this to Initial Review in STL Code Reviews Mar 1, 2026
Comment thread stl/inc/regex
Comment on lines +4225 to +4226
_Tgt_state._Grp_valid._Clear_range_at_least_until(
_Capture_first, _Capture_back + 1U);

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call might clear bits beyond _Capture_back. But this is fine here, because all of these bits must already be unset in ECMAScript grammars due to the rule that capture groups are reset when repeating a subexpression. Other grammars don't have lookahead assertions.

Comment thread stl/inc/regex
_Offset = 0U;
_Buf_size = 0U;
}
}

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This protects against user-defined iterator types with an alignment of 8192. (Or bigger: MSVC errors on such an alignment, but Clang in general supports it; I haven't tried Clang-imitating-MSVC.)

It also ensures correct behavior if the buffer is changed to have a size that is not a power-of-two.

@StephanTLavavej StephanTLavavej added performance Must go faster regex meow is a substring of homeowner labels Mar 1, 2026
@StephanTLavavej StephanTLavavej self-assigned this Mar 1, 2026
Comment thread stl/inc/regex
return _Data._First[_Pos];
}

_NODISCARD _Ty* _Begin() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's conforming to use pretty names begin, end, and size, which is somehow similar to type and value in internal type traits.

Do we want to deliberately make this not a range type?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it could be nice to use range-for, algorithms, etc. so I would have no objection to making an exception to our usual rule that type and value are the only pretty names we take. That said, if we don't need to do that now, I also wouldn't go out of my way to make the exception pre-emptively.

Comment thread stl/inc/regex Outdated
Comment thread stl/inc/regex Outdated
@StephanTLavavej StephanTLavavej removed their assignment Mar 2, 2026
@StephanTLavavej StephanTLavavej moved this from Initial Review to Ready To Merge in STL Code Reviews Mar 2, 2026
@StephanTLavavej StephanTLavavej moved this from Ready To Merge to Merging in STL Code Reviews Mar 3, 2026
@StephanTLavavej

Copy link
Copy Markdown
Member

I'm mirroring this to the MSVC-internal repo. Please notify me if any further changes are pushed, otherwise no action is required.

@StephanTLavavej StephanTLavavej merged commit 93d7fb0 into microsoft:main Mar 4, 2026
49 checks passed
@github-project-automation github-project-automation Bot moved this from Merging to Done in STL Code Reviews Mar 4, 2026
@StephanTLavavej

Copy link
Copy Markdown
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Must go faster regex meow is a substring of homeowner

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants