Add support for <url_prefix> in http handler url#107492
Conversation
There was a problem hiding this comment.
Pull request overview
Adds prefix: matching support for custom http_handlers URL rules, enabling handlers to be selected by a base-path prefix on path-segment (/) boundaries (and ignoring the query string).
Changes:
- Implement
prefix:parsing/matching forurl/full_urlrequest filters and move filter implementations out of the header into a.cpp. - Add an integration test and per-test server config to validate
urlprefix matching behavior. - Update server config comments and docs to describe the new matching mode.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/integration/test_http_handlers_config/test.py | Adds integration coverage for url prefix: matching behavior. |
| tests/integration/test_http_handlers_config/test_url_prefix_handler/config.xml | New test config enabling a prefix: url rule with a static handler. |
| src/Server/ReplicasStatusHandler.cpp | Drops an unnecessary include after filter code moved out of the header. |
| src/Server/KeeperHTTPHandlerFactory.cpp | Adjusts includes to avoid relying on transitive includes from the filter header. |
| src/Server/HTTPHandlerRequestFilter.h | Refactors filter utilities into declarations + documentation of prefix: semantics. |
| src/Server/HTTPHandlerRequestFilter.cpp | New implementation file: adds prefix: support and centralizes filter logic. |
| src/Server/HTTPHandler.cpp | Adjusts includes to avoid relying on transitive includes from the filter header. |
| programs/server/config.xml | Documents prefix: support in the example http_handlers configuration comment. |
| docs/en/operations/server-configuration-parameters/_server_settings_outside_source.md | Documents prefix: behavior for http_handlers. |
813c6cd to
252d374
Compare
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
|
@azat Can you please take a look? |
|
@groeneai Investigate CI failures |
|
CI failure: |
| | Sub-tags | Definition | | ||
| |----------------------|---------------------------------------------------------------------------------------------------------------------------------------------------| | ||
| | `url` | To match the request URL, you can use the 'regex:' prefix to use regex match (optional) | | ||
| | `url` | To match the request URL path. The query string is ignored when matching. You can use the 'regex:' prefix for a regex match, or the 'prefix:' prefix to match a base path and everything below it on a path-segment boundary (e.g. 'prefix:/api/v1' matches /api/v1, /api/v1/ and /api/v1/write, but not /api/v1beta) (optional) | |
There was a problem hiding this comment.
I think better to introduce separate url_prefix, although we already have regex:, and not clear why we went this way.
There was a problem hiding this comment.
OK, I'll change approach here from <url>prefix:/api/v1</url> to <url_prefix>/api/v1</url_prefix>, and the same for <full_url>
There was a problem hiding this comment.
Thanks
Maybe while you are here you can also add a proper "url_regexp" (and mark existing as deperecated in the doc)
| "test_prefix/write", # a sub-path | ||
| "test_prefix/a/b/c", # a deeper sub-path | ||
| "test_prefix?param=value", # query string is ignored | ||
| "test_prefix/write?param=value", |
There was a problem hiding this comment.
Should /test_prefixfoo work? I guess no. Make sense to add a test
There was a problem hiding this comment.
No, /test_prefixfoo doesn't match here, I'll add a test
…x <url>regex: as obsolete.
3ceb5a5 to
cfea92e
Compare
|
|
||
| auto regex = compileRegex(expression); | ||
| if (hasAnyOfCapturingGroups(regex, group_names)) | ||
| result.headers_name_with_regex.emplace(header_name, regex); |
There was a problem hiding this comment.
headersFilter matches missing headers against an empty string via request.get(header_name, ""), so a predefined handler with <headers_regex><X>(?P<name>.*)</X></headers_regex> legitimately matches a request with no X header. This line then records that regex for PredefinedQueryHandler::customizeContext, but that path reads the header with request.get(header_name) and raises an exception after the rule has already matched. Please use the same defaulted lookup in the extraction path (or reject empty-matching header regexes) and add a small predefined-handler regression test with an absent header.
There was a problem hiding this comment.
I haven't touched headersFilter or PredefinedQueryHandler::customizeContext() in this PR, so I'll leave this for another PR
|
@groeneai Investigate CI failures |
|
@vitlibar All 3 CI failures on Stress test (amd_tsan) - Stress test (arm_debug) - Stress test (arm_asan_ubsan) - No SIGSEGV / heap-UAF among them. The red Mergeable Check is only these 3 stress jobs. All three are nondeterministic fuzzer/stress crashes outside this PR's code path, so re-running the stress jobs should clear them. (session cron:clickhouse-worker-slot-2:20260617-202500) |
azat
left a comment
There was a problem hiding this comment.
Apart from handlers order looks OK
| else if (filter_type == "url_regex") | ||
| filters.push_back(urlFilter(config, config_prefix + ".url_regex", HTTPRequestFilterMatchType::Regex)); | ||
| /// URL path matched as a base path | ||
| else if (filter_type == "url_prefix") | ||
| filters.push_back(urlFilter(config, config_prefix + ".url_prefix", HTTPRequestFilterMatchType::Prefix)); |
There was a problem hiding this comment.
I think the "url_prefix" should goes before "url_regex", since AFAIU this defines order in which handlers will be handled, and prefix should be handled before (AFAIR nginx does the same)
Basically I would say "url_regex" should be the last (also looks like more common term is "regexp" in clickhouse)
There was a problem hiding this comment.
Rules are checked in the same order as they appear in the config. The order of checking conditions like filter_type == url_prefix or filter_type == url_regex doesn't really matter here. I'll rename url_regex to url_regexp
There was a problem hiding this comment.
Hm, I mean imagine we have the following in config:
<http_handlers>
<rule>
<methods>GET</methods>
<url_regex>/test_regex/[0-9]+</url_regex>
<handler>
<type>static</type>
<status>200</status>
<response_content>regex handler matched</response_content>
</handler>
</rule>
<rule>
<methods>GET</methods>
<url_prefix>/test_regex</url_prefix>
<handler>
<type>static</type>
<status>200</status>
<response_content>handler matched</response_content>
</handler>
</rule>
</http_handlers>
In this case for "/test/regex" I would expect that the "url_prefix" will match
There was a problem hiding this comment.
The documentation says:
Rules are checked from top to bottom as defined, and the first match will run the handler.
So in your example the first rule (regex handler matched) will work
LLVM Coverage Report
Changed lines: Changed C/C++ lines covered by tests: 142/153 (92.81%) | Lost baseline coverage (was covered on master, now uncovered in this PR): 1 line(s) · Uncovered code |
b65793f

Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Add support for new settings
<url_prefix>,<full_url_prefix>in http handler url:For example
Here
<url_prefix>/test_prefix</url_prefix>means all paths starting with/test_prefixare served.The functionality is required for example for prometheus http handlers, where a whole bunch of paths starting with some base path is usually served.
Also for consistency added syntax
<url_regexp>,<full_url_regexp>,<headers_regexp>(older syntax<url>regex:still works).Version info
26.6.1.1070