Match http_forbid_headers case-insensitively#108509
Conversation
HTTP header names are case-insensitive (RFC 7230 3.2), but http_forbid_headers matched them case-sensitively: the exact set used a verbatim std::unordered_set lookup, and each regexp was compiled case-sensitively on every check. So forbidding "Authorization" did not block "authorization", "AUTHORIZATION" or "aUtHoRiZaTiOn", and a regexp pattern had to carry an explicit (?i) to work. The blocklist used by url(), the URL/S3 engines, the HTTP dictionary source and DataLakeCatalog auth headers was therefore trivially bypassable. Store exact entries lower-cased at config load and lower-case the (already whitespace-stripped) incoming name before the lookup. Pre-compile each regexp once with RE2 set_case_sensitive(false) instead of recompiling it case-sensitively per check; case insensitivity comes from the RE2 option, not from lower-casing the pattern string, so metacharacters such as \d are preserved. An uncompilable pattern is now logged and skipped instead of silently never matching. The change is strictly more restrictive (more case variants blocked, never fewer), so no setting is needed to restore the old behavior; existing (?i) patterns keep working. Adds a unit test (src/Common/tests/gtest_http_header_filter.cpp) and extends 02752_forbidden_headers.sql with upper-case exact-header variants. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
cc @evillique @Algunenano — could you review this? |
|
Workflow [PR], commit [03e966c] Summary: ❌
AI ReviewSummaryThis PR makes Final VerdictStatus: ✅ Approve No required changes from review. |
The regexp branch was matching the lower-cased header name. Each pattern is compiled with set_case_sensitive(false), so a plain pattern still matches all case variants, but an inline (?-i) scope re-enables case-sensitive matching for that literal. Feeding it the lower-cased name meant an existing config like (?-i)Authorization no longer matched an incoming "Authorization", weakening the blocklist for that config (it matched on master). Pass the original-case normalized name and rely on the RE2 case-insensitive option for the default. The exact-set lookup stays lower-cased (literal set membership, no inline flags). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
LLVM Coverage Report
Changed lines: Changed C/C++ lines covered by tests: 120/125 (96.00%) | Lost baseline coverage: none · Uncovered code |
CI finish ledger — 03e966cPR diff is Session id: cron:our-pr-ci-monitor:20260625-220000 |

Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Match
http_forbid_headerscase-insensitively. HTTP header names are case-insensitive, so forbiddingAuthorizationnow also blocksauthorization,AUTHORIZATIONand other case variants. Configuredheader_regexppatterns are now matched case-insensitively without needing an explicit(?i)flag.Description
http_forbid_headersmatched header names case-sensitively, but HTTP header names are case-insensitive (RFC 7230 3.2). The exact set used a verbatimstd::unordered_setlookup, and eachheader_regexpwas compiled case-sensitively. So an administrator forbiddingAuthorizationdid not blockauthorization,AUTHORIZATIONoraUtHoRiZaTiOn, and a regexp pattern had to carry an explicit(?i)to work. The blocklist enforced forurl(), theURL/S3engines, the HTTP dictionary source and DataLakeCatalog auth headers was therefore trivially bypassable.HTTPHeaderFilter::checkAndNormalizeHeadersis the single funnel every URL-based header check goes through, so the fix is contained tosrc/Common/HTTPHeaderFilter.{h,cpp}:re2::RE2::Options::set_case_sensitive(false)instead of being recompiled case-sensitively on every check. Case insensitivity comes from the RE2 option, not from lower-casing the pattern string, so metacharacters such as\dare preserved.The change is strictly more restrictive (more case variants blocked, never fewer), so no setting is needed to restore the old behavior, and existing
(?i)patterns keep working.Tests:
src/Common/tests/gtest_http_header_filter.cppcovering exact + regexp case insensitivity, lower-case config with mixed-case input, regexp without explicit(?i), composition with whitespace stripping, metacharacter preservation, and that unrelated headers are still allowed. Fails on master, passes with this change.02752_forbidden_headers.sqlextended with upper-case exact-header variants forurl/urlCluster/s3/s3Cluster.Possible follow-up (left out deliberately, since it changes server start/reload behavior): rejecting an invalid
<http_forbid_headers>regexp at config load withCANNOT_COMPILE_REGEXPinstead of logging and skipping it. Happy to do that in a separate PR if maintainers prefer.No related open issue found on the public tracker.