Match http_forbid_headers case-insensitively by groeneai · Pull Request #108509 · ClickHouse/ClickHouse · GitHub
Skip to content

Match http_forbid_headers case-insensitively#108509

Open
groeneai wants to merge 2 commits into
ClickHouse:masterfrom
groeneai:groeneai/fix-http-forbid-headers-case-insensitive
Open

Match http_forbid_headers case-insensitively#108509
groeneai wants to merge 2 commits into
ClickHouse:masterfrom
groeneai:groeneai/fix-http-forbid-headers-case-insensitive

Conversation

@groeneai

Copy link
Copy Markdown
Contributor

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Match http_forbid_headers case-insensitively. HTTP header names are case-insensitive, so forbidding Authorization now also blocks authorization, AUTHORIZATION and other case variants. Configured header_regexp patterns are now matched case-insensitively without needing an explicit (?i) flag.

Description

http_forbid_headers matched header names case-sensitively, but HTTP header names are case-insensitive (RFC 7230 3.2). The exact set used a verbatim std::unordered_set lookup, and each header_regexp was compiled case-sensitively. So an administrator forbidding Authorization did not block authorization, AUTHORIZATION or aUtHoRiZaTiOn, and a regexp pattern had to carry an explicit (?i) to work. The blocklist enforced for url(), the URL/S3 engines, the HTTP dictionary source and DataLakeCatalog auth headers was therefore trivially bypassable.

HTTPHeaderFilter::checkAndNormalizeHeaders is the single funnel every URL-based header check goes through, so the fix is contained to src/Common/HTTPHeaderFilter.{h,cpp}:

  • Exact set: entries are stored lower-cased at config load, and the (already whitespace/control-stripped) incoming name is lower-cased before the lookup.
  • Regexp: each pattern is pre-compiled once with re2::RE2::Options::set_case_sensitive(false) instead of being recompiled case-sensitively on every check. Case insensitivity comes from the RE2 option, not from lower-casing the pattern string, so metacharacters such as \d are preserved.
  • An uncompilable configured pattern is now logged and skipped (it previously also never matched, but silently).

The change is strictly more restrictive (more case variants blocked, never fewer), so no setting is needed to restore the old behavior, and existing (?i) patterns keep working.

Tests:

  • New unit test src/Common/tests/gtest_http_header_filter.cpp covering exact + regexp case insensitivity, lower-case config with mixed-case input, regexp without explicit (?i), composition with whitespace stripping, metacharacter preservation, and that unrelated headers are still allowed. Fails on master, passes with this change.
  • 02752_forbidden_headers.sql extended with upper-case exact-header variants for url/urlCluster/s3/s3Cluster.

Possible follow-up (left out deliberately, since it changes server start/reload behavior): rejecting an invalid <http_forbid_headers> regexp at config load with CANNOT_COMPILE_REGEXP instead of logging and skipping it. Happy to do that in a separate PR if maintainers prefer.

No related open issue found on the public tracker.

HTTP header names are case-insensitive (RFC 7230 3.2), but
http_forbid_headers matched them case-sensitively: the exact set used a
verbatim std::unordered_set lookup, and each regexp was compiled
case-sensitively on every check. So forbidding "Authorization" did not
block "authorization", "AUTHORIZATION" or "aUtHoRiZaTiOn", and a regexp
pattern had to carry an explicit (?i) to work. The blocklist used by
url(), the URL/S3 engines, the HTTP dictionary source and
DataLakeCatalog auth headers was therefore trivially bypassable.

Store exact entries lower-cased at config load and lower-case the
(already whitespace-stripped) incoming name before the lookup.
Pre-compile each regexp once with RE2 set_case_sensitive(false) instead
of recompiling it case-sensitively per check; case insensitivity comes
from the RE2 option, not from lower-casing the pattern string, so
metacharacters such as \d are preserved. An uncompilable pattern is now
logged and skipped instead of silently never matching.

The change is strictly more restrictive (more case variants blocked,
never fewer), so no setting is needed to restore the old behavior;
existing (?i) patterns keep working.

Adds a unit test (src/Common/tests/gtest_http_header_filter.cpp) and
extends 02752_forbidden_headers.sql with upper-case exact-header
variants.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@groeneai

Copy link
Copy Markdown
Contributor Author

@groeneai

Copy link
Copy Markdown
Contributor Author

cc @evillique @Algunenano — could you review this? http_forbid_headers matched header names case-sensitively, but HTTP header names are case-insensitive (RFC 7230 3.2), so forbidding Authorization did not block authorization/AUTHORIZATION/etc. This stores the exact set lower-cased and pre-compiles each header_regexp once with RE2 set_case_sensitive(false) (case-insensitivity from the RE2 option, not from lower-casing the pattern, so metacharacters are preserved). You both have the most context on this file.

@alexey-milovidov alexey-milovidov added the can be tested Allows running workflows for external contributors label Jun 25, 2026
@clickhouse-gh

clickhouse-gh Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Workflow [PR], commit [03e966c]

Summary:

job_name test_name status info comment
AST fuzzer (amd_debug, targeted) FAIL
Logical error: Block structure mismatch in A stream: different columns: (STID: 0993-27f0) FAIL cidb, issue
Stress test (arm_ubsan) FAIL
Cannot start clickhouse-server FAIL cidb
Logical error: 'Unexpected exception in refresh scheduling' (STID: 2508-34af) FAIL cidb
Check failed FAIL cidb

AI Review

Summary

This PR makes http_forbid_headers exact and regexp checks case-insensitive for HTTP header names, while precompiling configured regexps. The earlier concern about matching regexps against a lower-cased subject is fixed in the current code and covered by a focused regression test. I found no remaining correctness findings.

Final Verdict

Status: ✅ Approve

No required changes from review.

@clickhouse-gh clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label Jun 25, 2026
Comment thread src/Common/HTTPHeaderFilter.cpp Outdated
The regexp branch was matching the lower-cased header name. Each pattern is
compiled with set_case_sensitive(false), so a plain pattern still matches all
case variants, but an inline (?-i) scope re-enables case-sensitive matching for
that literal. Feeding it the lower-cased name meant an existing config like
(?-i)Authorization no longer matched an incoming "Authorization", weakening the
blocklist for that config (it matched on master). Pass the original-case
normalized name and rely on the RE2 case-insensitive option for the default.

The exact-set lookup stays lower-cased (literal set membership, no inline flags).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@clickhouse-gh

clickhouse-gh Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 85.40% 85.40% +0.00%
Functions 92.60% 92.60% +0.00%
Branches 77.60% 77.60% +0.00%

Changed lines: Changed C/C++ lines covered by tests: 120/125 (96.00%) | Lost baseline coverage: none · Uncovered code

Full report · Diff report

@groeneai

Copy link
Copy Markdown
Contributor Author

CI finish ledger — 03e966c

PR diff is src/Common/HTTPHeaderFilter.cpp + a gtest + a stateless test (HTTP header blocklist), which cannot affect the query engine or the refresh scheduler, so none of these are PR-caused.

Check / test Reason Owner / fixing PR
AST fuzzer (amd_debug, targeted) / Logical error: Block structure mismatch (STID 0993-27f0) crash, chronic (64 PRs / 9 master in 30d) #107719 (ours, open)
Stress test (arm_ubsan) / Logical error: Unexpected exception in refresh scheduling (STID 2508-34af) crash, chronic (106 PRs / 10 master in 30d) #105905 (ours, open)
Stress test (arm_ubsan) / Cannot start clickhouse-server + Check failed crash collateral of the STID 2508-34af refresh-scheduling abort on the same run #105905 (ours, open)

Session id: cron:our-pr-ci-monitor:20260625-220000

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-bugfix Pull request with bugfix, not backported by default

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants