Fix race in RestCatalog by SmitaRKulkarni · Pull Request #101216 · ClickHouse/ClickHouse · GitHub
Skip to content

Fix race in RestCatalog#101216

Merged
SmitaRKulkarni merged 26 commits into
masterfrom
datalake_rest_catalog_race
Jun 18, 2026
Merged

Fix race in RestCatalog#101216
SmitaRKulkarni merged 26 commits into
masterfrom
datalake_rest_catalog_race

Conversation

@SmitaRKulkarni

@SmitaRKulkarni SmitaRKulkarni commented Mar 30, 2026

Copy link
Copy Markdown
Member

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fix race in RestCatalog

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Version info

  • Merged into: 26.6.1.970

@SmitaRKulkarni SmitaRKulkarni requested a review from alesapin March 30, 2026 12:54
@clickhouse-gh

clickhouse-gh Bot commented Mar 30, 2026

Copy link
Copy Markdown
Contributor

@clickhouse-gh clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label Mar 30, 2026
Comment thread src/Databases/DataLake/RestCatalog.cpp Outdated

@alesapin alesapin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What specifically this mutex protects? Let's use TSA annotations.

@alesapin alesapin self-assigned this Apr 1, 2026
@alexey-milovidov

Copy link
Copy Markdown
Member

The Stress test (arm_msan) failure is fixed by #101239, which should be merged first. After it is merged, please update the branch to include the fix.

Comment thread src/Databases/DataLake/RestCatalog.cpp Outdated
std::lock_guard lock(token_mutex);
if (!access_token.has_value() || update_token)
{
access_token = retrieveAccessToken();

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will access network with this mutex, it's bad design pattern. Maybe we can avoid race with some kind of initialize method? So we don't do initialization in getCatalog.

Comment thread src/Databases/DataLake/DatabaseDataLake.cpp Outdated
Comment thread src/Databases/DataLake/RestCatalog.cpp Outdated
Comment thread src/Databases/DataLake/RestCatalog.cpp Outdated
Comment thread src/Databases/DataLake/DatabaseDataLake.cpp Outdated
@SmitaRKulkarni SmitaRKulkarni requested a review from alesapin May 12, 2026 09:52
@SmitaRKulkarni

Copy link
Copy Markdown
Member Author

Updated tests/queries/0_stateless/03913_datalake_restful_catalog_bad_format.sh as the behaviour is now changed to eager initialization (no longer lazy)

Comment thread src/Databases/DataLake/RestCatalog.h Outdated
@SmitaRKulkarni SmitaRKulkarni requested a review from alesapin May 18, 2026 13:05
, db_uuid(uuid)
{
validateSettings();
initialize();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling initialize from the constructor changes DataLakeCatalog from lazy to eager remote initialization for all attach paths.

loadMetadata attaches databases at server startup via InterpreterCreateQuery/DatabaseFactory; with this change, RestCatalog::loadConfig/token fetch now runs during startup and any transient remote catalog failure bubbles up and aborts loading the database (and startup sequence), instead of failing only when the database is queried.

This makes external catalog availability a hard startup dependency, which is a significant operational regression unrelated to the race fix. Please keep startup/attach non-blocking on remote catalog I/O (for example, lazy one-time init on first use with a race-safe pattern).

/// This function is intentionally not synchronized: it is invoked only from the
/// constructor, before the `DatabaseDataLake` instance becomes reachable by any
/// other thread.
if (settings[DatabaseDataLakeSetting::catalog_type].value == DatabaseDataLakeCatalogType::NONE)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new check also runs on ATTACH DATABASE because initialize is called from the constructor during metadata loading. Before this PR, DataLakeCatalog metadata with omitted catalog_type (default NONE) could be attached; now the same metadata throws BAD_ARGUMENTS at startup.

That is a backward-incompatible validation on existing objects. loadMetadata does not swallow this path, so one such database can block server startup after upgrade. Please keep strict validation for fresh CREATE queries while preserving attach compatibility (or gate the new enforcement behind a compatibility setting/migration path).

@clickhouse-gh

clickhouse-gh Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 85.20% 85.20% +0.00%
Functions 92.30% 92.30% +0.00%
Branches 77.50% 77.40% -0.10%

Changed lines: Changed C/C++ lines covered by tests: 7/31 (22.58%) | Lost baseline coverage (was covered on master, now uncovered in this PR): 14 line(s) · Uncovered code

Full report · Diff report

@SmitaRKulkarni SmitaRKulkarni added this pull request to the merge queue Jun 18, 2026
Merged via the queue into master with commit 94b406f Jun 18, 2026
162 of 165 checks passed
@SmitaRKulkarni SmitaRKulkarni deleted the datalake_rest_catalog_race branch June 18, 2026 08:54
@robot-clickhouse-ci-1 robot-clickhouse-ci-1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Jun 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-bugfix Pull request with bugfix, not backported by default pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants