Fix race in RestCatalog#101216
Conversation
alesapin
left a comment
There was a problem hiding this comment.
What specifically this mutex protects? Let's use TSA annotations.
|
The Stress test (arm_msan) failure is fixed by #101239, which should be merged first. After it is merged, please update the branch to include the fix. |
| std::lock_guard lock(token_mutex); | ||
| if (!access_token.has_value() || update_token) | ||
| { | ||
| access_token = retrieveAccessToken(); |
There was a problem hiding this comment.
We will access network with this mutex, it's bad design pattern. Maybe we can avoid race with some kind of initialize method? So we don't do initialization in getCatalog.
|
Updated tests/queries/0_stateless/03913_datalake_restful_catalog_bad_format.sh as the behaviour is now changed to eager initialization (no longer lazy) |
| , db_uuid(uuid) | ||
| { | ||
| validateSettings(); | ||
| initialize(); |
There was a problem hiding this comment.
Calling initialize from the constructor changes DataLakeCatalog from lazy to eager remote initialization for all attach paths.
loadMetadata attaches databases at server startup via InterpreterCreateQuery/DatabaseFactory; with this change, RestCatalog::loadConfig/token fetch now runs during startup and any transient remote catalog failure bubbles up and aborts loading the database (and startup sequence), instead of failing only when the database is queried.
This makes external catalog availability a hard startup dependency, which is a significant operational regression unrelated to the race fix. Please keep startup/attach non-blocking on remote catalog I/O (for example, lazy one-time init on first use with a race-safe pattern).
| /// This function is intentionally not synchronized: it is invoked only from the | ||
| /// constructor, before the `DatabaseDataLake` instance becomes reachable by any | ||
| /// other thread. | ||
| if (settings[DatabaseDataLakeSetting::catalog_type].value == DatabaseDataLakeCatalogType::NONE) |
There was a problem hiding this comment.
This new check also runs on ATTACH DATABASE because initialize is called from the constructor during metadata loading. Before this PR, DataLakeCatalog metadata with omitted catalog_type (default NONE) could be attached; now the same metadata throws BAD_ARGUMENTS at startup.
That is a backward-incompatible validation on existing objects. loadMetadata does not swallow this path, so one such database can block server startup after upgrade. Please keep strict validation for fresh CREATE queries while preserving attach compatibility (or gate the new enforcement behind a compatibility setting/migration path).
LLVM Coverage ReportChanged lines: Changed C/C++ lines covered by tests: 7/31 (22.58%) | Lost baseline coverage (was covered on master, now uncovered in this PR): 14 line(s) · Uncovered code |

Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fix race in RestCatalog
Documentation entry for user-facing changes
Version info
26.6.1.970