Fix max_memory_usage_soft_limit hot reload#104940
Conversation
There was a problem hiding this comment.
The fix as is seems correct (LGTM), but there are a couple or corner cases/paths not covered in tests:
-
If we set
max_memory_usage_soft_limitdirectly in the config thencalculateMemorySoftLimittakes thehasPropertybranch and there is no test for that:
We need a test that: -
Sets an explicit value at startup and verifies it is respected
-
Changes the explicit value on reload and verifies the new value is picked up
-
If we change ratio at startup. After the PR's fix, this should work: the
MapConfigurationis clean (as there is not call tosetUInt64now), sohasProperty("...soft_limit")is false at startup (correct ratio path), and true after reload with an explicit value (correct explicit path). -
When
calculateMemorySoftLimitreturns 0 (i.e ratio = 0). Callers ofgetKeeperMemorySoftLimit()must treat 0 as "no limit".
I haven't change any other existing logic in this PR, however I've added more test cases
I thought this is guaranteed by this part ClickHouse/src/Coordination/KeeperServer.cpp Line 795 in b6844b2 |
|
The failed test It is unrelated to this PR. |
I know, but as this part is a fragile defined behavior, the extra tests simplify detectin conflicting changes in apparently unrelated parts in the future.
Indeed, but like in crypto: "Don't Trust, Verify" ;) |
Hi @Ergus thanks your reply . In the original logic, when The only caller of Could you clarify what specific test case you have in mind here? I am happy to add it if there's a concrete scenario to cover. |
|
@nauu Normal users won't be able to decipher the changelog entry |
|
I would rephrase the changelog more or less like this: |
|
|
||
| LOG_INFO(log, "keeper_server.max_memory_usage_soft_limit is set to {}", formatReadableSizeWithBinarySuffix(memory_soft_limit)); | ||
| void KeeperContext::initializeKeeperMemorySoftLimit(Poco::Util::AbstractConfiguration & config, Poco::Logger * log) |
There was a problem hiding this comment.
The name of this function is misleading now because it is not changing the memory_soft_limit
|
|
||
| LOG_INFO(log, "keeper_server.max_memory_usage_soft_limit is set to {}", formatReadableSizeWithBinarySuffix(memory_soft_limit)); | ||
| void KeeperContext::initializeKeeperMemorySoftLimit(Poco::Util::AbstractConfiguration & config, Poco::Logger * log) |
There was a problem hiding this comment.
Also I think that the config argument can be const now. Which requires updatign the comment in the header
| const UInt64 expected_reload = static_cast<UInt64>(static_cast<double>(getMemoryAmount()) * 0.5); | ||
| ASSERT_EQ(ctx->getKeeperMemorySoftLimit(), expected_reload); | ||
| } | ||
|
|
There was a problem hiding this comment.
I would recommend to add a test for explicit value -> ratio-only config on reload.
Similar to ExplicitZeroFallsBackToRatio bit where max_memory_usage_soft_limit_ratio=0
| config.setUInt64("keeper_server.max_memory_usage_soft_limit", memory_soft_limit); | ||
| } | ||
| } | ||
| memory_soft_limit = calculateMemorySoftLimit(config); |
There was a problem hiding this comment.
Add a log line in this function so the actual stored value is always observable.
| void KeeperContext::initializeKeeperMemorySoftLimit(Poco::Util::AbstractConfiguration & config, Poco::Logger * log) | ||
| { | ||
| const auto limit = calculateMemorySoftLimit(config); | ||
| LOG_INFO(log, "keeper_server.max_memory_usage_soft_limit is set to {}", formatReadableSizeWithBinarySuffix(limit)); |
There was a problem hiding this comment.
I think that this is the log line needed in updateKeeperMemorySoftLimit
|
The failing test is unrelated. |
There was a problem hiding this comment.
@nauu Can you please write an integration test instead? A C++ unit test seems too low level for the job.
| @@ -104,3 +108,61 @@ def test_soft_limit_create(started_cluster): | |||
| return | |||
|
|
|||
| raise Exception("all records are inserted but no error occurs") | |||
|
|
|||
|
|
|||
| def test_soft_limit_hot_reload(started_cluster): | |||
There was a problem hiding this comment.
Needs a test-level comment for the next guy who will read this test in a couple of weeks from now, basically what is the test supposed to test.
LLVM Coverage ReportChanged lines: 56.00% (14/25) | lost baseline coverage: 4 line(s) · Uncovered code |
Ergus
left a comment
There was a problem hiding this comment.
I liked the unit tests, but I agree that with integration tests it is simpler to test.
Fix max_memory_usage_soft_limit hot reload

Changelog category (leave one):
When
max_memory_usage_soft_limitis not set explicitly and is auto-calculated frommax_memory_usage_soft_limit_ratio,initializeKeeperMemorySoftLimitcalledconfig.setUInt64()to cache the result into the writableMapConfigurationlayer(PRIO_APPLICATION = -100).On config reload triggered by
CgroupsMemoryUsageObserver(e.g., when the cgroup memory limit increases),LayeredConfiguration::replace()only swaps the file layer (PRIO_DEFAULT = 0) but leavesMapConfigurationintact.Because
MapConfigurationhas higher priority, the stale startup value shadows the new file config, andupdateKeeperMemorySoftLimitreads the old value instead of recalculating.Fix: extract
calculateMemorySoftLimitand remove theconfig.setUInt64()call, so each reload recalculates from the current config andgetMemoryAmount().Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fixed the
max_memory_usage_soft_limitwas not updated on-the-fly when the configuration changed. The soft memory limit was calculated once at startup and cached, so subsequent config reloads had no effect. After this fix, the limit is recalculated and applied immediately on every config reload, no Keeper restart required.Version info
26.5.1.893