`parts_to_delay_insert` counts parts on `prefer_not_to_merge` volumes, causing permanent insert throttling · Issue #97862 · ClickHouse/ClickHouse · GitHub
Skip to content

parts_to_delay_insert counts parts on prefer_not_to_merge volumes, causing permanent insert throttling #97862

Description

@ku524

Describe the unexpected behaviour

delayInsertOrThrowIfNeeded counts all active parts across all volumes, including those on prefer_not_to_merge volumes. Since those parts can never be reduced through merges, the back-pressure feedback loop is broken: delaying inserts does not lead to fewer parts, resulting in permanent throttling.

Which ClickHouse versions are affected?

All versions with tiered storage support. Confirmed on 26.1.2.

How to reproduce

Server version: 26.1.2, native client.

Storage policy (add to config.xml):

<storage_configuration>
    <disks>
        <cold_disk>
            <path>/var/lib/clickhouse/cold/</path>
        </cold_disk>
    </disks>
    <policies>
        <tiered>
            <volumes>
                <hot>
                    <disk>default</disk>
                </hot>
                <cold>
                    <disk>cold_disk</disk>
                    <prefer_not_to_merge>true</prefer_not_to_merge>
                </cold>
            </volumes>
        </tiered>
    </policies>
</storage_configuration>

Reproduce:

CREATE TABLE test_delay (
    date Date,
    id UInt64
) ENGINE = MergeTree()
PARTITION BY date
ORDER BY id
SETTINGS storage_policy = 'tiered',
         parts_to_delay_insert = 50,
         parts_to_throw_insert = 100;

-- Prevent merges so small parts accumulate
SYSTEM STOP MERGES test_delay;

-- Create 60 small parts in partition 2026-01-01
-- for i in $(seq 1 60); do clickhouse-client -q "INSERT INTO test_delay SELECT '2026-01-01', number FROM numbers(100)"; done

-- Move all to cold volume (prefer_not_to_merge)
ALTER TABLE test_delay MOVE PARTITION '2026-01-01' TO VOLUME 'cold';
SYSTEM START MERGES test_delay;

-- Verify: 60 parts on cold volume, unmergeable
SELECT partition, disk_name, count() AS parts
FROM system.parts WHERE active AND table = 'test_delay'
GROUP BY partition, disk_name;

-- Insert into a different partition — this should NOT be delayed
INSERT INTO test_delay SELECT '2026-02-24', number FROM numbers(100);
-- But it IS delayed due to 60 unmergeable parts in 2026-01-01

Error message and/or stacktrace

Delaying inserting block by 126 ms. because there are 1251 parts and their average size is 1.00 KiB

The part count comes from a partition on the cold volume, but the insert targets a different partition with only a few parts on the hot volume.

Expected behavior

Parts on prefer_not_to_merge volumes should not contribute to the parts_to_delay_insert / parts_to_throw_insert threshold, since delaying inserts cannot reduce those parts.

Additional context

Root cause in code:

getMaxPartsCountAndSizeForPartitionWithState (MergeTreeData.cpp:5533) iterates all active parts without filtering by volume:

for (const auto & part : getDataPartsStateRange(state))
{
    ++cur_parts_count;  // no volume/disk awareness
}

The merge selector correctly excludes prefer_not_to_merge parts via shallParticipateInMerges (IMergeTreeDataPart.cpp:2066), but the delay/throw logic does not.

Why this matters: In tiered storage setups (hot local disk + cold S3 with prefer_not_to_merge), if parts reach the cold volume before being fully merged (e.g., due to merge backlog when TTL triggers), they become permanently unmergeable. The insert back-pressure feedback loop breaks — delay cannot reduce parts that will never merge.

Related issues:

Possible fix: Filter by shallParticipateInMerges in getMaxPartsCountAndSizeForPartitionWithState, or introduce a separate counting path for delayInsertOrThrowIfNeeded that excludes unmergeable volumes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions