StorageKafka: extended configuration, parallel consumers, offset tracking#1654
Conversation
|
Cool! |
There was a problem hiding this comment.
We have support for multiple consumers right now?
There was a problem hiding this comment.
Yes, I ended up adding it. Let me update the comment.
There was a problem hiding this comment.
Strange code.
It's not guaranteed, that capacity is equal to the value that was requested in reserve.
There was a problem hiding this comment.
Right, let me fix that.
There was a problem hiding this comment.
-1 (with intention to implicitly convert to max long) looks weird.
There was a problem hiding this comment.
Is it guaranteed that there is at least one available consumer at this point?
There was a problem hiding this comment.
This should be guaranteed by the semaphore. There are as many semaphore slots as there are consumers. If the code reaches this line, it means that semaphore reserved at least one slot for this caller (otherwise it would throw an exception or returned nullptr).
- Make sure to use dynamic linking on macOS to avoid OpenSSL static linking bug - Use -std=c++17 in CFLAGS for files in contrib - Avoid bad support for thread_local on macOS with clang altogether
Previously the dependencies were updated only on DROP TABLE, so detaching a materialized view and inserting to source table thrown an exception.
…king This contains many fixes and corrections for the Kafka engine. Most notably it now supports extended configuration similarly to GraphiteMergeTree. Now it also allows specification of consumer count to parallelize consumption of multiple partitions both in materialized views and in SELECT queries. The offsets are now committed in the insertSuffix() method after all rows are successfully read. If an exception is thrown during reading, affected consumer unsubscribes from all assignments and rejoins the consumer group to rewind offsets. This means that the consumer won't lose messages in case of write failures.
…es" (ClickHouse#105591) This reverts the merge of PR ClickHouse#105591 (commit d562138), reversing changes made to 6519f67. ClickHouse#105591 broke 02354_vector_search_rescoring on master across all build configs with a deterministic Code 44 ILLEGAL_COLUMN ("The _distance column is an internal virtual column of vector search and cannot be referenced directly"). Root cause is a base-skew semantic conflict between two PRs merged 65 minutes apart, neither of which saw the other in CI: - ClickHouse#107985 (merged 10:05 UTC) added a hard ILLEGAL_COLUMN guard in both passes of useVectorSearch.cpp, making _distance internal-only. It closed customer incident clickhouse-core-incidents#1654 and added test 02354_vector_search_incident1654 asserting the guard. - ClickHouse#105591 (merged 11:10 UTC) added a feature that deliberately supports referencing _distance directly, with test queries and reference output that expect it to succeed. The guard from ClickHouse#107985 pre-empts ClickHouse#105591's new code path, so ClickHouse#105591's own tests fail on master. Reverting ClickHouse#105591 (the later, feature PR) restores master to green while preserving the customer-incident ClickHouse#1654 guard. The product decision of whether _distance should be user-referenceable is left to the vector-search owners to reconcile and re-land. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

This contains many fixes and corrections for the Kafka engine.
Most notably it now supports extended configuration similarly to GraphiteMergeTree.
Now it also allows specification of consumer count to parallelize consumption of
multiple partitions both in materialized views and in SELECT queries.
The offsets are now committed in the insertSuffix() method after all rows
are successfully read. If an exception is thrown during reading, affected consumer
unsubscribes from all assignments and rejoins the consumer group to rewind offsets.
This means that the consumer won't lose messages in case of write failures.
The 07f8502 fixes build with clang 5.0 on macOS, however it also needs confluentinc/librdkafka#1592 to be merged first and then vendored.
This should fix #1402 and #1625