Add silk fibers aware socket implementations by mstetsyuk · Pull Request #106078 · ClickHouse/ClickHouse · GitHub
Skip to content

Add silk fibers aware socket implementations#106078

Merged
nikitamikhaylov merged 10 commits into
masterfrom
silk-sockets
Jun 15, 2026
Merged

Add silk fibers aware socket implementations#106078
nikitamikhaylov merged 10 commits into
masterfrom
silk-sockets

Conversation

@mstetsyuk

@mstetsyuk mstetsyuk commented May 28, 2026

Copy link
Copy Markdown
Member

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Add silk fiber aware secure and plain socket implementations.

Version info

  • Merged into: 26.6.1.814

@clickhouse-gh

clickhouse-gh Bot commented May 28, 2026

Copy link
Copy Markdown
Contributor

@clickhouse-gh clickhouse-gh Bot added the pr-improvement Pull request with some product improvements label May 28, 2026
Comment thread src/IO/SilkFiberStreamSocketImpl.cpp Outdated
Comment thread src/IO/SilkFiberStreamSocketImpl.cpp

int FiberStreamSocketImpl::receiveBytes(void * buffer, int length, int flags)
{
if (flags != 0)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rejecting every non-zero receive flag makes this class incompatible with existing Poco socket callers. ClickHouse HTTP code uses receiveBytes with MSG_DONTWAIT | MSG_PEEK to check whether a peer is still connected, and SecureSocketImpl::receiveBytes forwards exactly that flag combination to the underlying socket before touching SSL. With this implementation that path throws LOGICAL_ERROR, so a fiber socket used there will report the peer as disconnected instead of peeking. Please support at least MSG_DONTWAIT | MSG_PEEK via recv and fiber-aware polling, or explicitly keep these sockets away from code paths that rely on Poco flags.

int r;
if (timeout.totalMicroseconds() > 0)
{
r = silk::FiberFuture::waitWithTimeout(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Poco::Net::StreamSocketImpl::sendBytes preserves nonblocking mode: after callers use setBlocking(false), it attempts one send and returns short progress or EAGAIN so paths such as WriteBufferFromPocoSocket can run their async_callback and enforce the external timeout. This implementation always waits on the fiber future and keeps looping until all bytes are written, so a fiber socket used in those async paths waits here instead of handing control back to the poller. receiveBytes and the secure BIO wrappers have the same contract issue. Please branch on getBlocking/MSG_DONTWAIT and make the nonblocking path return with the same Poco semantics, or reject those async uses before the socket is exposed.

@nikitamikhaylov

Copy link
Copy Markdown
Member

@mstetsyuk

Copy link
Copy Markdown
Member Author

2 changes on the silk side are required:

  1. Add silk::FiberScheduler::connect (Add FiberScheduler::connect and FiberScheduler::accept silk#52), otherwise this is required to make sure that the socket is unblocking - required by this logic.
  2. cancelIo should not do future->result = nullptr on cancel, as there is a race between our thread calling cancel and the kernel consuming bytes from TCP connection into our userspace buffer. Failure mode: kernel reads bytes from a TCP connection, writes them to our userspace buffer, and we set future->result = nullptr. The bytes are forever lost, so when we try to read the data again (with a higher timeout, for instance), we won't see them.

@mstetsyuk

Copy link
Copy Markdown
Member Author

what's the difference with https://github.com/ClickHouse/silk/blob/main/src/perf/fiber-http.h ?

This PR is basically https://github.com/ClickHouse/silk/blob/main/src/perf/fiber-http.h + TLS + cosmetic changes/simplifications/minor improvements (honouring the receive/send timeout) + tests

(I was under the impression that https://github.com/ClickHouse/silk/blob/main/src/perf/fiber-http.h is just for internal perf tests in the silk library - while this PR is meant for normal production usage in our codebase)

@mstetsyuk

Copy link
Copy Markdown
Member Author

This change is required to implement this feature: #106078

@mstetsyuk mstetsyuk linked an issue May 28, 2026 that may be closed by this pull request
@@ -0,0 +1,208 @@
#include <IO/SilkFiberStreamSocketImpl.h>

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to drop the silk part of a name - just FiberStreamSocket.
I hope there will be just one fiber library in use.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another one, apparently (boost::context).

By the way, what do you think about the Silk namespace? I think it should be useful to gate silk-aware sockets, connection pool, throttlers etc.

Comment thread src/IO/SilkFiberStreamSocketImpl.cpp
}
peer.sendBytes("pong", 4);

std::this_thread::sleep_for(std::chrono::milliseconds(100));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes the test timing-dependent: after the server sends pong, if the client fiber is not scheduled until this 100 ms delay has elapsed, x is already queued and the preceding poll can return true. Please use explicit coordination, for example a FiberFuture or latch from the client after the negative poll, before sending x instead of sleeping to order the race.

Comment thread src/IO/SilkFiberStreamSocketImpl.cpp
@mstetsyuk

Copy link
Copy Markdown
Member Author

I believe this one is important. But note that the throttler function just sleeps now - need to changes this. Must return a sleep duration so fiber code can do FinerScheduler::sleep

Apparently, we can make it even simpler. It's the responsibility of the caller to supply the throttler, so I think we can just add Silk::Throttler which overrides sleep: 0790620

And if the user of this class wants to set a throttler, it's on them to use the right one.

Comment thread src/IO/SilkFiberStreamSocketImpl.cpp Outdated
@mstetsyuk

mstetsyuk commented Jun 11, 2026

Copy link
Copy Markdown
Member Author

I believe this one is important. But note that the throttler function just sleeps now - need to changes this. Must return a sleep duration so fiber code can do FinerScheduler::sleep

@vadimskipin, addressed by adding a one-liner throttler implementation that does silk::FiberScheduler::sleep and not std::this_thread::sleep.

@clickhouse-gh clickhouse-gh Bot added the submodule changed At least one submodule changed in this PR. label Jun 12, 2026
Comment thread src/IO/SilkSecureFiberStreamSocketImpl.cpp
@mstetsyuk mstetsyuk marked this pull request as ready for review June 15, 2026 10:02
@nikitamikhaylov

Copy link
Copy Markdown
Member

Unit tests failure should be fixed by: #106856

@nikitamikhaylov nikitamikhaylov added this pull request to the merge queue Jun 15, 2026
Merged via the queue into master with commit 63702bc Jun 15, 2026
164 of 166 checks passed
@nikitamikhaylov nikitamikhaylov deleted the silk-sockets branch June 15, 2026 12:10
@robot-ch-test-poll4 robot-ch-test-poll4 added the pr-synced-to-cloud The PR is synced to the cloud repo label Jun 15, 2026
@clickgapai

Copy link
Copy Markdown
Contributor

Hi @mstetsyuk @nikitamikhaylov @vadimskipin — a couple of suggestions to make this PR easier for reviewers and the changelog reader.

Description (currently missing):

  • Why: PR body is essentially template-only; it does not explain the Throttler sleep virtualization, the Poco BIO_METHOD injection, or the supportsExternalPolling contract enforced in setAsyncCallback.

  • Suggested addition:

    Describe: (1) the new build flag USE_SILK and its glibc/x86-64-v2/liburing requirements; (2) that Throttler::sleep is now virtual so Silk::Throttler can yield the fiber instead of blocking the thread; (3) that setAsyncCallback now rejects callbacks on sockets without external-polling support (Silk fiber sockets), since they use the blocking-future model; (4) the custom BIO_METHOD injection used to route OpenSSL I/O through the fiber scheduler.

Ignore this if the current title/body is intentional.

@alexey-milovidov

Copy link
Copy Markdown
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-improvement Pull request with some product improvements pr-synced-to-cloud The PR is synced to the cloud repo submodule changed At least one submodule changed in this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants