fix: init mp pool & grpc client once, use os.sched_setaffinity by chandra-siri · Pull Request #1751 · googleapis/python-storage · GitHub
Skip to content
This repository was archived by the owner on Mar 31, 2026. It is now read-only.

fix: init mp pool & grpc client once, use os.sched_setaffinity#1751

Merged
chandra-siri merged 21 commits into
mainfrom
writes_fixes
Feb 19, 2026
Merged

fix: init mp pool & grpc client once, use os.sched_setaffinity#1751
chandra-siri merged 21 commits into
mainfrom
writes_fixes

Conversation

@chandra-siri

Copy link
Copy Markdown
Collaborator

fix: init mp pool & grpc client once, use os.sched_setaffinity

  • mp pool is initalize once per benchmark run, otherwise time to init pool is affecting throughput calculations.
  • grpc client should be intialized once per process in an event loop, otherwise processes get stuck .
  • os.sched_setaffinity helps in pinning the process to a set of cores to avoid hard irq from nic.

@product-auto-label product-auto-label Bot added size: l Pull request size is large. api: storage Issues related to the googleapis/python-storage API. labels Feb 12, 2026
@chandra-siri chandra-siri changed the base branch from main to time_based_benchmarks February 12, 2026 16:32
@gemini-code-assist

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the multiprocessing setup in the write benchmarks to initialize the process pool and clients only once per run, optimizing benchmark results by preventing setup overhead. From a security standpoint, the analysis could not be fully completed as several files listed in SECURITY_ANALYSIS_TODO.md were unreadable, meaning no vulnerabilities were found in the accessible files, but a comprehensive review was hindered. For code improvements, consider making the os.sched_setaffinity CPU affinity setting more robust and portable, and ensuring graceful cleanup of resources in worker processes.

Comment thread tests/perf/microbenchmarks/writes/test_writes.py Outdated
Comment on lines +327 to +338
def _worker_init(bucket_type):
"""Initializes a persistent event loop and client for each worker process."""
os.sched_setaffinity(0, {i for i in range(20, 190)}) # Pin to cores 20-189
global worker_loop, worker_client, worker_json_client
if bucket_type == "zonal":
worker_loop = asyncio.new_event_loop()
asyncio.set_event_loop(worker_loop)
worker_client = worker_loop.run_until_complete(create_client())
else: # regional
from google.cloud import storage

worker_json_client = storage.Client()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The _worker_init function initializes an event loop and clients for each worker process, but there's no corresponding cleanup logic. This can lead to resource leaks. While process termination will clean up resources, a graceful shutdown is better practice. Consider using atexit.register() in _worker_init to call a cleanup function that closes the clients and the event loop before the worker process exits.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chandra-siri, can you check if a cleanup is necessary? If not, let's resolve the comment. I think the client doesn't need to be closed, so we should be fine.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes close is possible in both json and grpc clients, but it's not urgent item. so created a b/479135274 few days earlier. Will fix it soon

@product-auto-label product-auto-label Bot added size: m Pull request size is medium. and removed size: l Pull request size is large. labels Feb 13, 2026
@chandra-siri

Copy link
Copy Markdown
Collaborator Author

/gcbrun(6bf0860)

@chandra-siri chandra-siri marked this pull request as ready for review February 13, 2026 10:39
@chandra-siri chandra-siri requested review from a team February 13, 2026 10:39
@chandra-siri chandra-siri requested a review from a team as a code owner February 13, 2026 10:39
@googlyrahman

Copy link
Copy Markdown
Contributor

Added some comments, Please take a look.

@chandra-siri

Copy link
Copy Markdown
Collaborator Author

Added some comments, Please take a look.

Addressed them

@googlyrahman

googlyrahman commented Feb 19, 2026

Copy link
Copy Markdown
Contributor

@chandra-siri presubmits are failing, please fix it.

googlyrahman
googlyrahman previously approved these changes Feb 19, 2026
Base automatically changed from time_based_benchmarks to main February 19, 2026 12:54
@chandra-siri chandra-siri dismissed googlyrahman’s stale review February 19, 2026 12:54

The base branch was changed.

@chandra-siri

Copy link
Copy Markdown
Collaborator Author

@chandra-siri chandra-siri merged commit a9eb82c into main Feb 19, 2026
15 checks passed
@chandra-siri chandra-siri deleted the writes_fixes branch February 19, 2026 14:55
chandra-siri added a commit that referenced this pull request Mar 18, 2026
PR created by the Librarian CLI to initialize a release. Merging this PR
will auto trigger a release.

Librarian Version: v1.0.2-0.20251119154421-36c3e21ad3ac
Language Image:
us-central1-docker.pkg.dev/cloud-sdk-librarian-prod/images-prod/python-librarian-generator@sha256:8e2c32496077054105bd06c54a59d6a6694287bc053588e24debe6da6920ad91
<details><summary>google-cloud-storage: 3.10.0</summary>

##
[3.10.0](v3.9.0...v3.10.0)
(2026-03-18)

### Features

* [Bucket Encryption Enforcement] add support for bucket encryption
enforcement config (#1742)
([2a6e8b0](2a6e8b0))

### Perf Improvments

* [Rapid Buckets Reads] Use raw proto access for read resumption
strategy (#1764)
([14cfd61](14cfd61))
* [Rapid Buckets Benchmarks] init mp pool & grpc client once, use
os.sched_setaffinity (#1751)
([a9eb82c](a9eb82c))
* [Rapid Buckets Writes] don't flush at every append, results in bad
perf (#1746)
([ab62d72](ab62d72))


### Bug Fixes

* [Windows] skip downloading blobs whose name contain `":" ` eg: `C:`
`D:` etc when application runs in Windows. (#1774)
([5581988](5581988))
* [Path Traversal] Prevent path traversal in `download_many_to_path`
(#1768)
([700fec3](700fec3))
* [Rapid Buckets] pass token correctly, '&' instead of ',' (#1756)
([d8dd1e0](d8dd1e0))


</details>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

api: storage Issues related to the googleapis/python-storage API. size: m Pull request size is medium.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants