feat(storage): Use raw proto access for read resumption strategy by googlyrahman · Pull Request #1764 · googleapis/python-storage · GitHub
Skip to content
This repository was archived by the owner on Mar 31, 2026. It is now read-only.

feat(storage): Use raw proto access for read resumption strategy#1764

Merged
googlyrahman merged 4 commits into
googleapis:mainfrom
googlyrahman:proto-wrapper
Mar 17, 2026
Merged

feat(storage): Use raw proto access for read resumption strategy#1764
googlyrahman merged 4 commits into
googleapis:mainfrom
googlyrahman:proto-wrapper

Conversation

@googlyrahman

Copy link
Copy Markdown
Contributor

The proto-plus library provides protocol buffer message classes that behave like native Python types. While convenient, my profiling confirms this abstraction comes with a significant performance penalty in our hot paths. Every time we access a field on a proto-plus object, the library triggers dynamic lookups and wrapper instantiation. In our hot data ingestion loop, this overhead accumulates rapidly over multiple chunks.

My benchmarking shows that accessing the underlying C++ Protobuf directly is ~2x faster than going through the proto-plus wrapper (measured over 30,000 iterations). While 30,000 operations might sound high, it corresponds to downloading just 60GB of data (assuming 2MB chunks) for accessing just one attribute. For high-performance workloads (e.g., downloading at 1GB/s) so it only takes 60s to complete the process. Additionally this wrapper overhead introduces measurable latency not just for data access, but for every metadata check and state update that repeats per chunk.

This PR bypass proto-plus in our critical IO loops and interact directly with the underlying C++ Protobuf structures. This eliminates the "Wrapper tax" without changing the external behavior of the application.

@product-auto-label product-auto-label Bot added size: s Pull request size is small. api: storage Issues related to the googleapis/python-storage API. labels Mar 3, 2026
@gemini-code-assist

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the read resumption strategy to use raw protobuf message access instead of proto-plus wrappers, aiming for a significant performance improvement in critical I/O paths. The changes correctly adapt field presence checks to use the raw proto API (e.g., HasField), which is a good improvement for both performance and correctness. I have one point of feedback regarding a type inconsistency introduced in the shared state, which could lead to potential issues.

Note: Security Review did not run due to the size of the PR.

Comment thread google/cloud/storage/asyncio/retry/reads_resumption_strategy.py Outdated
@product-auto-label product-auto-label Bot added size: m Pull request size is medium. and removed size: s Pull request size is small. labels Mar 3, 2026
@googlyrahman googlyrahman changed the title Use raw proto access for read resumption strategy feat(storage): Use raw proto access for read resumption strategy Mar 3, 2026
@googlyrahman googlyrahman force-pushed the proto-wrapper branch 2 times, most recently from d06e240 to 08eaf33 Compare March 3, 2026 13:43
@googlyrahman googlyrahman marked this pull request as ready for review March 3, 2026 13:45
@googlyrahman googlyrahman requested review from a team as code owners March 3, 2026 13:45
@googlyrahman

Copy link
Copy Markdown
Contributor Author

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request improves performance in the read resumption strategy by bypassing the proto-plus library and accessing the underlying raw protobuf messages directly in the hot path. The changes correctly use HasField for checking optional scalar fields and introduce local variables to reduce repeated attribute access, which are good micro-optimizations. The test files have also been updated with stylistic formatting improvements.

I have one suggestion to improve the robustness and consistency of the code when checking for the presence of a message field.

Comment thread google/cloud/storage/asyncio/retry/reads_resumption_strategy.py Outdated
Comment thread google/cloud/storage/asyncio/retry/reads_resumption_strategy.py
@googlyrahman googlyrahman merged commit 14cfd61 into googleapis:main Mar 17, 2026
15 checks passed
chandra-siri added a commit that referenced this pull request Mar 18, 2026
PR created by the Librarian CLI to initialize a release. Merging this PR
will auto trigger a release.

Librarian Version: v1.0.2-0.20251119154421-36c3e21ad3ac
Language Image:
us-central1-docker.pkg.dev/cloud-sdk-librarian-prod/images-prod/python-librarian-generator@sha256:8e2c32496077054105bd06c54a59d6a6694287bc053588e24debe6da6920ad91
<details><summary>google-cloud-storage: 3.10.0</summary>

##
[3.10.0](v3.9.0...v3.10.0)
(2026-03-18)

### Features

* [Bucket Encryption Enforcement] add support for bucket encryption
enforcement config (#1742)
([2a6e8b0](2a6e8b0))

### Perf Improvments

* [Rapid Buckets Reads] Use raw proto access for read resumption
strategy (#1764)
([14cfd61](14cfd61))
* [Rapid Buckets Benchmarks] init mp pool & grpc client once, use
os.sched_setaffinity (#1751)
([a9eb82c](a9eb82c))
* [Rapid Buckets Writes] don't flush at every append, results in bad
perf (#1746)
([ab62d72](ab62d72))


### Bug Fixes

* [Windows] skip downloading blobs whose name contain `":" ` eg: `C:`
`D:` etc when application runs in Windows. (#1774)
([5581988](5581988))
* [Path Traversal] Prevent path traversal in `download_many_to_path`
(#1768)
([700fec3](700fec3))
* [Rapid Buckets] pass token correctly, '&' instead of ',' (#1756)
([d8dd1e0](d8dd1e0))


</details>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

api: storage Issues related to the googleapis/python-storage API. size: m Pull request size is medium.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants