iframe-proxy

chandra-siri · 2025-12-27T18:01:38Z

Add performance microbenchmarking suite for Sequential and Random Reads
This compares Regional (Standard Storage) with Zonal (Rapid Storage) buckets.
Regional uses JSON wheres Zonal uses gRPC bidi

gemini-code-assist · 2025-12-27T18:02:05Z

Summary of Changes

Hello @chandra-siri, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a collection of new Python scripts and a pytest benchmark designed to thoroughly evaluate and measure the performance of asynchronous operations with Google Cloud Storage. The primary focus is on benchmarking parallel uploads and downloads, leveraging Python's asyncio for concurrent I/O and multiprocessing for CPU-bound parallelism, thereby providing comprehensive tools to assess throughput and latency across various scenarios.

Highlights

Asynchronous Download Benchmarks: Introduced async_tasks_downloade_mp.py, async_tasks_downloader.py, and parallel_downloader.py to measure the performance of parallel asynchronous downloads from Google Cloud Storage, utilizing both asyncio and multiprocessing.
Asynchronous Upload Benchmarks: Added parallel_uploader.py to benchmark parallel asynchronous uploads of data to Google Cloud Storage.
Pytest Read Benchmark: Integrated test_reads.py as a formal pytest-benchmark test to assess the throughput of single-object asynchronous downloads with configurable chunk sizes.
Utility Scripts: Included random_number.py for generating secure 64-bit random integers and test_asyncio_gc.py for exploring asyncio task garbage collection behavior.
move_blob Test: A new script move_blob_test.py was added to demonstrate and test the move_blob functionality for Google Cloud Storage objects.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds a collection of benchmarking and testing scripts. My review focuses on improving code quality, fixing bugs, and increasing efficiency. Key findings include a critical bug in a time calculation, several instances of inefficient client creation within loops, and various opportunities for code cleanup such as removing unused imports and commented-out code. I've also suggested improvements for better script structure and more informative benchmark reporting.

I am having trouble creating individual review comments. Click here to see my feedback.

parallel_downloader.py (60)

There is a critical bug in the time calculation due to operator precedence. start_time / 10**9 is calculated before the subtraction. You need to wrap end_time - start_time in parentheses to get the correct duration.

        f"\nFinished all download attempts for {num_objects} objects: took - {(end_time - start_time) / 10**9}s"

parallel_downloader.py (16)

Creating a new AsyncGrpcClient for each object download is inefficient as it involves setting up a new connection each time. The client should be created once per worker process and reused for all downloads within that process. Consider creating the client in download_worker and passing it to this function.

threaded_downloader.py (14)

Creating a new AsyncGrpcClient for each download is inefficient. Since each download_worker runs in its own thread and creates its own event loop, the client should be created once per thread in download_worker and passed to download_object_async.

tests/perf/microbenchmarks/test_reads.py (66)

The output_buffer is initialized here and then immediately re-initialized on line 69. This first initialization is redundant and can be removed.

parallel_uploader.py (17)

Creating a new AsyncGrpcClient for each object upload is inefficient. This creates a new connection for every task. The client should be instantiated once per worker process in upload_worker and passed to this function to be reused.

parallel_downloader.py (23)

The object size is hardcoded here. It would be better to define it as a constant at the top of the file (e.g., OBJECT_SIZE = 100 * 1024 * 1024) for better readability and maintainability.

parallel_uploader.py (31)

The bucket name is hardcoded. It's better to define this as a constant at the top of the file for easier configuration.

async_tasks_downloader.py (65)

There is a typo in FINSHED. It should be FINISHED.

        f"FINISHED: total bytes downloaded - {num_objects*OBJECT_SIZE} in time {(end_time - start_time) / (10**9)}s"

move_blob_test.py (3-9)

The script's logic is executed at the module level. It's a best practice to encapsulate the main logic within a main() function and call it under an if __name__ == "__main__": guard. This prevents the code from running when the module is imported elsewhere and improves reusability and testability.

def main():
    gcs = storage.Client()
    bucket = gcs.bucket("chandrasiri-us-west1-hns-soft-del")
    # print(bucket.name)
    blob = bucket.blob("test/blob.csv")
    blob.upload_from_string("")
    print("Uploaded blob:", blob.name)
    bucket.move_blob(blob, new_name="test/blob2.csv")

if __name__ == "__main__":
    main()

move_blob_test.py (4)

The bucket name is hardcoded. For better portability and reusability, consider defining it as a constant at the top of the file or passing it as a command-line argument.

parallel_downloader.py (3)

The os module is imported but never used. It should be removed.

async_tasks_downloader.py (4)

The ThreadPoolExecutor is imported but not used in this file. It should be removed to keep the code clean.

async_tasks_downloade_mp.py (4)

The ThreadPoolExecutor is imported but never used in this file. It's good practice to remove unused imports to keep the code clean.

parallel_downloader.py (34)

The bucket name is hardcoded. Consider defining this as a constant at the top of the file to make it easier to change.

async_tasks_downloade_mp.py (83-84)

There's a typo in 'Throuput'. It should be 'Throughput'.

Additionally, the throughput calculation uses 10**-6 to convert from bytes to megabytes, which assumes 1 MB = 1,000,000 bytes. However, OBJECT_SIZE is defined using binary prefixes (1024*1024). For consistency, you should use (1024*1024) for the conversion to MiB/s.

        f"Throughput: {num_object*OBJECT_SIZE /((end_time_proc - start_time_proc) / (10**9))/(1024*1024)} MiB/s"

async_tasks_downloade_mp.py (76)

The results variable is assigned but its value is never used. It should be removed to avoid confusion.

        pool.starmap(async_runner, args)

async_tasks_downloader.py (36)

This docstring is empty. Please add a description of what the function does.

    """Creates and runs asyncio tasks to download a range of objects."""

random_number.py (12-20)

The script's logic is executed at the module level. It's a best practice to encapsulate this logic within a main() function and call it under an if __name__ == "__main__": guard. This prevents the code from running when the module is imported elsewhere.

def main():
    # Generate 1000 unique IDs
    # A set is the easiest way to guarantee uniqueness in the batch.
    request_ids = set()
    while len(request_ids) < 1000:
        request_ids.add(generate_random_64bit_int())

    # You can convert it to a list if needed
    id_list = list(request_ids)

    print(f"Generated {len(id_list)} unique 64-bit IDs.")
    print("First 5 IDs:", id_list[:5])

if __name__ == "__main__":
    main()

tests/perf/microbenchmarks/test_reads.py (16-21)

These modules (math, google.api_core.exceptions, google.cloud.storage.blob.Blob) are imported but not used in the file. They should be removed to keep the code clean.

tests/perf/microbenchmarks/test_reads.py (42)

The comment indicates the object size is 1 GiB, but 100 * (1024 ** 2) is 100 MiB. The comment should be corrected to avoid confusion.

OBJECT_SIZE = 100 * (1024 ** 2)  # 100 MiB

async_tasks_downloade_mp.py (71-73)

These lines appear to be leftover debugging code. They should be removed before merging to keep the codebase clean.

tests/perf/microbenchmarks/test_reads.py (102-109)

This docstring appears to be a list of implementation notes rather than a description of the function's purpose. It should be updated to be a proper docstring that explains what my_setup does, its parameters, and what it returns.

tests/perf/microbenchmarks/test_reads.py (210)

The benchmark summary prints 'N/A' for standard deviation. pytest-benchmark calculates this value and it's available in benchmark.stats['stddev']. It would be more informative to include this statistic in the report. Note that the other metrics are in MiB/s, so you may need to convert the time-based standard deviation or adjust the table header.

tests/perf/microbenchmarks/test_reads.py (219-234)

This large block of commented-out code should be removed to improve readability and reduce clutter.

async_tasks_downloade_mp.py (38)

This docstring is empty. Please add a meaningful description of what the download_objects_pool function does, its parameters, and what it returns, or remove the docstring if it's not needed.

    """Downloads a pool of objects asynchronously within a single process."""

threaded_downloader.py (32)

The bucket name is hardcoded. It's better to define this as a constant at the top of the file for easier configuration and maintenance.

…lysis

…e analysis - Adjusted file size and rounds in config.yaml for better benchmarking accuracy. - Enhanced download object tests in test_reads.py to include random chunk downloading. - Refactored upload and download worker functions for improved clarity and performance.

…ersisted_size_sync to accept command-line arguments

…iprocessing

…s.py

…ve CRC32 performance measurement

…ads , writes folder

chandra-siri · 2026-01-15T18:08:54Z

gemini-code-assist

Code Review

This pull request introduces a comprehensive micro-benchmarking suite to compare read performance between regional (standard storage, JSON API) and zonal (rapid storage, gRPC API) buckets. The implementation is well-structured, effectively using pytest parameterization, multiprocessing, and asyncio to cover various scenarios. The inclusion of a resource monitor and a script to convert results to CSV are excellent additions. I have a few minor suggestions to enhance code clarity and maintainability.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

…entation

…ameter calculation

PR created by the Librarian CLI to initialize a release. Merging this PR will auto trigger a release. Librarian Version: v1.0.2-0.20251119154421-36c3e21ad3ac Language Image: us-central1-docker.pkg.dev/cloud-sdk-librarian-prod/images-prod/python-librarian-generator@sha256:8e2c32496077054105bd06c54a59d6a6694287bc053588e24debe6da6920ad91 <details><summary>google-cloud-storage: 3.9.0</summary> ## [3.9.0](v3.8.0...v3.9.0) (2026-02-02) ### Features * update generation for MRD (#1730) ([08bc708](08bc7082)) * add get_object method for async grpc client (#1735) ([0e5ec29](0e5ec29b)) * Add micro-benchmarks for reads comparing standard (regional) vs rapid (zonal) buckets. (#1697) ([1917649](1917649f)) * Add support for opening via `write_handle` and fix `write_handle` type (#1715) ([2bc15fa](2bc15fa5)) * add samples for appendable objects writes and reads ([2e1a1eb](2e1a1eb5)) * add samples for appendable objects writes and reads (#1705) ([2e1a1eb](2e1a1eb5)) * add context manager to mrd (#1724) ([5ac2808](5ac2808a)) * Move Zonal Buckets features of `_experimental` (#1728) ([74c9ecc](74c9ecc5)) * add default user agent for grpc (#1726) ([7b31946](7b319469)) * expose finalized_time in blob.py applicable for GET_OBJECT in ZB (#1719) ([8e21a7f](8e21a7fe)) * expose `DELETE_OBJECT` in `AsyncGrpcClient` (#1718) ([c8dd7a0](c8dd7a0b)) * send `user_agent` to grpc channel (#1712) ([cdb2486](cdb2486b)) * integrate writes strategy and appendable object writer (#1695) ([dbd162b](dbd162b3)) * Add micro-benchmarks for writes comparing standard (regional) vs rapid (zonal) buckets. (#1707) ([dbe9d8b](dbe9d8b8)) * add support for `generation=0` to avoid overwriting existing objects and add `is_stream_open` support (#1709) ([ea0f5bf](ea0f5bf8)) * add support for `generation=0` to prevent overwriting existing objects ([ea0f5bf](ea0f5bf8)) * add `is_stream_open` property to AsyncAppendableObjectWriter for stream status check ([ea0f5bf](ea0f5bf8)) ### Bug Fixes * receive eof while closing reads stream (#1733) ([2ef6339](2ef63396)) * update write handle on every recv() (#1716) ([5d9fafe](5d9fafe1)) * implement requests_done method to signal end of requests in async streams. Gracefully close streams. (#1700) ([6c16079](6c160794)) * implement requests_done method to signal end of requests in async streams. Gracefully close streams. ([6c16079](6c160794)) * instance grpc client once per process in benchmarks (#1725) ([721ea2d](721ea2dd)) * Fix formatting in setup.py dependencies list (#1713) ([cc4831d](cc4831d7)) * Change contructors of MRD and AAOW AsyncGrpcClient.grpc_client to AsyncGrpcClient (#1727) ([e730bf5](e730bf50)) </details>

chandra-siri added 3 commits December 3, 2025 15:33

local files for benchmarking

5d58213

Merge branch 'main' of github.com:googleapis/python-storage into bench

c797586

add test_reads.py for microbenchmarking reads

20d2d2d

product-auto-label Bot added size: l Pull request size is large. api: storage Issues related to the googleapis/python-storage API. labels Dec 27, 2025

push local files

f493bd8

product-auto-label Bot added size: xl Pull request size is extra large. and removed size: l Pull request size is large. labels Dec 27, 2025

gemini-code-assist Bot reviewed Dec 27, 2025

View reviewed changes

chandra-siri added 17 commits December 28, 2025 10:30

1p 1c working copy

68c8ba0

Add microbenchmarking tests and utility functions for performance ana…

9e2afa8

…lysis

upload local changes

bef9dcb

just upload one

75007a7

Refactor get_persisted_size_async to improve logging and update get_p…

a85fff1

…ersisted_size_sync to accept command-line arguments

working copy

4c24f66

add regional tests

e216644

Add JSON to CSV conversion script and update benchmark tests for mult…

80120a1

…iprocessing

Refactor benchmark configuration and cleanup unused code in test_read…

99bc3eb

…s.py

Merge branch 'main' of github.com:googleapis/python-storage into bench

f4a622b

Implement write benchmarks

af98e0e

Merge branch 'main' of github.com:googleapis/python-storage into bench

1405e92

Merge branch 'bench' of github.com:googleapis/python-storage into bench

3c7e7af

working copy

970b162

Add benchmarks for downloading and uploading large objects, and impro…

0bf17c7

…ve CRC32 performance measurement

revert changes in samples/snippets/storage_list_files_with_prefix.py

a7309ac

chandra-siri changed the title ~~Bench~~ feat: Add Micro Benchmarks for reads and writes comparing standard (regional) vs rapid (zonal) buckets. Jan 11, 2026

chandra-siri changed the title ~~feat: Add Micro Benchmarks for reads and writes comparing standard (regional) vs rapid (zonal) buckets.~~ feat: Add micro-benchmarks for reads and writes comparing standard (regional) vs rapid (zonal) buckets. Jan 11, 2026

Remove unused test utility file in asyncio tests

829f0f4

chandra-siri assigned suni72 Jan 12, 2026

chandra-siri added 2 commits January 15, 2026 15:52

move reads & writes into their folders

9873084

move writes benchmarking into another PR

3e5435e

chandra-siri changed the title ~~feat: Add micro-benchmarks for reads and writes comparing standard (regional) vs rapid (zonal) buckets.~~ feat: Add micro-benchmarks for reads comparing standard (regional) vs rapid (zonal) buckets. Jan 15, 2026

chandra-siri added 5 commits January 15, 2026 21:34

Merge branch 'main' into bench

7dd5e00

refactor: reorganize benchmark parameters add each parameter.py in re…

3c0dac2

…ads , writes folder

move writes into another PR

11ff2f6

remove write related changes from contest.py

4a94ed6

fix: update bucket map to use environment variables for default values

1298818

gemini-code-assist Bot reviewed Jan 15, 2026

View reviewed changes

chandra-siri and others added 6 commits January 15, 2026 23:55

Apply suggestion from @gemini-code-assist[bot]

ffc7cd1

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

refactor: simplify throughput calculations and improve function docum…

799bc99

…entation

refactor: enhance docstrings for clarity and detail in read benchmarks

a20622c

add README for performance microbenchmarks with usage instructions

91553d8

add testing dependencies for benchmarking

a283c89

refactor: improve docstring for _get_params function and simplify par…

d5a7efd

…ameter calculation

chandra-siri assigned googlyrahman and jasha26 Jan 16, 2026

jasha26 previously approved these changes Jan 21, 2026

View reviewed changes

Merge branch 'main' into bench

51ec525

chandra-siri dismissed jasha26’s stale review via 51ec525 January 21, 2026 10:58

Pulkit0110 approved these changes Jan 21, 2026

View reviewed changes

Comment thread tests/perf/microbenchmarks/__init__.py

Comment thread tests/perf/microbenchmarks/conftest.py

chandra-siri enabled auto-merge (squash) January 21, 2026 11:06

chandra-siri merged commit 1917649 into main Jan 21, 2026
20 of 21 checks passed

chandra-siri deleted the bench branch January 21, 2026 11:21

release-please Bot mentioned this pull request Jan 21, 2026

chore(main): release 4.0.0 #1708

Closed

This was referenced Jan 29, 2026

chore: librarian release pull request: 20260129T115903Z #1731

Closed

chore: librarian release pull request: 20260202T123858Z #1736

Merged

Sunbelt Computer Software

PL/B Language Development and Support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add micro-benchmarks for reads comparing standard (regional) vs rapid (zonal) buckets.#1697

feat: Add micro-benchmarks for reads comparing standard (regional) vs rapid (zonal) buckets.#1697
chandra-siri merged 44 commits into
mainfrom
bench

chandra-siri commented Dec 27, 2025 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Dec 27, 2025

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

chandra-siri commented Jan 15, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Sunbelt Computer Software

PL/B Language Development and Support

Uh oh!

Conversation

chandra-siri commented Dec 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot commented Dec 27, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

parallel_downloader.py (60)

parallel_downloader.py (16)

threaded_downloader.py (14)

tests/perf/microbenchmarks/test_reads.py (66)

parallel_uploader.py (17)

parallel_downloader.py (23)

parallel_uploader.py (31)

async_tasks_downloader.py (65)

move_blob_test.py (3-9)

move_blob_test.py (4)

parallel_downloader.py (3)

async_tasks_downloader.py (4)

async_tasks_downloade_mp.py (4)

parallel_downloader.py (34)

async_tasks_downloade_mp.py (83-84)

async_tasks_downloade_mp.py (76)

async_tasks_downloader.py (36)

random_number.py (12-20)

tests/perf/microbenchmarks/test_reads.py (16-21)

tests/perf/microbenchmarks/test_reads.py (42)

async_tasks_downloade_mp.py (71-73)

tests/perf/microbenchmarks/test_reads.py (102-109)

tests/perf/microbenchmarks/test_reads.py (210)

tests/perf/microbenchmarks/test_reads.py (219-234)

async_tasks_downloade_mp.py (38)

threaded_downloader.py (32)

Uh oh!

chandra-siri commented Jan 15, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

chandra-siri commented Dec 27, 2025 •

edited

Loading