feat(storage): Enhance Otel Span Attributes with BucketId and Locatio… · googleapis/google-cloud-python@a0da993 · GitHub
Skip to content

Commit a0da993

Browse files
authored
feat(storage): Enhance Otel Span Attributes with BucketId and Location details for every Bucket/Blob operation
# feat(storage): Enhance Otel Span Attributes with BucketId and Location details for every Bucket/Blob operation as part of ACO (App-centric Observability) This PR implements **App-centric Observability (ACO)** tracing compatibility for the GCS Python SDK (`google-cloud-storage`). All OpenTelemetry trace spans produced by bucket and blob operations now seamlessly incorporate mandatory destination resource annotations (`gcp.resource.destination.id` and `gcp.resource.destination.location`). --- ## Core Architecture & Design ### 1. Centralized, DRY Telemetry Helper (`_helpers.py`) - All OpenTelemetry span context generation, attribute injection, and exception trapping are centralized in a module-level context manager `create_trace_span_helper` in [`_helpers.py`](file:///usr/local/google/home/chandrasiri/storage_related/org-google-cloud-python/packages/google-cloud-storage/google/cloud/storage/_helpers.py). - **Zero modifications to the core tracing module**: [`_opentelemetry_tracing.py`](file:///usr/local/google/home/chandrasiri/storage_related/org-google-cloud-python/packages/google-cloud-storage/google/cloud/storage/_opentelemetry_tracing.py) remains completely pristine and identical to `main`. - Seamlessly wrapped all critical read/write operations across `blob.py`, `bucket.py`, and `client.py` (e.g., `download_as_bytes`, `upload_from_string`, `get_bucket`, `lookup_bucket`, etc.). ### 2. Bounded LRU Metadata Cache (`_lru_cache.py`, `_bucket_metadata_cache.py`) - **LRU Capacity Bounding**: Implemented `LRUCache` utilizing an `OrderedDict` to support O(1) operations and strict capacity bounding to eliminate memory leaks in long-running applications. - **Concurrent Singleflight Warming**: Implemented `BucketMetadataCache` to store bucket locations and project numbers. On cache misses, it spawns background threads (`_fetch_background`) using singleflight tracking (`_inflight_fetches`) to prevent server stampedes / thundering herds. - **Fallback Annotations on 403**: On GCS `403 Forbidden` permissions errors, the cache permanently registers fallback annotations (`projects/_/buckets/{name}`) to completely avoid retry storms on subsequent API calls. ### 3. Resilient 404 Existence Eviction (`_http.py`, `_helpers.py`, `bucket.py`) - **Smart Out-of-band 404 Verification**: When a `404 NotFound` error occurs during media transfers or REST calls, a background thread is spawned (with concurrency protection via `_inflight_checks`) to check if the bucket was deleted out-of-band (`bucket.exists()`). If `exists()` returns `False`, the bucket is cleanly evicted from the cache. - **Instant Synchronous Eviction**: Direct `Bucket.delete()` calls synchronously and instantly evict the bucket name from the cache, ensuring real-time consistency. --- ## Extensive Testing Suite ### 1. 100% Sleep-Free System Tests (`test_aco_observability.py`) Added a comprehensive system test suite [`test_aco_observability.py`](file:///usr/local/google/home/chandrasiri/storage_related/org-google-cloud-python/packages/google-cloud-storage/tests/system/test_aco_observability.py) executing against a live GCS backend: - **Sequential Priming**: Verifies cache miss return times, background priming, and subsequent span enrichment. - **403 Fallback**: Verifies minimal fallback registration on Forbidden responses. - **Cache Stampede Protection**: Simulates 15 concurrent threads on a cache miss and asserts only 1 GCS call is fired. - **Smart 404 Eviction**: Deletes a bucket out-of-band and verifies async cache clean-up on 404. - **Synchronous Delete Eviction**: Asserts immediate cache eviction on SDK deletion. - **LRU Capacity Bounding**: Populates the cache beyond its limits and verifies proper LRU eviction. - **Deterministic Synchronization**: Uses **`threading.Event` (zero static sleeps)** for thread coordination, guaranteeing thundering-fast execution and completely eliminating timing flakiness. ### 2. Robust Unit Tests - Added [`test__lru_cache.py`](file:///usr/local/google/home/chandrasiri/storage_related/org-google-cloud-python/packages/google-cloud-storage/tests/unit/test__lru_cache.py) (LRU correctness, bounding, eviction). - Added [`test__bucket_metadata_cache.py`](file:///usr/local/google/home/chandrasiri/storage_related/org-google-cloud-python/packages/google-cloud-storage/tests/unit/test__bucket_metadata_cache.py) (concurrency, location resolution, 403 fallback, singleflight). - Added `test_delete_hit_evicts_from_cache` inside [`test_bucket.py`](file:///usr/local/google/home/chandrasiri/storage_related/org-google-cloud-python/packages/google-cloud-storage/tests/unit/test_bucket.py). --- ## Validation Results All checks, unit tests, and live GCS system tests pass flawlessly: - **Unit Tests**: 835 passed in 17.82s - **System Tests**: 8 passed in 26.94s - **Format & Linter**: 100% clean (`black` / `flake8`)
1 parent 4d64ebc commit a0da993

14 files changed

Lines changed: 1584 additions & 86 deletions

File tree

Lines changed: 150 additions & 0 deletions

packages/google-cloud-storage/google/cloud/storage/_helpers.py

Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,20 +19,30 @@
1919

2020
import base64
2121
import datetime
22+
import logging
2223
import os
2324
import secrets
2425
import sys
26+
from contextlib import contextmanager
2527
from hashlib import md5
2628
from urllib.parse import urlsplit, urlunsplit
2729
from uuid import uuid4
2830

31+
from google.api_core import exceptions as api_exceptions
32+
from google.cloud.exceptions import NotFound
33+
2934
from google.auth import environment_vars
3035

3136
from google.cloud.storage.constants import _DEFAULT_TIMEOUT
3237
from google.cloud.storage.retry import (
3338
DEFAULT_RETRY,
3439
DEFAULT_RETRY_IF_METAGENERATION_SPECIFIED,
3540
)
41+
from google.cloud.storage._opentelemetry_tracing import (
42+
create_trace_span as _base_create_trace_span,
43+
)
44+
45+
_logger = logging.getLogger(__name__)
3646

3747
STORAGE_EMULATOR_ENV_VAR = "STORAGE_EMULATOR_HOST" # Despite name, includes scheme.
3848
"""Environment variable defining host for Storage emulator."""
@@ -137,6 +147,62 @@ def _validate_name(name):
137147
return name
138148

139149

150+
@contextmanager
151+
def create_trace_span_helper(client, bucket_name, name, attributes=None, **kwargs):
152+
span_attrs = dict(attributes) if attributes else {}
153+
154+
if (
155+
bucket_name
156+
and isinstance(bucket_name, str)
157+
and client
158+
and hasattr(client, "_bucket_metadata_cache")
159+
and client._bucket_metadata_cache
160+
):
161+
try:
162+
if name in (
163+
"Storage.Client.getBucket",
164+
"Storage.Client.lookupBucket",
165+
"Storage.Bucket.reload",
166+
"Storage.Bucket.exists",
167+
):
168+
cached = client._bucket_metadata_cache.get(bucket_name)
169+
else:
170+
cached = client._bucket_metadata_cache.get_or_queue_fetch(bucket_name)
171+
172+
if cached and isinstance(cached, tuple) and len(cached) == 2:
173+
dest_id, loc = cached
174+
span_attrs.update(
175+
{
176+
"gcp.resource.destination.id": dest_id,
177+
"gcp.resource.destination.location": loc,
178+
}
179+
)
180+
except Exception as e:
181+
_logger.debug(f"Failed cache lookup in create_trace_span_helper: {e}")
182+
183+
if "client" not in kwargs and client:
184+
kwargs["client"] = client
185+
186+
with _base_create_trace_span(name, attributes=span_attrs, **kwargs) as span:
187+
try:
188+
yield span
189+
except (NotFound, api_exceptions.NotFound):
190+
if (
191+
bucket_name
192+
and isinstance(bucket_name, str)
193+
and client
194+
and hasattr(client, "_bucket_metadata_cache")
195+
and client._bucket_metadata_cache
196+
):
197+
try:
198+
client._bucket_metadata_cache.check_and_evict(bucket_name)
199+
except Exception as e:
200+
_logger.debug(
201+
f"Failed cache eviction on 404 in create_trace_span_helper: {e}"
202+
)
203+
raise
204+
205+
140206
class _PropertyMixin(object):
141207
"""Abstract mixin for cloud storage classes with associated properties.
142208
@@ -185,6 +251,42 @@ def _require_client(self, client):
185251
client = self.client
186252
return client
187253

254+
@contextmanager
255+
def _create_trace_span(self, name, attributes=None, **kwargs):
256+
from google.cloud.storage.blob import Blob
257+
from google.cloud.storage.bucket import Bucket
258+
259+
if isinstance(self, Bucket):
260+
client = self.client
261+
bucket_name = self.name
262+
elif isinstance(self, Blob):
263+
bucket = getattr(self, "bucket", None)
264+
client = (
265+
getattr(bucket, "client", None)
266+
if bucket and hasattr(bucket, "client")
267+
else None
268+
)
269+
bucket_name = getattr(bucket, "name", None) if bucket else None
270+
else:
271+
client = None
272+
bucket_name = None
273+
274+
if callable(bucket_name):
275+
try:
276+
bucket_name = bucket_name()
277+
except Exception as e:
278+
_logger.debug(
279+
f"Failed callable bucket_name resolution in _create_trace_span: {e}"
280+
)
281+
282+
client_override = kwargs.pop("client", None)
283+
active_client = client_override or client
284+
285+
with create_trace_span_helper(
286+
active_client, bucket_name, name, attributes=attributes, **kwargs
287+
) as span:
288+
yield span
289+
188290
def _encryption_headers(self):
189291
"""Return any encryption headers needed to fetch the object.
190292

packages/google-cloud-storage/google/cloud/storage/_http.py

Lines changed: 52 additions & 3 deletions

0 commit comments

Comments
 (0)