Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defaulting to anonymous Google Storage client fails in Google Colab notebooks #507

Open
chrisjkuch opened this issue Mar 3, 2025 · 0 comments

Comments

@chrisjkuch
Copy link

else:
try:
self.client = StorageClient()
except DefaultCredentialsError:
self.client = StorageClient.create_anonymous_client()

In this code block, StorageClient() succeeds because CoLab notebooks automatically use the user's default credentials if they're logged in.

You end up getting a RefreshError:

---------------------------------------------------------------------------
RefreshError                              Traceback (most recent call last)
[<ipython-input-33-10663fe929d6>](https://localhost:8080/#) in <cell line: 0>()
     11 from cloudpathlib import GSPath
     12 path = GSPath("gs://some-public-bucket/some-public.file")
---> 13 path.download_to(".")

19 frames
[/usr/local/lib/python3.11/dist-packages/cloudpathlib/cloudpath.py](https://localhost:8080/#) in download_to(self, destination)
   1048         destination = Path(destination)
   1049 
-> 1050         if not self.exists():
   1051             raise CloudPathNotExistsError(f"Cannot download because path does not exist: {self}")
   1052 

[/usr/local/lib/python3.11/dist-packages/cloudpathlib/cloudpath.py](https://localhost:8080/#) in exists(self)
    429 
    430     def exists(self) -> bool:
--> 431         return self.client._exists(self)
    432 
    433     def is_dir(self, follow_symlinks=True) -> bool:

[/usr/local/lib/python3.11/dist-packages/cloudpathlib/gs/gsclient.py](https://localhost:8080/#) in _exists(self, cloud_path)
    173             return self.client.bucket(cloud_path.bucket).exists()
    174 
--> 175         return self._is_file_or_dir(cloud_path) in ["file", "dir"]
    176 
    177     def _list_dir(self, cloud_path: GSPath, recursive=False) -> Iterable[Tuple[GSPath, bool]]:

[/usr/local/lib/python3.11/dist-packages/cloudpathlib/gs/gsclient.py](https://localhost:8080/#) in _is_file_or_dir(self, cloud_path)
    150 
    151         bucket = self.client.bucket(cloud_path.bucket)
--> 152         blob = bucket.get_blob(cloud_path.blob)
    153 
    154         if blob is not None:

[/usr/lib/python3.11/contextlib.py](https://localhost:8080/#) in inner(*args, **kwds)
     79         def inner(*args, **kwds):
     80             with self._recreate_cm():
---> 81                 return func(*args, **kwds)
     82         return inner
     83 

[/usr/local/lib/python3.11/dist-packages/google/cloud/storage/bucket.py](https://localhost:8080/#) in get_blob(self, blob_name, client, encryption_key, generation, if_etag_match, if_etag_not_match, if_generation_match, if_generation_not_match, if_metageneration_match, if_metageneration_not_match, timeout, retry, soft_deleted, **kwargs)
   1341             #       Batch.finish() is called, the resulting `NotFound` will be
   1342             #       raised.
-> 1343             blob.reload(
   1344                 client=client,
   1345                 timeout=timeout,

[/usr/local/lib/python3.11/dist-packages/google/cloud/storage/_helpers.py](https://localhost:8080/#) in reload(self, client, projection, if_etag_match, if_etag_not_match, if_generation_match, if_generation_not_match, if_metageneration_match, if_metageneration_not_match, timeout, retry, soft_deleted)
    301             headers, if_etag_match=if_etag_match, if_etag_not_match=if_etag_not_match
    302         )
--> 303         api_response = client._get_resource(
    304             self.path,
    305             query_params=query_params,

[/usr/local/lib/python3.11/dist-packages/google/cloud/storage/client.py](https://localhost:8080/#) in _get_resource(self, path, query_params, headers, timeout, retry, _target_object)
    472                 If the bucket is not found.
    473         """
--> 474         return self._connection.api_request(
    475             method="GET",
    476             path=path,

[/usr/local/lib/python3.11/dist-packages/google/cloud/storage/_http.py](https://localhost:8080/#) in api_request(self, *args, **kwargs)
     88                 if retry:
     89                     call = retry(call)
---> 90             return call()

[/usr/local/lib/python3.11/dist-packages/google/api_core/retry/retry_unary.py](https://localhost:8080/#) in retry_wrapped_func(*args, **kwargs)
    291                 self._initial, self._maximum, multiplier=self._multiplier
    292             )
--> 293             return retry_target(
    294                 target,
    295                 self._predicate,

[/usr/local/lib/python3.11/dist-packages/google/api_core/retry/retry_unary.py](https://localhost:8080/#) in retry_target(target, predicate, sleep_generator, timeout, on_error, exception_factory, **kwargs)
    151         except Exception as exc:
    152             # defer to shared logic for handling errors
--> 153             _retry_error_helper(
    154                 exc,
    155                 deadline,

[/usr/local/lib/python3.11/dist-packages/google/api_core/retry/retry_base.py](https://localhost:8080/#) in _retry_error_helper(exc, deadline, next_sleep, error_list, predicate_fn, on_error_fn, exc_factory_fn, original_timeout)
    210             original_timeout,
    211         )
--> 212         raise final_exc from source_exc
    213     if on_error_fn is not None:
    214         on_error_fn(exc)

[/usr/local/lib/python3.11/dist-packages/google/api_core/retry/retry_unary.py](https://localhost:8080/#) in retry_target(target, predicate, sleep_generator, timeout, on_error, exception_factory, **kwargs)
    142     for sleep in sleep_generator:
    143         try:
--> 144             result = target()
    145             if inspect.isawaitable(result):
    146                 warnings.warn(_ASYNC_RETRY_WARNING)

[/usr/local/lib/python3.11/dist-packages/google/cloud/_http/__init__.py](https://localhost:8080/#) in api_request(self, method, path, query_params, data, content_type, headers, api_base_url, api_version, expect_json, _target_object, timeout, extra_api_info)
    480             content_type = "application/json"
    481 
--> 482         response = self._make_request(
    483             method=method,
    484             url=url,

[/usr/local/lib/python3.11/dist-packages/google/cloud/_http/__init__.py](https://localhost:8080/#) in _make_request(self, method, url, data, content_type, headers, target_object, timeout, extra_api_info)
    339         headers["User-Agent"] = self.user_agent
    340 
--> 341         return self._do_request(
    342             method, url, headers, data, target_object, timeout=timeout
    343         )

[/usr/local/lib/python3.11/dist-packages/google/cloud/_http/__init__.py](https://localhost:8080/#) in _do_request(self, method, url, headers, data, target_object, timeout)
    377         :returns: The HTTP response.
    378         """
--> 379         return self.http.request(
    380             url=url, method=method, headers=headers, data=data, timeout=timeout
    381         )

[/usr/local/lib/python3.11/dist-packages/google/auth/transport/requests.py](https://localhost:8080/#) in request(self, method, url, data, headers, max_allowed_time, timeout, **kwargs)
    535 
    536         with TimeoutGuard(remaining_time) as guard:
--> 537             self.credentials.before_request(auth_request, method, url, request_headers)
    538         remaining_time = guard.remaining_timeout
    539 

[/usr/local/lib/python3.11/dist-packages/google/auth/credentials.py](https://localhost:8080/#) in before_request(self, request, method, url, headers)
    226             self._non_blocking_refresh(request)
    227         else:
--> 228             self._blocking_refresh(request)
    229 
    230         metrics.add_metric_header(headers, self._metric_header_for_usage())

[/usr/local/lib/python3.11/dist-packages/google/auth/credentials.py](https://localhost:8080/#) in _blocking_refresh(self, request)
    189     def _blocking_refresh(self, request):
    190         if not self.valid:
--> 191             self.refresh(request)
    192 
    193     def _non_blocking_refresh(self, request):

[/usr/local/lib/python3.11/dist-packages/google/auth/compute_engine/credentials.py](https://localhost:8080/#) in refresh(self, request)
    132         except exceptions.TransportError as caught_exc:
    133             new_exc = exceptions.RefreshError(caught_exc)
--> 134             raise new_exc from caught_exc
    135 
    136     @property

RefreshError: ("Failed to retrieve http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true from the Google Compute Engine metadata service. Status: 404 Response:\nb''", <google.auth.transport.requests._Response object at 0x7a5ed126d850>)

A few possible options here (or can close as #wontfix)

  • There's a workaround to create an anonymous client manually. We may want to document this somehow, or warn the user in Colab environments / in the docs that this might be an issue
    from cloudpathlib import GSClient
    from google.cloud.storage import Client as StorageClient
    anonymous_gclient = StorageClient.create_anonymous_client()
    client = GSClient(storage_client=anonymous_gclient)
    path = client.GSPath("gs://some-public-bucket/some-public.file")
    path.download_to(".")
  • Except a RefreshError in addition to a DefaultCredentialsError as a condition to attempt creating an anonymous client
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant