Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drop_storage() errors when using S3 interface for GCS #2148

Open
jorritsandbrink opened this issue Dec 14, 2024 · 0 comments
Open

drop_storage() errors when using S3 interface for GCS #2148

jorritsandbrink opened this issue Dec 14, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@jorritsandbrink
Copy link
Collaborator

dlt version

1.4.1

Describe the problem

drop_storage() errors when using S3 interface for GCS:

OSError: [Errno 38] A header or query you provided requested a function that is not implemented.

Expected behavior

No error.

Steps to reproduce

Reproduce with:

import dlt

from dlt.destinations import filesystem
from dlt.common.utils import custom_environ


pipe = dlt.pipeline(destination=filesystem("s3://ci-test-bucket"))
creds = {
    "CREDENTIALS__AWS_ACCESS_KEY_ID": "GOOG1EOM...3RQ",
    "CREDENTIALS__AWS_SECRET_ACCESS_KEY": "78lO...tcE",
    "CREDENTIALS__PROJECT_ID": "chat...ci",
    "CREDENTIALS__ENDPOINT_URL": "https://storage.googleapis.com",
}
with custom_environ(creds):
    pipe.run([{"foo": "bar"}], table_name="foo")
    with pipe.destination_client() as client:
        client.drop_storage()

Traceback:

Traceback (most recent call last):
  File "/home/j/.cache/pypoetry/virtualenvs/dlt-2tG_aB2A-py3.9/lib/python3.9/site-packages/s3fs/core.py", line 113, in _error_wrapper
    return await func(*args, **kwargs)
  File "/home/j/.cache/pypoetry/virtualenvs/dlt-2tG_aB2A-py3.9/lib/python3.9/site-packages/aiobotocore/client.py", line 408, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (NotImplemented) when calling the DeleteObjects operation: A header or query you provided requested a function that is not implemented.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/j/repos/dlt/mre.py", line 18, in <module>
    client.drop_storage()
  File "/home/j/repos/dlt/dlt/destinations/impl/filesystem/filesystem.py", line 305, in drop_storage
    self.fs_client.rm(self.dataset_path, recursive=True)
  File "/home/j/.cache/pypoetry/virtualenvs/dlt-2tG_aB2A-py3.9/lib/python3.9/site-packages/fsspec/asyn.py", line 118, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/home/j/.cache/pypoetry/virtualenvs/dlt-2tG_aB2A-py3.9/lib/python3.9/site-packages/fsspec/asyn.py", line 103, in sync
    raise return_result
  File "/home/j/.cache/pypoetry/virtualenvs/dlt-2tG_aB2A-py3.9/lib/python3.9/site-packages/fsspec/asyn.py", line 56, in _runner
    result[0] = await coro
  File "/home/j/.cache/pypoetry/virtualenvs/dlt-2tG_aB2A-py3.9/lib/python3.9/site-packages/s3fs/core.py", line 1924, in _rm
    out = await _run_coros_in_chunks(
  File "/home/j/.cache/pypoetry/virtualenvs/dlt-2tG_aB2A-py3.9/lib/python3.9/site-packages/fsspec/asyn.py", line 254, in _run_coros_in_chunks
    await asyncio.gather(*chunk, return_exceptions=return_exceptions),
  File "/usr/lib/python3.9/asyncio/tasks.py", line 442, in wait_for
    return await fut
  File "/home/j/.cache/pypoetry/virtualenvs/dlt-2tG_aB2A-py3.9/lib/python3.9/site-packages/s3fs/core.py", line 1898, in _bulk_delete
    out = await self._call_s3(
  File "/home/j/.cache/pypoetry/virtualenvs/dlt-2tG_aB2A-py3.9/lib/python3.9/site-packages/s3fs/core.py", line 348, in _call_s3
    return await _error_wrapper(
  File "/home/j/.cache/pypoetry/virtualenvs/dlt-2tG_aB2A-py3.9/lib/python3.9/site-packages/s3fs/core.py", line 140, in _error_wrapper
    raise err
OSError: [Errno 38] A header or query you provided requested a function that is not implemented.

"Native" GCS interface does not have this problem, i.e. this works:

import dlt

from dlt.destinations import filesystem
from dlt.common.utils import custom_environ

pipe = dlt.pipeline(destination=filesystem("gs://ci-test-bucket"))
creds = {
    "CREDENTIALS__CLIENT_EMAIL": "cha...com",
    "CREDENTIALS__PRIVATE_KEY": "-----BEGIN PRIVATE KEY-----\nM...=\n-----END PRIVATE KEY-----\n",
    "CREDENTIALS__PRIVATE_KEY_ID": "066...d68",
    "CREDENTIALS__PROJECT_ID": "cha...-ci",
}
with custom_environ(creds):
    pipe.run([{"foo": "bar"}], table_name="foo")
    with pipe.destination_client() as client:
        client.drop_storage()

Operating system

Linux

Runtime environment

Local

Python version

3.9

dlt data source

No response

dlt destination

Filesystem & buckets

Other deployment details

No response

Additional information

No response

@jorritsandbrink jorritsandbrink added the bug Something isn't working label Dec 14, 2024
jorritsandbrink added a commit that referenced this issue Dec 14, 2024
rudolfix pushed a commit that referenced this issue Dec 15, 2024
* make duckdb handle iceberg table with nested types

* replace duckdb views for iceberg tables

* remove unnecessary context closing and opening

* replace duckdb views for abfss protocol

* restore original destination for write path

* use dev_mode to work around leftover data from previous tests

leftover data caused by #2148
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Todo
Development

No branches or pull requests

1 participant