Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Max retry exceeded when using DeltaTable with Azure Blob Storage #2669

Closed
erickfigueiredo opened this issue Jul 15, 2024 · 3 comments
Closed
Labels
bug Something isn't working

Comments

@erickfigueiredo
Copy link

Environment

Delta-rs version: 0.16.0

Environment:

  • OS: Windows 11
  • Python: 3.11.8
  • Pyarrow: 13.0.0
  • adlfs: 2024.4.1

Bug

What happened: I'm facing an issue when using the deltalake lib to save / loading data to Azure Blob Storage. Sometimes, I'm getting the following error:

DatasetError: Failed while saving data to data set CustomDeltaTableDataset(file_example).
Failed to parse parquet: Parquet error: AsyncChunkReader::get_bytes error:
Generic MicrosoftAzure error: Error after 10 retries in 2.196683949s, max_retries:10, 
retry_timeout:180s, source:error sending request for url 
(https://<address>/file.parquet):
 error trying to connect: dns error: failed to lookup address information: Name or service not known

What you expected to happen: I expected to load the data from the Delta table and convert it to a Pandas DataFrame without any errors.

How to reproduce it:

from deltalake import DeltaTable


datalake_info = {
    'account_name': <account>,
    'client_id': <cli_id>,
    'tenant_id': <tenant_id>,
    'client_secret': <secret>,
    'timeout': '100000s'
}

# Load data from the delta table
dt = DeltaTable("abfs://<azure_address>", storage_options=datalake_info)

More details: I was looking for a parameter like max_retries but couldn't find anything related. Does anyone know a solution or workaround for this issue? I didn't find an approach in the docs: https://docs.rs/object_store/latest/object_store/azure/enum.AzureConfigKey.html

@erickfigueiredo erickfigueiredo added the bug Something isn't working label Jul 15, 2024
@erickfigueiredo
Copy link
Author

Has anyone ever faced this problem?

@martindut
Copy link

I'm also getting this error lately:
Generic MicrosoftAzure error: Error after 10 retries in 3.296507315s, max_retries:10, retry_timeout:180s, source:HTTP status server error (503 Service Unavailable) for url (https://onelake.blob.fabric.microsoft.com/xxxxxxxxxxxxxx/Tables/dddddddddddd/_delta_log/_last_checkpoint).

The path I am using is: abfss://@onelake.dfs.fabric.microsoft.com/.Lakehouse/Tables/<table_name>

@djouallah
Copy link

this should work, using 0.18.2 and above

from azure.identity import ClientSecretCredential, AuthenticationRequiredError
credential = ClientSecretCredential(
                client_id = "appId",
                client_secret="secret",
                tenant_id= "tenantId"
                )
access_token =       credential.get_token("https://storage.azure.com/.default").token
storage_options=     {"bearer_token": access_token, "use_fabric_endpoint": "true"}
from deltalake import DeltaTable
scada = DeltaTable('abfss://[email protected]/Lakehousename.Lakehouse/Tables/xxxxx',storage_options=storage_options)

@rtyler rtyler closed this as completed Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants