Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem writing to azure file share. #3053

Closed
PeterThramkrongart opened this issue Dec 12, 2024 · 8 comments
Closed

Problem writing to azure file share. #3053

PeterThramkrongart opened this issue Dec 12, 2024 · 8 comments
Labels
bug Something isn't working

Comments

@PeterThramkrongart
Copy link

PeterThramkrongart commented Dec 12, 2024

Environment

Python 3.11

Delta-rs version:
deltalake==0.21.0

Environment:

  • Cloud provider:
    -Azure fileshare
  • OS:
  • windows and debian
  • Other:

Bug

I have trouble writing to file share. I'm forced to set storage_options={"allow_unsafe_rename":"true"}.

What happened:
This is the error I get when I run it in kubernetes: File "/usr/local/lib/python3.11/site-packages/deltalake/writer.py", line 323, in write_deltalake\n write_deltalake_rust(\n', 'OSError: Generic LocalFileSystem error: Unable to rename file: Operation not supported (os error 95)\n']

This is the error I get when I run it locally (with mounted drive): ```
write_to_delta()
File "c:\Users\PeterThramkrongart\RaptorScripts\raw-tracking-delta-converter\show_write_problem.py", line 38, in write_to_delta
write_deltalake(
File "C:\Users\PeterThramkrongart\anaconda3\envs\delta-test-env\Lib\site-packages\deltalake\writer.py", line 323, in write_deltalake
write_deltalake_rust(
OSError: Generic LocalFileSystem error: Unable to rename file: The parameter is incorrect. (os error 87)

the parquet files are written and partitioned correctly, but the deltalog is never written.

What you expected to happen:

I expected to be able to write without any problems

How to reproduce it:

import os
import pandas as pd
from deltalake import write_deltalake
from dotenv import load_dotenv

# Load environment variables
load_dotenv(override=True)

def write_to_delta():
    # Get file share path from environment variable
    file_share_path = os.environ.get("FILE_SHARE_PATH")
    if not file_share_path:
        raise ValueError("FILE_SHARE_PATH environment variable not set")
    
    # Define delta table path
    delta_table_path = os.path.join(
        file_share_path,
        "delta-problem-demo",
    )

    if not os.path.exists(delta_table_path):
        os.makedirs(delta_table_path)
    
    # Loop to append data
    for i in range(10):
        # Create sample data with dynamic day and hour
        data = {
            'user_id': [f'user{i}', f'user{i+1}', f'user{i+2}'],
            'event_type': ['view', 'click', 'purchase'],
            'product_id': [f'prod{i}', f'prod{i+1}', f'prod{i+2}'],
            'day': [f'2024-03-{15+i}', f'2024-03-{15+i+1}', f'2024-03-{15+i+2}'],
            'hour': [f'{12+i}', f'{12+i+1}', f'{12+i+2}']
        }
        df = pd.DataFrame(data)

        
        # Write DataFrame to Delta table
        write_deltalake(
            delta_table_path,
            df,
            mode="append",
            partition_by=["day"]
        )
        print(f"Data written to Delta table at: {delta_table_path}")


  if __name__ == "__main__":
      write_to_delta()´´´
@ion-elgreco
Copy link
Collaborator

It might be an easy fix, I think we just need to add a create flag

@PeterThramkrongart
Copy link
Author

@ion-elgreco I'm not sure I understand. Can you elaborate?

@PeterThramkrongart
Copy link
Author

PeterThramkrongart commented Dec 13, 2024

This doesn't make a difference if that is what you mean.

        write_deltalake(
            delta_table_path,
            df,
            mode="append",
            partition_by=["day"],
            storage_options={"create": "true"},
        )´´´

@ion-elgreco
Copy link
Collaborator

ion-elgreco commented Dec 13, 2024

This doesn't make a difference if that is what you mean.

        write_deltalake(
            delta_table_path,
            df,
            mode="append",
            partition_by=["day"],
            storage_options={"create": "true"},
        )´´´

No that's not what I mean.

Can you run your script and before importing deltalake, set the environment variable: RUST_LOG=DEBUG, then share those logs here

@PeterThramkrongart
Copy link
Author

PeterThramkrongart commented Dec 16, 2024

@ion-elgreco Here is the output when running locally on windows:

[2024-12-16T08:27:16Z DEBUG deltalake_core::table::builder] creating table builder with file://XXXXX.file.core.windows.net/dev/delta-problem-demo
[2024-12-16T08:27:16Z DEBUG deltalake_core::table::builder] build_storage() with file://XXXXX.file.core.windows.net/dev/delta-problem-demo
[2024-12-16T08:27:16Z DEBUG deltalake_core::table::builder] Loading a logstore based off the location: Url { scheme: "file", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("XXXXX.file.core.windows.net")), port: None, path: "/dev/delta-problem-demo", query: None, fragment: None }
[2024-12-16T08:27:16Z DEBUG deltalake_core::logstore] Found a storage provider for file:/// (file://XXXXX.file.core.windows.net/dev/delta-problem-demo)
[2024-12-16T08:27:16Z DEBUG deltalake_core::logstore] Found a logstore provider for file:///
[2024-12-16T08:27:17Z DEBUG deltalake_core::table::builder] creating table builder with file://XXXXX.file.core.windows.net/dev/delta-problem-demo
[2024-12-16T08:27:17Z DEBUG deltalake_core::table::builder] build_storage() with file://XXXXX.file.core.windows.net/dev/delta-problem-demo
[2024-12-16T08:27:17Z DEBUG deltalake_core::table::builder] Loading a logstore based off the location: Url { scheme: "file", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("XXXXX.file.core.windows.net")), port: None, path: "/dev/delta-problem-demo", query: None, fragment: None }
[2024-12-16T08:27:17Z DEBUG deltalake_core::logstore] Found a storage provider for file:/// (file://XXXXX.file.core.windows.net/dev/delta-problem-demo)  
[2024-12-16T08:27:17Z DEBUG deltalake_core::logstore] Found a logstore provider for file:///
[2024-12-16T08:27:17Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2024-12-16T08:27:17Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2024-12-16T08:27:17Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.

Here is the output when running on kubernetes with debian:

devuser_delta_table_conversion@app-0:~$ python show_write_problem.py 
[2024-12-16T08:42:20Z DEBUG deltalake_core::table::builder] creating table builder with file:///mnt/azurefile/delta-problem-demo
[2024-12-16T08:42:20Z DEBUG deltalake_core::table::builder] build_storage() with file:///mnt/azurefile/delta-problem-demo
[2024-12-16T08:42:20Z DEBUG deltalake_core::table::builder] Loading a logstore based off the location: Url { scheme: "file", cannot_be_a_base: false, username: "", password: None, host: None, port: None, path: "/mnt/azurefile/delta-problem-demo", query: None, fragment: None }
[2024-12-16T08:42:20Z DEBUG deltalake_core::logstore] Found a storage provider for file:/// (file:///mnt/azurefile/delta-problem-demo)
[2024-12-16T08:42:20Z DEBUG deltalake_core::logstore] Found a logstore provider for file:///
[2024-12-16T08:42:20Z DEBUG deltalake_core::table::builder] creating table builder with file:///mnt/azurefile/delta-problem-demo
[2024-12-16T08:42:20Z DEBUG deltalake_core::table::builder] build_storage() with file:///mnt/azurefile/delta-problem-demo
[2024-12-16T08:42:20Z DEBUG deltalake_core::table::builder] Loading a logstore based off the location: Url { scheme: "file", cannot_be_a_base: false, username: "", password: None, host: None, port: None, path: "/mnt/azurefile/delta-problem-demo", query: None, fragment: None }
[2024-12-16T08:42:20Z DEBUG deltalake_core::logstore] Found a storage provider for file:/// (file:///mnt/azurefile/delta-problem-demo)
[2024-12-16T08:42:20Z DEBUG deltalake_core::logstore] Found a logstore provider for file:///
[2024-12-16T08:42:20Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2024-12-16T08:42:20Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2024-12-16T08:42:20Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.

@ion-elgreco
Copy link
Collaborator

ion-elgreco commented Dec 16, 2024

@PeterThramkrongart you didn't set allow_unsafe_rename, you need to do that since you are using mounted storage. Azure file share simply doesn't support these rename requests.

Since you are on azure, you should write directly to adls

@ion-elgreco ion-elgreco closed this as not planned Won't fix, can't repro, duplicate, stale Dec 16, 2024
@ion-elgreco
Copy link
Collaborator

@PeterThramkrongart you can create a clarifying question in https://github.com/apache/arrow-rs/issues, they might be able to explain more in depth why Azure File Share doesn't support certain FS operations

@PeterThramkrongart
Copy link
Author

Thanks. I don't think I need to hunt this down any further. Ideally my company wanted to use Azure fs because of cost considerations. I just needed to confirm that my problem with writing to delta isn't due to my coding errors, but rather a lack of support for mounted drives. Adls is gonna have do for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants