Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support writes to mounted local path with blobfuse #1765

Closed
RobinLin666 opened this issue Oct 24, 2023 · 3 comments
Closed

Support writes to mounted local path with blobfuse #1765

RobinLin666 opened this issue Oct 24, 2023 · 3 comments
Labels
enhancement New feature or request

Comments

@RobinLin666
Copy link
Contributor

Description

Use Case
Re-open an issue for tracking #1418 (comment). Thank you.

import pandas as pd
from deltalake.writer import write_deltalake

df = pd.DataFrame(
    {
        "a": [1, 2, 3, 4, 5],
        "fruits": ["banana", "orange", "mango", "apple", "banana"],
    }
)

write_deltalake('/lakehouse/default/Files/fruits', df,mode='overwrite')
image

Related Issue(s)
#1418 (comment)
GlareDB/glaredb#1809

@RobinLin666 RobinLin666 added the enhancement New feature or request label Oct 24, 2023
@RobinLin666
Copy link
Contributor Author

Hi @scsmithr , I tried to solve the rename issue, but got another issue when write to mounted path. Could you please help to check?

from deltalake import DeltaTable, write_deltalake
import pandas as pd
df = pd.DataFrame({"id": [1, 2], "value": ["foo", "boo"]})
write_deltalake("/synfs/test/delta/data/delta3", df)

image
image

@RobinLin666 RobinLin666 changed the title Support writes to blobfuse Support writes to mounted local path with blobfuse Oct 25, 2023
@scsmithr
Copy link
Contributor

For what it's worth, I was testing with this change to object store: https://github.com/GlareDB/arrow-rs/pull/2/files

The first diff chunk explicitly drops the file so that the metadata gets flushed to blobstore before the rename.

The second chunk I'm not sure is necessary. I was just trying out explicitly syncing the file, but I don't recall it actually fixing the issue.

@RobinLin666
Copy link
Contributor Author

Thanks @scsmithr, I tried to mitigate the issue from blobfuse side.

for #1 rename issue can be mitigated. I upload the tmp file before renaming.

for #2 you said blobfuse doesn't support hard linking, copy_if_not_exists will fail. I don't quite understand how this aspect is affected and whether it is the cause of the error I encountered above. OSError: Generic DeltaLocalObjectStore error: Function not implemented (os error 38) If possible, please let me know where it need a hard link and we can think about any workarounds available.

I saw the same error when writing to DBFS, perhaps it's one of the reasons?
#987

ion-elgreco pushed a commit that referenced this issue Mar 15, 2024
…rd link (#1868)

compatible to write to local file systems that do not support hard link.

# Description

When we write to the local file system, sometimes hard link is not
supported, such as blobfuse, goofys, s3fs, so deal with it with
compatibility.

It is important to note that:
There is another problem with blobfuse, that is, when it comes to
rename, it will report errors. Because rename did not release the file
handle before.
See here for details: #1765

Arrow-rs is required to cooperate with the modification, for example:
https://github.com/GlareDB/arrow-rs/pull/2/files
Because object_store has been upgraded to 0.8, there are a lot of
breaking change, so I haven't changed this one for the time being. Will
fix it after upgrading to 0.8
#1858

# Related Issue(s)

#1765
 
#1376 

# Documentation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants