Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbfs paths not supported #1376

Closed
MrPowers opened this issue May 17, 2023 · 6 comments
Closed

dbfs paths not supported #1376

MrPowers opened this issue May 17, 2023 · 6 comments
Labels
enhancement New feature or request on-hold Issues and Pull Requests that are on hold for some reason

Comments

@MrPowers
Copy link
Contributor

Environment

Delta-rs version: 0.9.0

Binding: Python

Environment:

  • Cloud provider: Databricks
  • OS: ?
  • Other: ?

Bug

What happened: Tried to instantiate a DeltaTable from a DBFS path, like this: deltalake.DeltaTable("dbfs:/some-thing/some_dir")

What you expected to happen: I expected this to work. This works: spark.read.format("delta").load("dbfs:/some-thing/some_dir").show()

How to reproduce it: Create a Delta table in Databricks with a DBFS path and then try to instantiate a deltalake.DeltaTable. Should be relatively easy to reproduce.

More details: N/A

@MrPowers MrPowers added the bug Something isn't working label May 17, 2023
@rtyler
Copy link
Member

rtyler commented May 17, 2023

@MrPowers to the best of my knowledge there is not a REST API for DBFS or any such open "file system provider" for what DBFS actually. Does Databricks make it possible for third party interoperability with DBFS

@MrPowers
Copy link
Contributor Author

@rtyler - yea, I'm not sure. Perhaps I have to figure out another way to get the path to the data.

@rtyler rtyler added enhancement New feature or request on-hold Issues and Pull Requests that are on hold for some reason and removed bug Something isn't working labels Sep 15, 2023
@rtyler rtyler mentioned this issue Sep 15, 2023
@Lundez
Copy link

Lundez commented Sep 29, 2023

I have the same issue when using a mounted ADSL2 in a Azure ML Studio job. I wish to write, and it fails on writing the log. The parquet-file is correctly written.

This is ADSL2 with Hierarchial Storage.

@ion-elgreco
Copy link
Collaborator

I have the same issue when using a mounted ADSL2 in a Azure ML Studio job. I wish to write, and it fails on writing the log. The parquet-file is correctly written.

This is ADSL2 with Hierarchial Storage.

I also ran into this issue with AML, writing to mounted storage is not supported.

The way I do it now is I don't mount but write to the adls2 container directly.

@Lundez
Copy link

Lundez commented Oct 9, 2023

I have the same issue when using a mounted ADSL2 in a Azure ML Studio job. I wish to write, and it fails on writing the log. The parquet-file is correctly written.

This is ADSL2 with Hierarchial Storage.

I also ran into this issue with AML, writing to mounted storage is not supported.

The way I do it now is I don't mount but write to the adls2 container directly.

I solved it the same way, but that means my jobs aren't as clear (output is not job output but a hidden API call) 😅

Thanks for responding!

ion-elgreco pushed a commit that referenced this issue Mar 15, 2024
…rd link (#1868)

compatible to write to local file systems that do not support hard link.

# Description

When we write to the local file system, sometimes hard link is not
supported, such as blobfuse, goofys, s3fs, so deal with it with
compatibility.

It is important to note that:
There is another problem with blobfuse, that is, when it comes to
rename, it will report errors. Because rename did not release the file
handle before.
See here for details: #1765

Arrow-rs is required to cooperate with the modification, for example:
https://github.com/GlareDB/arrow-rs/pull/2/files
Because object_store has been upgraded to 0.8, there are a lot of
breaking change, so I haven't changed this one for the time being. Will
fix it after upgrading to 0.8
#1858

# Related Issue(s)

#1765
 
#1376 

# Documentation
@ion-elgreco
Copy link
Collaborator

Should work now for mounted storage with change by #1868

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request on-hold Issues and Pull Requests that are on hold for some reason
Projects
None yet
Development

No branches or pull requests

4 participants