Skip to content

Conversation

ghostiek
Copy link

@ghostiek ghostiek commented Sep 19, 2025

What changes are proposed in this pull request?

Currently databricks-sdk-py does not handle recursive list properly. For instance if your file structure exists with nested directories as such

└── Volumes/
    └── DirA/
        └── DirB/
            └── DirC/
                ├── FileA.csv
                ├── FileB.csv
                └── FileC.csv
 w.dbfs.list("/Volumes/DirA", recursive=True) # will error out at some point in the loop

Here is the stack trace

  File "/home/ptawil/test-dbx-sdk/.venv/lib/python3.13/site-packages/databricks/sdk/mixins/files.py", line 641, in list
    yield from p.list(recursive=recursive)
  File "/home/ptawil/test-dbx-sdk/.venv/lib/python3.13/site-packages/databricks/sdk/mixins/files.py", line 488, in list
    for file in self._api.list_directory_contents(next_path.as_string):
                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^
  File "/home/ptawil/test-dbx-sdk/.venv/lib/python3.13/site-packages/databricks/sdk/service/files.py", line 981, in list_directory_contents
    json = self._api.do(
        "GET",
    ...<2 lines>...
        headers=headers,
    )
  File "/home/ptawil/test-dbx-sdk/.venv/lib/python3.13/site-packages/databricks/sdk/core.py", line 85, in do
    return self._api_client.do(
           ~~~~~~~~~~~~~~~~~~~^
        method=method,
        ^^^^^^^^^^^^^^
    ...<8 lines>...
        response_headers=response_headers,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/home/ptawil/test-dbx-sdk/.venv/lib/python3.13/site-packages/databricks/sdk/_base_client.py", line 196, in do
    response = call(
        method,
    ...<7 lines>...
        auth=auth,
    )
  File "/home/ptawil/test-dbx-sdk/.venv/lib/python3.13/site-packages/databricks/sdk/retries.py", line 57, in wrapper
    raise err
  File "/home/ptawil/test-dbx-sdk/.venv/lib/python3.13/site-packages/databricks/sdk/retries.py", line 36, in wrapper
    return func(*args, **kwargs)
  File "/home/ptawil/test-dbx-sdk/.venv/lib/python3.13/site-packages/databricks/sdk/_base_client.py", line 298, in _perform
    raise error from None
databricks.sdk.errors.platform.NotFound: The directory being accessed is not found.

It seems that the loop will try to list the content directory of Volumes/DirA/DirC/ at some point. Instead of Volumes/DirA/DirB/DirC/

The issue stems from the list function in _VolumePath in databricks/sdk/mixins/files.py. While looping it tries to replace the last element in the deque with self.child(file.name) unfortunately as the base path of _VolumePath is never modified instead of appending DirB/DirC to that path, it will simply append DirC to the base volume path.

Specifically, try to answer the two following questions:

WHAT changes are being made in the PR?

  • Fixing the path appended into the deque when recursive listing is enabled

WHY are these changes needed? This should provide the context that the
reader might be missing. For example, were there any decisions behind the
change that are not reflected in the code itself?

  • Fixes a bug where the sdk cannot traverse nested directories.

How is this tested?

Describe any tests you have done; especially if test tests are not part of
the unit tests (e.g. local tests).

I created a new project and ran the following code

from databricks.sdk import WorkspaceClient


def main():
    w = WorkspaceClient()
    result = w.dbfs.list("/Volumes/DirA", recursive=True)
    
    li_result = list(result)
    for i in li_result:
        print(i)


if __name__ == "__main__":
    main()

I cloned databricks-sdk-py and imported it in editable mode and ran it before and after. With the change it no longer errors out and correctly displays all files in the nested directories

Copy link

If integration tests don't run automatically, an authorized user can run them manually by following the instructions below:

Trigger:
go/deco-tests-run/sdk-py

Inputs:

  • PR number: 1050
  • Commit SHA: 5d987095fea3d7c156a0914f83fa370d5ecb2c38

Checks will be approved automatically on success.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant