Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize workspace tree building #149

Conversation

bloodearnest
Copy link
Member

Previously, we walked the file tree as we recursed down, calling
iterdir(), is_dir(), and is_file() on the way. For large workspace
trees, this was very slow.

Instead, we now get a list of all relative paths at the start,
and then walk down the paths, just processing them as strings, not
hitting disk. This is similar to how we built the tree for each
filegroup before, from a flat list of paths in the db. So that method
had been extracted as get_path_tree(), made more general, and is now
used by both get_workspace_files and get_request_files.

As we do not want to call is_dir() to detect directories, we instead
use a heurstic based on the path.

  • if it has descendant paths, it is a directory
  • if it has no descendants and no suffix, it is a directory.

This heuristic is not perfect - an empty directory with a . in the
name will be misclassified as a file. However, this case seems very
rare, and I have added some protection in this case so we fail
gracefully.

Previously, we walked the file tree as we recursed down, calling
`iterdir()`, `is_dir()`, and `is_file()` on the way. For large workspace
trees, this was very slow.

Instead, we now get a list of all relative *paths* at the start,
and then walk down the *paths*, just processing them as strings, not
hitting disk.  This is similar to how we built the tree for each
filegroup before, from a flat list of paths in the db. So that method
had been extracted as get_path_tree(), made more general, and is now
used by both get_workspace_files and get_request_files.

As we do not want to call `is_dir()` to detect directories, we instead
use a heurstic based on the path.

 - if it has descendant paths, it is a directory
 - if it has no descendants and no suffix, it is a directory.

This heuristic is not perfect - an empty directory with a `.` in the
name will be misclassified as a file. However, this case seems very
rare, and I have added some protection in this case so we fail
gracefully.
@bloodearnest bloodearnest linked an issue Mar 6, 2024 that may be closed by this pull request
Copy link
Contributor

@rebkwok rebkwok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The UI doesn't quite work as usual if there is an empty directory with an extension - it gets displayed with the folder icon but no arrow (unlike a normal empty directory). But I don't think it's worth spending any time fixing.

@bloodearnest
Copy link
Member Author

bloodearnest commented Mar 7, 2024

LGTM. The UI doesn't quite work as usual if there is an empty directory with an extension - it gets displayed with the folder icon but no arrow (unlike a normal empty directory). But I don't think it's worth spending any time fixing.

Huh, for me it displayed as a file, ie not directory icon, and no triangle to open. When I click on it, I see the text I expect

image

I think this case is very unlikely, so am merging as is.

@bloodearnest bloodearnest merged commit e89a133 into main Mar 7, 2024
8 checks passed
@bloodearnest bloodearnest deleted the bloodearnest/147/Optimize-workspace-tree-building-for-large-workspace-trees branch March 7, 2024 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize workspace tree building for large workspace trees
2 participants