Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calls to /disks slow #123

Open
jacdavi opened this issue May 18, 2023 · 4 comments
Open

Calls to /disks slow #123

jacdavi opened this issue May 18, 2023 · 4 comments

Comments

@jacdavi
Copy link
Contributor

jacdavi commented May 18, 2023

We've been noticing calls to /disks can be pretty slow (20+ seconds).

It seems that the speed is mostly due to the large number of files/directories. In one example we have 1,520 files 1,727 directories. Of note, 1,687 of the directories have miniccc_responses in their path.

The main code that handles this is files.getAllFiles. There are a number of solutions I've tried that fix the issue, but I'm not sure what's the best solution.

  1. Limit the recursion depth to the base directory and one sub directory
  2. Only search subdirectories for the experiment, not other subdirectories
  • The code is set up to ignore directories that are for other experiments. However, if phenix loses track of an experiment, but its files still exist, they will get included. That seems to be happening in this example
  1. Explicitly exclude miniccc_responses directories from search

I don't love the idea of hardcoding an exception for miniccc_responses, but that's the best option I've come up with that nearly guarantees a disk isn't missed. Though I'm not sure how users typically arrange their disks. All of ours are in the base directory.

@jacdavi
Copy link
Contributor Author

jacdavi commented May 18, 2023

@activeshadow I can make a PR for one of the above solutions (or something else), but would be interested in your thoughts first. Thanks!

@activeshadow
Copy link
Collaborator

@jacdavi since phenix is pretty much only wired to work with minimega at this point, I don't mind hard coding it to skip over miniccc_responses. I think the typical use case is to have all images in the base directory, but the code in question to check for image files deeper into subdirectories was added by @eric-c-wood, who is a prominent user of phenix, so I would suggest leaving that in place unless he chimes in otherwise.

Long story short, let's just go with hard coding the code to skip miniccc_responses directories for now.

@jacdavi
Copy link
Contributor Author

jacdavi commented May 18, 2023

Actually maybe the miniccc_responses isn't a general enough solution. I know Arthur had this problem before, and I recall he had a ~5000 directory container file system. So the solution wouldn't work in that case.

@eric-c-wood
Copy link
Contributor

@jacdavi @activeshadow Wow, 20+ seconds is way too slow especially when feeding a UI. For our use cases, we really only have a need to enumerate disk images in the minimega files directory and disk images that exist in {minimega files directory}/{experiment directory}/"files". In addition, there is a need to enumerate disk images defined in a topology that may exists outside the minimega files directory path but that appears to be handled by the getTopologyFiles function.

Would a solution that first enumerates disk images in the minimega files directory combined with an enumeration of disk images in {minimega files directory}/{experiment directory}/"files" solve the 20+ second response time for most use cases? Would that be too restrictive for other use cases?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants