Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for a better archive hierarchy #1

Open
1 of 3 tasks
itkovian opened this issue May 24, 2019 · 4 comments
Open
1 of 3 tasks

Allow for a better archive hierarchy #1

itkovian opened this issue May 24, 2019 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@itkovian
Copy link
Owner

itkovian commented May 24, 2019

We currently have a very flat format, i.e., job.<jobid>_script and job.<jobid>_environment. While this suffices for finding job scripts, it has several drawbacks.

  • there can be many jobs in the archive, meaning the number of entries in the single archival directory will become quite large.
  • users may not always recall the exact job ID (there might be a few) and looking for a time might help pin down the problematic job.

A better archive could be organised by

  • user
  • cluster
  • timestamps (e.g., yearly, monthly, daily, ...)
@itkovian itkovian added the enhancement New feature or request label May 24, 2019
@itkovian itkovian mentioned this issue May 25, 2019
@itkovian
Copy link
Owner Author

itkovian commented Jun 6, 2019

Not all environment files contain information about the user due to the --export=NONE setting when calling sbatch. This means we cannot reliably place the user name in the archived file name of directory structure.

@itkovian itkovian self-assigned this Jun 6, 2019
@itkovian
Copy link
Owner Author

Adding the cluster in the hierarchy seems only useful if the archive resides on storage that is shared between masters.

@kcgthb
Copy link

kcgthb commented May 14, 2020

For the job archival system we've developed locally, we use a multi-level hierarchy based on the job ids, not too different from what Slurm does in StateSaveLocation with the hash.{0..9} directories. That's the only way we found to store dozens of millions of file scripts in a POSIX filesystem.

The idea is to reverse the job id and slice it like this:
jobid 67043328 -> /archive/82/33/40/76/
jobid 10123 -> /archive/32/10/10/00/

This ensures that consecutive job ids get equally dispatched to the different end-level archive directories without overloading any particular one.

Maybe something similar could be used for sarchive?

@itkovian
Copy link
Owner Author

That's a nice suggestion, thanks. I would suggest not to take it to the lowest level, so maybe not start with /82/33 as in your example, but then you would have multiple consequential jobs in the same dir, though limited to e.g., 10K files or even 1K files if we use jobid div 1000.

In our usage, we do stick them in YYYYMMDD subdirs, which then get tarred and zipped after 7 days or so. So that may also avoid overloading, even though this lacks an equal distribution in numbers of files across the days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants