Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vine: disk allocation strategy improvements #3911

Open
colinthomas-z80 opened this issue Aug 9, 2024 · 2 comments
Open

vine: disk allocation strategy improvements #3911

colinthomas-z80 opened this issue Aug 9, 2024 · 2 comments

Comments

@colinthomas-z80
Copy link
Contributor

After a discussion concerning various disk management problems we have identified some fundamental issues in our disk allocation strategy. Improving our disk allocation strategy should help take care of some immediate problems as well as make our disk management generally more effective in situations where it is a constrained resource. In summary:

  • Our default strategy of allocating all of the available worker disk to a task is not effective, especially in the advent of the library task, which does not "complete" and release allocated resources in the same way as a regular task.

  • We should give priority to the user's declared resource requirements. If a user assigns a quantity of disk to a task we should not override their choice.

  • There are a number of different values maintained on the manager and at the worker indicating how much disk is available, how much is used, current disk allocated, sandbox and cache size. Keeping these values synchronized is some work and introduces the potential for error.

  • The worker's disk.total value is calculated as cache+sandboxes+available_space. We do not want this value to change, however separate processes and/or files generated by a task outside of the sandbox will cause the value to shrink and cause problems

@colinthomas-z80
Copy link
Contributor Author

Our tentative plan for these issues:

  • Replace the default "whole disk" allocation with a heuristic of some sizeable fraction of remaining space available, with a generous minimum size

  • Do not override task resource specifications even if proportional_resources thinks its a good idea

  • Refine the data structures maintaining disk usage statistics and make the manager distinctly aware of cache space used vs individual task sandboxes

  • Consider disk.total to be immutable at the worker. If it finds that something is consuming unmanaged disk space it will need to take this seriously and perhaps halt operation while it reports new resource quantities to the manager. We will need to address the specific scenario where the manager and worker are running on the same machine, and the manager logs are taking up the disk space. It may be the case that we should set a finite default disk value in the same way vine_factory does.

@dthain
Copy link
Member

dthain commented Aug 12, 2024

One amendment to your previous comment: disk_total = cache_size + sum(sandbox_i) + disk_avail is a constraint to be obeyed, not an assignment statement. disk_total is constant and cache_size and sandbox_i are measured, and disk_avail is what's left over. If you prefer an assignment, then this: disk_avail = disk_total - cache_size - sum(sandbox_i)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants