Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Status quo caching extension #53

Merged

Conversation

tfesenbecker
Copy link
Contributor

Current status of the caching extension of lapis.
The new code contains

  • Storage class: describes storage objects like caches and has the basic functionalities to place new files in the cache or update them when
  • FileProvider class: objects handle communication between jobs and storage objects, currently very basic, could take limited bandwith into consideration
  • readout for storage object input files
  • drones have a new attribute "sitename" that can be used to select storage elements that could be accessed by jobs running inside a drone

As this is only a draft the code contains different debug outputs to track the execution of jobs.

@tfesenbecker
Copy link
Contributor Author

I'll have a look at the UnitTests

@codecov
Copy link

codecov bot commented Nov 2, 2019

Codecov Report

Merging #53 into master will decrease coverage by 1.9%.
The diff coverage is 18.05%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #53      +/-   ##
==========================================
- Coverage   37.34%   35.43%   -1.91%     
==========================================
  Files          15       18       +3     
  Lines         739      807      +68     
  Branches      110      122      +12     
==========================================
+ Hits          276      286      +10     
- Misses        449      506      +57     
- Partials       14       15       +1
Impacted Files Coverage Δ
lapis/pool_io/htcondor.py 76.92% <ø> (ø) ⬆️
lapis/storage_io/storage.py 0% <0%> (ø)
lapis/cli/simulate.py 0% <0%> (ø) ⬆️
lapis/file_provider.py 0% <0%> (ø)
lapis/simulator.py 0% <0%> (ø) ⬆️
lapis/drone.py 60.41% <100%> (+0.84%) ⬆️
lapis/scheduler.py 15.68% <40%> (-0.48%) ⬇️
lapis/utilities/walltime_models.py 66.66% <66.66%> (ø)
lapis/job.py 73.91% <77.77%> (+0.14%) ⬆️
lapis/job_io/swf.py 90.9% <0%> (ø) ⬆️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a698820...32faa38. Read the comment docs.

lapis/job.py Outdated Show resolved Hide resolved
@pep8speaks
Copy link

pep8speaks commented Nov 2, 2019

Hello @tfesenbecker! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-11-08 09:14:17 UTC

Copy link
Member

@eileen-kuehn eileen-kuehn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, great progress! 🥇

General

Please try to attach docstrings to methods and classes. You can also include information on what you are still planning to do, what is not working yet, etc. This helps in reading the new stuff going on here and can improve our feedback for you :)

Comments to Caching Approach itself

It currently seems to me that parts of files can only be added to a storage but never be deleted again.

Importing Storage Information

In general, I am currently not yet convinced to import the storage data as it is done with the other input files. It makes a difference. The other input files are exported from existing tools and have defined formatting. So we define in- and output methods. For the storage it is different, it is for setting up the environment for simulation. So more an in-simulation-configuration.
What I currently also don't know is if we could combine those information with information from configuration data from Tardis (I am planning to also support importing those).

lapis/cli/simulate.py Outdated Show resolved Hide resolved
@@ -15,6 +15,7 @@ def __init__(
pool_resources: dict,
scheduling_duration: float,
ignore_resources: list = None,
sitename: str = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please reason about including sitename into the drone? From my point of view, this might be too specific. I currently also don't see the advantages from having something like sitename. But I bet you do have some very good reasons! :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be careful about using sitename -- it is a very specific jargon/feature (CMS) and may not properly reflect how we need to model caches. E.g. what we've seen is that for a single "site" there may be different cache/storage requirements -- see Chris' results on benchmarking TSystems. If we need to identify the "site", that is basically the Pool of the drone -- which can already be used as a dict key etc.

Is there some advantage to using a string identifier?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sitename is just quick workaround for the mapping of pools/drones and storage elements. I want to replace this as soon as I understand what information/means of identification will available once other changes to the overall system (adding more of the jobs ClassAd structure, ...) are done. Passing this information directly to the drone was the option that needed the least changes to existing code.

@maxfischer2781 What do you mean by

which can already be used as a dict key etc.

?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by ...

A Pool instance can be used as a key in a mapping -- e.g. Dict[Pool, Storage] should be possible to map from pool (site) to storage and vice versa.

Which of course isn't of much use to you if you want to define this outside of the simulation, e.g. in a JSON/YAML configuration file...

lapis/storage.py Show resolved Hide resolved
lapis/storage.py Show resolved Hide resolved
lapis/storage.py Outdated Show resolved Hide resolved
lapis/file_provider.py Outdated Show resolved Hide resolved
lapis/file_provider.py Outdated Show resolved Hide resolved
lapis/job.py Outdated Show resolved Hide resolved
lapis/scheduler.py Outdated Show resolved Hide resolved
lapis/simulator.py Outdated Show resolved Hide resolved
Co-Authored-By: Eileen Kuehn <[email protected]>
Copy link
Member

@maxfischer2781 maxfischer2781 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a mixture of new proposals, food for though and change requests. Note that if you feel a change may be worthwhile but want to get basic functionality first, just add a ticket to add it later. The label recommendation is likely such a case.

@@ -15,6 +15,7 @@ def __init__(
pool_resources: dict,
scheduling_duration: float,
ignore_resources: list = None,
sitename: str = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be careful about using sitename -- it is a very specific jargon/feature (CMS) and may not properly reflect how we need to model caches. E.g. what we've seen is that for a single "site" there may be different cache/storage requirements -- see Chris' results on benchmarking TSystems. If we need to identify the "site", that is basically the Pool of the drone -- which can already be used as a dict key etc.

Is there some advantage to using a string identifier?

lapis/file_provider.py Outdated Show resolved Hide resolved
lapis/file_provider.py Show resolved Hide resolved
lapis/file_provider.py Outdated Show resolved Hide resolved
lapis/scheduler.py Outdated Show resolved Hide resolved
lapis/storage.py Outdated Show resolved Hide resolved
lapis/storage.py Show resolved Hide resolved
lapis/storage.py Outdated Show resolved Hide resolved
lapis/storage.py Outdated Show resolved Hide resolved
lapis/utilities/cache_algorithm_implementations.py Outdated Show resolved Hide resolved
@eileen-kuehn eileen-kuehn mentioned this pull request Nov 4, 2019
@eileen-kuehn
Copy link
Member

Last night I thought again about the walltime in the jobs regarding the monitoring. The monitoring itself only happens after a job is finished. So this value should not be dependent on an async execution.

So I must revert one of my previous comments: This value must stay an value that you set after you now what its value will actually be. I think this means that the run method of the job itself needs extension. So you must have the connection to the Pipe there that runs as long as not all data is received. So you should remove the await self.walltime condition. See this as a placeholder as we did not have a method to determine the actual wall time before :)

What do you think?

@tfesenbecker
Copy link
Contributor Author

I'm a bit confused about your mention of the Pipe but we'll see to that later.
I agree with the rest.

@tfesenbecker
Copy link
Contributor Author

I decided to change the jobs _walltime attribute in case that a new walltime is calculated. I don't think that this should cause any mishaps or difficulties down the road. Do you agree?

@maxfischer2781 maxfischer2781 changed the base branch from master to feature/caching November 11, 2019 14:57
@maxfischer2781
Copy link
Member

@eileen-kuehn @tfesenbecker as discussed, I've changed the target branch to feature/caching so that we can merge early.

@tfesenbecker tfesenbecker merged commit 32faa38 into MatterMiners:feature/caching Nov 19, 2019
@maxfischer2781 maxfischer2781 mentioned this pull request Nov 28, 2019
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants