-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support postprocessing of instances #116
base: master
Are you sure you want to change the base?
Support postprocessing of instances #116
Conversation
b7190ff
to
dc31070
Compare
PR ready for review |
print("simexpal: Restoring original instance files") | ||
os.rename(partial_path + '.post0', partial_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, if we restore the instance file to the post0
file, how do we know if any given instance has already been postprocessed or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old logic here was: the post0
files are obtained through download, and the instance files are always already postprocessed, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, if we restore the instance file to the
post0
file, how do we know if any given instance has already been postprocessed or not?
For each instance we create a .postprocessed
file (lines 781
and 868
) when the postprocessing was successful. Also, we print out status messages in the terminal. Although we probably shouldn't assume that it is enough. Maybe I could add an additional check in Instance.check_available()
which looks for the .postprocessed
file for instances that are supposed to be postprocessed. In this way, experiments can't be started on unprocessed instances that were supposed to be postprocessed.
The old logic here was: the
post0
files are obtained through download, and the instance files are always already postprocessed, right?
Yes. With this PR .post0
files are still obtained through download and instance files are postprocessed depending on the existence of the .postprocessed
file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concerning .postprocessed
and Instance.check_available()
: I think it is enough to inform the user when postprocessing has not been successful. Making successful postprocessing mandatory might lead to unexpected results for users (for example if the experiment does not really rely on it in practice).
…en postprocessing instances The .postprocessed file signals the successful postprocessing of an instance. The .original file is a copy of the instances before postprocessing
… types Also 'to_edgelist' was removed from instances where we do not support edge list conversion anyway
Right now it should suffice to check the existence of the files, which is done in check_available()
1dba381
to
a83a4b2
Compare
Fixed the merge conflicts for now. When the remarks above are resolved or ok with you I will mark this PR as ready for review again. |
^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
You can define arbitrary postprocessing steps by setting the ``postprocess`` key to a list of dictionaries | ||
containing the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... to a list of dictionaries containing the keys:
- ``environ``: dictionary of (environment variable, value)-pairs | ||
- ``workdir``: path of the working directory | ||
|
||
keys. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete this line then.
|
||
Assume you want to postprocess the ``facebook_combined`` and ``cit-HepTh`` network from | ||
`SNAP <https://snap.stanford.edu/data/>`_ using two executables ``postprocess1`` and ``postprocess2``, which | ||
take the path of the instance as parameter. Also, you have to prepend the path for ``postprocess1`` to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does only postprocess1
needs the PATH
variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just to demonstrate the possibility/necessity of adding the appropriate PATH-variable in simexpal in order to find the executable for the postprocessing. There might be cases where the PATH was added beforehand, so that is isn't necessary to do so anymore.
Re-Postprocessing | ||
^^^^^^^^^^^^^^^^^ | ||
|
||
To re-postprocess instances, you can simply delete the respective ``<instance_name>.postprocessed`` files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better name/reference one more time that instances are saved in inst_dir
.
``simex instances run-transform --transform='to_edgelist'`` if you already have them saved locally. | ||
execute supported actions, e.g, transforming the instances to edge list format via | ||
``simex instances run-transform --transform='to_edgelist'`` or :ref:`postprocess <PostprocessInstances>` | ||
them if you already have them saved locally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would put if you already have them saved locally
to the beginning of the sentence. Otherwise it currently reads like: postprocessing is available if the instances are locally available (while the transform works all the time).
print("simexpal: Restoring original instance files") | ||
os.rename(partial_path + '.post0', partial_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concerning .postprocessed
and Instance.check_available()
: I think it is enough to inform the user when postprocessing has not been successful. Making successful postprocessing mandatory might lead to unexpected results for users (for example if the experiment does not really rely on it in practice).
fullpath = instance.fullpath | ||
util.try_rmfile(fullpath) | ||
util.try_rmfile(fullpath + '.postprocessed') | ||
util.try_rmfile(fullpath + '.original') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is with .post0
and .post1
?
This PR contains the following:
args
,workdir
andenviron
keys in thepostprocess
key (like the build arguments incompile
,configure
, ...)assert file_size > 0
while testing the installation process (so we can have empty (dummy) instances; also it should suffice to only check for the existence of instance files)