-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Cross-references of jobs #96
Comments
Original comment by Carl Simon Adorf (Bitbucket: csadorf, GitHub: csadorf). @bdice I would argue that a many-to-many relationship in this context could be realized by storing many references to project A and many references to project B within one location. |
Addressing this issue thoroughly requires quite a bit of additional work. For the moment I'm bumping this feature past the 2.0 milestone. I'm (possibly ambitiously) targeting #189 for version 2.0, but since there's more work to do to enable this proposal fully I don't think we need this for 2.0. I also don't think this feature needs to break any existing APIs, so we could add it to a 2.x release. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Reopening since there is still interest in this work. |
Original report by Carl Simon Adorf (Bitbucket: csadorf, GitHub: csadorf).
About
This issue describes a proposed feature which would standardize the way that jobs may be referenced within and between different projects.
One typical use case is the need to store aggregated results or data that is shared among many different jobs within one larger data space.
Rationale
There is currently no standardized way to reference jobs from different projects in order to define relationships of jobs within or across projects.
This puts the burden on the user to conceptualize and implement such references, which leads to duplication of effort and possible complications when code is interfaced by 3rd parties.
A standardization of references will make it easier for users to setup a data spaces with above mentioned relationships.
Example
Assuming that the user performed multiple computations at different state points and wants to generate aggregated results, such as a phase diagram, based on that data.
We propose that such a workflow would be supported with the following API:
The above mentioned workflow allows us to easily determine the origin data:
Definitions
Terms used in this proposal document:
Explicitly supported use-cases
The following use-cases should be supported by the proposed concept and implementation:
Specify the following relationships: one-to-one, one-to-many, many-to-one, many-to-many,
Specify a reference:
a) within the same project,
b) from one project to a sub-project,
c) from one project to a parent-project,
d) from one project to a neighbor-project.
Concept
We need two pieces of information in order to be able to locate a job within or across projects:
The project is referenced by a relative or an absolute path to its root directory.
A relative path is defined as relative to a specific project, where the default is the current project.
A link is a URI defined like this:
The URI scheme is called 'signac', the project root directory is defined as the combination of the netloc and path component, and the job id is specified through the fragment component.
A signac URI can be parsed for example with the
urllib.parse.parse_url
function:Proposed API
The high-level API is comprised of project-based methods and root namespace functions.
Project-based API
Using the project-based API, all links are generated relative to a specific project.
Generate a link document for
job
relative toproject
.Lookup the project referenced in
link
relative toproject
's root directory and then return the referenced job.This function will raise a
LookupError
if the referenced project cannot be found and aKeyError
if the referenced job does not exist in the looked-up project.Lookup the project referenced in
link
relative toproject
's root directory.This function will raise a
LookupError
if the referenced project cannot be found.Root-namespace API
The root-namespace API works like the project-based API, but always acts on the current project, that means the project returned by
signac.get_project()
.The root-namespace API can also be used if users want so specify an arbitrary path or even absolute paths.
This function will generate a link to
job
relative tofrom
.If the argument for
from
isNone
(the default), then the link will be relative to the return value ofsignac.get_project().root_directory()
, otherwise it will be relative to path specified infrom
.Instead of a directory path, one can also pass an instance of
Project
as thefrom
argument, in which case the link will be relative to the project's root directory.This function will attempt to look-up the
job
referenced inlink
relative tofrom
.If no argument for
from
is provided, the link will be relative to the return value ofsignac.get_project().root_directory()
.The argument for
from
can be a directory or an instance ofProject
.Automatic-conversion of instances of
Job
to linksWhen storing an instance of
Job
within a job's state point or document, it is automatically converted to a link.For example:
This is equivalent to:
This enables users to specify links with a concise API and predictable behavior.
To ensure that links are relative to the project of the job that contains the references, it is recommended to use
with
:By entering the job's workspace prior to the look-up, we can guarantee that we use the same reference:
Examples
Single link to another job
To create a reference to another job you simply call:
To look up the referenced we use the complimentary
Project.lookup
function:In general the following relationship is always true:
Link across projects
Jobs and their reference do not need to belong to the same project.
For example:
Caveats
Migration
Changing the state point of a job, for example by adding an additional key, changes its id and will therefore break the references.
Therefore, special care must be taken when migrating referenced jobs:
Assuming that we have a one-to-many relationship, where one parent-job is referenced by many children-jobs:
Then, to properly migrate all parent jobs, we could use the following recipe, where we take advantage of the
groupbydoc
function:Fixing broken references
Assuming that a user migrated jobs without taking care to update the references.
One could use the following recipe to repair those broken links:
The text was updated successfully, but these errors were encountered: