-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Breaking changes and maintaining reproducibility #276
Comments
OK, excellent writeup and description of the problem, and nice summary of various solutions. I can immediately say that I would be sad to see the option to specifying groups and individuals going out, for
which is beautiful. I also sympathise with your solution 2, though. Perhaps we can perform two reading-runs: One package-reading with the usual strictness, and then another one which is far more lenient. Any packages that were read in the second run, but not in the first, get reported as "broken according to current standards" or something like that: Users can still list them in all APIs and even local listings, and download them, but perhaps not forge without additional |
We could even provide a base flag called |
OK, this issue became new fodder now with #283. The more I think about it, the more I think we should implement both a lenient and a strict way of loading packages. One could throw warnings, the other one would refuse to load. The main question is what should be the default, and I think you are leaning into a clear way of making the strict reading the default, which I would sympathize with. So indeed we should implement a package reading option |
Another thought: I guess we could also codify the changing parsing strictness in the schema itself (for example by changing a SHOULD to a MUST statement), and provide a new Poseidon Version for the stricter behavior. Then, we could implement the behavior in trident contingent on the Poseidon Version number. Concretely: Of course, that would be a bit of a paradigm shift, I guess. I'm just throwing it out there. |
A recent change added in trident v1.4.0.0 (#268) and again reverted in v.1.4.0.2 (#275) uncovered an interesting issue with our current package version management concept for Poseidon:
This change boils down to a new cross-file consistency check: The genotype ploidy (
Genotype_Ploidy
) documented in .janno files was supposed to be checked when reading a package, and if a sample marked ashaploid
had a heterozygote SNP, this would throw an error. So this change in trident did not reflect a change in the Poseidon schema, but only a more strict validation of an already formalized promise in the schema.Activating this check had unforeseen consequences, because some packages in our public community archive actually failed this validation step and had to be fixed (poseidon-framework/community-archive#135). They were, strictly speaking, not valid Poseidon packages before this update. With the new, adjusted versions of said packages, we considered this issue solved.
This was premature, though, because trident v1.4.0.0 and v1.4.0.1 were not able to read the old package versions any more. For a user this is not necessarily a problem: They can either switch to the new, updated package versions, or continue working with an old trident release, which doesn't check this property. But a big problem emerged for the Poseidon web server (
trident serve
): It could neither read nor serve the old package versions any more, thus breaking the reproducibility promise that motivated the entire shift to a versioned web API.We averted chaos by deactivating the ploidy check in v.1.4.0.2. But this is only a temporary solution, because we want to include this check eventually. A similar issue would probably also emerge, if we ever decided to update the Poseidon schema in a breaking way.
There are various ways to solve this issue permanently. Here are two potential solutions (formulated in a way that clearly highlights which one I prefer 😬):
trident
's reading process to a point where even broken or anachronistic packages could be loaded. Issues could be pointed out as warnings.validate
could continue to be strict. This way the server could easily read and serve every somewhat functional package and even outdated versions. Users may experience less issues when using trident, but may end up with potentially broken output (as a consequence of broken input). This explicitly includes packages that do not fulfill the guarantees of the Poseidon schema./zip_file
and/packages
endpoints. This could, for example, be implemented by relying on meaningful directory names, which encode package name and version, or by only parsing POSEIDON.yml file partially. The server could thus serve literally anything old and outdated, because it does not have to read it.While I think 2. is the better approach, there are some obvious drawbacks:
/packages
API could be pretty clear about which packages and package versions are old and outdated,/groups
and/individuals
would lack information on the unreadable, old packages. This mismatch might be confusing and it propagates directly tolist
.fetch
should probably not offer the option to list individuals and group entities in its selection language (-f
) any more.timetravel
is also affected by this issue. Just asserve
forzip_file
it could switch to a minimal package discovery workflow.The text was updated successfully, but these errors were encountered: