Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release version v1 of the file format #118

Open
10 of 15 tasks
cschwan opened this issue Feb 18, 2022 · 13 comments
Open
10 of 15 tasks

Release version v1 of the file format #118

cschwan opened this issue Feb 18, 2022 · 13 comments
Assignees
Milestone

Comments

@cschwan
Copy link
Contributor

cschwan commented Feb 18, 2022

Here's what I'd like to change in a newer version of the file format, v1, replacing the current (v0) one:

Originally posted by @cschwan in #83 (comment)

@cschwan
Copy link
Contributor Author

cschwan commented Sep 9, 2022

Technically we can have different file formats at the same time by versioning also the modules:

  • pineappl::v0:: would contain the old code
  • pineappl::v1:: would contain the new file format
  • and so on should we need even more changes.
  • in the module pineappl we can insert pub use ... statements to import the most recent file format.

Grid::read would be the only method that has access to the old v0 module, which it uses if the file header indicates a non-current version. It then elects the appropriate Grid, either from v0 or v1. Grids loaded from v0 would automatically be converted into objects from v1, so everything should go smoothly without the user even noticing.

In this way we can also easily get rid of old file format - if we decide to do that - by simply throwing away the v0 module and changing Grid::read to return an error. In that case the user would have to manually upgrade the file format, for instance by using an older version of the CLI.

@felixhekhorn
Copy link
Contributor

Since you want to guarantee backwards compatibility, you need to have both the old and the new code inside and also the converter (which could go in v0 since it can be thrown away together) ... (just a style comment: you maybe want to scope them a bit more then just pineappl, because this sounds like the whole library changes - e.g. pineappl::io or similar?)

@cschwan
Copy link
Contributor Author

cschwan commented Sep 9, 2022

Yet another way (no idea how that would work) is to let the pineappl crate depend on an older version of itself, or separate that part into a different crate, for instance pineappl-upgrade.

@alecandido
Copy link
Member

alecandido commented Sep 9, 2022

Personally, I don't care so much about splitting crates. But if the other is not obviously the good choice, I'd stick with one for simplicity.

The dependency on the older version is potentially a mess.

I've read in the Go ecosystem an advise for new major releases, that was to create a completely new module (in this case a new crate), and depend on the old one.
Unfortunately, in Go is simpler, because of the way modules are addressed. In Rust is not so common, so it would appear messy.
But given that we are talking about the file format, I would imitate inside the crate itself (essentially with ::io::v0 and ::io::v1).

@cschwan
Copy link
Contributor Author

cschwan commented Sep 10, 2022

I think I'll first go with v0 and v1 modules, and after some time we might decide split off the v0 part into a separate crate.

@cschwan
Copy link
Contributor Author

cschwan commented Nov 10, 2022

Yet another way (no idea how that would work) is to let the pineappl crate depend on an older version of itself, or separate that part into a different crate, for instance pineappl-upgrade.

This stackoverflow answer describes how to do it: https://stackoverflow.com/a/58739076/812178. That seems pretty straightforward.

@alecandido
Copy link
Member

alecandido commented Nov 10, 2022

Consider that, going this way, you might have to maintain also the older version, in case of bugs preventing some kind of applications.
It is possible, but not the easiest, for a project with a restricted number of maintainers.

In my opinion, I would try to keep the version management straight, and just increase the version number in new releases. Thus, I'm more in favor of pineappl::v0, pineappl::v1... (or pineappl::io::v0 or whatever...).
When you have a new one, you deprecate the old at some point, and eventually drop.

@cschwan
Copy link
Contributor Author

cschwan commented Nov 10, 2022

@alecandido those two ideas aren't exclusive. You'd write

use pineappl_v0 as v0;

in pineappl/src/lib.rs and in pineappl/Cargo.toml:

[dependencies]
pineappl_v0 = { package = "pineappl", version = "0.5.8" }

In this way we'd preserve an older version in pineappl::v0, while the newer version would be in pineappl. The main difference to explicitly keeping the old copy in pineappl/src/v0/... is that we can't modifiy the old source, but I think this is an advantage. With the method described above we'd rely on crates.io to get 0.5.8, which no one needs to maintain (you can't change published versions).

The only functionality that we need from pineappl::v0 is to load a Grid and update the structures to those in v1.

@alecandido
Copy link
Member

For as long as it is working, that's fine.

Problem is if Grid becomes incompatible, then you have to write a method to convert v0::Grid into Grid.
However, if the only operation allowed with the old crate in the new crate is loading, that's just the worst case: you need a converter. Then, all the operations and everything will be done by the new crate with the new logic, even if incompatible.

The only room remained is if you are also going to dump an old version, or just the new one (I'd recommend the later).
E.g., if you optimize an old grid, will it still be old? Or if you merge old grids?
This is a potential mess, since then you also have to deal with merging new and old grids and so on, so I'd say all the operations should be performed as if the grids were all new, and the old ones are converted.
Notice that this is potentially breaking: if your code depends on PineAPPL v0, and you optimize your grid with PineAPPL CLI v1, then you won't be able to use it any longer. However, I would say that this should be solved by the user either using CLI v0, or upgrading the dependency.

@alecandido
Copy link
Member

Another topic: from what you said it seems you want to keep file version and library version synced

In this way we'd preserve an older version in pineappl::v0 [...] we'd rely on crates.io to get 0.5.8 [...]

Are you sure?

Definitely, it has the advantage of being more clear.
But in principle, you can make an incompatible file format also maintaining compatibility of the library (it seems to me), and the opposite as well. Since compatibility should be maintained as far as possible, sharing the version you will occasionally declare as incompatible objects that actually are compatible.

@cschwan
Copy link
Contributor Author

cschwan commented Nov 10, 2022

Problem is if Grid becomes incompatible, then you have to write a method to convert v0::Grid into Grid.

The changes in #118 (comment) will require rewriting the Grid class, so in that sense a converter from a v0 to a v1 Grid will need to be written in Grid::read.

However, if the only operation allowed with the old crate in the new crate is loading, that's just the worst case: you need a converter. Then, all the operations and everything will be done by the new crate with the new logic, even if incompatible.

I don't think there will be incompatible logic; the logic shouldn't change, only the internal structure and the file format.

The only room remained is if you are also going to dump an old version, or just the new one (I'd recommend the later). E.g., if you optimize an old grid, will it still be old? Or if you merge old grids? This is a potential mess, since then you also have to deal with merging new and old grids and so on, so I'd say all the operations should be performed as if the grids were all new, and the old ones are converted. Notice that this is potentially breaking: if your code depends on PineAPPL v0, and you optimize your grid with PineAPPL CLI v1, then you won't be able to use it any longer. However, I would say that this should be solved by the user either using CLI v0, or upgrading the dependency.

We'll always guarantee backwards compatibility, but not forwards compatibility, so eventually you'll have to update your dependencies. But we'll only change the Rust API and add a new file format (v1), everything else (CAPI, CLI, Python interface, etc.) will stay the same. Basically, I will change a few things under the hood, so the car will look the same, although maybe it may be a bit faster and simpler 😄.

@cschwan
Copy link
Contributor Author

cschwan commented Nov 10, 2022

Another topic: from what you said it seems you want to keep file version and library version synced

In this way we'd preserve an older version in pineappl::v0 [...] we'd rely on crates.io to get 0.5.8 [...]

Are you sure?

Definitely, it has the advantage of being more clear. But in principle, you can make an incompatible file format also maintaining compatibility of the library (it seems to me), and the opposite as well. Since compatibility should be maintained as far as possible, sharing the version you will occasionally declare as incompatible objects that actually are compatible.

In an arbitrary version of PineAPPL (the Rust library) supporting the file format version x the function Grid::read will parse the file format number of the specific file that we want to read, let's call that p, and then we'll have three cases:

  • p > x; in that case the library is too old and we have to return an error (there's no forwards compatibility)
  • p == x; we deserialize Grid from the file as we do now
  • p < x; we use pineappl_v(x-1)::Grid::read (for instance, if x == 1 we use pineappl_v0::Grid::read) and convert the grid to a structure supported by the library with version x. Note that this enables us to support all file formats for all times for all values of p. Let's imagine that in 10 years we'll have file format 3. Then what happens is that pineappl::Grid::read calls pineappl_v2::Grid::read, which in turn calls pineappl_v1::Grid::read, and so on, and at each step the grid will be converted. So in a way the system bootstraps itself, without having to maintain all versions.

For each file version we just need to know the latest version of PineAPPL that supports it, and that will be stored in Cargo.toml as shown above.

@cschwan cschwan modified the milestones: v0.6.0, v1.0 Jan 3, 2023
@t7phy t7phy linked a pull request Jul 12, 2024 that will close this issue
21 tasks
@cschwan cschwan removed a link to a pull request Jul 31, 2024
21 tasks
@cschwan
Copy link
Contributor Author

cschwan commented Oct 9, 2024

The ideas outlined in #118 (comment) have been implemented in #299 and they seem to work fine. In particular, we now have a new v0 module in which read_uncompressed_v0 reads the old grids in the old file format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants