Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add interfaces compatible with EKO v0.11.x #186

Closed
wants to merge 3 commits into from
Closed

Conversation

alecandido
Copy link
Member

@alecandido alecandido commented Oct 24, 2022

The best thing would be to be EKO independent, but this is not completely possible.

Furthermore, with the new structure will also be a not so clever idea, since we want to allow for very big EKOs, to be loaded one Q2 at a time.

So, for the future, we have two options:

  1. either we find a way to pass a callable with a certain interface, and PineAPPL will query for operators, when needed
  2. or we just make a Rust EKO library (needed in any case, Lazy loading for rust eko#97, but will be extremely tiny), and have PineAPPL directly depend on it

For the time being, we just stick to the single operator, but I propose to move the construction out of PineAPPLpy

  • receive directly a concatenated array in convolute_eko
  • same in evolve
  • recover tests

@cschwan
Copy link
Contributor

cschwan commented Oct 25, 2022

It seems you're changing the arguments of the function and thus the API. If this is the case we can't merge it into master without breaking semver compatibility.

@alecandido
Copy link
Member Author

If you wish, I can preserve the arguments, but that is not the only issue with compatibility.

In general, now we have a new memory structure, and maybe it is not 100% stable, because we just introduced and we need to do a few things. But the intention is to be as close as possible to the "final" one.

This was the first step towards the computation of jets EKOs, and in principle the structure is there to be used. Unfortunately, the way it works is not extremely simple to integrate with PineAPPL, since the way we did until now is to use rank 5 arrays:

operator: &Array5<f64>,

but that's the object being too big, so we are splitting by Q2 in a list of rank 4 arrays, and loading them in memory one by one (and unload as soon as it has been used).

So, we have two goals:

  1. in the short term, recover the old behavior (i.e. Array5), but starting from the new structure
  2. in the longer term (but not too long), use the new behavior (i.e. Array4 one by one) to save memory

This PR is trying to do 1., asking immediately for the Array5, in such a way not too depend any longer so much on the EKO internal structure (so the concatenation is done by someone else). This would also improve current memory usage, since we can try to drop the operator errors, without affecting PineAPPL any longer (it is only a factor of 2, but it matters for large objects).

@alecandido
Copy link
Member Author

Similar for the muf2_grid, since in this way PineAPPL doesn't need to rely on how EKO stores them.

@felixhekhorn
Copy link
Contributor

It seems you're changing the arguments of the function and thus the API. If this is the case we can't merge it into master without breaking semver compatibility.

Consider that we could even fake in pineko for the moment and only do a single PR against PineAPPL the moment our output is more stable (because I expect still further changes)

@alecandido alecandido marked this pull request as draft October 25, 2022 07:22
@alecandido
Copy link
Member Author

alecandido commented Oct 25, 2022

Consider that we could even fake in pineko for the moment and only do a single PR against PineAPPL the moment our output is more stable (because I expect still further changes)a

This I didn't consider, but it might be reasonable. On one side we are doing something not optimal twice: here, working with a legacy structure, and in Pineko, mocking to keep it.

But it's true that in the end we want to go for 2., so we can also go for mocking as a temporary solution, and face the change when we'll be ready for the proper one.

@cschwan how would you proceed for 2.?
(in principle we could even start working for it: we don't have the full product in EKO, but this mostly concern computation, the structure is already there - apart confidence on its stability)

@cschwan cschwan changed the title Update to EKO v0.11.x Add interfaces compatible with EKO v0.11.x Oct 25, 2022
@cschwan
Copy link
Contributor

cschwan commented Oct 25, 2022

At the moment option 1. (in the first comment) seems most attractive to me, mainly because that's what evolution::operators does in any case and because we could probably easily extend to also support slicing in the other dimensions.

@cschwan
Copy link
Contributor

cschwan commented Oct 25, 2022

For option 2. you basically have to copy what I did in pineappl evolve.

@alecandido
Copy link
Member Author

For option 2. you basically have to copy what I did in pineappl evolve.

Indeed, you have already done most of the job.

The practical question is actually more about if we want an explicit dependency on a single function in a given crate, or we just define an interface.

In any case, I would build on what you have already done, and I would also stick to just Grid.evolve for the complete version, and keep Grid.convolute_eko with the legacy behavior (i.e. Array5). In any case, it is already deprecated, and we will drop.

The proposal is to move that part of your code (the data loader in pineappl evolve) in the EKO repo, since it is related to the EKO format, so it makes sense that follows the natural evolution and development of the rest of EKO.
We can't use directly that one, because we are changing the EKO format right now, and in particular we need lazy loading (and early dropping) for the memory. Apart from this, that's the code we need.

So, let's finalize the interface vs dependency decision, and then I'll start sketching the loader in EKO NNPDF/eko#97 (maybe with the help of @andreab1997, while @felixhekhorn will take care of the completion of split computation NNPDF/eko#138, that is the main bottleneck)

@cschwan
Copy link
Contributor

cschwan commented Oct 25, 2022

In terms of memory it would probably be best to lazy load a three-dimensional array of the type used here:

operator
.slice(s![.., pid1_idx, .., pid0_idx, ..])
.select(Axis(0), &fac1_indices)
.select(Axis(1), &x1_indices)
.permuted_axes([0, 2, 1])
.as_standard_layout()
.into_owned()

This extracts (a slice of the) operators for each non-zero (pid1, pid0) tuple as follows:

  • in dimension 0, the factorization scale values active in the grid (fac1_indices),
  • in dimension 2, the x1 grid values active in the grid (x1_indices) and
  • all entries in dimension 4.

Storing three-dimensional operators would give you the benefit of throwing away operators that are zero (that's probably important if you have one big EKO where dimensions 0 and 2 are big).

As an interface we could have two callables:

  • pid0(pid1: Vec<i32>) -> Vec<i32> - given the vector of pid1, for which pid0 are there non-zero operators?
  • operator(fac1_slice, x1_slice) -> Array3<f64> - return the operator described above.

@alecandido
Copy link
Member Author

That's an interesting idea, but it is something more with respect to the existing solution.

For us, we have to compute rank 4 operators, but they are computed one final scale at a time. That's why it made sense to split by what you call fac1_indices.

So: the layout how they are produced will be a set of Array4. Nothing prevents from optimizing the layout before convolution, also once for all (optimize and dump). However, it is not my main goal right now, so if you wish we/you can do the offline optimization or virtualize such an interface (but I don't expect to be much convenient, if in the end relies on Array4 below).

Instead, about the interface vs dependency, maybe there is not really a conflict: I will start providing an interface like operator(fac1_slice) -> Array4<f64>.
If PineAPPL will depend on this directly (I believe more likely), or it will receive the callable from Pineko, we can decide afterwards.

The moment I have something, you can have a look and see if it is possible/sensible to implement your interface on top.

@cschwan
Copy link
Contributor

cschwan commented Oct 25, 2022

@alecandido sounds good!

@cschwan cschwan mentioned this pull request May 6, 2023
10 tasks
@alecandido
Copy link
Member Author

By now, this is definitely outdated. I'd propose to close and wait for NNPDF/eko#260, that would be the implementation of what is being discussed above.

Until then, I believe that there is no point to keep discussing this issue.

@cschwan cschwan closed this May 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants