-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache of operators #173
Comments
We took considered this many times, and it was part of the initial concept behind EKO. The only reason why we gave up is the size of the EKOs. If this is not an issue any longer, and we have a lot of storage, that could be an interesting option (storing EKOs is essentially a space-time tradeoff, if you are actually reusing them). The FK tables case
We discussed for quite some time about compressing all the FK tables EKOs into a single one. Indeed, the terminology used was small EKO and big EKO for a theory. However, since EKOs are squared wrt to the DIS grid (not exactly squared for the double hadronic) the requirement to store them was considered to be absurd (just think about merging all the jets EKOs), and we gave up... |
More specifically, you could also reuse EKO subsets, or just recompute a subset, if some configurations are matching (the theory + evmod + scale variations + ...). |
Why? I was thinking (hoping!) precisely on a way of reducing the amount of EKOs needed since, for instance, most DY datasets are probably sharing them. |
at some point we also considered adding a interpolation in Q2 (or better |
They usually involve different scales. The relevant settings are controlled by the theory, so they should not change per-dataset. *well, there is a limit in which you can control the dynamical scale choice, but there are external recommendations on how to pick it, so it is not something we can lightheartedly use for computing optimization |
Many of the DY datasets just have (and even more so if we have the same process binned across some variable the scale does not depend on, like some of the 2D distributions) |
However, these datasets are often not problematic, because if your scale does not depend on the bin, you often have a single one per dataset, and those EKOs are small (the usual All the big EKOs are coming from having many scales. To the best of my knowledge. |
If I recall correctly a big problem were jet measurements at the LHC where also the xgrid wasn't constant over Q2. That led to the biggest EKOs that I've seen so far. But we could try to implement a hybrid approach in which we have several EKOs:
So the best of both worlds essentially. |
This was due to them not being originally pineappl grids, right? |
I remember this as well. However, it should not be a big deal: each Q2 value is computed separately in EKO, so sharing the same I.e. if each value of Q2 has its own xgrid, it could be up to 3x computation (up-to-bottom evolution + bottom matching + from-bottom evolution - since NNPDF Q0 is in 4 flavors, and ignoring above top). But if that's not the case, the overhead should be small.
For sure. |
In principle and in practice we also have preferred Q2 values; if a dynamic scale is chosen, the Q2 points of newly-generated grids should always be a subset of 40 fixed values. The only exception to this for new grids comes from datasets where we chose a static scale value. But then there's only one Q2 per dataset/bin. We could choose not to make a static-scale optimization and then we already know which EKOs are needed: only the ones for known 50 x grid values and the 40 Q2 values. |
With one Q2 value per dataset + 40, and 50 xgrid points, it would be a very reasonable EKO even the "big one" per theory (the FK table EKO, as opposed to the postfit EKO). It'd be certainly sizeable, but reasonable. However, we still have many wild (imported) grids. Are we planning to recompute them soon? Have they already been recomputed? |
For the old theories that ship has sailed of course, but we are recomputing many grids as we are preparing the theory for 4.1 I think only singletop, jets and dijets will not be native pineappl grids. And both jet and dijets should be pineappl-able since they are processes included in nnlojet. |
Using separate EKOs for those seems like a good compromise. |
Whatever you're doing, the proposal is of course for new theories. The old one could be at most deprecated in favor of new ones (because of known bugs/limitations, and the files could be dropped in the very long term).
How much computation would be required to pinefarm them? |
At the moment in order to generate a theory we need to generate an insane amount of EKOs.
However, due to the fact that many datasets are sharing the scale and that pineappl grids all share the same x-factors, it should be possible to generate a cache of EKOs (for a given theory).
So for instance, if I'm going to run:
Pineko should be able to
dataset_n
are already computed and take them directly from there.The (ideal) next step would be to not save all operators, but just the union of all operators requested in all operators cards.
I'm wondering whether this is a crazy idea or this could be doable. I'm particularly interested in the ideal next step since I'm having storage problems...
The text was updated successfully, but these errors were encountered: