Cache of operators #173

scarlehoff · 2024-04-22T14:17:18Z

At the moment in order to generate a theory we need to generate an insane amount of EKOs.

However, due to the fact that many datasets are sharing the scale and that pineappl grids all share the same x-factors, it should be possible to generate a cache of EKOs (for a given theory).

So for instance, if I'm going to run:

pineko theory ekos dataset_n 410000000

Pineko should be able to

Read all the ekos already present in the eko folder (in which the eko for dataset_n will be generated)
Read the relevant operator cards (no need to parse up all ekos)
Find out where some of the operators for dataset_n are already computed and take them directly from there.

The (ideal) next step would be to not save all operators, but just the union of all operators requested in all operators cards.

I'm wondering whether this is a crazy idea or this could be doable. I'm particularly interested in the ideal next step since I'm having storage problems...

The text was updated successfully, but these errors were encountered:

alecandido · 2024-04-22T14:39:46Z

We took considered this many times, and it was part of the initial concept behind EKO.

The only reason why we gave up is the size of the EKOs. If this is not an issue any longer, and we have a lot of storage, that could be an interesting option (storing EKOs is essentially a space-time tradeoff, if you are actually reusing them).

The FK tables case

We discussed for quite some time about compressing all the FK tables EKOs into a single one.

Indeed, the terminology used was small EKO and big EKO for a theory.
The small one was the postfit (that is actually computed and there), and the big one would have been the union of all the EKOs used to generate the FK tables.

However, since EKOs are squared wrt to the DIS grid (not exactly squared for the double hadronic) the requirement to store them was considered to be absurd (just think about merging all the jets EKOs), and we gave up...

alecandido · 2024-04-22T14:41:29Z

More specifically, you could also reuse EKO subsets, or just recompute a subset, if some configurations are matching (the theory + evmod + scale variations + ...).

scarlehoff · 2024-04-22T14:42:50Z

The only reason why we gave up is the size of the EKOs

Why? I was thinking (hoping!) precisely on a way of reducing the amount of EKOs needed since, for instance, most DY datasets are probably sharing them.

felixhekhorn · 2024-04-22T14:45:34Z

at some point we also considered adding a interpolation in Q2 (or better $\log(Q^2)$ ) ... (i.e. to not compute for every $Q^2 \in \mathbb R$ possible)

alecandido · 2024-04-22T14:56:18Z

Why? I was thinking (hoping!) precisely on a way of reducing the amount of EKOs needed since, for instance, most DY datasets are probably sharing them.

They usually involve different scales.

The relevant settings are controlled by the theory, so they should not change per-dataset.
And if we're computing the grids (when it happens) we can even control the momentum fraction grid (ok, this is not the case of jets and the other Ploughshare grids).
But the Q2 scales we do not control*...

*well, there is a limit in which you can control the dynamical scale choice, but there are external recommendations on how to pick it, so it is not something we can lightheartedly use for computing optimization

scarlehoff · 2024-04-22T15:09:38Z

They usually involve different scales.

Many of the DY datasets just have muF = muR = mZ.
That's why I'm thinking the same family of processes might often share the scales. Even when they are dynamic.

(and even more so if we have the same process binned across some variable the scale does not depend on, like some of the 2D distributions)

alecandido · 2024-04-22T15:24:59Z

However, these datasets are often not problematic, because if your scale does not depend on the bin, you often have a single one per dataset, and those EKOs are small (the usual len(xgrid) ** 2 * len(flavors) ** 2 * size(float), without the Q2 factor). And the computational demand is proportional to the size.

All the big EKOs are coming from having many scales. To the best of my knowledge.

cschwan · 2024-04-23T07:04:29Z

If I recall correctly a big problem were jet measurements at the LHC where also the xgrid wasn't constant over Q2. That led to the biggest EKOs that I've seen so far. But we could try to implement a hybrid approach in which we have several EKOs:

If the dataset uses our favourite xgrid of 50 points we use a single 'big EKO'
For all the other datasets we don't change anything. We just keep using 'small' (lol) EKOs
In the meantime we should try to migrate datasets from 2 to 1 - essentially implement them in pinefarm ourselves.

So the best of both worlds essentially.

scarlehoff · 2024-04-23T08:23:48Z

If I recall correctly a big problem were jet measurements at the LHC where also the xgrid wasn't constant over Q2.

This was due to them not being originally pineappl grids, right?

alecandido · 2024-04-23T08:31:41Z

If I recall correctly a big problem were jet measurements at the LHC where also the xgrid wasn't constant over Q2.

I remember this as well. However, it should not be a big deal: each Q2 value is computed separately in EKO, so sharing the same xgrid on different scales it is only helpful for the common part (up to the matching).

I.e. if each value of Q2 has its own xgrid, it could be up to 3x computation (up-to-bottom evolution + bottom matching + from-bottom evolution - since NNPDF Q0 is in 4 flavors, and ignoring above top). But if that's not the case, the overhead should be small.

This was due to them not being originally pineappl grids, right?

For sure.

cschwan · 2024-04-23T09:21:02Z

In principle and in practice we also have preferred Q2 values; if a dynamic scale is chosen, the Q2 points of newly-generated grids should always be a subset of 40 fixed values. The only exception to this for new grids comes from datasets where we chose a static scale value. But then there's only one Q2 per dataset/bin.

We could choose not to make a static-scale optimization and then we already know which EKOs are needed: only the ones for known 50 x grid values and the 40 Q2 values.

alecandido · 2024-04-23T09:25:57Z

With one Q2 value per dataset + 40, and 50 xgrid points, it would be a very reasonable EKO even the "big one" per theory (the FK table EKO, as opposed to the postfit EKO). It'd be certainly sizeable, but reasonable.

However, we still have many wild (imported) grids. Are we planning to recompute them soon? Have they already been recomputed?

scarlehoff · 2024-04-23T09:29:44Z

For the old theories that ship has sailed of course, but we are recomputing many grids as we are preparing the theory for 4.1

I think only singletop, jets and dijets will not be native pineappl grids. And both jet and dijets should be pineappl-able since they are processes included in nnlojet.

cschwan · 2024-04-23T09:32:09Z

I think only singletop, jets and dijets will not be native pineappl grids. And both jet and dijets should be pineappl-able since they are processes included in nnlojet.

Using separate EKOs for those seems like a good compromise.

alecandido · 2024-04-23T09:32:40Z

For the old theories that ship has sailed of course, but we are recomputing many grids as we are preparing the theory for 4.1

Whatever you're doing, the proposal is of course for new theories. The old one could be at most deprecated in favor of new ones (because of known bugs/limitations, and the files could be dropped in the very long term).

I think only singletop, jets and dijets will not be native pineappl grids. And both jet and dijets should be pineappl-able since they are processes included in nnlojet.

How much computation would be required to pinefarm them?

scarlehoff added the question Further information is requested label Apr 22, 2024

scarlehoff assigned felixhekhorn and andreab1997 Apr 22, 2024

giacomomagni added the enhancement New feature or request label Aug 23, 2024

scarlehoff mentioned this issue Oct 28, 2024

Remove operator_cards (and _template.yaml) #202

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache of operators #173

Cache of operators #173

scarlehoff commented Apr 22, 2024

alecandido commented Apr 22, 2024 •

edited

Loading

alecandido commented Apr 22, 2024

scarlehoff commented Apr 22, 2024

felixhekhorn commented Apr 22, 2024 •

edited

Loading

alecandido commented Apr 22, 2024 •

edited

Loading

scarlehoff commented Apr 22, 2024

alecandido commented Apr 22, 2024

cschwan commented Apr 23, 2024

scarlehoff commented Apr 23, 2024

alecandido commented Apr 23, 2024

cschwan commented Apr 23, 2024 •

edited

Loading

alecandido commented Apr 23, 2024

scarlehoff commented Apr 23, 2024

cschwan commented Apr 23, 2024

alecandido commented Apr 23, 2024 •

edited

Loading

Cache of operators #173

Cache of operators #173

Comments

scarlehoff commented Apr 22, 2024

alecandido commented Apr 22, 2024 • edited Loading

alecandido commented Apr 22, 2024

scarlehoff commented Apr 22, 2024

felixhekhorn commented Apr 22, 2024 • edited Loading

alecandido commented Apr 22, 2024 • edited Loading

scarlehoff commented Apr 22, 2024

alecandido commented Apr 22, 2024

cschwan commented Apr 23, 2024

scarlehoff commented Apr 23, 2024

alecandido commented Apr 23, 2024

cschwan commented Apr 23, 2024 • edited Loading

alecandido commented Apr 23, 2024

scarlehoff commented Apr 23, 2024

cschwan commented Apr 23, 2024

alecandido commented Apr 23, 2024 • edited Loading

alecandido commented Apr 22, 2024 •

edited

Loading

felixhekhorn commented Apr 22, 2024 •

edited

Loading

alecandido commented Apr 22, 2024 •

edited

Loading

cschwan commented Apr 23, 2024 •

edited

Loading

alecandido commented Apr 23, 2024 •

edited

Loading