Replace Numpy files with zarr for resampling LUT caching #880

pnuu · 2019-08-16T05:49:28Z

Another go at making a draft PR for using Zarr instead of Numpy for storing resampling LUTs. The first try, from wrong branch, is here for reference #879

Tests added and test suite added to parent suite
Tests passed
Passes flake8 satpy
Fully documented

pnuu · 2019-08-16T05:54:53Z

And naturally I forgot to select "Draft", and it seems already made PRs can't be converted to a draft. Blargh.

Also, fix flake8 warnings

pnuu · 2019-08-16T10:55:15Z

Having the resampling LUTs in the same zarr doesn't affect the performance at all, as they are in any case in separate files on disk. I anyway combined all the separate arrays to one zarr to have the cache_dir cleaner.

Some performance tests (HRIT SEVIRI, times in seconds) below. The bilinear performance sky-rockets for the later runs when the resampling data are computed to Numpy arrays. This is due to how the arrays are used in get_sample_from_bil_info() in pyresample has been implemented. This needs to be fixed in another PR for pyresample.

To run this PR this is also required to fix an issue in output index tiling: pytroll/pyresample#207

npz (current master)
first run: 61.2
cache run: 43.7
third run: 43.8

zarr, multiple files
first run: 60.6
cache run: 41.1
third run: 41.4

zarr, multiple files, no persist
first run: 95.7
cache run: 41.6
third run: 41.5

zarr, one file
first run: 61.9
cache run: 41.1
third run: 41.2

zarr, one file, no persist
first run: 100
cache run: 41.9
third run: 41.9

zarr, bilinear, no env variables
first run: 134
cache run: 141.4
third run: 141.2

zarr, bilinear, OMP=1 DASK=4, CHUNKS=1024
first run: 110.1
cache run: 429.4
third run: -

zarr, bilinear, OMP=1 DASK=4, CHUNKS=1024, cache.compute()
first run: -
cache run: 28.7
third run: 28.2

djhoese · 2019-08-16T12:38:50Z

Correct me if I'm wrong but since you are running to_zarr three separate times it is still computing them 3 separate times. If they share any calculations then those calculations will be (unnecessarily) repeated each time to_zarr is executed. Not sure if a xarray Dataset would be the simplest alternative.

pnuu · 2019-08-16T20:20:06Z

Indeed. Didn't find an example to write several dask arrays in one go, only the way I used. But now that you mention it, I remember seeing something that used xarray.Dataset when I first looked at zarr. Will have another look at this on monday.

…samplers

pnuu · 2019-08-19T12:10:13Z

When using xr.Dataset.to_zarr() the timings are the following (note that the area for bilinear is 1/4th of the nearest in each dimension.):

nearest, xr.Dataset.to_zarr
first run: 60.3
cache run: 38.3
third run: 37.3

nearest, xr.Dataset.to_zarr, OMP=1 DASK=4, CHUNKS=1024
first run: 49.3
cache run: 32.7
third run: 33.9

bilinear, xr.Dataset.to_zarr, OMP=1 DASK=4, CHUNKS=1024
first run: 115.0
cache run: 456.1
third run: 486.1

bilinear, xr.Dataset.to_zarr, OMP=1 DASK=4, CHUNKS=1024, cache.compute()
first run: 115.0
cache run: 27.8
third run: 27.1

Note that the compute() call in the last test run does not affect the memory footprint.

pnuu · 2019-08-19T12:11:45Z

I'm now forcing the resample LUTs to be loaded in memory for bilinear resampler for useable performance.

djhoese · 2019-08-19T12:35:01Z

Was the nearest test case the same as the previous timing? Like those with master? If so, does this mean the zarr implementation is equal or at least similar in timing to the original numpy? Any idea what the environment variables were set to for the original master branch tests?

Those bilinear times are extremely concerning. How is it possible for the 2nd and 3rd run to take that long just from the cache? The cache should be loading from the zarr shouldn't it? Your numbers make it seem like the cache is being computed multiple times and it is doing it from the original inputs not the cached arrays. Does that sound possible?

pnuu · 2019-08-19T12:47:53Z

Yes, for nearest the base tests were without any environment variables set, like the first in this new set. So with the same setting zarr seems to be a bit faster.

Multiple compute calls could indeed be the explanation. And now that I look at the code, the BilinearResampler class doesn't implement any in-memory caching. I think I should add that in similar way KDreeResampler does it with setattr(resampler, idx_name, cache).

djhoese · 2019-08-19T12:50:14Z

Yeah that could be helpful. I'm not a fan of setting/getting the attributes on the pyresample Resampler object so if you can find a cleaner way to do it 👍.

pnuu · 2019-08-19T13:46:56Z

Hmm. Caching the indices doesn't seem to help much. If the dask arrays are persisted the cached runs are still slow (343.4 s). So I was thinking it's the indexing done with the dask arrays instead of Numpy arrays that is slow, and computed only valid_input_index and index_array and kept bilinear_s and bilinear_t as dask arrays. But it still was slow. But doing the opposite and computing bilinear_* takes the cached runs down below 30 s times. And these two arrays are only in a multiplication when getting the output pixel value.

I think I need to do some testing with a custom dask scheduler to count the computations for different runs.

pnuu · 2019-08-20T08:05:22Z

And now some results with a custom scheduler that keeps count on how many compute() calls are made in bilinear resampling:
precompute bilinear_?: 97 compute calls, 29.0 s
precompute all indices: 97 compute calls, 29.3 s
no precomputations for indices: 95 calls, 194.3 s

All the runs were made with OMP_NUM_THREADS=1 DASK_NUM_WORKERS=4 PYTROLL_CHUNK_SIZE=1024 using zarr as cache storage.

Things I see from these results:

now, without any changes to the prior similar test, running without the explicit compute when loading the resampling LUTs is much faster (see point 4.)
there are least computations in the version without precomputation, but that version is still the slowest by far
it doesn't matter for computation count whether only bilinear_{s,t} or all of the indices are precomputed, even if that is done one-by-one
using a custom scheduler, even with scheduler='threads' kwarg uses single-threaded scheduler, which seems to be faster for the "don't precompute anything" case

…unts

coveralls · 2019-08-22T10:41:17Z

Coverage increased (+0.03%) to 84.63% when pulling 365d761 on pnuu:feature-zarr-resample-luts into d90e7f5 on pytroll:master.

codecov · 2019-09-03T07:43:18Z

Codecov Report

Merging #880 into master will increase coverage by 0.03%.
The diff coverage is 87.96%.

@@            Coverage Diff             @@
##           master     #880      +/-   ##
==========================================
+ Coverage    84.6%   84.63%   +0.03%     
==========================================
  Files         169      171       +2     
  Lines       24946    25086     +140     
==========================================
+ Hits        21105    21231     +126     
- Misses       3841     3855      +14

Impacted Files	Coverage Δ
satpy/tests/test_resample.py	`98.21% <100%> (-0.48%)`	⬇️
satpy/resample.py	`89.86% <80.3%> (-1.48%)`	⬇️
satpy/writers/cf_writer.py	`91.43% <0%> (-0.55%)`	⬇️
satpy/scene.py	`90.47% <0%> (-0.18%)`	⬇️
satpy/readers/mersi2_l1b.py	`96.38% <0%> (-0.09%)`	⬇️
satpy/tests/writer_tests/test_mitiff.py	`97.9% <0%> (-0.07%)`	⬇️
satpy/composites/abi.py	`100% <0%> (ø)`	⬆️
satpy/readers/hdf4_utils.py	`92.72% <0%> (ø)`	⬆️
satpy/tests/enhancement_tests/test_abi.py	`100% <0%> (ø)`
satpy/enhancements/abi.py	`100% <0%> (ø)`
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d90e7f5...365d761. Read the comment docs.

mraspaud · 2019-09-03T12:57:34Z

satpy/resample.py

+                    zarr_out[idx_name] = (coord, fid[idx_name])
+
+            # Write indices to Zarr file
+            zarr_out.to_zarr(fname_zarr)


Should the npz cache files be deleted after the conversion ?

I was thinking about that, but was hesitant to do that (automatically) if the user happens to have another streams still using them.

ok, fine by me

What do you mean by "stream"? If there is an active process using the file then deleting it shouldn't be a problem (it should still be held in memory). I suppose we are assuming that two different versions of satpy may be using the same resampling cache? I guess it would be a problem then.

I'm fine with leaving them there, but I don't think it is necessary.

"Stream" = "processing chain". Someone might have another process running an older Satpy for whatever reason, e.g. for comparison reasons before upgrading an operational chain to the latest versions.

mraspaud

LGTM

mraspaud · 2019-09-03T13:06:25Z

@djhoese do you want to review this one ?

djhoese · 2019-09-04T02:01:25Z

Short answer: I'm ok merging this as is.

Long answer: @pnuu Could you re-summarize the performance you're seeing? Is it as expected (1 overall compute for to_zarr, future resamples are much faster)? If stuff doesn't make sense could you provide updated information on timings and when computations are occurring? At least for the nearest neighbor resampling since it is the default (and I'm not sure I understand everything you're doing in bilinear).

If you make your chunk size really large how many computations occur? You mentioned ~100 compute calls, but that is because it includes each chunk as a separate compute, right? I was thinking one chunk might make things easier to debug. Is the computation only during to_zarr? Is it possible for the user to force persist for the cached arrays?

Random thought: Is it possible to make to_zarr a task in the dask chain so it isn't saved to disk until it is needed/computed?

pnuu · 2019-09-04T05:14:44Z

Short summary:

one to_zarr call (61.9 s) vs. several calls (60.6 s) didn't matter much with nearest
chunk size doesn't affect the number of compute() calls made. Not sure, but this might be due to dask forcing single-threaded scheduler when wrapping things in custom scheduler. This is seen when dask profilers are used; only one thread in use.
for bilinear the usage of zarr makes it possible to use much larger areas without running out of memory
for bilinear the resampling indices need to be computed; slicing with dask arrays produces huge result arrays with shapes of (target_area.size, target_area.size). Also the data needs to be computed (via accessing .values) as dask doesn't support fancy indexing and vindex is just slow
for bilinear this PR doesn't include caching the resampler, but that doesn't matter as the same resampler (and the indices) will be used for all the channels with the same resolution and no re-computations are needed

mraspaud · 2019-09-04T05:44:56Z

Ok, so the speed increase isn't the big deal here, it's the memory usage, right ?

pnuu · 2019-09-04T05:54:58Z

Indeed. And zarr should be more cluster-friendly. Not sure if the resamplers themselves are cluster-friendly, but anyway 😁

djhoese

Looks good. I suppose the main thing is that this isn't any slower than before.

pnuu added 7 commits August 2, 2019 18:18

Create filenames ending in .zarr

b16bdc5

Use zarr to save resampling LUTs

de94b2f

Fix logging syntax

43d8987

Finish zarr cache handling for bilinear interpolation

a5f77e6

Add zarr as install requirement

73667e5

Use Dask to interface with zarr files

e5112bc

Merge branch 'master' into feature-zarr-resample-luts

3db2f67

pnuu requested review from djhoese and mraspaud as code owners August 16, 2019 05:49

pnuu changed the title ~~Feature zarr resample luts~~ Replace Numpy files with zarr for resampling LUT caching Aug 16, 2019

pnuu added component:resampling enhancement code enhancements, features, improvements help wanted work in progress labels Aug 16, 2019

pnuu self-assigned this Aug 16, 2019

Save resampling LUTs to save zarr

a6fbda2

Also, fix flake8 warnings

pnuu added 3 commits August 19, 2019 10:37

Use xr.Dataset.to_zarr to save resampling LUTs in one go

8ac3859

Convert valid input index array to boolean in nearest and bilinear re…

652ac49

…samplers

Force resample cache to be loaded in memory for bilinear resampler

64ba972

pnuu added 4 commits August 22, 2019 08:32

Fix mask usage for nearest neighbour resampling

e742b1a

Update tests for Zarr usage

5dfa641

Merge branch 'master' into feature-zarr-resample-luts

40768fc

Fix FakeZarr.compute() and .astype() to return something, fix call co…

c1d1b94

…unts

pnuu added 3 commits August 23, 2019 10:15

Remove obsolete nprocs kwarg

7f19f42

Convert pre-existing Numpy cache to zarr for nearest resampler

b855136

Mock _check_numpy_cache in unittests

365d761

mraspaud reviewed Sep 3, 2019

View reviewed changes

mraspaud approved these changes Sep 3, 2019

View reviewed changes

djhoese approved these changes Sep 4, 2019

View reviewed changes

pnuu merged commit ff5ed76 into pytroll:master Sep 6, 2019

pnuu deleted the feature-zarr-resample-luts branch September 6, 2019 10:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace Numpy files with zarr for resampling LUT caching #880

Replace Numpy files with zarr for resampling LUT caching #880

pnuu commented Aug 16, 2019 •

edited

Loading

pnuu commented Aug 16, 2019

pnuu commented Aug 16, 2019

djhoese commented Aug 16, 2019

pnuu commented Aug 16, 2019

pnuu commented Aug 19, 2019

pnuu commented Aug 19, 2019

djhoese commented Aug 19, 2019

pnuu commented Aug 19, 2019

djhoese commented Aug 19, 2019

pnuu commented Aug 19, 2019

pnuu commented Aug 20, 2019 •

edited

Loading

coveralls commented Aug 22, 2019 •

edited

Loading

codecov bot commented Sep 3, 2019

mraspaud Sep 3, 2019

pnuu Sep 3, 2019

mraspaud Sep 3, 2019

djhoese Sep 4, 2019

pnuu Sep 4, 2019

mraspaud left a comment

mraspaud commented Sep 3, 2019

djhoese commented Sep 4, 2019

pnuu commented Sep 4, 2019

mraspaud commented Sep 4, 2019

pnuu commented Sep 4, 2019

djhoese left a comment

Replace Numpy files with zarr for resampling LUT caching #880

Replace Numpy files with zarr for resampling LUT caching #880

Conversation

pnuu commented Aug 16, 2019 • edited Loading

pnuu commented Aug 16, 2019

pnuu commented Aug 16, 2019

djhoese commented Aug 16, 2019

pnuu commented Aug 16, 2019

pnuu commented Aug 19, 2019

pnuu commented Aug 19, 2019

djhoese commented Aug 19, 2019

pnuu commented Aug 19, 2019

djhoese commented Aug 19, 2019

pnuu commented Aug 19, 2019

pnuu commented Aug 20, 2019 • edited Loading

coveralls commented Aug 22, 2019 • edited Loading

codecov bot commented Sep 3, 2019

Codecov Report

mraspaud Sep 3, 2019

Choose a reason for hiding this comment

pnuu Sep 3, 2019

Choose a reason for hiding this comment

mraspaud Sep 3, 2019

Choose a reason for hiding this comment

djhoese Sep 4, 2019

Choose a reason for hiding this comment

pnuu Sep 4, 2019

Choose a reason for hiding this comment

mraspaud left a comment

Choose a reason for hiding this comment

mraspaud commented Sep 3, 2019

djhoese commented Sep 4, 2019

pnuu commented Sep 4, 2019

mraspaud commented Sep 4, 2019

pnuu commented Sep 4, 2019

djhoese left a comment

Choose a reason for hiding this comment

pnuu commented Aug 16, 2019 •

edited

Loading

pnuu commented Aug 20, 2019 •

edited

Loading

coveralls commented Aug 22, 2019 •

edited

Loading