Skip to content

DINCAE (Data-Interpolating Convolutional Auto-Encoder) is a neural network to reconstruct missing data in satellite observations.

License

Notifications You must be signed in to change notification settings

gher-uliege/DINCAE

Repository files navigation

documentation latest DOI codecov.io

DINCAE

DINCAE (Data-Interpolating Convolutional Auto-Encoder) is a neural network to reconstruct missing data in satellite observations which is described in the following open access paper: https://doi.org/10.5194/gmd-13-1609-2020

Note that this code is no longer maintained and has been superseeded by /~https://github.com/gher-ulg/DINCAE.jl

Installation

Python 3.6 or 3.7 with the modules:

Tested versions:

  • Python 3.6.8
  • netcdf4 1.4.2
  • numpy 1.15.4
  • Tensorflow version 1.15 (DINCAE does not work with TensforFlow 2.0; TensorFlow 1.5 does not work on python 3.8)

You can install those packages either with pip3 or with conda.

Documentation

The document is available at https://gher-uliege.github.io/DINCAE/.

Input format

The input data should be in netCDF with the variables:

  • lon: longitude (degrees East)
  • lat: latitude (degrees North)
  • time: time (days since 1900-01-01 00:00:00)
  • mask: boolean mask where true means the data location is valid
  • SST (or any other varbiable name): the data

This is the example output from ncdump -h:

netcdf avhrr_sub_add_clouds {
dimensions:
	time = UNLIMITED ; // (5266 currently)
	lat = 112 ;
	lon = 112 ;
variables:
	double lon(lon) ;
	double lat(lat) ;
	double time(time) ;
		time:units = "days since 1900-01-01 00:00:00" ;
	int mask(lat, lon) ;
	float SST(time, lat, lon) ;
		SST:_FillValue = -9999.f ;
}

An example for how to create this file in the examples folder:

Running DINCAE

Copy the template file run_DINCAE.py and adapt the filename, variable name and the output directory and possibly optional arguments for the reconstruction method as mentioned in the documentation. The code can be run as follows:

python3 run_DINCAE.py

The output NetCDF files are contain the variables:

  • meandata the time average of the input data used to compute the anomalies
  • mean_rec and sigma_rec: the mean and standard deviation of the Gaussian probability distribution function of the reconstruction.

In Barth et al., 2020 the best results were obtained by averaging all the NetCDF files.

Reducing GPU memory

Convolutional neural networks can require "a lot" of GPU memory. These parameters can affect GPU memory utilisation:

  • reduce the mini-batch size
  • use fewer layers (e.g. enc_nfilter_internal = [16,24,36] or [16,24])
  • use less filters (reduce the values of the optional parameter enc_nfilter_internal)
  • reduce frac_dense_layer, a parameter controlling the width of the dense layer in the bottleneck
  • use a smaller domain or lower resolution

Example results

Link to animation

More information about this result is given in the linked paper.