Skip to content

Sea Surface Temperature (SST) – Data Procurement, Analysis and Machine Learning (ML) Foundations

Notifications You must be signed in to change notification settings

ash0545/sst-data-analysis-ml

Repository files navigation

sst-data-analysis-ml

Sea Surface Temperature (SST) – Data Procurement, Analysis and Machine Learning (ML) Foundations

Internship work done for the IP1201 course of our 3rd Semester

Reference Paper: Physical Knowledge-Enhanced Deep Neural Network for Sea Surface Temperature Prediction

Team Members:

Overview

Motivation

The primary motivation for this internship was to gain practical experience in Data Analysis for ML, specifically for the SST data provided by the Group for High Resolution Sea Surface Temperature (GHRSST). Data preprocessing steps are elaborated, and we navigate through these steps while providing insights into the tools and methods used to ensure data quality and usability.

Tools / Libraries Used

  • podaac-data-subscriber : to access data from the Physical Oceanography Distributed Active Archive Center (PODAAC)
  • Panoply : to view plots and attributes of downloaded NetCDF files
  • xarray : for working with labelled multi-dimensional arrays
  • pandas : for data manipulation and analysis

Data Acquisition

  • The CMC0.2deg-CMC-L4-GLOB-v2.0 dataset was downloaded retrieved through podaac-data-subscriber

Note

The podaac-data-subscriber tool requires credentials from https://urs.earthdata.nasa.gov, which are to be stored in a .netrc file.

The exact command used for download was : podaac-data-downloader -c CMC0.2deg-CMC-L4-GLOB-v2.0 -d ./data --start-date 2007-05-01T00:00:00Z --end-date 2014-12-31T23:59:59Z

  • This data contained SST records spanning the entire globe from May 2007 to December 2014
  • The data was present as netCDF files, with 2803 files having a total size of 5.56 GB image

Data Preprocessing

  • The data had the following attributes:

    • analysed_sst : SST values, defined at all grid points
    • analysis_error : estimated error standard deviation of analysed SST
    • lat and lon
    • mask : sea / land / lake / ice field composite mask
    • sea_ice_fraction : sea ice area fraction
    • time : reference time of SST field
  • As no physical meaning ascribed to values over land, they were masked out using the mask attribute and xarray image

  • The resulting masked data was then sliced with a bounding box over India, using xarray's .sel() method image

  • Data compression was applied using the zlib Python library

  • The final processed data was ~170 MB, down from the initial 5.56 GB, a reduction of more than 30X

ML Foundations

  • Our reference paper consisted of three models, which were :

    • Generative Adversarial Network (GAN)
    • Auto-Encoder
    • ConvLSTM

    We decided to focus on the ConvLSTM model, as the other two were used to enhance the data, and we wanted to get hands on with the training process as soon as possible

  • A previous implementation of ConvLSTM was used

  • The entire dataset was merged into a single xarray DataArray, to be fed into the model image

  • Our final work for this internship was learning how to reshape data to a form expected by the model (as provided in the model's documentation)

Helpful Links

About

Sea Surface Temperature (SST) – Data Procurement, Analysis and Machine Learning (ML) Foundations

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published