Sea Surface Temperature (SST) – Data Procurement, Analysis and Machine Learning (ML) Foundations

sst-data-analysis-ml

Sea Surface Temperature (SST) – Data Procurement, Analysis and Machine Learning (ML) Foundations

Internship work done for the IP1201 course of our 3rd Semester

Reference Paper: Physical Knowledge-Enhanced Deep Neural Network for Sea Surface Temperature Prediction

Team Members:

Ashwin Santhosh (CS22B1005, GitHub: ash0545)
Aswin Valsaraj (CS22B1006, GitHub: aswinn03)
Kaustub Pavagada (CS22B1042, GitHub: Kaustub26Pvgda)

Overview

Motivation

The primary motivation for this internship was to gain practical experience in Data Analysis for ML, specifically for the SST data provided by the Group for High Resolution Sea Surface Temperature (GHRSST). Data preprocessing steps are elaborated, and we navigate through these steps while providing insights into the tools and methods used to ensure data quality and usability.

Tools / Libraries Used

podaac-data-subscriber : to access data from the Physical Oceanography Distributed Active Archive Center (PODAAC)
Panoply : to view plots and attributes of downloaded NetCDF files
xarray : for working with labelled multi-dimensional arrays
pandas : for data manipulation and analysis

Data Acquisition

The CMC0.2deg-CMC-L4-GLOB-v2.0 dataset was downloaded retrieved through podaac-data-subscriber

Note

The podaac-data-subscriber tool requires credentials from https://urs.earthdata.nasa.gov, which are to be stored in a .netrc file.

The exact command used for download was : podaac-data-downloader -c CMC0.2deg-CMC-L4-GLOB-v2.0 -d ./data --start-date 2007-05-01T00:00:00Z --end-date 2014-12-31T23:59:59Z

This data contained SST records spanning the entire globe from May 2007 to December 2014
The data was present as netCDF files, with 2803 files having a total size of 5.56 GB

Data Preprocessing

The data had the following attributes:
- analysed_sst : SST values, defined at all grid points
- analysis_error : estimated error standard deviation of analysed SST
- lat and lon
- mask : sea / land / lake / ice field composite mask
- sea_ice_fraction : sea ice area fraction
- time : reference time of SST field
As no physical meaning ascribed to values over land, they were masked out using the mask attribute and xarray
The resulting masked data was then sliced with a bounding box over India, using xarray's .sel() method
Data compression was applied using the zlib Python library
The final processed data was ~170 MB, down from the initial 5.56 GB, a reduction of more than 30X

ML Foundations

Our reference paper consisted of three models, which were :
- Generative Adversarial Network (GAN)
- Auto-Encoder
- ConvLSTM
We decided to focus on the ConvLSTM model, as the other two were used to enhance the data, and we wanted to get hands on with the training process as soon as possible
A previous implementation of ConvLSTM was used
The entire dataset was merged into a single xarray DataArray, to be fed into the model
Our final work for this internship was learning how to reshape data to a form expected by the model (as provided in the model's documentation)

Helpful Links

Data Access : https://www.ncei.noaa.gov/access/ghrsst-long-term-stewardship-and-reanalysis-facility/
Bounding over Certain Regions : https://medium.com/analytics-vidhya/how-to-read-and-visualize-netcdf-nc-geospatial-files-using-python-6c2ac8907c7c
NetCDF Compression : https://stackoverflow.com/questions/49053692/csv-to-netcdf-produces-nc-files-4x-larger-than-the-original-csv
ConvLSTM Implementation : /~https://github.com/ndrplz/ConvLSTM_pytorch
Channels in CNNs : https://datascience.stackexchange.com/questions/64278/what-is-a-channel-in-a-cnn

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Notebooks		Notebooks
CMC0.2deg-CMC-L4-GLOB-v2.0.citation.txt		CMC0.2deg-CMC-L4-GLOB-v2.0.citation.txt
Final Dataset.zip		Final Dataset.zip
README.md		README.md
Timelapse for Verification.gif		Timelapse for Verification.gif

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sst-data-analysis-ml

Sea Surface Temperature (SST) – Data Procurement, Analysis and Machine Learning (ML) Foundations

Team Members:

Overview

Motivation

Tools / Libraries Used

Data Acquisition

Data Preprocessing

ML Foundations

Helpful Links

About

Releases

Packages

Languages

ash0545/sst-data-analysis-ml

Folders and files

Latest commit

History

Repository files navigation

sst-data-analysis-ml

Sea Surface Temperature (SST) – Data Procurement, Analysis and Machine Learning (ML) Foundations

Team Members:

Overview

Motivation

Tools / Libraries Used

Data Acquisition

Data Preprocessing

ML Foundations

Helpful Links

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages