From eb556ef41ebc270b942d430b3bb93963e047ef5a Mon Sep 17 00:00:00 2001 From: Eli Holmes Date: Wed, 21 Aug 2024 00:48:51 +0000 Subject: [PATCH] edits --- book/notebooks/CHL_prediction_CNN.ipynb | 3 +- book/notebooks/CHL_prediction_ConvLSTM_.ipynb | 2 +- book/notebooks/IO_Zarr.md | 33 +++++++---- book/notebooks/background.md | 55 ++++++++----------- 4 files changed, 47 insertions(+), 46 deletions(-) diff --git a/book/notebooks/CHL_prediction_CNN.ipynb b/book/notebooks/CHL_prediction_CNN.ipynb index 464811d..7242ee6 100644 --- a/book/notebooks/CHL_prediction_CNN.ipynb +++ b/book/notebooks/CHL_prediction_CNN.ipynb @@ -6,7 +6,8 @@ "source": [ "# Gap-filling with CNN\n", "\n", - "**Author:** Yifei Hang (UW)\n", + "**Author:** Yifei Hang (UW Varanasi intern 2024)\n", + "\n", "\n", "This notebook shows how to fit a basic Convolutional Neural Network for filling the gaps in the Chlorophyll-a data. Although you can run this tutorial on CPU, it will be much faster on GPU. We used the image `quay.io/pangeo/ml-notebook:2024.08.18` for running the notebook." ] diff --git a/book/notebooks/CHL_prediction_ConvLSTM_.ipynb b/book/notebooks/CHL_prediction_ConvLSTM_.ipynb index 0dc5e48..8ff5bed 100644 --- a/book/notebooks/CHL_prediction_ConvLSTM_.ipynb +++ b/book/notebooks/CHL_prediction_ConvLSTM_.ipynb @@ -6,7 +6,7 @@ "source": [ "# Gap-filling with ConvLSTM\n", "\n", - "**Author:** Yifei Hang (UW), Jiarui Yu (UW)\n", + "**Author:** Yifei Hang (UW Varanasi intern 2024), Jiarui Yu (UW Varanasi intern 2023)\n", "\n", "This notebook shows how to fit a basic ConvLSTM for filling the gaps in the Chlorophyll-a data. Although you can run this tutorial on CPU, it will be much faster on GPU. We used the image `quay.io/pangeo/ml-notebook:2024.08.18` for running the notebook." ] diff --git a/book/notebooks/IO_Zarr.md b/book/notebooks/IO_Zarr.md index d088386..9e59db2 100644 --- a/book/notebooks/IO_Zarr.md +++ b/book/notebooks/IO_Zarr.md @@ -1,26 +1,31 @@ # Indian Ocean Dataset +**Author:** Minh Phan (UW Varanasi intern 2023), Eli Holmes (NOAA/UW) + Our Indian Ocean zarr dataset `INDIAN_OCEAN_025GRID_DAILY.zarr` or `IO.zarr` is a 1972-2022 blended dataset for the Arabian Sea and Bay of Bengal formated as a `.zarr` file, containing daily cleaned and interpolated data from variables across multiple sources, mostly from processed NASA/NOAA and Copernicus collections and the ERA5 reanalysis products. ### Variables -* `adt`: sea surface height above geoid (m) +* `adt`: sea surface height above geoid (m) (SL_TAC) * `air_temp`: air temperature at 2 meters above the surface (K), from 1979 (ERA5) -* `mlotst`: mean ocean mixed layer thickness (m) -* `sla`: sea level anomaly (m) -* `so`: sea salinity concentration (m**-3 or PSL) +* `mlotst`: mean ocean mixed layer thickness (m) (GLORY) +* `sla`: sea level anomaly (m) (SL_TAC) +* `so`: sea salinity concentration (m**-3 or PSL) (GLORY) * `sst`: sea surface temperature (K), from 1979 (ERA5) -* `topo`: topography (m) (USGS) -* `u_curr`: u-component of total surface currents (m/s) -* `v_curr`: v-component of total surface currents (m/s) -* `ug_curr`: u-component of geostrophic surface currents (m/s) +* `topo`: topography (m) (SRTM30+) +* `u_curr`: u-component of total surface currents (m/s) (OSCAR) +* `v_curr`: v-component of total surface currents (m/s) (OSCAR) +* `ug_curr`: u-component of geostrophic surface currents (m/s) (OSCAR) * `vg_curr`: v-component of geostrophic surface currents (m/s) * `u_wind`: u-component of surface wind (m/s), from 1979 (ERA5) * `v_wind`: v-component of surface wind (m/s), from 1979 (ERA5) -* `curr_speed`: total current speed (m/s) -* `curr_dir`: total current direction (degrees) -* `wind_speed`: surface wind speed (m/s), computed from ERA5, from 1979 -* `wind_dir`: surface wind direction (degrees), computed from ERA5, from 1979 + +### Derived variables + +* `curr_speed`: total current speed (m/s), computed from `u_curr` and `v_curr` +* `curr_dir`: total current direction (degrees) (OSCAR), computed from `u_curr` and `v_curr` +* `wind_speed`: surface wind speed (m/s), computed from `u_wind` and `v_wind` +* `wind_dir`: surface wind direction (degrees), computed from `u_wind` and `v_wind` ### Chlorophyll variables @@ -44,5 +49,9 @@ All variables have been broadcasted to fit into the temporal range we have. Ther * ERA5: These are hourly data that have been averaged to daily data, with the addition of some additional hourly wind layers. * GlobColour: CHL from the [GlobColour project](https://www.globcolour.info/) and accessed from Copernicus. There are two products. A Level 4 gap-filled product which is derived from a gappy Level 3 multi-sensor product. Gappy means still has cloud (etc) NaNs. +* GLORY: Global Ocean Physics Reanalysis (product code name GLORYS12V1) from Copernicus Marine Environment Monitoring Service (CMEMS). +* OSCAR: Ocean Surface Current Analyses Real-time (OSCAR) Version 2.0. +* SL_TAC: Sea Level TAC product Global Ocean Gridded L 4 Sea Surface Heights And Derived Variables Reprocessed 1993 Ongoing +* SRTM30+: SRTM30+ Global 1-km Digital Elevation Model Version 11 * CCI: Ocean Color CCI product that merges multiple sensors. * DINEOF: NOAA MSL12 Ocean Color, science quality, VIIRS multi-sensor (SNPP + NOAA-20), chlorophyll DINEOF gap-filled analysis diff --git a/book/notebooks/background.md b/book/notebooks/background.md index 93ac90d..b7cf3c9 100644 --- a/book/notebooks/background.md +++ b/book/notebooks/background.md @@ -1,32 +1,40 @@ # Background +**Author:** Minh Phan (UW Varanasi intern 2023) + +This dataset is a 1972-2022 daily data cube with a spatial range of −12◦S → 32◦N, 42◦E → 102◦E. Where necessary, we applied linear interpolation on, both spatially and temporally, on all of our data variables so that they all follow an average daily temporal, 0.25◦ × 0.25◦ spatial grid. + +The dataset is in Zarr format. Zarr, short for “zarr array,” is a storage format specifically designed for efficient, scalable, and parallelizable access to multi-dimensional typed arrays (tensors), making it an ideal choice for managing Earth observation data, especially for cloud-hosted data (cite gowen (Moore et al., 2023). It is developed as an open-source project by using referencible associated metadata and binary data called “chunks” stored in “formatted” directories, it leverages modern data storage technologies, such as chunked and compressed arrays, to optimize storage and retrieval, and reducing access latency. + +[Documentation](https://safs-varanasi-internship.github.io/indian-ocean-zarr/) on how the Zarr dataset was created. + ## Data Sources Most of our remote sensing data are sourced from the Copernicus program. Copernicus program is an European flagship program providing reliable and open satellite-based imagery, models, and in situ (nonspace) data, and is a coordinated effort between many organizations, including the European Commission, the European Space Agency, the European Centre for Medium-Range Weather Forecasts (ECMWF), and European Union Agencies (Skoda & Adam, 2020). Furthermore, we also blended data from the National Aeronautics and Space Administration (NASA) EarthData and the National Centers for Environmental Information (NCEI)’s databases. ### ERA5 -The primary data source we used in this assembled dataset product is the Copernicus ERA5 Global Reanalysis, the fifth generation of an atmospheric reanalysis project from Copernicus and ECMWF (Hersbach et al., 2020). As an ongoing project, once completed, it will embody a “detailed record of the global atmosphere, land surface, and ocean waves from 1950 onwards” (Hersbach et al., 2020). However, only 1979 data onwards is currently publicly available for download. The dataset’s high temporal and spatial resolution and ranges enables consistent, detailed, and concise detection and prediction tasks in our project. Comparing its performance to another popular reanalysis product, MERRA-2 by NASA, it overpowers the other in all aspects: resolution, time coverage, and accuracy (Olauson, 2018). Given its powerful capabilities, We collected five variables from this source, namely sea surface temperature, two-meter-high from surface level atmospheric temperature, and horizontal and vertical surface wind velocities. All of them are currently accessible using Amazon Simple Storage Service (S3) and updated regularly (Hersbach et al., 2020). +The primary data source we used in this assembled dataset product is the Copernicus ERA5 Global Reanalysis, the fifth generation of an atmospheric reanalysis project from Copernicus and ECMWF (Hersbach et al., 2020). ERA5 aims to provide a “detailed record of the global atmosphere, land surface, and ocean waves from 1950 onwards” (Hersbach et al., 2020). However, only 1979 data onwards is currently publicly available for download. The dataset’s high temporal and spatial resolution and ranges enables consistent, detailed, and concise detection and prediction tasks in our project. We compared ERA5 to another popular reanalysis product, MERRA-2 by NASA. ERA5 was choosen since has better resolution, time coverage, and accuracy (Olauson, 2018). We used five variables from ERA5: sea surface temperature (`sst`), two-meter-high from surface level atmospheric temperature (`air_temp`), and surface wind velocities (`u_wind` and `v_wind`). #### Sea surface temperature -The temperature of the ocean’s surface, known as Sea Sur- face Temperature (SST), serves as a crucial gauge for assessing the Earth’s climate system (Reynolds et al., 2002). Consequently, having precise information about SST is vital for monitoring, researching, and forecasting climate patterns. This also rings true in the case of coastal upwelling, as seasonally variable low SST compared to the average temperature at the same latitude may partially indicate an upwelling zone (Benazzouz et al., 2014; Alvarez et al., 2010; Izumo et al., 2008). +The temperature of the ocean’s surface, known as Sea Surface Temperature (SST), serves as a crucial gauge for assessing the Earth’s climate system (Reynolds et al., 2002). Consequently, having precise information about SST is vital for monitoring, researching, and forecasting climate patterns. In the case of coastal upwelling, as seasonally low SST compared to the temperature off-shore at the same latitude is indicative of coastal upwelling (Benazzouz et al., 2014; Alvarez et al., 2010; Izumo et al., 2008) as the deep cold water is pulled to the surface by the upwelling forces. #### Atmospheric temperature -Our choice to include atmospheric temperature was more or less a secondary addition. It does not have a instant, direct correlation on upwelling as SST; however, researches showed that its strength may be influenced by preceding air temperature records in the preceding seasons before the upwelling season (Sun et al., 2022). This complements our project, especially on prediction tasks. +Air temperature does not have a direct correlation with upwelling as SST does; however, research has shown that upwelling strength may be influenced by air temperature records in the preceding seasons before the upwelling season (Sun et al., 2022). #### Vertical and horizontal components of the surface wind -One of the prominent coastal upwelling characteristics is a parallel wind direction along the coast (Lill, 1978). Strong winds can also cool the ocean surface, promoting the conditions and occurrence of upwelling (Kim et al., 2023). Longshore surface wind is also a major factor in mass transport and upwelling intensity, especially in the case of wind stress (Nigam et al., 2018). In their paper, Nigam et al. (2018) mentioned a formula, τy = ρd|W|v, addressing the relationship between the meridional wind stress (τy ), wind speed (W ), and v (meridional wind component, or more informally known as the vertical component of the wind). +One of the prominent coastal upwelling characteristics is a parallel wind direction along the coast (Lill, 1978). Strong winds can also cool the ocean surface, promoting the conditions and occurrence of upwelling (Kim et al., 2023). Longshore surface wind is also a major factor in mass transport and upwelling intensity, especially in the case of wind stress (Nigam et al., 2018). -### Global Ocean Physics Reanalysis +### Global Ocean Physics Reanalysis (GLORY) -The Global Ocean Physics Reanalysis (product code name GLORYS12V1) is the first version of a Copernicus Marine Environment Monitoring Service (CMEMS)’s reanalysis product covering an 5000m elevation range from 1993, with models used for reanalysis simi- lar to ERA5’s (European Union-Copernicus Marine Service, 2018). It covers a wide range of variables such as sea surface temperature, salinity, or mixed layer thickness that is relevant to our project. The latter two are extracted from this data source into our composite dataset, whose data is resampled and interpolated using arithmetic mean to fit in our collection resolution from the original 1/12◦ horizontal resolution (Jean-Michel et al., 2021). +The Global Ocean Physics Reanalysis (product code name GLORYS12V1) is the first version of a Copernicus Marine Environment Monitoring Service (CMEMS)’s reanalysis product covering an 5000m elevation range from 1993, with models used for reanalysis similar to ERA5’s (European Union-Copernicus Marine Service, 2018). It covers a wide range of variables relevant to our project. Salinity (`so`) and mean mixed layer thickness (`mlotst`) were extracted from this data source. The GLORY data were down-sampled using arithmetic mean from the original 1/12◦ resolution to our 0.25◦ resolution (Jean-Michel et al., 2021). #### Salinity -We used in-situ salinity covered at the most shallow point on the elevation range at 0.49 meters below the surface with bias reduced using 3D-VAR scheme correction (Jean- Michel et al., 2021). During coastal upwelling in the West Indian Ocean, subsurface water, which is more salined, rises up to the surface, bringing additional salinity to the surface water which is diluted by heavy precipitation in the monsoon season (Awo et al., 2022; Sreenath et al., 2022). Note that this is not always the case. For example, the Coast of Bengal, where river discharges and rainfall combined created a thick barrier and shallow mixed layer, preventing salinity to reach the surface (Vinayachandran et al., 2002; Lahiri and Vissa, 2022). Despite that, coastal upwelling still occurs in the area with the aid from the impact of seasonal monsoon (Ray et al., 2022). Note that the Coast of Bengal is not covered in our region of interest, so any concerns about the outcome of this variable is to not confused with the inherit nature of the area. +We used in-situ salinity covered at the most shallow point on the elevation range at 0.49 meters below the surface with bias reduced using 3D-VAR scheme correction (Jean- Michel et al., 2021). During coastal upwelling in the West Indian Ocean, subsurface water, which is more salined, rises up to the surface, bringing additional salinity to the surface water which is diluted by heavy precipitation in the monsoon season (Awo et al., 2022; Sreenath et al., 2022). Note that this is not always the case. For example, the Coast of Bengal, where river discharges and rainfall combined created a thick barrier and shallow mixed layer, preventing salinity to reach the surface (Vinayachandran et al., 2002; Lahiri and Vissa, 2022). Despite that, coastal upwelling still occurs in the area with the aid from the impact of seasonal monsoon (Ray et al., 2022). #### Mixed Layer Thickness Defined by Sigma T @@ -34,19 +42,19 @@ During our search of possible variables to add in our blended dataset product, w ### Global Ocean Colour (Copernicus-GlobColour) -The Ocean Colour Thematic Assembly Centre (OCTAC) currently provide global and regional high quality data products used by mostly intergovermnetal bodies and EU insti- tuitons, focusing on mostly ecosystem model assimilation and validation (European Union- Copernicus Marine Service, 2022). The Global Ocean Colour dataset (code name OCEAN-COLOUR GLO BGC L4 MY 009 104), based on data validated using the GlobColour processor owned by Copernicus, output daily and monthly data on a 4km × 4km spatial resolution covering data from September 1997. This wide temporal extent provide our machine learning tasks with plentiful data to train and validate with, and better cover the exten- sive range that the ERA5 variables do comparing to comparable datasets such as NASA’s MODIS-Aqua (NASA Ocean Biology Processing Group, 2015). +The Ocean Colour Thematic Assembly Centre (OCTAC) currently provide global and regional high quality data products used by mostly intergovermnetal bodies and EU institutions, focusing on mostly ecosystem model assimilation and validation (European Union Copernicus Marine Service, 2022). The Global Ocean Colour dataset (code name OCEAN-COLOUR GLO BGC L4 MY 009 104), based on data validated using the GlobColour processor owned by Copernicus, output daily and monthly data on a 4km × 4km spatial resolution covering data from September 1997. #### Gapfree chlorophyll-a concentration and uncertainty (Level 4) -When upwelling happens, we can also observe an increase in nutrient-rich near-surface waters (Benazzouz et al., 2014), in which wind (convective) mixing and upwad nutrient fluxes to the subsurface zone leads to phyto- plankton bloom and chlorophyll-a production (Lahiri and Vissa, 2022; Brock et al., 1991). Cold waters being mixed rise above the thermocline to the surface, promoting the growth of species in unfavorable environments, which also contains chlorophyll-a (Alvarez et al., 2010). Therefore, high chlorophyll-a concentrations at the sea surface level can imply whether up- welling is happening. However, based on empirical data processing, we noticed that there is a lot of missing data, which is also confirmed in Park et al. (2020)’s paper. Many factors are weighed in, including phytoplankton’ photosyntehtic parameters, seawater optical complexity, or flog and clouds peristence due to seasonal monsoon, leading to rain. S. Yu et al. (2022)’s dataset, while addressed this issue, does not issue a daily resolution dataset that we need to incorporate into our product. +When upwelling happens, we can also observe an increase in nutrient-rich near-surface waters (Benazzouz et al., 2014), in which wind (convective) mixing and upward nutrient fluxes to the subsurface zone leads to phytoplankton bloom and chlorophyll-a production (Lahiri and Vissa, 2022; Brock et al., 1991). Cold waters being mixed rise above the thermocline to the surface, promoting the growth of species in unfavorable environments, which also contains chlorophyll-a (Alvarez et al., 2010). Therefore, high chlorophyll-a concentrations at the sea surface level can imply whether upwelling is happening. However, there is still missing data in this 'gap-free' product, which is also confirmed in Park et al. (2020)’s paper. Many factors are weighed in, including phytoplankton’ photosyntehtic parameters, seawater optical complexity, or flog and clouds peristence due to seasonal monsoon, leading to rain. S. Yu et al. (2022)’s dataset, while addressed this issue, does not issue a daily resolution dataset that we need to incorporate into our product. ### Global Ocean Gridded L4 Sea Surface Heights And Derived Variables Reprocessed 1993 Ongoing -The dataset (code name SEALEVEL GLOB PHY L4 MY 008 047) is part of the Sea Level Altimeter product family, providing multiyear records of sea surface height anomalies and derived variables for the whole global ocean (European Union-Copernicus Marine Ser- vice, 2021). It has a 0.25◦ × 0.25◦ spatial resolution, similar to the standard ERA5’s that we based on, and covered an impressive temporal range from 1993 to 2022. +The dataset (code name SEALEVEL GLOB PHY L4 MY 008 047) is part of the Sea Level Altimeter product family, providing multiyear records of sea surface height anomalies and derived variables for the whole global ocean (European Union-Copernicus Marine Service, 2021). It has a 0.25◦ × 0.25◦ spatial resolution, similar to the standard ERA5’s that we based on, and covers the temporal range from 1993 to 2022. DOI: https://doi.org/10.48670/moi-00148 #### Sea surface height above geoid and sea surface height above level (sea level height anomaly) -Wind components may be a good starting point to investigate the state of coastal upwelling, but comparing to sea surface height anomaly, the latter is more directly involved, through changes in the the thermocline and isothermal layer depth changes (Zhang and Mochizuki, 2022; L. Yu, 2003). We have been searching to no avail for public D20 (20◦ isothermal layer depth) or D20 anomaly dataset that satisfies our resolution and coverage requirements. Zhang and Mochizuki (2022) calculated this variable using monthly ocean temp data, but no specific formulae/method is disclosed. We resort to sea surface height anomalies data as the variable is somewhat related to the former variable itself, albeit not completely linear due to complex involvements of other variables in our blended data product, like salinity and temperature (L. Yu, 2003). +Wind components may be a good starting point to investigate the state of coastal upwelling, but sea surface height anomaly is more directly involved through changes in the the thermocline and isothermal layer depth changes (Zhang and Mochizuki, 2022; L. Yu, 2003). We have been searching to no avail for public D20 (20◦ isothermal layer depth) or D20 anomaly dataset that satisfies our resolution and coverage requirements. Zhang and Mochizuki (2022) calculated this variable using monthly ocean temp data, but no specific formulae/method is disclosed. We resort to sea surface height anomalies data as the variable is somewhat related to the former variable itself, albeit not completely linear due to complex involvements of other variables in our blended data product, like salinity and temperature (L. Yu, 2003). ### Ocean Surface Current Analyses Real-time (OSCAR) Version 2.0 @@ -58,37 +66,20 @@ There are many papers discussing the correlation of currents, especially surface ### SRTM30+ Global 1-km Digital Elevation Model Version 11: Bathymetry -A product from the Scripts Institution of Oceanography of the University of California San Diego, the SRTM30 Plus’s bathymetry data is based on a satellite-gravity model with a heavily calibrated gravity-to-topography ratio from over two hundred millions soundings (Becker et al., 2009). Given the ultrahigh resolution and precision of the map, we subsetted and provided two bathymetry maps covering our region of interest with different resolu- tions, one with standard 0.25◦ × 0.25◦, and another one with finer resolution using as a basemap/map background due to any graphing of the variables. The former one is included in our final assembled product, and therefore easier to extract and use within the scope of the dataset (which means, we can extract the bathymetry value at a predetermined point right away without having to interpolate from an outside source import). +A product from the Scripts Institution of Oceanography of the University of California San Diego, the SRTM30 Plus’s bathymetry data is based on a satellite-gravity model with a heavily calibrated gravity-to-topography ratio from over two hundred millions soundings (Becker et al., 2009). Given the ultra high resolution and precision of the map, we subsetted and provided two bathymetry maps covering our region of interest with different resolutions, one with standard 0.25◦ × 0.25◦ included in the Zarr files and another one with finer resolution using as a basemap/map background due to any graphing of the variables. #### Bathymetry -There have been multiple studies on the relationship between coastal up- welling, such as Garvine (1973) or Lill (1978). They proposed that the ocean depth (and inherently the ocean floor shape or bathymetry) can determine the motion of the subsurface return flow, one of the two principle layers of the upwelling motion of homogeneous water. The topographic variation also influences the water circulation, such as disrupting or redi- recting flows along coasts, weakening them, or increasing their strength to enhance mixing (Pitcher et al., 2010). - -## Zarr Format - -Zarr, short for “zarr array,” is a storage format specifically designed for efficient, scalable, and parallelizable access to multi-dimensional typed arrays (tensors), making it an ideal choice for managing Earth observation data, especially for cloud-hosted data (cite gowen (Moore et al., 2023). It is developed as an open-source project by using referencible associated metadata and binary data called “chunks” stored in “formatted” directories, it leverages modern data storage technologies, such as chunked and compressed arrays, to optimize storage and retrieval, and reducing access latency. - -Gobet and Lane (2012), in their “Encyclopedia of the Sciences of Learning,” referred to ”chunk” as a high-level, “meaningful unit of information built from smaller pieces of in- formation.” Therefore, chunking is the mechanism, or the process of creating those objects. While Miller (1956) introduced the term in his 1956 paper as a way of “breaking up long strings of information into units”, it is not until de Groot, in his study of chess experts and observed this phenomenon in nature, developed the concept as a mean of transforming information after observing their capabilities to retain precise brief information on presented chess positions (Frey & Adesman, 1976). Further research explores the concept as a mea- surement of human cognitive system, and how chunking can be interpreted as an automatic learning processing to “recode information in a more efficient way” (Gobet & Lane, 2012). This definition can somewhat explain the overall concept of chunking in informatics (the field of information), given how it essential divides large datasets into manageable blocks (comparable to “recode information” or more technically speaking, “encoding”). In the Xar- ray library we mostly used in this project, chunking, by default, is mostly handled by the underlying Dask arrays, where NumPy (or NumPy-like) arrays are broken down and rear- ranged to speed up certain algorithms (Hoyer et al., 2022; Dask Development Team, 2016). Resulted chunks then are mapped to metadata files in JSON (JavaScript Object Notation) so that the overall structure and content description can be read quickly (Vance et al., 2019; Miles et al., 2023). - -Therefore, understanding chunking, as a mechanism, is crucial in understanding how Zarr works so that it can provide seamless access to very large datasets, often in the terabyte and petabyte range. Additionally, its support for chunked storage, data compression over deduplication, and lightweight metadata management optimizes storage, enables parallel processing, and minimizes access latency, all of which are critical when dealing with extensive datasets which we will incorporate from the above data sources (Miles et al., 2023). - -Zarr’s special structure makes it ideal to be stored on the cloud (Pollack, 2023). As modern data storage technologies and strategies, including distributed storage systems and cloud-based solutions, play a crucial role in efficiently managing and accessing large datasets, cloud-based datasets can ensure researchers that they can work with these data resources effectively. - -By consolidating our data sources into a Zarr file, we aimed to create an easy-to-use analysis-ready data cube, bringing together multiple diverse datasets onto the same grid for seamless integration. This preprocessing step not only harmonized the data but also ensured that it adhered to a consistent spatial and temporal framework, facilitating efficient and comprehensive analyses of Earth’s oceans. This approach aligns with the broader shift in the remote-sensing community towards providing researchers with analysis-ready datasets, thus enabling more accessible and reproducible scientific investigations. - -## Data Blending and Processing +There have been multiple studies on the relationship between coastal upwelling, such as Garvine (1973) or Lill (1978). They proposed that the ocean depth (and inherently the ocean floor shape or bathymetry) can determine the motion of the subsurface return flow, one of the two principle layers of the upwelling motion of homogeneous water. The topographic variation also influences the water circulation, such as disrupting or redirecting flows along coasts, weakening them, or increasing their strength to enhance mixing (Pitcher et al., 2010). ### Computed Variables -To alleviate the usage of the product, we also pre-computed absolute speed and direction using vertical and horizontal components of our wind and (near-)surface current variables. For speed, we utilized a simple Pythagorean theorem approach where +We pre-computed absolute speed and direction using u- and v- components of our wind and (near-)surface current variables. For speed, we utilized a simple Pythagorean theorem approach where + $$ v = \sqrt{v_x^2 + v_y^2} $$ with $v$ as vector-less speed, and $v_x$ and $v_y$ as horizontal and vertical velocity components, respectively. For direction, we utilized NumPy’s `arctan2()` function and then convert radians to degrees using their `rad2deg()` function, with the latter chosen as degrees are more commonly used in meteorology than radians due to its unique conventions comparing to the standard mathematical Cartesian plane’s (Harris et al., 2020; “Meteorological Concentions”, 2022). -### Interpolation - -Due to the mismatched spatial grid configurations between our original datasets, we also applied linear interpolation on, both spatially and temporally, on all of our data variables so that they all follow an average daily temporal, 0.25◦ × 0.25◦ spatial grid. While temporal ranges may vary among data sources, with an insistence on available data of at least twenty- one years from 2000-2020, we enforced a spatial range of −12◦S → 32◦N, 42◦E → 102◦E across all variables. -