Skip to content

Causality for Earth science - A Review on Time-series and Spatiotemporal Causality Methods

Notifications You must be signed in to change notification settings

big-data-lab-umbc/Causality-For-Earth-science

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 

Repository files navigation

Causality-For-Earth-science - A Review on Time-series and Spatiotemporal Causality Methods

Abstract

This survey paper covers the breadth and depth of time-series and spatiotemporal causality methods, and their applications in Earth science. The paper first introduces the concepts of causal discovery and causal inference, followed by the underlying causal assumptions, evaluation techniques and key terminologies of the two domain areas. The paper elicits the various state-of-the-art methods introduced for time-series and spatiotemporal causal analysis along with their strengths and limitations. The paper further describes the existing applications of several methods for answering specific Earth science questions such as extreme weather events, sea level rise, teleconnections, etc. Our survey paper will benefit the Earth science community interested in taking an AI-driven approach to study the causality of different dynamic and thermodynamic processes as we present the open challenges and opportunities in performing causality-based Earth science study. It will also serve as a primer for data science researchers interested in data-driven causal study as we share a holistic list of resources, such as Earth science datasets (synthetic, simulated and observational data) and open source tools for causal analysis.

Datasets for Causal Analysis

Title: Synthetic, simulated, and real world datasets used for causal analysis}

Dataset Description Data Type Causal Category Dataset Type
CausalWorld This open-source causal structure learning benchmarking data generation platform contains a robotic environment manipulation dataset for different tasks. The generated datasets represent different causal structures of interacting objects like robot and object masses, colors, sizes, etc. Different causal studies like do-interventions, counterfactual situations, structure learning, inference, etc. can be performed and evaluated using this platform. Time-series Causal Discovery/Inference Realistic Simulated
Harvard Dataverse Contains six synthetic datasets representing different causal structures. The time series datasets are generated using a nonlinear function of cause variables, linear self-causation and additive Gaussian noise. Time series Causal Discovery Synthetic
FLAIRS This resource contains 22 simulated time series datasets. All datasets contain 20 continuous variables and 1000 time points with a lag of 1 and 3 time units. Time series Causal Discovery Synthetic
Diffusion Data This dataset contains 4000 samples of diffusion-based spatiotemporal images. The dataset contains 3 variables including treatment, time-varying confounder, and (factual and counterfactual) outcomes. Spatiotemporal Causal Inference Synthetic
North American Mesoscale (NAM) Generated by the National Centers for Environmental Prediction (NCEP) using the WRF Non-Hydrostatic Mesoscale Model. This is a spatiotemporal dataset of 12km resolution covering the continental United States and the data frequency is every 6 hours from 2012-01-01 00:00 to 2023-10-15 18:00. Different properties of Air Temperature, Geopotential Height, Humidity, Sea Level Pressure, Snow, Surface Pressure and Upper Level Winds are available in this simulation. Spatiotemporal Causal Discovery Realistic Simulated
NCEP-DOE Reanalysis 2 product The US National Centers for Environmental Protection (NCEP) and the Department of Energy (DOE) provide this dataset from 1979 to the present time. All available data is applied to a complex climate model to generate reanalysis data for unobserved locations and missing time steps. This is a large set of almost 40 atmospheric variables measured in the reanalysis dataset. The dataset covers 90N-90S, 0E-357.5E with a 2.5-degree latitude x 2.5-degree longitude global grid (144x73). Spatiotemporal Causal Discovery Realistic Simulated
Beijing Multi-Site Air-Quality Dataset This observational dataset was collected by the Beijing Municipal Environmental Monitoring Center and contains hourly observation of 6 pollutants in the air: CO, PM2.5, PM10, O3, NO2 and SO2, and 6 meteorological variables: air temperature, wind direction and speed, pressure, dew point temperature, and precipitation. These data were collected from 2013 to 2017. Time-series Causal Discovery/Inference Real-world
ERA5 The European Centre for Medium-Range Weather Forecasts (ECMWF) maintains this global climate and weather dataset. The hourly observations of the different atmosphere, land and oceanic variables are available in this dataset from 1940 to the present day for the whole globe and are updated daily for new data. Spatiotemporal Causal Discovery/Inference Real-world
Sea Ice Data This data collection is a polar sea ice observational dataset maintained by the National Snow and Ice Data Center (NSIDC). This dataset is collected from the Scanning Multichannel Microwave Radiometer (SMMR) instrument on the Nimbus-7 satellite and the Special Sensor Microwave/Imager (SSM/I). Several observational variables like sea ice concentration and extent, sea surface temperatures, wind stress, snow cover, rainfall rates, etc. are recorded in this dataset from 1978 to the present. Spatiotemporal Causal Discovery/Inference Real-world
Metropolit Cohort The Metropolit Cohort dataset contains data from 11532 humans born in 1953 and lived till 1968 in the Copenhagen Metropolitan area, Denmark. This dataset comprises physical, medical, mental, social and diagnosis information from different stages of life of these men collected from nationwide social and health registers. This is a very reliable dataset with minimal measurement error and strong validity. Time-series Causal Discovery Real-world
Lalonde This is a popular observational dataset collected from the National Supported Work Demonstration. The study examined how well a work training program (the treatment) affected a participant's actual wages a few years after the program's conclusion. Besides the treatment indicator the dataset provides demographic variables like age, race, academic background and previous real earnings for 260 controlled and 185 treated subjects (a total of 445) with the response (real earnings in the year 1978). In the values of the treatment assignment indicator variable 1 means treated and 0 means control/untreated. Time-series Causal Inference Real-world

About

Causality for Earth science - A Review on Time-series and Spatiotemporal Causality Methods

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published