ClimateSpark

Overreview

ClimateSpark is a Spark-based distributed computing framework to support big climate data management and analytics. It has the following capabilities:

Natively support HDF4 and NetCDF4 datasets stored in HDFS
Spatiotemporal query of multi-dimensional array-based climate data with high efficiency, e.g. high data locality, no redundant data reading
ClimateRDD: a multi-dimensional array-based data model for Spark to organize climate data
Basic climate data analytics

Tutorial

https://docs.google.com/document/d/1JIMLhNzXA_Ay-0P6yxzGfvUhajMLfJFyyduillzpeSI/edit

Extract the metadata: hadoop jar sia-core/target/sia-core-0.1.0.jar properties/sia_merra2_preprocessor.properties
Build index: hadoop jar sia-indexer/target/sia-indexer-0.1.0.jar properties/sia_merra2_indexer.properties

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
properties		properties
sia-climatespark		sia-climatespark
sia-core		sia-core
sia-indexer		sia-indexer
sia-mapreducer		sia-mapreducer
.gitignore		.gitignore
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ClimateSpark

Overreview

Tutorial

About

Releases

Packages

Languages

feihugis/ClimateSpark

Folders and files

Latest commit

History

Repository files navigation

ClimateSpark

Overreview

Tutorial

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages