Skip to content

A library for parsing and querying shapefile data with Apache Spark, for Spark SQL and DataFrames.

License

Notifications You must be signed in to change notification settings

mjohns-databricks/spark-shp

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Shapefile Data Source for Apache Spark

A library for parsing and querying shapefile data with Apache Spark, for Spark SQL and DataFrames.

Requirements

This library requires Spark 2.0+

Using with Spark shell

$SPARK_HOME/bin/spark-shell --packages com.esri:spark-shp:0.8

Features

This package allows reading shapefiles in local or distributed filesystem as Spark DataFrames. When reading files the API accepts several options:

  • path The location of shapefile(s). Similar to Spark can accept standard Hadoop globbing expressions.
  • shape An optional name of the shape column. Default value is shape.
  • columns An optional list of comma separated attribute column names. Default value is blank indicating all attribute fields.
  • format An optional parameter to define the output format of the shape field. Default value is SHP. Possible values are:

SQL API

CREATE TABLE gps
USING com.esri.spark.shp
OPTIONS (path "data/gps.shp")

Python API

df = spark.read \
    .format("com.esri.spark.shp") \
    .options(path="data/gps.shp", columns="atext,adate", format="GEOJSON") \
    .load() \
    .cache()

Building From Source

This library is built using Apache Maven. To build the jar, execute the following command:

mvn clean install

Data

Create Conda Env

export ENV=spark-shp
conda remove --yes --all --name $ENV
conda create --yes --name $ENV python=3.6
source activate $ENV
conda install --yes --quiet -c conda-forge\
    jupyterlab\
    tqdm\
    future\
    matplotlib=3.1\
    gdal=2.4\
    pyproj=2.2\
    shapely=1.6\
    pyshp=2.1

About

A library for parsing and querying shapefile data with Apache Spark, for Spark SQL and DataFrames.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 53.9%
  • Scala 46.1%