Big Data Analysis

1. Analyzing fishing activity from AIS (Automatic identification system) data

Technologies:

Python
MongoDB

Dataset:

https://zenodo.org/record/1167595#.X_0AltgzaUl
Refers to real data from October 1st,2015 to March 31,2016, in Celtic Sea, Channel and Bay of Biscay (France).

Topics:

Fishing activity report
Fishing vessels trajectory clustering (using Hausdorff distance)
Fishing spots clustering
Finding illegal fishing activities
Clustering algorithms used:
   * KMeans
   * DBSCAN
   * Optics
   * Birch

Files:

FishingAnalysisAIS.pdf (Greek)
nosql_db.ipynb (Add data to local mongodb )
mongodbProject.ipynb

2. Geospatial Queries on Big Data

Technologies

Apache Spark
Scala

Dataset:

Testing dataset: Hotels and Restaurants Worldwide
Any two geospatial datasets can be used with a few modifications of the code
Data can be downloaded from https://www.dropbox.com/s/dis84tm0r5vyzy7/hotels_restaurants.rar?dl=0

Topics

Geospatial data indexing and partitioning in a cluster
Geospatial join between two datasets given a distance d
  i.e.: Find all hotels close to all restaurants by a distance no more than d
Manual creation of a space-partitioning algorithm (for learning purposes. Well known libraries already exist for this task)
For distance metric, Haversine formula is used

Files

GeoQueries_Spark.jar 
        or
GeoQueries_Spark.scala

Instructions (for jar)

run: spark-submit --master local[*] --class "app"  JAR_FILE d n file1 file2
where:
    JAR_FILE:  jar file location
    d: distance in kilometers
    n: number of partitions to make
    file1,file2: locations of the two datasets to join

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Big Data Analysis

1. Analyzing fishing activity from AIS (Automatic identification system) data

Technologies:

Dataset:

Topics:

Files:

2. Geospatial Queries on Big Data

Technologies

Dataset:

Topics

Files

Instructions (for jar)

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
FishingAnalysisAIS.pdf		FishingAnalysisAIS.pdf
GeoQueries_Spark.jar		GeoQueries_Spark.jar
GeoQueries_Spark.scala		GeoQueries_Spark.scala
README.md		README.md
mongodbProject.ipynb		mongodbProject.ipynb
nosql_db.ipynb		nosql_db.ipynb

leandrosev/Big-Data-Analysis

Folders and files

Latest commit

History

Repository files navigation

Big Data Analysis

1. Analyzing fishing activity from AIS (Automatic identification system) data

Technologies:

Dataset:

Topics:

Files:

2. Geospatial Queries on Big Data

Technologies

Dataset:

Topics

Files

Instructions (for jar)

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages