Skip to content

UofT course final project PDF and Jupyter notebook investigating how Neo4j can outperform PostgreSQL for large interconnected datasets

Notifications You must be signed in to change notification settings

kcarmonamurphy/flightly

Repository files navigation

flightly

This repo is a collection of resources from my final project for the SCS 3252:017 Big Data Management Systems & Tools course (University of Toronto) taken in the last four months of 2019. It includes my final project submission PDF Comparing Neo4j with PostgreSQL as well as the accompanying Jupyter notebook.

TL;DR

Neo4j can be substantially faster for certain types of queries because of the benefits gained from index-free adjacency.

Example: Get the names of destination airports from all flights originating in Wyoming

Cypher (Neo4j query language):

MATCH (hi:Airport {state: 'WY'})-[:HAS_DEPARTURE]->(fl:Flight)-[:FLIES_TO]->(ap:Airport)

Postgres SQL:

RETURN DISTINCT ap.name
SELECT name from airports
JOIN flights ON (airports.iata = flights.destination_airport)
WHERE flights.origin_airport IN
(SELECT iata from airports WHERE airports.state = 'WY')
Attempt 1 2 3 Average
Neo4j 0.29222047s 0.34756797s 0.30003158s 0.32918392s
PostgreSQL 2.32982001s 2.14832110s 2.39080437s 2.29873976s

About

UofT course final project PDF and Jupyter notebook investigating how Neo4j can outperform PostgreSQL for large interconnected datasets

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published