forked from jimcrozier/fars_data
-
Notifications
You must be signed in to change notification settings - Fork 0
alexhwoods/fars_data
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
The FARS dataset contains information on every fatal accident in the US from 1975 - 2015. There are roughly 35k accidents per year so the data starts getting pretty big. This analysis is the beginning skelton of a project to pull down all data from the FARS dataset, add them to hdfs, and analyze with sparklyr. To run, make sure that you have postgres, hdfs and hive all set with the specifications in the code, and that the absolute working directories are changed to yours. You will also need access to the shell, so I am not sure how well this will work in windows. 1. download_data_gen_shell.R #make sure to run the shell script that it creates 2. load_data_into_postgres.R 3. sqoop_from_postgres_into_hive_shell_gen.R #make sure to run the shell script that it creates 4. hive_concat_all_years_shell_gen.R #make sure to run the shell script that it creates 5. spark_analysis.R
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published
Languages
- R 100.0%