In organizations with limited resources and siloed teams, data sharing is a clunky process. This is especially the case when one department produces a large file (say, 12GB), and another team must provide analysis within a tight deadline.
These instances inspired me to explore the possibilities of Apache Arrow. I utilized the following resources:
- R for Data Science Chapter 22: Arrow by Hadley Wickham
- Doing More With Data: An Introduction to Arrow for R Users by Danielle Navarro at Voltron Data
- Using the {arrow} and {duckdb} packages to wrangle medical datasets that are Larger than RAM by Peter Higgins at R Consortium
The Fire Department of New York City (FDNY) maintains data produced by their EMS dispatch system. The EMS Incident Dispatch Data file contains 27M records with information relating to incident location, perceived call severity, and Fire Department response time.
This project was produced with R version 4.3.1.