OpenCaseStudies

Important links

HTML: https://www.opencasestudies.org/ocs-bp-RTC-analysis
GitHub: https://github.com//opencasestudies/ocs-bp-RTC-analysis
Bloomberg American Health Initiative: https://americanhealth.jhu.edu/open-case-studies
Wrangling HTML: https://www.opencasestudies.org/ocs-bp-RTC-wrangling
Wrangling GitHub: https://github.com//opencasestudies/ocs-bp-RTC-wrangling

Disclaimer

The purpose of the Open Case Studies project is to demonstrate the use of various data science methods, tools, and software in the context of messy, real-world data. A given case study does not cover all aspects of the research process, is not claiming to be the most appropriate way to analyze a given dataset, and should not be used in the context of making policy decisions without external consultation from scientific experts.

License

This case study is part of the OpenCaseStudies project. This work is licensed under the Creative Commons Attribution-NonCommercial 3.0 (CC BY-NC 3.0) United States License.

Citation

To cite this case study:

Wright, Carrie and Ontiveros, Michael and Meng, Qier and Jager, Leah and Taub, Margaret and Hicks, Stephanie. (2020). https://github.com//opencasestudies/ocs-bp-RTC-analysis. Influence of Multicollinearity on Measured Impact of Right-to-Carry Gun Laws (Version v1.0.0).

Acknowledgments

We would like to acknowledge Daniel Webster for assisting in framing the major direction of the case study. We would also like to thank Elizabeth Stuart and Aboozar Hadavand, and Alexander McCourt for reviewing the case study.

We would like to acknowledge Michael Breshock for his contributions to this case study and developing the OCSdata package.

We would also like to acknowledge the Bloomberg American Health Initiative for funding this work.

Reading Metrics

The total reading time for this case study was calculated with koRpus: **~ 60 minutes**

The Flesch-Kincaid Readability Index was also calculated with koRpus: Grade 11, Age 16

Title

Influence of Multicollinearity on Measured Impact of Right-to-Carry Gun Laws Part 2

Motivation

The influence of the implementation of less restrictive right-to-carry gun laws on violent crime is a historically controversial topic. One reason for the controversy, is concern that some earlier reports examining this topic may have used methods that were inappropriate.

One of the major concerns is that an earlier report included multiple demographic variables that were collinear with one another. This resulted in different a very coefficient estimate for right-to-carry gun law adoption than other reports that did not include collinear variables. This phenomenon is called multicollinearity, and it can result in aberrant findings for particular explanatory variables, despite not altering the overall predictive power of a model.

In this case study we use data perform simplified analyses similar to those of reports on this topic to explore the influence of multicollinearity on coefficient estimate stability. We however, do not recreate the previous analyses. The reports that we use as a guide for our analysis are:

John J. Donohue et al., Right‐to‐Carry Laws and Violent Crime: A Comprehensive Assessment Using Panel Data and a State‐Level Synthetic Control Analysis. Journal of Empirical Legal Studies, 16,2 (2019).
David B. Mustard & John Lott. Crime, Deterrence, and Right-to-Carry Concealed Handguns. Coase-Sandor Institute for Law & Economics Working Paper No. 41, (1996).

Motivating question

What is the effect of multicollinearity on coefficient estimates from linear regression models when analyzing right to carry laws and violence rates?

Data

In this case study, we perform analyses similar to those in Donohue, et al. article and the Lott and Mustard article, however we do not try to recreate them, instead we perform simplified analyses to allow us to focus on multicollinearity.

Therefore we use a subset of the explanatory variables used by each article including:

Data about state demographics in terms of population compositions for age, sex, race, as well as overall population values from the US Census Bureau:

Data	Link
years 1977 to 1979	link
years 1980 to 1989	link * county data was used for this decade which also has state information
years 1990 to 1999	link
years 2000 to 2010	link technical documentation

Six demographic variables are created for the Donohue, et al.-like analysis and 36 were created for the Lott and Mustard-like analysis.

To use this data, we also need Federal Information Processing Standard (FIPS) state codes{target="_blank", to identify what demographic data corresponds to what state. This is also available from the US Census Bureau.

Police staffing data, which was downloaded from the Federal Bureau of Investigation
Unemployment data, which was downloaded from the U.S. Bureau of Labor Statistics.
Poverty data, extracted from Table 21 from this US Census Bureau Poverty Data
Right-to-carry law data, which is available in a table in the Donohue paper

Finally our outcome of interest is violent crime rates. The violent crime data was downloaded from the FBI uniform crime reporting system

Learning Objectives

The skills, methods, and concepts that students will be familiar with by the end of this case study are:

Data Science Learning Objectives:

Create correlation scatterplots and heatmaps (GGally, ggcorrplot)
Creating interactive tables (DT)
Sampling subsets of data (rsample)
Combining multiple plots (cowplot)
Data visualizations with equations and text(ggplot2 and latex2exp)

Statistical Learning Objectives:

Understanding of what multicollinearity is and how it can influence linear regression coefficients
Know how to look for the presence of multicollinearity and determine its severity
Illustrate the difference between multicollinearity and correlation
How to implement panel regression analysis in R (plm)
Define variance inflation factor (VIF) and know how to calculate in R (car)

To see another case study about how the original raw data was imported and wrangled please see here.

Data import and wrangling

See the part 1 case study for the data import and data wrangling details.

Data Visualization

This case study demonstrates how to make correlation plots and scatter plots with error bars. We also show how to add formulas and arrows to plots. The instruction about data visualization assumes that students have some familiarity with ggplot2.

Analysis

This case study covers balanced panel regression model data analysis with fixed effects. In doing so we provide an introduction to longitudinal analysis in general, as well as use of the plm package. We also show how to calculate Variance inflation factor (VIF) values using the car package to quantify the severity of multicollinearity. As another assessment of multicollinearity, we demonstrate how to perform simulations to evaluate the stability of coefficient estimates.

Other notes and resources

Tidyverse
Please see this case study for more details on using ggplot2
Longitudinal studies
Panel data
Confidence intervals
Linear regression
panel regression analysis
Hausmen test Resampling
Variance inflation factor (VIF)
R² coefficient of determination
Ridge regression
LaTeX mathematical notationtarget="_blank"}

For more information on linear regression see this book and this case study.

For more information on the different types of panel regression models see this book, here, and here.

For more information on implementing panel regression in R using the plm package, see here and here.

For more information on multicollinearity and VIF, see this article.DOI 10.1007/s11135-006-9018-6

The articles used to motivate this case study are:
Lott and Mustard
Donohue, et al.
See here for a list of studies on this topic

Packages used in this case study:

Package	Use in this case study
here	to easily load and save data
dplyr	to arrange/filter/select/compare specific subsets of the data
magrittr	to use the compound assignment pipe operator `%<>%`
purrr	to import the data in all the different excel and csv files efficiently
tibble	to create data objects that we can manipulate with `dplyr`/`stringr`/`tidyr`/`purrr`
ggplot2	to create plots
ggrepel	to allow labels in figures not to overlap
plm	to work with panel data fitting fixed effects and linear regression models
broom	to create nicely formatted model output
GGally	to extend ggplot2 functionality to easily create more complex plots
ggcorrplot	to easily visualize a correlation matrix
rsample	to split our sample for the simulation analysis
DT	to create interactive and searchable tables
car	to calculate VIF values on linear model output
stringr	to manipulate the character strings within the data
cowplot	to allow plots to be combined
latex2exp	to convert latex math formulas to R’s plotmath expressions

For users

There is a Makefile in this folder that allows you to type make to knit the case study contained in the index.Rmd to index.html and it will also knit the README.Rmd to a markdown file (README.md).

For instructors

If instructors want more details about the data import and wrangling for the data used in this analysis, start with this case study.

Target audience

For individuals or classes with some familiarity with regression and ggplot2. See this case study for an introduction to regression.

Suggested homework

Ask students to remove one or more of the demographic variables with high VIF values from the Lott-like panel data and perform the panel linear regression analysis again, as well as actuate the VIF values.

Ask the students to discuss how this possibly changed the results.

Estimate of RMarkdown Compilation Time:

~ About 39 - 49 seconds

This compilation time was measured on a PC machine operating on Windows 10. This range should only be used as an estimate as compilation time will vary with different machines and operating systems.

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
.github/workflows		.github/workflows
data		data
docs		docs
img		img
site_libs		site_libs
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
GA_Script.Rhtml		GA_Script.Rhtml
README.Rmd		README.Rmd
README.md		README.md
index.html		index.html
index.rmd		index.rmd
ocs-bp-right-to-carry.Rproj		ocs-bp-right-to-carry.Rproj
style.css		style.css

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenCaseStudies

Important links

Disclaimer

License

Citation

Acknowledgments

Reading Metrics

Title

Motivation

Motivating question

Data

Learning Objectives

Data import and wrangling

Data Visualization

Analysis

Other notes and resources

For users

For instructors

Target audience

Suggested homework

Estimate of RMarkdown Compilation Time:

About

Releases

Packages

Contributors 6

Languages

opencasestudies/ocs-bp-RTC-analysis

Folders and files

Latest commit

History

Repository files navigation

OpenCaseStudies

Important links

Disclaimer

License

Citation

Acknowledgments

Reading Metrics

Title

Motivation

Motivating question

Data

Learning Objectives

Data import and wrangling

Data Visualization

Analysis

Other notes and resources

For users

For instructors

Target audience

Suggested homework

Estimate of RMarkdown Compilation Time:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages