diff --git a/content/plots/descriptive_plots/Death_rate_over_time_(by state).png b/content/plots/descriptive_plots/Death_rate_over_time_(by state).png new file mode 100644 index 0000000..8e3652f Binary files /dev/null and b/content/plots/descriptive_plots/Death_rate_over_time_(by state).png differ diff --git a/content/plots/descriptive_plots/Deaths_Histogram.png b/content/plots/descriptive_plots/Deaths_Histogram.png new file mode 100644 index 0000000..a9e1671 Binary files /dev/null and b/content/plots/descriptive_plots/Deaths_Histogram.png differ diff --git a/content/plots/descriptive_plots/Deaths_and_Population.png b/content/plots/descriptive_plots/Deaths_and_Population.png new file mode 100644 index 0000000..13a5e34 Binary files /dev/null and b/content/plots/descriptive_plots/Deaths_and_Population.png differ diff --git a/content/plots/descriptive_plots/Gun_Deaths_and_Population_over_time_(by year).png b/content/plots/descriptive_plots/Gun_Deaths_and_Population_over_time_(by year).png new file mode 100644 index 0000000..cb3a65a Binary files /dev/null and b/content/plots/descriptive_plots/Gun_Deaths_and_Population_over_time_(by year).png differ diff --git a/content/plots/descriptive_plots/Gun_Regulation_Index_over_time_(by state).png b/content/plots/descriptive_plots/Gun_Regulation_Index_over_time_(by state).png new file mode 100644 index 0000000..dc64214 Binary files /dev/null and b/content/plots/descriptive_plots/Gun_Regulation_Index_over_time_(by state).png differ diff --git a/content/plots/descriptive_plots/Gun_Regulation_index_in_states.png b/content/plots/descriptive_plots/Gun_Regulation_index_in_states.png new file mode 100644 index 0000000..aa4656d Binary files /dev/null and b/content/plots/descriptive_plots/Gun_Regulation_index_in_states.png differ diff --git a/content/plots/descriptive_plots/Index_score_count.png b/content/plots/descriptive_plots/Index_score_count.png new file mode 100644 index 0000000..57b686c Binary files /dev/null and b/content/plots/descriptive_plots/Index_score_count.png differ diff --git a/content/plots/descriptive_plots/Violent_crime_rate_and_population.png b/content/plots/descriptive_plots/Violent_crime_rate_and_population.png new file mode 100644 index 0000000..0fa6690 Binary files /dev/null and b/content/plots/descriptive_plots/Violent_crime_rate_and_population.png differ diff --git a/content/plots/descriptive_plots/Violent_crime_rate_over_time_by_state.png b/content/plots/descriptive_plots/Violent_crime_rate_over_time_by_state.png new file mode 100644 index 0000000..0627318 Binary files /dev/null and b/content/plots/descriptive_plots/Violent_crime_rate_over_time_by_state.png differ diff --git a/content/plots/explorative_plots/Death_Rate_over_time.png b/content/plots/explorative_plots/Death_Rate_over_time.png new file mode 100644 index 0000000..fbfde5a Binary files /dev/null and b/content/plots/explorative_plots/Death_Rate_over_time.png differ diff --git a/content/plots/explorative_plots/Death_rate_vs_Gun_Regulation.png b/content/plots/explorative_plots/Death_rate_vs_Gun_Regulation.png new file mode 100644 index 0000000..db2abf3 Binary files /dev/null and b/content/plots/explorative_plots/Death_rate_vs_Gun_Regulation.png differ diff --git a/content/plots/explorative_plots/General_gdvsgr.png b/content/plots/explorative_plots/General_gdvsgr.png new file mode 100644 index 0000000..9d7812f Binary files /dev/null and b/content/plots/explorative_plots/General_gdvsgr.png differ diff --git a/content/plots/explorative_plots/Mean_Violent_Crime_Rate_Over_Time.png b/content/plots/explorative_plots/Mean_Violent_Crime_Rate_Over_Time.png new file mode 100644 index 0000000..68ad100 Binary files /dev/null and b/content/plots/explorative_plots/Mean_Violent_Crime_Rate_Over_Time.png differ diff --git a/content/plots/explorative_plots/Mean_gin_regulation_index_over_time.png b/content/plots/explorative_plots/Mean_gin_regulation_index_over_time.png new file mode 100644 index 0000000..7f3b1e2 Binary files /dev/null and b/content/plots/explorative_plots/Mean_gin_regulation_index_over_time.png differ diff --git a/content/plots/explorative_plots/VCR_vs_DR_vs_GR.png b/content/plots/explorative_plots/VCR_vs_DR_vs_GR.png new file mode 100644 index 0000000..d011023 Binary files /dev/null and b/content/plots/explorative_plots/VCR_vs_DR_vs_GR.png differ diff --git a/content/plots/explorative_plots/Violent_Crime_vs_index_verylow.png b/content/plots/explorative_plots/Violent_Crime_vs_index_verylow.png new file mode 100644 index 0000000..5e25c43 Binary files /dev/null and b/content/plots/explorative_plots/Violent_Crime_vs_index_verylow.png differ diff --git a/content/plots/explorative_plots/Violent_crime_vs_Gun_death_allstates.png b/content/plots/explorative_plots/Violent_crime_vs_Gun_death_allstates.png new file mode 100644 index 0000000..f4f7df1 Binary files /dev/null and b/content/plots/explorative_plots/Violent_crime_vs_Gun_death_allstates.png differ diff --git a/content/plots/explorative_plots/Violent_crime_vs_gun_death_low.png b/content/plots/explorative_plots/Violent_crime_vs_gun_death_low.png new file mode 100644 index 0000000..2488fe6 Binary files /dev/null and b/content/plots/explorative_plots/Violent_crime_vs_gun_death_low.png differ diff --git a/content/plots/explorative_plots/Violent_crime_vs_gun_death_verylow.png b/content/plots/explorative_plots/Violent_crime_vs_gun_death_verylow.png new file mode 100644 index 0000000..e2c3f3a Binary files /dev/null and b/content/plots/explorative_plots/Violent_crime_vs_gun_death_verylow.png differ diff --git a/content/plots/explorative_plots/Violent_crime_vs_gun_deaths_medium.png b/content/plots/explorative_plots/Violent_crime_vs_gun_deaths_medium.png new file mode 100644 index 0000000..8183914 Binary files /dev/null and b/content/plots/explorative_plots/Violent_crime_vs_gun_deaths_medium.png differ diff --git a/content/plots/explorative_plots/Violent_crime_vs_index_allstates.png b/content/plots/explorative_plots/Violent_crime_vs_index_allstates.png new file mode 100644 index 0000000..45678a1 Binary files /dev/null and b/content/plots/explorative_plots/Violent_crime_vs_index_allstates.png differ diff --git a/content/plots/explorative_plots/Violent_crime_vs_index_comb.png b/content/plots/explorative_plots/Violent_crime_vs_index_comb.png new file mode 100644 index 0000000..814797f Binary files /dev/null and b/content/plots/explorative_plots/Violent_crime_vs_index_comb.png differ diff --git a/content/plots/explorative_plots/Violent_crime_vs_index_high.png b/content/plots/explorative_plots/Violent_crime_vs_index_high.png new file mode 100644 index 0000000..9c11782 Binary files /dev/null and b/content/plots/explorative_plots/Violent_crime_vs_index_high.png differ diff --git a/content/plots/explorative_plots/Violent_crime_vs_index_low.png b/content/plots/explorative_plots/Violent_crime_vs_index_low.png new file mode 100644 index 0000000..8bfa13f Binary files /dev/null and b/content/plots/explorative_plots/Violent_crime_vs_index_low.png differ diff --git a/content/plots/explorative_plots/Violent_crime_vs_index_medium.png b/content/plots/explorative_plots/Violent_crime_vs_index_medium.png new file mode 100644 index 0000000..3c4ecc1 Binary files /dev/null and b/content/plots/explorative_plots/Violent_crime_vs_index_medium.png differ diff --git a/content/plots/explorative_plots/Violent_crimes_vs_gun_death_high.png b/content/plots/explorative_plots/Violent_crimes_vs_gun_death_high.png new file mode 100644 index 0000000..9dc013f Binary files /dev/null and b/content/plots/explorative_plots/Violent_crimes_vs_gun_death_high.png differ diff --git a/content/plots/explorative_plots/Violent_crimes_vs_gun_deaths_comb.png b/content/plots/explorative_plots/Violent_crimes_vs_gun_deaths_comb.png new file mode 100644 index 0000000..1d0f562 Binary files /dev/null and b/content/plots/explorative_plots/Violent_crimes_vs_gun_deaths_comb.png differ diff --git a/content/plots/explorative_plots/comb_gdvsgr.png b/content/plots/explorative_plots/comb_gdvsgr.png new file mode 100644 index 0000000..72d36c1 Binary files /dev/null and b/content/plots/explorative_plots/comb_gdvsgr.png differ diff --git a/content/plots/explorative_plots/gd_gr_vc_3dplot.html b/content/plots/explorative_plots/gd_gr_vc_3dplot.html new file mode 100644 index 0000000..87ef429 --- /dev/null +++ b/content/plots/explorative_plots/gd_gr_vc_3dplot.html @@ -0,0 +1,1941 @@ + + + + +plotly + + + + + + + + + + + +
+
+
+ + + + diff --git a/content/plots/explorative_plots/gd_gr_vc_plot.png b/content/plots/explorative_plots/gd_gr_vc_plot.png new file mode 100644 index 0000000..f78fa68 Binary files /dev/null and b/content/plots/explorative_plots/gd_gr_vc_plot.png differ diff --git a/content/plots/explorative_plots/high_gdvsgr.png b/content/plots/explorative_plots/high_gdvsgr.png new file mode 100644 index 0000000..117796a Binary files /dev/null and b/content/plots/explorative_plots/high_gdvsgr.png differ diff --git a/content/plots/explorative_plots/low_gdvsgr.png b/content/plots/explorative_plots/low_gdvsgr.png new file mode 100644 index 0000000..3bb38fe Binary files /dev/null and b/content/plots/explorative_plots/low_gdvsgr.png differ diff --git a/content/plots/explorative_plots/medium_gdvsgr.png b/content/plots/explorative_plots/medium_gdvsgr.png new file mode 100644 index 0000000..ab8110b Binary files /dev/null and b/content/plots/explorative_plots/medium_gdvsgr.png differ diff --git a/content/plots/explorative_plots/verylow_gdvsgr.png b/content/plots/explorative_plots/verylow_gdvsgr.png new file mode 100644 index 0000000..3494971 Binary files /dev/null and b/content/plots/explorative_plots/verylow_gdvsgr.png differ diff --git a/content/posts/gun_deaths_paper.md b/content/posts/gun_deaths_paper.md index d3199ae..15aa256 100644 --- a/content/posts/gun_deaths_paper.md +++ b/content/posts/gun_deaths_paper.md @@ -1,6 +1,436 @@ --- title: "An analysis of the correlation between gun regulation and public safety in the United States of America" date: 2024-04-02T00:56:05+02:00 +author: Francesco Prem Solidoro, Michele Salvi --- -{{< load-plotly >}} -{{< plotly json="/plotly/gun_stuff.json" height="500px" modebar="false">}} + +   *Abstract*--- The aim of this paper is to provde a summary analysis +of the data regarding the effectiveness of gunrestriction policy in +diminishing the numberof gun-related deaths. We do so by conducting an +analysis of three different datasets, and by combining the data we +gathered from them, constructing a linear regression model. The +constructed linear regression model shows a negative correlation between +the gun-related death rate and the amount of legislation present within +the state. This leads us to believe that it's possible for legislation +to have a positive effect on gun-related deaths. + +  *Index terms*---Gun regulation, Gun death rate, Violent crime rate + +# Introduction + +The topic of gun regulation has been discussed at length in the United +States of America, because of the real and present danger that it poses +to Public Safety. There's little doubt that policy has potential to do +enourmous amounts of harm, as well as benefit, but it needs direction +from data. The aim of this paper is to conduct a summmary analysis of +the correlation between overall presence of gun restriction and the rate +of deaths caused by guns. + +## Paper overview + +In this paper, we analyse data coming from three different datasets: one +[@Lawprovisions] is a collection of all gun restiction laws that have +been put in place by individual states, the second [@Gundeaths] is a +collection of gun deaths by county. The third [@Violentcrimes] trakcs +violent crime rates. + +The scope of the paper is national, and we aim to obtain valuable +information by comparing the performance of different states within the +same country: we expect the cultural differences to be (even if still +present) less impactful than they would be in a inter-country +comparison, and thus hope to be able to eliminate most untraceable +biases that would arise from it. + +To reach our conclusions, we conducted an analysis of the 3 datasets, +using them as follows: + +- From the first dataset, we extracted an index to answer the question + 'how much regulation around guns is present in this state?' + +- From the second dataset, we extracted aggregated data by year and by + state regarding deaths and death rate by guns + +- From the third dataset, we extracted information about the change in + violent crime rates over time, to be able to set up a multiple + regressor linear regression model + +Overall, we found a negative correlation between the gun provision index +and the gun death rate, even when accounting for the changes in violent +crime. + +# Methods + +## Preparing the Datasets + +### Dataset 1 + +The first dataset is comprised of informations regarding deaths by guns +divided by county, but our analysis takes place at the state level. +Therefore, the dataset was grouped by state and we removed superfluous +variables like county code, state code and state initials. + +Additionally all of our datasets contain data compiled at different +points in time so the information was filtered to be in years +overlapping across all three datasets, the nineteen years from 1999 to +2017. + +The preparation was done in the following way: + +``` r +gundeaths_cut <- gun_deaths_us_1999_2019[-c(1, 3:5, 7)] +gundeaths_cut <- subset(gundeaths_cut, !gundeaths_cut$Year > 2017) +gundeaths_cut <- gundeaths_cut[-c(5:10)] +gundeaths_cond <- gundeaths_cut %>% + group_by(across(where(is.character))) %>% + summarise(across(where(is.numeric), sum, na.rm = T), .groups = "drop") +``` + +We then constructed a simple rate, balanced over 100,000 people, to +provide for the different populations between states. + +``` r +gundeaths_cond$Rate <- 0 +for (i in 1:nrow(gundeaths_cond)) { + r <- ((gundeaths_cond$Deaths[i] / gundeaths_cond$Population[i]) * 100000) + gundeaths_cond$Rate[i] <- r +} +``` + +The the final form of the dataset was: + +| **Year** | **State** | **Deaths** | **Population** | **Rate** | +| -------- | --------- | ---------- | -------------- | --------- | +| 1999 | Alabama | 605 | 3047241 | 19.854025 | +| ... | ... | ... | ... | ... | + +### Dataset 2 + +The second dataset contains all of the laws restricting gun usage for +each US state, represented by 133 binary variables. Aside from filtering +for the correct years a Index (referred to as "Gun Regulation Index" +from here on) was constructed usin min-max normalization. + +``` r +law_provision_norm <- subset(law_provision_norm, !law_provision_norm$year < 1999) +proc_lawprov <- preProcess(as.data.frame +(law_provision_norm$lawtotal), + method = c("range")) +law_provision_norm$index <- predict(proc_lawprov, as.data.frame(law_provision_norm$lawtotal)) +``` + +To account for differences, states were given scores, "Very Low" "Low" +"Medium" "High", according to their Gun Regulation Index. + +``` r +merged$Score <- ifelse(unlist(merged$index) <0.25, "Very Low",ifelse(unlist(merged$index) <0.5, "Low", ifelse(unlist(merged$index) <0.75, "Medium","High"))) +``` + +Final product: + +| **Year** | **State** | **Law Total** | **Index** | **Score** | +| -------- | --------- | ------------- | ----------- | --------- | +| 1999 | Alabama | 16 | 0.126213592 | Very Low | +| ... | ... | ... | ... | ... | + +### Dataset 3 + +The third dataset, mainly added for multiple linear regression purposes +in an attempt to account for Omittted Variable Bias, contains the raw +numbers of different types of crimes committed in each state. To keep +the data as closely related to guns as possible, and to make sure there +was no instance of double counting (it wasn't made clear what of the +specific crimes listed counted as a 'Violent Crime') we just included +"Violent Crimes". + +``` r +estimatedcrimes_cut <- subset(estimated_crimes, estimated_crimes$year >= 1999) +estimatedcrimes_cut <- subset(estimatedcrimes_cut, estimatedcrimes_cut$year <= 2017) +estimatedcrimes_cut <- subset(estimatedcrimes_cut, !estimatedcrimes_cut$state_name == "District of Columbia") +estimatedcrimes_cut <- subset(estimatedcrimes_cut, !estimatedcrimes_cut$state_name == "") +estimatedcrimes_cut <- estimatedcrimes_cut[-c(2,4,6:15)] +``` + +A Violent crime rate was then constructed, balanced on thousands of +people. + +``` r +estimatedcrimes_cut$VCrime_Rate = 0 +for (i in 1:nrow(expl_merge)) { + expl_merge$VCrime_Rate[i]<- ((expl_merge$violent_crime[i] / expl_merge$Population[i]) * 1000)} +``` + +The data obtained was formatted in the following way: + +| **Year** | **State** | **Violent Crimes** | **VCrime Rate** | +| -------- | --------- | ------------------ | --------------- | +| 1999 | Alabama | 21421 | 7.029638 | +| ... | ... | ... | ... | + +## Descriptive Analysis + +### Dataset 1 + +The Gun Deaths dataset shows the average deaths to be 515.34, with a +standard deviation of 605.09, the death rate instead has an average of +11.90% and a standard deviation of 4.67%, these descriptive statistics, +although weak alone, show a variance of considerabe size. + +![imd](/plots/descriptive_plots/Deaths_and_Population.png) +![imd](/plots/descriptive_plots/Gun_Deaths_and_Population_over_time_(by year).png) + +The graphs show the variance and its evolution over time. + +To aid visualization we also constructed a histogram representing the +death count for each state +![imd](/plots/descriptive_plots/Deaths_Histogram.png) + +### Dataset 2 + +For the law provision dataset the mean of the law total is 24.92, with a +standard deviation of 23.60, instead the mean of the Gun Regualtion +Index is 0.21, with a standard deviation of 0.23, again showing a big +variance in our data. + +![imd](/plots/descriptive_plots/Gun_Regulation_index_in_states.png) + +As it's easily interpretable most of the states have a kept a very low +index for the time period analyzed in this study, but there are +instances of some states, for example the state of California, keeping +very high Index values across the whole time period. This wide gap was +not a surprise, as the difference of positions of prominent political +figures in the respective states are pretty representative of their -be +it high or low- index score. +![imd](/plots/descriptive_plots/Gun_Regulation_Index_over_time_(by state).png) + +In order to understand the importance of the division of the states in +accordance to their scores, it's important to visualize the position of +the US states, as the differences will become painfully obvious. + +![imd](/plots/descriptive_plots/Index_score_count.png) + +This bar chart represents the scores of all the US states across our 19 +year time period, and it's pretty easy to say that seeing these +discrepancies also shaped our future analyisis and prompted us towards +conducting it by keeping track at all times of each state's score. + +### Dataset 3 + +The third dataset, the one containing the information about violent +crimes in the us presents itself in an analogous way to the previous +datasets. The mean of the violent crimes is 26976.7, with a standard +deviation of 34431.64, while the mean of the violent crime rate is +6.18%, with a standard deviation of 2.62% + +Once again, a high variance is observable across all of our data, again +suggesting the importance of a criteria of distinction (in our case the +index score) in the state-level analysis to capture efficiently +correlations and statistically useful results. + +![imd](/plots/descriptive_plots/Violent_crime_rate_and_population.png) +![imd](/plots/descriptive_plots/Violent_crime_rate_over_time_by_state.png) + +## Explorative Analysis + +Having established the nature and characteristics of the three datasets, +we now proceed to conduct an analysis of the combination of their data. +The aim of this analysis is to establish the impact (or lack there of) +of legislation on gun death rates. In order to do so, we first seek to +check for the presence of correlation, and then formulate a linear +regression model. We then will try to integrate data from possibly +correlated variables in an attempt to remove omitted variable bias, and +thus create a multiple regressor linear regression model.\ +The explorative analysis began by plotting and calculating the mean gun +death rate (by 100,000) over time, and comparing it with the mean Gun +Regulation index over time. +![imd](/plots/explorative_plots/Death_rate_vs_Gun_Regulation.png) + +At a first glance just by looking at the graphs side to side may lead us +to think that even if the Gun Regulation Index has been rising over time +the Deaths by guns still don'show any signs of decreasing, but a closer +inspection reveals that the Gun Regulation Index over nineteen years has +only increased by a meager 0.04% and to discern a pattern we need to go +deeper and look for substantial causal effects. + +By calulating the Pearson correlation between the Gun Regulation Index +and the gun death rate we're returned with a value -0.624, suggesting a +non-negligible negative correlation between them + +![imd](/plots/explorative_plots/General_gdvsgr.png) + +By plotting them in the same graph the negative correlation becomes +obvious, but we can also notice the skewedness of observations towards +the low values of the Gun Regulation Index, so in accordance to our +prior suggestion we split the observations in accordance with their own +Index score bracket and run the same analysis. +![imd](/plots/explorative_plots/comb_gdvsgr.png) By splitting the +observations we get a clearer image: the -0.624 correlation isn't +consistent across all states, but the pattern can still be observed. + +The correlations for each groups are as follows: + +- Very low score states = -0.375 + +- Low score states = -0.254 + +- Medium score states = -0.555 + +- High score states = -0.4123633 + +Here we also begin to see the signs of a pattern repeating also in the +next steps: Medium score countries tend to have the highest correlation +among all of the groups. + +Now, after setting up a linear model, we obtained the results of an +intercept equal to 14.63 and the index coefficient equal to -12.65. + +This number may seem absurdly big, but during the interpretation it is +useful to remember that our index is normalized, so states will benefit +from the full value only if they have a high enough index. + +Due to the general unreliability and susceptibility to outliers of +linear regression it was at this point that we decided to aid our +analysis with the inclusion of the third dataset, with the aim of +constructing some additional linear models and useful plots. + +![imd](/plots/explorative_plots/VCR_vs_DR_vs_GR.png) Before cimenting +ourselves into the actual model building, we included the evolution of +the mean violent crime rate over time and, once again an immediate +analysis may lead us to think that as the Gun Regulation Index increased +the violent crime rate decreased, and this is also supported by the +corrlation between them (-0.293), and their plot. +![imd](/plots/explorative_plots/Violent_crime_vs_index_allstates.png) But we +very quickly found out that this plot is actually very misleading. It +simply takes a division of states by score and we can analyze much more +interesting data. +![imd](/plots/explorative_plots/Violent_crime_vs_index_comb.png) Now that +the plots are much clearer, while our pattern in the sheer size of the +negative correlation between our values in medium score states is +visible to the naked eye, we can also observe some instances of positive +correlations in some of the following cases: + +- Very low score states = -0.0596 + +- Low score states = 0.403 + +- Medium score states = -0.654 + +- High score states = 0.229 + +While the social interpretation of these findings is out of the scope of +this study without claiming anythin we can still infer that the data +shows that increasing the Gun Regulation Index has some dubious effects +in relation with the Violent Crime Rate, we can still interpret that +Violent Crime is not a simple enough issue that can be fixed just by +increasing Gun Control. + +Building a linear model emphasizes this finding since it returned us a +value of 6.896 for the intercept and of -3.335 for our Gun Regulation +Index.\ +\ +Now, we may also be interested in the relationship between gun deaths +and the violent crime rate, so we calculated once again the correlation +between them and obtained a value of 0.471, showing that indeed they are +correlated. +![imd](/plots/explorative_plots/Violent_crime_vs_Gun_death_allstates.png) + +Now,to understand relationships on a deeper level, we once again divided +the states in accordance to their score. +![imd](/plots/explorative_plots/Violent_crimes_vs_gun_deaths_comb.png) + +In this case we can observe that all of our brackets contain some level +of positive correlation, in particular: + +- Very low score states = 0.371 + +- Low score states = 0.285 + +- Medium score states = 0.685 + +- High score states = 0.239 + +As anticipated before, medium score states continue to stand out in +their correlations in our analysis this has been a constant so far +\[think of a possible explanation\]\[criminals don't steal from the +middle class!\]\[no but srs help needed\] + +By building a linear model we're met with the following results: a value +of 3.029 for the intercept and 0.264 for the Death Rate coefficient, +showing a surprisingly low coefficient, but understandable, since the +Death Rate is balanced over 100.000 people.\ +\ +For the final passage of our study we culminated the research by trying +to draw some conclusions making use of multiple linear regression in an +attempt to fix eventual mistakes made with the usage of linear +regression, in other words by adding the Violent Crime rate we try to +solve some of the omitted variable bias that could have tampered with +our data.\ +\ +With the construction of the Multiple linear regression model we +obtained the values of: 10.755 for the intercept, -10.775 for the +coefficient of our Gun Regulation Index and 0.563 for the coefficient of +the violent crime rate. + +Even if the coefficient of the violent crime rate returned pretty small +values, since also the intercept and the coefficient for Gun Regulation +got smaller means that we achieved our aim of covering some of the +unexplained variance. + +We then plot these results in a 3d interactive plot (available for +download in the GitHub repository for this study and +[here](https://soliprems.web.app/gd_gr_vc_3dplot.html)), +that due to the limitations of pdfs can be relayed here only in the form +of an image. ![imd](/plots/explorative_plots/gd_gr_vc_plot.png) + +# Conclusions + +The relationship that emerges from the data is clear, even if not +perfectly consistent: there's a negative correlation between our gun +provision index and the gun death rate. This correlation gets weaker +with the introduction of violent crime rates as a regressor in the +linear regression model, but it remains negative on the whole (although +it becomes positive in the case of low and high score states. It's worth +noting that they're a much smaller sample than the very low score +states, which remains negative. This is visible by the total remaining +overall negative).\ +While the policy analysis is complicated, this would seem to suggest +that it's possible, although obviously not guaranteed, to write +legislation that aims and achieves and improvement in the number of +gun-related deaths. + +## Future work + +The work could benefit with an expansion in a few areas: + +- Integration of more variables that might be correlated with the gun + death rate in order to escapoe possibleomitted variable bias + +- analysis of the effectiveness in relation to the intentionality + behind gun deaths: what kind of deaths does gun control prevent? + +# Appendix + +The full script and project contents are freely available [here](https://github.com/soliprem/statistics-project) under the GPL_v3 license. + +## Contributions + +### Francesco Prem Solidoro + +Contributed to the writing of the code, to the search and reformatting +of the datasets, to the writing of the document (in particular: +abstract, conclusions) and management of the tooling used for the +project (github, git, typst.app, firebase) + +### Michele Salvi + +Contributed to the writing of the code, to the search and reformatting +of the datasets, and to the writing of the document (in particular: +methodology, sections A, B and C) + +### Juan Calani + +Contributed to the ideation of the project, and provided a reiview of +the work + +### Elena Rocco + +Contributed provided a review of the work diff --git a/public/en/index.xml b/public/en/index.xml index 41786b2..286007c 100644 --- a/public/en/index.xml +++ b/public/en/index.xml @@ -14,7 +14,7 @@ http://localhost:1313/en/posts/gun_deaths_paper/ Tue, 02 Apr 2024 00:56:05 +0200 http://localhost:1313/en/posts/gun_deaths_paper/ - + Abstract&mdash; The aim of this paper is to provde a summary analysis of the data regarding the effectiveness of gunrestriction policy in diminishing the numberof gun-related deaths. We do so by conducting an analysis of three different datasets, and by combining the data we gathered from them, constructing a linear regression model. The constructed linear regression model shows a negative correlation between the gun-related death rate and the amount of legislation present within the state. Why diff --git a/public/en/plots/descriptive_plots/Death_rate_over_time_(by state).png b/public/en/plots/descriptive_plots/Death_rate_over_time_(by state).png new file mode 100644 index 0000000..8e3652f Binary files /dev/null and b/public/en/plots/descriptive_plots/Death_rate_over_time_(by state).png differ diff --git a/public/en/plots/explorative_plots/Death_Rate_over_time.png b/public/en/plots/explorative_plots/Death_Rate_over_time.png new file mode 100644 index 0000000..fbfde5a Binary files /dev/null and b/public/en/plots/explorative_plots/Death_Rate_over_time.png differ diff --git a/public/en/posts/gun_deaths_paper/index.html b/public/en/posts/gun_deaths_paper/index.html index 10e58c6..c48397f 100644 --- a/public/en/posts/gun_deaths_paper/index.html +++ b/public/en/posts/gun_deaths_paper/index.html @@ -17,20 +17,20 @@ - + - + - + - + - + An analysis of the correlation between gun regulation and public safety in the United States of America @@ -62,10 +62,10 @@
  • - +
  • - +
  • @@ -85,30 +85,442 @@
    Apr 2, 2024

    An analysis of the correlation between gun regulation and public safety in the United States of America

    -

    Francesco Prem Solidoro

    -

    0 Words +


    -

    - - - - - - -

    - -

    +

       Abstract— The aim of this paper is to provde a summary analysis +of the data regarding the effectiveness of gunrestriction policy in +diminishing the numberof gun-related deaths. We do so by conducting an +analysis of three different datasets, and by combining the data we +gathered from them, constructing a linear regression model. The +constructed linear regression model shows a negative correlation between +the gun-related death rate and the amount of legislation present within +the state. This leads us to believe that it’s possible for legislation +to have a positive effect on gun-related deaths.

    +

      Index terms—Gun regulation, Gun death rate, Violent crime rate

    +

    Introduction

    +

    The topic of gun regulation has been discussed at length in the United +States of America, because of the real and present danger that it poses +to Public Safety. There’s little doubt that policy has potential to do +enourmous amounts of harm, as well as benefit, but it needs direction +from data. The aim of this paper is to conduct a summmary analysis of +the correlation between overall presence of gun restriction and the rate +of deaths caused by guns.

    +

    Paper overview

    +

    In this paper, we analyse data coming from three different datasets: one +[@Lawprovisions] is a collection of all gun restiction laws that have +been put in place by individual states, the second [@Gundeaths] is a +collection of gun deaths by county. The third [@Violentcrimes] trakcs +violent crime rates.

    +

    The scope of the paper is national, and we aim to obtain valuable +information by comparing the performance of different states within the +same country: we expect the cultural differences to be (even if still +present) less impactful than they would be in a inter-country +comparison, and thus hope to be able to eliminate most untraceable +biases that would arise from it.

    +

    To reach our conclusions, we conducted an analysis of the 3 datasets, +using them as follows:

    +
      +
    • +

      From the first dataset, we extracted an index to answer the question +‘how much regulation around guns is present in this state?’

      +
    • +
    • +

      From the second dataset, we extracted aggregated data by year and by +state regarding deaths and death rate by guns

      +
    • +
    • +

      From the third dataset, we extracted information about the change in +violent crime rates over time, to be able to set up a multiple +regressor linear regression model

      +
    • +
    +

    Overall, we found a negative correlation between the gun provision index +and the gun death rate, even when accounting for the changes in violent +crime.

    +

    Methods

    +

    Preparing the Datasets

    +

    Dataset 1

    +

    The first dataset is comprised of informations regarding deaths by guns +divided by county, but our analysis takes place at the state level. +Therefore, the dataset was grouped by state and we removed superfluous +variables like county code, state code and state initials.

    +

    Additionally all of our datasets contain data compiled at different +points in time so the information was filtered to be in years +overlapping across all three datasets, the nineteen years from 1999 to +2017.

    +

    The preparation was done in the following way:

    +
    gundeaths_cut <- gun_deaths_us_1999_2019[-c(1, 3:5, 7)]
    +gundeaths_cut <- subset(gundeaths_cut, !gundeaths_cut$Year > 2017)
    +gundeaths_cut <- gundeaths_cut[-c(5:10)]
    +gundeaths_cond <- gundeaths_cut %>%
    +    group_by(across(where(is.character))) %>%
    +    summarise(across(where(is.numeric), sum, na.rm = T), .groups = "drop")
    +

    We then constructed a simple rate, balanced over 100,000 people, to +provide for the different populations between states.

    +
    gundeaths_cond$Rate <- 0
    +for (i in 1:nrow(gundeaths_cond)) {
    +    r <- ((gundeaths_cond$Deaths[i] / gundeaths_cond$Population[i]) * 100000)
    +    gundeaths_cond$Rate[i] <- r
    +}
    +

    The the final form of the dataset was:

    + + + + + + + + + + + + + + + + + + + + + + + + + + +
    YearStateDeathsPopulationRate
    1999Alabama605304724119.854025
    +

    Dataset 2

    +

    The second dataset contains all of the laws restricting gun usage for +each US state, represented by 133 binary variables. Aside from filtering +for the correct years a Index (referred to as “Gun Regulation Index” +from here on) was constructed usin min-max normalization.

    +
    law_provision_norm <- subset(law_provision_norm, !law_provision_norm$year < 1999)
    +proc_lawprov <- preProcess(as.data.frame
    +(law_provision_norm$lawtotal),
    +    method = c("range"))
    +law_provision_norm$index <- predict(proc_lawprov, as.data.frame(law_provision_norm$lawtotal))
    +

    To account for differences, states were given scores, “Very Low” “Low” +“Medium” “High”, according to their Gun Regulation Index.

    +
    merged$Score <- ifelse(unlist(merged$index) <0.25, "Very Low",ifelse(unlist(merged$index) <0.5, "Low", ifelse(unlist(merged$index) <0.75, "Medium","High")))
    +

    Final product:

    + + + + + + + + + + + + + + + + + + + + + + + + + + +
    YearStateLaw TotalIndexScore
    1999Alabama160.126213592Very Low
    +

    Dataset 3

    +

    The third dataset, mainly added for multiple linear regression purposes +in an attempt to account for Omittted Variable Bias, contains the raw +numbers of different types of crimes committed in each state. To keep +the data as closely related to guns as possible, and to make sure there +was no instance of double counting (it wasn’t made clear what of the +specific crimes listed counted as a ‘Violent Crime’) we just included +“Violent Crimes”.

    +
    estimatedcrimes_cut <-  subset(estimated_crimes, estimated_crimes$year >= 1999)
    +estimatedcrimes_cut <-  subset(estimatedcrimes_cut, estimatedcrimes_cut$year <= 2017)
    +estimatedcrimes_cut <-  subset(estimatedcrimes_cut, !estimatedcrimes_cut$state_name == "District of Columbia")
    +estimatedcrimes_cut <-  subset(estimatedcrimes_cut, !estimatedcrimes_cut$state_name == "")
    +estimatedcrimes_cut <-  estimatedcrimes_cut[-c(2,4,6:15)]
    +

    A Violent crime rate was then constructed, balanced on thousands of +people.

    +
    estimatedcrimes_cut$VCrime_Rate = 0
    +for (i in 1:nrow(expl_merge)) {
    +  expl_merge$VCrime_Rate[i]<- ((expl_merge$violent_crime[i] / expl_merge$Population[i]) * 1000)}
    +

    The data obtained was formatted in the following way:

    + + + + + + + + + + + + + + + + + + + + + + + +
    YearStateViolent CrimesVCrime Rate
    1999Alabama214217.029638
    +

    Descriptive Analysis

    +

    Dataset 1

    +

    The Gun Deaths dataset shows the average deaths to be 515.34, with a +standard deviation of 605.09, the death rate instead has an average of +11.90% and a standard deviation of 4.67%, these descriptive statistics, +although weak alone, show a variance of considerabe size.

    +

    imd +![imd](/plots/descriptive_plots/Gun_Deaths_and_Population_over_time_(by year).png)

    +

    The graphs show the variance and its evolution over time.

    +

    To aid visualization we also constructed a histogram representing the +death count for each state +imd

    +

    Dataset 2

    +

    For the law provision dataset the mean of the law total is 24.92, with a +standard deviation of 23.60, instead the mean of the Gun Regualtion +Index is 0.21, with a standard deviation of 0.23, again showing a big +variance in our data.

    +

    imd

    +

    As it’s easily interpretable most of the states have a kept a very low +index for the time period analyzed in this study, but there are +instances of some states, for example the state of California, keeping +very high Index values across the whole time period. This wide gap was +not a surprise, as the difference of positions of prominent political +figures in the respective states are pretty representative of their -be +it high or low- index score. +![imd](/plots/descriptive_plots/Gun_Regulation_Index_over_time_(by state).png)

    +

    In order to understand the importance of the division of the states in +accordance to their scores, it’s important to visualize the position of +the US states, as the differences will become painfully obvious.

    +

    imd

    +

    This bar chart represents the scores of all the US states across our 19 +year time period, and it’s pretty easy to say that seeing these +discrepancies also shaped our future analyisis and prompted us towards +conducting it by keeping track at all times of each state’s score.

    +

    Dataset 3

    +

    The third dataset, the one containing the information about violent +crimes in the us presents itself in an analogous way to the previous +datasets. The mean of the violent crimes is 26976.7, with a standard +deviation of 34431.64, while the mean of the violent crime rate is +6.18%, with a standard deviation of 2.62%

    +

    Once again, a high variance is observable across all of our data, again +suggesting the importance of a criteria of distinction (in our case the +index score) in the state-level analysis to capture efficiently +correlations and statistically useful results.

    +

    imd +imd

    +

    Explorative Analysis

    +

    Having established the nature and characteristics of the three datasets, +we now proceed to conduct an analysis of the combination of their data. +The aim of this analysis is to establish the impact (or lack there of) +of legislation on gun death rates. In order to do so, we first seek to +check for the presence of correlation, and then formulate a linear +regression model. We then will try to integrate data from possibly +correlated variables in an attempt to remove omitted variable bias, and +thus create a multiple regressor linear regression model.
    +The explorative analysis began by plotting and calculating the mean gun +death rate (by 100,000) over time, and comparing it with the mean Gun +Regulation index over time. +imd

    +

    At a first glance just by looking at the graphs side to side may lead us +to think that even if the Gun Regulation Index has been rising over time +the Deaths by guns still don’show any signs of decreasing, but a closer +inspection reveals that the Gun Regulation Index over nineteen years has +only increased by a meager 0.04% and to discern a pattern we need to go +deeper and look for substantial causal effects.

    +

    By calulating the Pearson correlation between the Gun Regulation Index +and the gun death rate we’re returned with a value -0.624, suggesting a +non-negligible negative correlation between them

    +

    imd

    +

    By plotting them in the same graph the negative correlation becomes +obvious, but we can also notice the skewedness of observations towards +the low values of the Gun Regulation Index, so in accordance to our +prior suggestion we split the observations in accordance with their own +Index score bracket and run the same analysis. +imd By splitting the +observations we get a clearer image: the -0.624 correlation isn’t +consistent across all states, but the pattern can still be observed.

    +

    The correlations for each groups are as follows:

    +
      +
    • +

      Very low score states = -0.375

      +
    • +
    • +

      Low score states = -0.254

      +
    • +
    • +

      Medium score states = -0.555

      +
    • +
    • +

      High score states = -0.4123633

      +
    • +
    +

    Here we also begin to see the signs of a pattern repeating also in the +next steps: Medium score countries tend to have the highest correlation +among all of the groups.

    +

    Now, after setting up a linear model, we obtained the results of an +intercept equal to 14.63 and the index coefficient equal to -12.65.

    +

    This number may seem absurdly big, but during the interpretation it is +useful to remember that our index is normalized, so states will benefit +from the full value only if they have a high enough index.

    +

    Due to the general unreliability and susceptibility to outliers of +linear regression it was at this point that we decided to aid our +analysis with the inclusion of the third dataset, with the aim of +constructing some additional linear models and useful plots.

    +

    imd Before cimenting +ourselves into the actual model building, we included the evolution of +the mean violent crime rate over time and, once again an immediate +analysis may lead us to think that as the Gun Regulation Index increased +the violent crime rate decreased, and this is also supported by the +corrlation between them (-0.293), and their plot. +imd But we +very quickly found out that this plot is actually very misleading. It +simply takes a division of states by score and we can analyze much more +interesting data. +imd Now that +the plots are much clearer, while our pattern in the sheer size of the +negative correlation between our values in medium score states is +visible to the naked eye, we can also observe some instances of positive +correlations in some of the following cases:

    +
      +
    • +

      Very low score states = -0.0596

      +
    • +
    • +

      Low score states = 0.403

      +
    • +
    • +

      Medium score states = -0.654

      +
    • +
    • +

      High score states = 0.229

      +
    • +
    +

    While the social interpretation of these findings is out of the scope of +this study without claiming anythin we can still infer that the data +shows that increasing the Gun Regulation Index has some dubious effects +in relation with the Violent Crime Rate, we can still interpret that +Violent Crime is not a simple enough issue that can be fixed just by +increasing Gun Control.

    +

    Building a linear model emphasizes this finding since it returned us a +value of 6.896 for the intercept and of -3.335 for our Gun Regulation +Index.
    +
    +Now, we may also be interested in the relationship between gun deaths +and the violent crime rate, so we calculated once again the correlation +between them and obtained a value of 0.471, showing that indeed they are +correlated. +imd

    +

    Now,to understand relationships on a deeper level, we once again divided +the states in accordance to their score. +imd

    +

    In this case we can observe that all of our brackets contain some level +of positive correlation, in particular:

    +
      +
    • +

      Very low score states = 0.371

      +
    • +
    • +

      Low score states = 0.285

      +
    • +
    • +

      Medium score states = 0.685

      +
    • +
    • +

      High score states = 0.239

      +
    • +
    +

    As anticipated before, medium score states continue to stand out in +their correlations in our analysis this has been a constant so far +[think of a possible explanation][criminals don’t steal from the +middle class!][no but srs help needed]

    +

    By building a linear model we’re met with the following results: a value +of 3.029 for the intercept and 0.264 for the Death Rate coefficient, +showing a surprisingly low coefficient, but understandable, since the +Death Rate is balanced over 100.000 people.
    +
    +For the final passage of our study we culminated the research by trying +to draw some conclusions making use of multiple linear regression in an +attempt to fix eventual mistakes made with the usage of linear +regression, in other words by adding the Violent Crime rate we try to +solve some of the omitted variable bias that could have tampered with +our data.
    +
    +With the construction of the Multiple linear regression model we +obtained the values of: 10.755 for the intercept, -10.775 for the +coefficient of our Gun Regulation Index and 0.563 for the coefficient of +the violent crime rate.

    +

    Even if the coefficient of the violent crime rate returned pretty small +values, since also the intercept and the coefficient for Gun Regulation +got smaller means that we achieved our aim of covering some of the +unexplained variance.

    +

    We then plot these results in a 3d interactive plot (available for +download in the GitHub repository for this study and +here), +that due to the limitations of pdfs can be relayed here only in the form +of an image. imd

    +

    Conclusions

    +

    The relationship that emerges from the data is clear, even if not +perfectly consistent: there’s a negative correlation between our gun +provision index and the gun death rate. This correlation gets weaker +with the introduction of violent crime rates as a regressor in the +linear regression model, but it remains negative on the whole (although +it becomes positive in the case of low and high score states. It’s worth +noting that they’re a much smaller sample than the very low score +states, which remains negative. This is visible by the total remaining +overall negative).
    +While the policy analysis is complicated, this would seem to suggest +that it’s possible, although obviously not guaranteed, to write +legislation that aims and achieves and improvement in the number of +gun-related deaths.

    +

    Future work

    +

    The work could benefit with an expansion in a few areas:

    +
      +
    • +

      Integration of more variables that might be correlated with the gun +death rate in order to escapoe possibleomitted variable bias

      +
    • +
    • +

      analysis of the effectiveness in relation to the intentionality +behind gun deaths: what kind of deaths does gun control prevent?

      +
    • +
    +

    Appendix

    +

    The full script and project contents are freely available here under the GPL_v3 license.

    +

    Contributions

    +

    Francesco Prem Solidoro

    +

    Contributed to the writing of the code, to the search and reformatting +of the datasets, to the writing of the document (in particular: +abstract, conclusions) and management of the tooling used for the +project (github, git, typst.app, firebase)

    +

    Michele Salvi

    +

    Contributed to the writing of the code, to the search and reformatting +of the datasets, and to the writing of the document (in particular: +methodology, sections A, B and C)

    +

    Juan Calani

    +

    Contributed to the ideation of the project, and provided a reiview of +the work

    +

    Elena Rocco

    +

    Contributed provided a review of the work

    diff --git a/public/en/posts/index.html b/public/en/posts/index.html index 2581391..489d824 100644 --- a/public/en/posts/index.html +++ b/public/en/posts/index.html @@ -48,19 +48,19 @@
    diff --git a/public/en/posts/index.xml b/public/en/posts/index.xml index 7ee25d9..5dd83c3 100644 --- a/public/en/posts/index.xml +++ b/public/en/posts/index.xml @@ -16,20 +16,432 @@ http://localhost:1313/en/posts/gun_deaths_paper/ )]]> - - - - - - - -
    - -

    +    Abstract— The aim of this paper is to provde a summary analysis +of the data regarding the effectiveness of gunrestriction policy in +diminishing the numberof gun-related deaths. We do so by conducting an +analysis of three different datasets, and by combining the data we +gathered from them, constructing a linear regression model. The +constructed linear regression model shows a negative correlation between +the gun-related death rate and the amount of legislation present within +the state. This leads us to believe that it’s possible for legislation +to have a positive effect on gun-related deaths.

    +

      Index terms—Gun regulation, Gun death rate, Violent crime rate

    +

    Introduction

    +

    The topic of gun regulation has been discussed at length in the United +States of America, because of the real and present danger that it poses +to Public Safety. There’s little doubt that policy has potential to do +enourmous amounts of harm, as well as benefit, but it needs direction +from data. The aim of this paper is to conduct a summmary analysis of +the correlation between overall presence of gun restriction and the rate +of deaths caused by guns.

    +

    Paper overview

    +

    In this paper, we analyse data coming from three different datasets: one +[@Lawprovisions] is a collection of all gun restiction laws that have +been put in place by individual states, the second [@Gundeaths] is a +collection of gun deaths by county. The third [@Violentcrimes] trakcs +violent crime rates.

    +

    The scope of the paper is national, and we aim to obtain valuable +information by comparing the performance of different states within the +same country: we expect the cultural differences to be (even if still +present) less impactful than they would be in a inter-country +comparison, and thus hope to be able to eliminate most untraceable +biases that would arise from it.

    +

    To reach our conclusions, we conducted an analysis of the 3 datasets, +using them as follows:

    +
      +
    • +

      From the first dataset, we extracted an index to answer the question +‘how much regulation around guns is present in this state?’

      +
    • +
    • +

      From the second dataset, we extracted aggregated data by year and by +state regarding deaths and death rate by guns

      +
    • +
    • +

      From the third dataset, we extracted information about the change in +violent crime rates over time, to be able to set up a multiple +regressor linear regression model

      +
    • +
    +

    Overall, we found a negative correlation between the gun provision index +and the gun death rate, even when accounting for the changes in violent +crime.

    +

    Methods

    +

    Preparing the Datasets

    +

    Dataset 1

    +

    The first dataset is comprised of informations regarding deaths by guns +divided by county, but our analysis takes place at the state level. +Therefore, the dataset was grouped by state and we removed superfluous +variables like county code, state code and state initials.

    +

    Additionally all of our datasets contain data compiled at different +points in time so the information was filtered to be in years +overlapping across all three datasets, the nineteen years from 1999 to +2017.

    +

    The preparation was done in the following way:

    +
    gundeaths_cut <- gun_deaths_us_1999_2019[-c(1, 3:5, 7)]
    +gundeaths_cut <- subset(gundeaths_cut, !gundeaths_cut$Year > 2017)
    +gundeaths_cut <- gundeaths_cut[-c(5:10)]
    +gundeaths_cond <- gundeaths_cut %>%
    +    group_by(across(where(is.character))) %>%
    +    summarise(across(where(is.numeric), sum, na.rm = T), .groups = "drop")
    +

    We then constructed a simple rate, balanced over 100,000 people, to +provide for the different populations between states.

    +
    gundeaths_cond$Rate <- 0
    +for (i in 1:nrow(gundeaths_cond)) {
    +    r <- ((gundeaths_cond$Deaths[i] / gundeaths_cond$Population[i]) * 100000)
    +    gundeaths_cond$Rate[i] <- r
    +}
    +

    The the final form of the dataset was:

    + + + + + + + + + + + + + + + + + + + + + + + + + + +
    YearStateDeathsPopulationRate
    1999Alabama605304724119.854025
    +

    Dataset 2

    +

    The second dataset contains all of the laws restricting gun usage for +each US state, represented by 133 binary variables. Aside from filtering +for the correct years a Index (referred to as “Gun Regulation Index” +from here on) was constructed usin min-max normalization.

    +
    law_provision_norm <- subset(law_provision_norm, !law_provision_norm$year < 1999)
    +proc_lawprov <- preProcess(as.data.frame
    +(law_provision_norm$lawtotal),
    +    method = c("range"))
    +law_provision_norm$index <- predict(proc_lawprov, as.data.frame(law_provision_norm$lawtotal))
    +

    To account for differences, states were given scores, “Very Low” “Low” +“Medium” “High”, according to their Gun Regulation Index.

    +
    merged$Score <- ifelse(unlist(merged$index) <0.25, "Very Low",ifelse(unlist(merged$index) <0.5, "Low", ifelse(unlist(merged$index) <0.75, "Medium","High")))
    +

    Final product:

    + + + + + + + + + + + + + + + + + + + + + + + + + + +
    YearStateLaw TotalIndexScore
    1999Alabama160.126213592Very Low
    +

    Dataset 3

    +

    The third dataset, mainly added for multiple linear regression purposes +in an attempt to account for Omittted Variable Bias, contains the raw +numbers of different types of crimes committed in each state. To keep +the data as closely related to guns as possible, and to make sure there +was no instance of double counting (it wasn’t made clear what of the +specific crimes listed counted as a ‘Violent Crime’) we just included +“Violent Crimes”.

    +
    estimatedcrimes_cut <-  subset(estimated_crimes, estimated_crimes$year >= 1999)
    +estimatedcrimes_cut <-  subset(estimatedcrimes_cut, estimatedcrimes_cut$year <= 2017)
    +estimatedcrimes_cut <-  subset(estimatedcrimes_cut, !estimatedcrimes_cut$state_name == "District of Columbia")
    +estimatedcrimes_cut <-  subset(estimatedcrimes_cut, !estimatedcrimes_cut$state_name == "")
    +estimatedcrimes_cut <-  estimatedcrimes_cut[-c(2,4,6:15)]
    +

    A Violent crime rate was then constructed, balanced on thousands of +people.

    +
    estimatedcrimes_cut$VCrime_Rate = 0
    +for (i in 1:nrow(expl_merge)) {
    +  expl_merge$VCrime_Rate[i]<- ((expl_merge$violent_crime[i] / expl_merge$Population[i]) * 1000)}
    +

    The data obtained was formatted in the following way:

    + + + + + + + + + + + + + + + + + + + + + + + +
    YearStateViolent CrimesVCrime Rate
    1999Alabama214217.029638
    +

    Descriptive Analysis

    +

    Dataset 1

    +

    The Gun Deaths dataset shows the average deaths to be 515.34, with a +standard deviation of 605.09, the death rate instead has an average of +11.90% and a standard deviation of 4.67%, these descriptive statistics, +although weak alone, show a variance of considerabe size.

    +

    imd +![imd](/plots/descriptive_plots/Gun_Deaths_and_Population_over_time_(by year).png)

    +

    The graphs show the variance and its evolution over time.

    +

    To aid visualization we also constructed a histogram representing the +death count for each state +imd

    +

    Dataset 2

    +

    For the law provision dataset the mean of the law total is 24.92, with a +standard deviation of 23.60, instead the mean of the Gun Regualtion +Index is 0.21, with a standard deviation of 0.23, again showing a big +variance in our data.

    +

    imd

    +

    As it’s easily interpretable most of the states have a kept a very low +index for the time period analyzed in this study, but there are +instances of some states, for example the state of California, keeping +very high Index values across the whole time period. This wide gap was +not a surprise, as the difference of positions of prominent political +figures in the respective states are pretty representative of their -be +it high or low- index score. +![imd](/plots/descriptive_plots/Gun_Regulation_Index_over_time_(by state).png)

    +

    In order to understand the importance of the division of the states in +accordance to their scores, it’s important to visualize the position of +the US states, as the differences will become painfully obvious.

    +

    imd

    +

    This bar chart represents the scores of all the US states across our 19 +year time period, and it’s pretty easy to say that seeing these +discrepancies also shaped our future analyisis and prompted us towards +conducting it by keeping track at all times of each state’s score.

    +

    Dataset 3

    +

    The third dataset, the one containing the information about violent +crimes in the us presents itself in an analogous way to the previous +datasets. The mean of the violent crimes is 26976.7, with a standard +deviation of 34431.64, while the mean of the violent crime rate is +6.18%, with a standard deviation of 2.62%

    +

    Once again, a high variance is observable across all of our data, again +suggesting the importance of a criteria of distinction (in our case the +index score) in the state-level analysis to capture efficiently +correlations and statistically useful results.

    +

    imd +imd

    +

    Explorative Analysis

    +

    Having established the nature and characteristics of the three datasets, +we now proceed to conduct an analysis of the combination of their data. +The aim of this analysis is to establish the impact (or lack there of) +of legislation on gun death rates. In order to do so, we first seek to +check for the presence of correlation, and then formulate a linear +regression model. We then will try to integrate data from possibly +correlated variables in an attempt to remove omitted variable bias, and +thus create a multiple regressor linear regression model.
    +The explorative analysis began by plotting and calculating the mean gun +death rate (by 100,000) over time, and comparing it with the mean Gun +Regulation index over time. +imd

    +

    At a first glance just by looking at the graphs side to side may lead us +to think that even if the Gun Regulation Index has been rising over time +the Deaths by guns still don’show any signs of decreasing, but a closer +inspection reveals that the Gun Regulation Index over nineteen years has +only increased by a meager 0.04% and to discern a pattern we need to go +deeper and look for substantial causal effects.

    +

    By calulating the Pearson correlation between the Gun Regulation Index +and the gun death rate we’re returned with a value -0.624, suggesting a +non-negligible negative correlation between them

    +

    imd

    +

    By plotting them in the same graph the negative correlation becomes +obvious, but we can also notice the skewedness of observations towards +the low values of the Gun Regulation Index, so in accordance to our +prior suggestion we split the observations in accordance with their own +Index score bracket and run the same analysis. +imd By splitting the +observations we get a clearer image: the -0.624 correlation isn’t +consistent across all states, but the pattern can still be observed.

    +

    The correlations for each groups are as follows:

    +
      +
    • +

      Very low score states = -0.375

      +
    • +
    • +

      Low score states = -0.254

      +
    • +
    • +

      Medium score states = -0.555

      +
    • +
    • +

      High score states = -0.4123633

      +
    • +
    +

    Here we also begin to see the signs of a pattern repeating also in the +next steps: Medium score countries tend to have the highest correlation +among all of the groups.

    +

    Now, after setting up a linear model, we obtained the results of an +intercept equal to 14.63 and the index coefficient equal to -12.65.

    +

    This number may seem absurdly big, but during the interpretation it is +useful to remember that our index is normalized, so states will benefit +from the full value only if they have a high enough index.

    +

    Due to the general unreliability and susceptibility to outliers of +linear regression it was at this point that we decided to aid our +analysis with the inclusion of the third dataset, with the aim of +constructing some additional linear models and useful plots.

    +

    imd Before cimenting +ourselves into the actual model building, we included the evolution of +the mean violent crime rate over time and, once again an immediate +analysis may lead us to think that as the Gun Regulation Index increased +the violent crime rate decreased, and this is also supported by the +corrlation between them (-0.293), and their plot. +imd But we +very quickly found out that this plot is actually very misleading. It +simply takes a division of states by score and we can analyze much more +interesting data. +imd Now that +the plots are much clearer, while our pattern in the sheer size of the +negative correlation between our values in medium score states is +visible to the naked eye, we can also observe some instances of positive +correlations in some of the following cases:

    +
      +
    • +

      Very low score states = -0.0596

      +
    • +
    • +

      Low score states = 0.403

      +
    • +
    • +

      Medium score states = -0.654

      +
    • +
    • +

      High score states = 0.229

      +
    • +
    +

    While the social interpretation of these findings is out of the scope of +this study without claiming anythin we can still infer that the data +shows that increasing the Gun Regulation Index has some dubious effects +in relation with the Violent Crime Rate, we can still interpret that +Violent Crime is not a simple enough issue that can be fixed just by +increasing Gun Control.

    +

    Building a linear model emphasizes this finding since it returned us a +value of 6.896 for the intercept and of -3.335 for our Gun Regulation +Index.
    +
    +Now, we may also be interested in the relationship between gun deaths +and the violent crime rate, so we calculated once again the correlation +between them and obtained a value of 0.471, showing that indeed they are +correlated. +imd

    +

    Now,to understand relationships on a deeper level, we once again divided +the states in accordance to their score. +imd

    +

    In this case we can observe that all of our brackets contain some level +of positive correlation, in particular:

    +
      +
    • +

      Very low score states = 0.371

      +
    • +
    • +

      Low score states = 0.285

      +
    • +
    • +

      Medium score states = 0.685

      +
    • +
    • +

      High score states = 0.239

      +
    • +
    +

    As anticipated before, medium score states continue to stand out in +their correlations in our analysis this has been a constant so far +[think of a possible explanation][criminals don’t steal from the +middle class!][no but srs help needed]

    +

    By building a linear model we’re met with the following results: a value +of 3.029 for the intercept and 0.264 for the Death Rate coefficient, +showing a surprisingly low coefficient, but understandable, since the +Death Rate is balanced over 100.000 people.
    +
    +For the final passage of our study we culminated the research by trying +to draw some conclusions making use of multiple linear regression in an +attempt to fix eventual mistakes made with the usage of linear +regression, in other words by adding the Violent Crime rate we try to +solve some of the omitted variable bias that could have tampered with +our data.
    +
    +With the construction of the Multiple linear regression model we +obtained the values of: 10.755 for the intercept, -10.775 for the +coefficient of our Gun Regulation Index and 0.563 for the coefficient of +the violent crime rate.

    +

    Even if the coefficient of the violent crime rate returned pretty small +values, since also the intercept and the coefficient for Gun Regulation +got smaller means that we achieved our aim of covering some of the +unexplained variance.

    +

    We then plot these results in a 3d interactive plot (available for +download in the GitHub repository for this study and +here), +that due to the limitations of pdfs can be relayed here only in the form +of an image. imd

    +

    Conclusions

    +

    The relationship that emerges from the data is clear, even if not +perfectly consistent: there’s a negative correlation between our gun +provision index and the gun death rate. This correlation gets weaker +with the introduction of violent crime rates as a regressor in the +linear regression model, but it remains negative on the whole (although +it becomes positive in the case of low and high score states. It’s worth +noting that they’re a much smaller sample than the very low score +states, which remains negative. This is visible by the total remaining +overall negative).
    +While the policy analysis is complicated, this would seem to suggest +that it’s possible, although obviously not guaranteed, to write +legislation that aims and achieves and improvement in the number of +gun-related deaths.

    +

    Future work

    +

    The work could benefit with an expansion in a few areas:

    +
      +
    • +

      Integration of more variables that might be correlated with the gun +death rate in order to escapoe possibleomitted variable bias

      +
    • +
    • +

      analysis of the effectiveness in relation to the intentionality +behind gun deaths: what kind of deaths does gun control prevent?

      +
    • +
    +

    Appendix

    +

    The full script and project contents are freely available here under the GPL_v3 license.

    +

    Contributions

    +

    Francesco Prem Solidoro

    +

    Contributed to the writing of the code, to the search and reformatting +of the datasets, to the writing of the document (in particular: +abstract, conclusions) and management of the tooling used for the +project (github, git, typst.app, firebase)

    +

    Michele Salvi

    +

    Contributed to the writing of the code, to the search and reformatting +of the datasets, and to the writing of the document (in particular: +methodology, sections A, B and C)

    +

    Juan Calani

    +

    Contributed to the ideation of the project, and provided a reiview of +the work

    +

    Elena Rocco

    +

    Contributed provided a review of the work

    ]]>