JOM299

wrangling general election 2019 data

Where do we find our data

Good way to find data very cheekily: open the DevTools on colleagues’ pieces!

Reuters GE2019 results

In this instance, we’re able to pick up all the map shapefiles and the data loaded by the page.

Importing into R

jsonlite is a library that helps us work with JSON data. curl is a very popular utility to fetch data from the internet.

install.packages("jsonlite")
install.packages('curl')

library(jsonlite)
df <- fromJSON("https://graphics.thomsonreuters.com/ge2019/results/all_results.json")

> head(df)
   constituencyName constituencyID winningParty sittingParty gainOrHold
1          Aberavon      W07000049          Lab          Lab       hold
2         Aberconwy      W07000058            C          Ind       gain
3    Aberdeen North      S14000001          SNP          SNP       hold
4    Aberdeen South      S14000002          SNP            C       gain

Simple questions about the results

Now that we’ve stored our data in df, can we quickly get out:

the average electorate across all constituencies?
the median turnout (in percent)?
the average majority (in percent)?

Hint: it’s all in summary.

Distributions at a glance

A quick word about distributions

Left/right skew v “normal” distribution
Too small a sample size
Multiple modes
Outliers and what they can mean about your data

Turnout and majority

We can take a quick look at the turnout and majority in each constituency by looking at the distribution of our data. This adds a little bit of depth to our analysis and helps us understand how different places voted.

df %>% 
  ggplot() +
  geom_density(aes(percentageTurnout, 
                   fill = I("red"))) +
  geom_density(aes(percentageMajority, 
                   fill = I("orange"))) +
  geom_text(aes(x = 20,
                y = 0.01,
                label = "Turnout (%)")) +
  geom_text(aes(x = 68,
                y = 0.015,
                label = "Majority (%)")) +
  theme_minimal()

Changes in turnout and majority

df %>% 
  ggplot() +
  geom_density(aes(percentageChangeTurnout, fill = I("red"))) +
  geom_density(aes(percentageChangeMajority, fill = I("orange"))) +
  geom_text(aes(x = 13,
                y = 0.15,
                label = "← Change turnout (%)")) +
  geom_text(aes(x = 0,
                y = 0.01,
                label = "Majority change (%)")) +
  theme_minimal()

figures/changeturnoutandmajority

Winners and losers

How many seats were lost, and how many seats held (i.e. were kept by the incumbent party)?

df %>%
  group_by(gainOrHold) %>%
  summarise(total = n())

or this shorthand: tally() :

df %>%
  group_by(gainOrHold) %>%
  tally()

Now can we break up these numbers between Labour and Conservative? Yes, if we add the party name to the group we set up.

Note the %in%, which essentally applies a filter based on several test conditions.

df %>% 
  filter(winningParty %in% c("Lab", "C")) %>%
  group_by(winningParty, gainOrHold) %>%
  tally()

Wins by party

How could we write a column chart with one column per political party that would show seats won on top of seats held?

Hint:

Calculate the number of won seats by party (we’ve done this above)
Use this number with geom_bar()

Another way to look at wins

Could we compare majority and turnout with a scatterplot across parties?

Hint:

We can compare parties against each other with facet_wrap()
We probably only want to compare parties that won a certain number of seats… which means calculating the parties’ number of seats and filter() out small parties

Let’s bring in the Census

The latest census in the UK was carried out in 2011 and is now available from the ONS. It is carried out every ten years and the 2021 should start soon (?).

We could go and download the whole lot but then we’d run into the issue of matching it against Westminster constituencies.

Fortunately, we can use parlitools, which contains census data matched to our constituencies.

install.packages("sf")
install.packages("parlitools")
library(parlitools)

census <- parlitools::census_11
View(census)

Combine with our election data

join <- left_join(df, census, by = c("constituencyID" = "ons_const_id"))

Let’s also try a few things with different joins, e.g.:

full_join
anti_join

Let’s be creative!

Full list of variables available in our census data

For example, how did Conservative and Labour fare in constituencies containing fewer deprived households?

join %>% 
  filter(winningParty %in% c("Lab", "C")) %>%
  ggplot(aes(x = percentageMajority,
             y = deprived_none)) +
  geom_point() +
  geom_smooth(method = "lm") +
  facet_wrap(~winningParty)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

week4-ge2019.org

week4-ge2019.org

JOM299

wrangling general election 2019 data

Where do we find our data

Importing into R

Simple questions about the results

Distributions at a glance

A quick word about distributions

Turnout and majority

Changes in turnout and majority

Winners and losers

Wins by party

Another way to look at wins

Let’s bring in the Census

Combine with our election data

Let’s be creative!

Files

week4-ge2019.org

Latest commit

History

week4-ge2019.org

File metadata and controls

JOM299

wrangling general election 2019 data

Where do we find our data

Importing into R

Simple questions about the results

Distributions at a glance

A quick word about distributions

Turnout and majority

Changes in turnout and majority

Winners and losers

Wins by party

Another way to look at wins

Let’s bring in the Census

Combine with our election data

Let’s be creative!