-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathREADME.Rmd
181 lines (114 loc) · 9.97 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# AustralianPoliticians <img src="man/figures/AustralianPoliticians.png" align="right" height="200" />
The goal of `AustralianPoliticians` is to provide access to biographical and political data about Australian federal politicians who served between 1901 and 2021. This enhances the reproducibility of research that uses this data. The datasets are:
* **all**: The main dataset.
* **mps**: Adds information about the division ('seat') of the politician.
* **allbyparty**: Adds information about the party of the politician.
* **senators**: Adds information about the state that a senator was representing.
The datasets are up-to-date as of 29 November 2021.
If you are using this for anything other than general interest, then please check the comments column in case there is a flag that could affect your results. You're also welcome to get in touch so that I can make sure that the aspect you're interested in is of a good enough quality for your purposes.
If you have suggestions on how I could improve the datasets, or corrections, please don't hesitate to get in touch.
## Installation
You can install the released version of `AustralianPoliticians` from CRAN:
``` r
install.packages("AustralianPoliticians")
```
or the development version from GitHub with:
``` r
devtools::install_github("RohanAlexander/AustralianPoliticians")
```
## Example
This is a example of how to load the data:
```{r loaddata, eval=FALSE, echo=TRUE}
library(tidyverse)
devtools::install_github("RohanAlexander/AustralianPoliticians")
all <- AustralianPoliticians::get_auspol('all')
party <- AustralianPoliticians::get_auspol('allbyparty')
mps <- AustralianPoliticians::get_auspol('mps')
senators <- AustralianPoliticians::get_auspol('senators')
```
You could then combine the tables using left_join:
```{r combinetables, eval=FALSE, echo=TRUE}
all_individuals_with_their_division <- all %>%
left_join(mps, by = c("uniqueID"))
```
Monica Alexander has written a brief blog post where she uses the package to look at life expectancy of Australian politicians:
https://www.monicaalexander.com/posts/2019-08-09-australian_politicians/
## Dataset details
### all
This is the main dataset and contains one row per politician, with columns: uniqueID, surname, allOtherNames, firstName, commonName, displayName, earlierOrLaterNames, title, gender, birthDate, birthYear, birthPlace, deathDate, member, senator, wikidataID, wikipedia, adb, and comments.
```{r}
head(AustralianPoliticians::get_auspol('all'))
```
uniqueID is usually the surname of the politician and the year that they were born, e.g. Abbott1859. In certain cases this is not enough to uniquely identify them and then we add the first name, e.g. AndersonCharles1897 and AndersonGordon1897. In cases where there is punctuation in the surname, e.g. Ashley-Brown or O'Brien, this has been removed but capitalisation has been retained, so those would become AshleyBrown or OBrien, respectively.
commonName is used to highlight the name that the politician tended to be known as e.g. Ted instead of Edward This is used in displayName which is a politicians surname and their common name (if they had one) or first name e.g. Abbott, Richard. In cases where this would not be unique, e.g. Francis Baker, an additional name has been added.
earlierOrLaterNames is mostly used to keep track of women changing their names at marriage. Similarly, title is mostly used to keep track of 'Dr', but both have been used inconsistently and should be only used sparingly.
Some politicians don't have a complete birth date, and instead only have a year of birth. In these cases their entry for birthDate will be empty, but they will have a birthYear. All death dates are complete, but in the case of one politician -- John William Croft -- this has been inputted, as the circumstances and timing (even year) of his death are unknown. birthPlace is mostly taken from WikiData, with a few updates. There are some issues that need to be addressed here.
member and senator are binary indicator variables used to signify whether the politician was in the lower or upper house. Most politicians are only in one or the other, but some were in both. One politician in the dataset was neither a senator nor an MP - Heather Elaine Hill. She remains in the dataset because she was elected to the senate, and the need for this dataset to exactly match the AustralianElections one), however her eligibility was challenged and her election was invalidated, so she was never a senator.
adb is a link to the Australian Dictionary of Biography.
### mps
This dataset adds information about the division ('seat') of the politician. One row per division-politician, with columns: uniqueID; mpsDivision; mpsState; mpsEnteredAtByElection; mpsFrom; mpsTo; mpsEndReason; mpsChangedSeat; and mpsComments.
```{r}
head(AustralianPoliticians::get_auspol('mps'))
```
Certain divisions change name. Sometimes this is minor, for instance Kingsford-Smith to Kingsford Smith, and sometimes it is total. In all cases this is being treated as change in division -- the politician is treated as finishing with one division and moving to another -- but changedSeat can be used to identify these cases and adjust for them if necessary.
byElection is a binary indicator variable as to whether the politician was entering the seat following a by-election.
changedSeat is a binary indicator variable as to whether the politician left a division because they were changing the division, as opposed to losing an election or retiring.
### allbyparty
This dataset adds information about the party of the politician. One row per party-politician, with columns: uniqueID; partyAbbrev; partyName; partyFrom; partyTo; partyChangedName; partySimplifiedName; partySpecificDateInputted; and partyComments.
```{r}
head(AustralianPoliticians::get_auspol('allbyparty'))
```
Party can be a little confusing in cases where a politician changed party. In general, in this dataset, the to/from dates are set-up so that when a politician is in parliament they will have the correct party. However the dataset should not be used to say anything about when they are out of parliament. For instance, some politicians lost their seat, changed party, and then regained a seat in parliament. The dataset does not know when they changed party while they were out of parliament, and it assumes that they changed party either at the same time that they lost their seat or at the same time as they re-gained a seat. Similarly, there are plenty of cases where a politician has ceased being a member after they leave parliament, for instance, Malcolm Fraser left the Liberals. Again, that is not reflected in the dataset.
Certain parties, such as the Nationals, changed their name at various points in time. This is included as a party change for people at that time in partyAbbreviationParlHandbook and partyNameParlHandbook. However, partySimplified abstracts away from that.
Party name changes:
* The Country Party changed to the National Country Party on 3 May 1975 according to http://nla.gov.au/nla.news-article110636121. It then changed from the National Country Party to the National Party of Australia on 17 October 1982 according to http://nla.gov.au/nla.news-article116476081. And finally, it changed from the National Party of Australia to The Nationals on 11 October 2003 according to the party website.
* The Nick Xenophon Team changed to Centre Alliance on 10 April 2018, according to ABC news reports.
### senators
This dataset adds information about the state that a senator was representing. The variables are: uniqueID; senatorsState; senatorsFrom; senatorsTo; senatorsEndReason; senatorsSec15Sel; and senatorsComments.
```{r}
head(AustralianPoliticians::get_auspol('senators'))
```
This dataset is fairly similar to by_division_mps, expect that it also has senatorsSec15Sel This is a binary indicator variable and indicates whether the senator has been appointed rather than elected.
## Roadmap
* ~~Add dataset of ministers with dates.~~
* Add information about education.
* Add information about relationships, for instance father-son, etc.
* Add voting based on JFG's dataset.
## Sources
In the first instance, the Parliamentary Handbook was the main source of information. This was augmented with information from Wikipedia, the Australian Dictionary of Biography, and the Senate Biographies wherever possible. Limited information was obtained from other sources, such as state parliaments and newspapers (via Trove), and these have generally been specified in the comments. birthPlace is mostly from WikiData.
The uniqueID_to_aphID dataset was primarily drawn from a dataset put together by Patrick Leslie, and it was checked against a modern dataset from [Open Australia](https://github.com/openaustralia/openaustralia-parser/blob/master/data/people.csv), and Tim Sherratt's Historic Hansard records for the [Reps](http://historichansard.net/hofreps/people/) and [Senate](http://historichansard.net/senate/people/).
## Acknowledgements
In terms of developing the package I am especially grateful for the advice and help of:
* Ben Readshaw,
* Edward Howlett,
* Julia Haider,
* Kelly Lyons,
* Monica Alexander,
* Sharla Gelfand, and
* Simon Munzert.
Thank you to Patrick Leslie who generously donated data.
The icon of parliaments used in the hex sticker was made by <a href="https://www.flaticon.com/authors/freepik" title="Freepik">Freepik</a> from <a href="https://www.flaticon.com/" title="Flaticon">www.flaticon.com</a></div>
## Author information
**Rohan Alexander** (corresponding author and repository maintainer)
University of Toronto
Faculty of Information and Department of Statistical Sciences
140 St George St
Toronto, ON, Canada
Email: [email protected]
**Paul A. Hodgetts**
University of Toronto
Faculty of Information
140 St George St
Toronto, ON, Canada