by Rodrigo Esteves de Lima-Lopes (Universidade Estadual de Campinas)
This is the version 1.0 of the package BrPoliCorpus (Brazilian Political Corpus). It is intended to be a free repository of open data regarding official documents of Brazilian Politics.
For the current time, the following datasets are available:
- Inaugural Speeches: A set of Brazilian President's Inaugural Speeches.
- Updated until 01/01/2023
- Parliamentary Floor: A set of all parliamentary discourses available from October/2000.
- Updated until 31/12/2023
- Governmental Programmes: A set of candidates programmes for the Brazilian Elections. From 2014 on.
- Updated until 01/07/2024
- CPI: Brazilian Parliamentary Inquire Commission.
- Only one CPI available
- Updated until 01/07/2024
Doc | Types | Tokens | Texts |
---|---|---|---|
CPI | 128.089 | 3.767.972 | 75.182 |
Parliamentary Committees | 386.534 | 44.668.908 | 2.577 |
Floor Parliamentary speeches | 1.218.926 | 184.115.811 | 428.445 |
Gov Programmes | 218.783 | 11.158.384 | 112 |
Inaugural Speeches | 15.103 | 75.918 | 35 |
Total | 1.967.435 | 243.786.993 | 506.351 |
This data is available in both CSV for free download and as a set of R commands for integrating data into R environment.
- Availability on CSV data might be found here.
- The R package has a set of functions responsible for downloading specific pieces of data. Please, see the vignette for a more detailed discussion.
Each module is distributed as a CSV file containing the text and some metadata regarding the text.The CSV files can be downloaded individually (see how to do it here).
For those who are downloading the CSV file, keep in mind that a column contains the text, but others are metadata. See an example:
For use in ordinary Corpus Linguistics software, like WordSmith Tools and Anticonc, these columns have to be extracted and saved as single text files. If you have interest in doing so, please let me know, so I can provide a single file structure to you.
Once you have installed the package (see instructions bellow) run:
download_index()
View(IndexFunctions)
The first command will download a general index of the data available and the second will open the set of commands for data downloading. An example would be:
Committees006 <- download_Committees_006_data()
In R, this package might be installed using devtools
library(devtools)
install_github("rll307/BrPoliCorpus")
The following researchers have contributed to this corpus:
- Prof. Dr. Rodrigo Esteves de Lima-Lopes (UNICAMP)
- Coding, data scraping, corpus conceptualisation, package building
- Dr. Jörn Stegmeier (TU-Darmstadt)
- Coding, data scraping, data scraping infrastructure
- Ni Yan (TU-Darmstadt)
- Coding, data scraping
- Dariia Shamgunova (TU-Darmstadt)
- Coding, data scraping
- Rodrigo Dornelles (MsC Candiate at Hertie School)
- Coding, data scraping
I would like to acknowledge CAPES and Alexander Humboldt Foundation for financing this version of this Corpus. To all the team of TU-Darmstadt who made this project possible.
@software{BrPoliCorpus,
author = {Rodrigo Esteves {de Lima-Lopes}},
title = {BrPoliCorpus: Brazilian Political Corpus},
url = {https://github.com/rll307/BrPoliCorpus.git},
version = {1.0},
date = {2024-7-01},
}