Skip to content

rll307/BrPoliCorpus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BrPoliCorpus: Brazilian Political Corpus

by Rodrigo Esteves de Lima-Lopes (Universidade Estadual de Campinas)

Introduction

This is the version 1.0 of the package BrPoliCorpus (Brazilian Political Corpus). It is intended to be a free repository of open data regarding official documents of Brazilian Politics.

Data

For the current time, the following datasets are available:

  • Inaugural Speeches: A set of Brazilian President's Inaugural Speeches.
    • Updated until 01/01/2023
  • Parliamentary Floor: A set of all parliamentary discourses available from October/2000.
    • Updated until 31/12/2023
  • Governmental Programmes: A set of candidates programmes for the Brazilian Elections. From 2014 on.
    • Updated until 01/07/2024
  • CPI: Brazilian Parliamentary Inquire Commission.
    • Only one CPI available
    • Updated until 01/07/2024

Corpus size

Doc Types Tokens Texts
CPI 128.089 3.767.972 75.182
Parliamentary Committees 386.534 44.668.908 2.577
Floor Parliamentary speeches 1.218.926 184.115.811 428.445
Gov Programmes 218.783 11.158.384 112
Inaugural Speeches 15.103 75.918 35
Total 1.967.435 243.786.993 506.351

Availability

This data is available in both CSV for free download and as a set of R commands for integrating data into R environment.

  • Availability on CSV data might be found here.
  • The R package has a set of functions responsible for downloading specific pieces of data. Please, see the vignette for a more detailed discussion.

Organisation of each CSV module

Each module is distributed as a CSV file containing the text and some metadata regarding the text.The CSV files can be downloaded individually (see how to do it here).

For those who are downloading the CSV file, keep in mind that a column contains the text, but others are metadata. See an example:

Example CSV

For use in ordinary Corpus Linguistics software, like WordSmith Tools and Anticonc, these columns have to be extracted and saved as single text files. If you have interest in doing so, please let me know, so I can provide a single file structure to you.

R Package and commands

Once you have installed the package (see instructions bellow) run:

download_index() 
View(IndexFunctions)

The first command will download a general index of the data available and the second will open the set of commands for data downloading. An example would be:

Committees006 <- download_Committees_006_data()

R package

Installation

In R, this package might be installed using devtools

library(devtools)
install_github("rll307/BrPoliCorpus")

Contribution

The following researchers have contributed to this corpus:

Acknowledgments

I would like to acknowledge CAPES and Alexander Humboldt Foundation for financing this version of this Corpus. To all the team of TU-Darmstadt who made this project possible.

How to cite

@software{BrPoliCorpus,
  author = {Rodrigo Esteves {de Lima-Lopes}},
  title = {BrPoliCorpus: Brazilian Political Corpus},
  url = {https://github.com/rll307/BrPoliCorpus.git},
  version = {1.0},
  date = {2024-7-01},
}

About

Corpus of Brazilian Political Language

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages