-
Notifications
You must be signed in to change notification settings - Fork 0
Free polling data from pollster PDFs.
License
pindograma/pdf-poll-parser
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Welcome to pdf-poll-parser! This is a series of Python scripts that uses pdfminer to extract polling data from PDFs published by Brazilian pollsters. The PDFs generally follow patterns that allow for this process. Example usage: >>> from pathlib import Path >>> from multidados import MultidadosParser >>> pathlist = Path('.').glob('data/multidados/*.pdf') >>> MultidadosParser.parse(pathlist, 'multidados') Currently supported pollsters are: - Ibope (2012, 2014, 2016, 2018) - Escutec (2012, 2016) * - Paraná Pesquisas (2016, 2018) * - Datafolha (2012, 2014, 2016, 2018) * - Multidados (2016) - Veritá (2012, 2014, 2016, 2018) ** Pollster PDFs were obtained from the following sources: - Ibope: http://eleicoes.ibopeinteligencia.com/ - Escutec: http://www.escutec.com/ - Paraná Pesquisas: https://www.paranapesquisas.com.br/ - Datafolha: http://datafolha.folha.uol.com.br/ - Multidados: https://www.multidadospesquisa.com.br/ *: Parser fails on 5% or less of the PDF set. These outliers will require manual parsing. **: Not exactly a PDF parser. Instead, it parses a JSON obtained from https://institutoverita.com.br/inc/js/psq.php.
About
Free polling data from pollster PDFs.
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published