GitHub - pindograma/pdf-poll-parser: Free polling data from pollster PDFs.

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README		README
datafolha.py		datafolha.py
escutec.py		escutec.py
ibope.py		ibope.py
interface.py		interface.py
multidados.py		multidados.py
parana.py		parana.py
table.py		table.py
util.py		util.py
verita.py		verita.py

Repository files navigation

Welcome to pdf-poll-parser!

This is a series of Python scripts that uses pdfminer to extract polling data
from PDFs published by Brazilian pollsters. The PDFs generally follow patterns
that allow for this process.

Example usage:
>>> from pathlib import Path
>>> from multidados import MultidadosParser
>>> pathlist = Path('.').glob('data/multidados/*.pdf')
>>> MultidadosParser.parse(pathlist, 'multidados')

Currently supported pollsters are:
- Ibope (2012, 2014, 2016, 2018)
- Escutec (2012, 2016) *
- Paraná Pesquisas (2016, 2018) *
- Datafolha (2012, 2014, 2016, 2018) *
- Multidados (2016)
- Veritá (2012, 2014, 2016, 2018) **

Pollster PDFs were obtained from the following sources:
- Ibope: http://eleicoes.ibopeinteligencia.com/
- Escutec: http://www.escutec.com/
- Paraná Pesquisas: https://www.paranapesquisas.com.br/
- Datafolha: http://datafolha.folha.uol.com.br/
- Multidados: https://www.multidadospesquisa.com.br/

*: Parser fails on 5% or less of the PDF set. These outliers will require
manual parsing.

**: Not exactly a PDF parser. Instead, it parses a JSON obtained from
https://institutoverita.com.br/inc/js/psq.php.