This repository has been archived by the owner on Sep 26, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 20
Add utils to measure portal's uptime #71
Open
bcbernardo
wants to merge
21
commits into
okfn-brasil:main
Choose a base branch
from
bcbernardo:main
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 23 commits
Commits
Show all changes
21 commits
Select commit
Hold shift + click to select a range
d5ca708
Add utils skeleton
bcbernardo 66c54d8
Get portals using Querido Diario Census as source
bcbernardo ea7ad3c
Add portal fetchers
bcbernardo 87226ec
Add kaggle-related callbacks.
bcbernardo 978163f
Add main logic
bcbernardo 77b876b
Fix style
bcbernardo f6c6c4d
Ignore zip files
bcbernardo d9a8c08
Move models.py module
bcbernardo 1d710c3
Fix style
bcbernardo 1f5db63
Package for distribution
bcbernardo b287d8f
Fix typing issues
bcbernardo c0be8db
Fix log level for warnings
bcbernardo 7c0df86
Fix logging level selection
bcbernardo fdc9035
Enhance response logging
bcbernardo cffdf94
Add Azure Function to ping portals
bcbernardo d95dc8f
Remove space characters around attribution operator
bcbernardo a02e42a
Add subrepo documentation
bcbernardo 95d1463
Dockerize portal fetching
bcbernardo 29e4514
Update gitignore
bcbernardo 3f53321
Merge pull request #1 from bcbernardo/reports
bcbernardo cb9368f
Merge branch 'okfn-brasil:main' into main
bcbernardo File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
__pycache* | ||
.*cache | ||
.venv |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
# Environment variables | ||
|
||
### IMPORTANT: when done editing this file, rename it to ".env" | ||
### (without the ".template" ending) | ||
|
||
## General settings | ||
|
||
### log verbosity level - choose between 'error', 'warn', 'info', 'debug' | ||
FETCHPORTALS_LOG_LEVEL="debug" | ||
|
||
### set source of portal URLs and geo IDs - only 'census' is currently accepted | ||
FETCHPORTALS_SOURCE="census" | ||
|
||
### control whether to only ping portal status, or to fetch its source code | ||
FETCHPORTALS_MODE="ping" | ||
|
||
### set maximum number of tries and time waiting | ||
FETCHPORTALS_MAX_RETRIES=3 | ||
FETCHPORTALS_TIMEOUT=10.0 | ||
|
||
### control callback to process and/or save the retrived data | ||
FETCHPORTALS_CALLBACK="kaggle" | ||
|
||
### control what to do when destination file already exists - must be one of | ||
### 'replace', 'append' or 'skip' | ||
FETCHPORTALS_EXISTING="replace" | ||
|
||
### set a local directory where retrieved files may persist | ||
FETCHPORTALS_LOCALDIR="./data" | ||
|
||
## Kaggle settings | ||
|
||
KAGGLE_USERNAME="exampleuser" | ||
KAGGLE_KEY="12345678abcdefgh" | ||
KAGGLE_DATASET="bcbernardo/censusqd2020" | ||
KAGGLE_FILE="portals-availability.csv" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
.env | ||
.venv | ||
local.settings.json | ||
__azurite* | ||
__pycache__ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,134 @@ | ||
# Utilitários do Censo Querido Diário | ||
|
||
Este sub-repositório inclui rotinas e funções elaboradas pela comunidade para | ||
processar, analisar e salvar os resultados do Censo Querido Diário. | ||
|
||
Atualmente, o único utilitário desenvolvido é o pacote `fetch_portals`, que se | ||
comunica com todos os endereços web cadastrados no Censo para checar quais | ||
portais estão *online* e/ou para obter seu código-fonte. | ||
|
||
Contribuições na forma de novos pacotes e utilitários para processar os dados | ||
do Censo são bem-vindas. Cheque o [CONTRIBUTING.md](../CONTRIBUTING.md) do | ||
projeto para mais detalhes de como ajudar nas diferentes tarefas do Censo, bem | ||
como a seção [Adicionando um novo utilitário](#adicionando-um-novo-utilitário) | ||
para instruções específicas de como criar uma nova rotina de pré-processamento. | ||
|
||
Se tiver alguma dúvida ou quiser ter uma visão geral dos próximos passos do | ||
Censo, não hesite em visitar as | ||
[issues](https://github.com/okfn-brasil/censo-querido-diario/issues) do projeto | ||
ou entrar em contato pelo [Discord](https://discord.gg/M6ep5VED). | ||
|
||
## Instalação e execução | ||
|
||
### Com o Docker (recomendado) | ||
|
||
A forma mais simples de rodar os utilitários é utilizando o Docker. Você deve | ||
ter instalado o Docker Community Edition. Encontre a versão apropriada para o | ||
seu sistema [aqui][Docker CE]. Você também deve ter o [git] instalado para fazer download do repositório. | ||
|
||
Para instalar os utilitários, abra um terminal de linha de comando e rode os | ||
seguintes comandos: | ||
|
||
```bash | ||
$ git clone https://github.com/okfn-brasil/censo-querido-diario.git | ||
$ cd censo-querido-diario/utils | ||
``` | ||
|
||
Em um explorador de arquivos, encontre o diretório onde você fez download do | ||
repositório e abra o arquivo `censo-querido-diario/utils/.env.template`. Adapte | ||
as configurações presentes no arquivo de acordo com os dados que pretende obter | ||
(especialmente as iniciadas em `KAGGLE_*`, se for exportar para o Kaggle). | ||
Salve o arquivo modificado renomeie-o para `.env` (sem o `.template` no final). | ||
|
||
Para inicializar a checagem dos portais, basta voltar ao terminal e inserir o | ||
comando: | ||
|
||
```bash | ||
$ docker-compose up | ||
``` | ||
|
||
[Docker CE]: https://hub.docker.com/search?offering=community&type=edition | ||
[git]: https://git-scm.com/ | ||
|
||
### Como pacote Python | ||
|
||
Os utilitários contidos nesse sub-repositório podem ser instalados como pacotes | ||
Python avulsos. Para isso, você deve ter instalada na sua máquina uma versão | ||
Python compatível (3.7 ou superior). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see you use |
||
|
||
Para instalar a partir do repositório, rode em um terminal de linha de comando: | ||
|
||
```bash | ||
$ git clone https://github.com/okfn-brasil/censo-querido-diario.git | ||
$ cd censo-querido-diario/utils | ||
$ python -m venv .venv | ||
$ source .venv/bin/activate # no PowerShell: $ .venv/Scripts\activate.ps1 | ||
(.venv) $ python -m pip install . | ||
``` | ||
|
||
Para que a instalação funcione e você possa usar o comando `fetch-portals` da | ||
linha de comando, é necessário que antes você exporte algumas variáveis de | ||
ambiente, que controlam o funcionamento do programa. | ||
|
||
Para isso, edite o arquivo `.env.template`, contido no diretório | ||
`censo-querido-diario/utils`, alterando as configurações necessárias. | ||
**Importante:** para rodar a versão atual do coletor de portais, você deve, no | ||
mínimo, alterar as variáveis de ambiente iniciadas em `KAGGLE_*`. Você precisa | ||
ter permissão de escrita no dataset utilizado para salvar os resultados. | ||
|
||
Quando finalizar a edição, salve o arquivo `.env.template` e renomei-o para | ||
`.env`, apenas. | ||
<!-- | ||
|
||
No terminal, acesse a pasta `censo-querido-diario/utils` e rode | ||
o seguinte comando: | ||
|
||
No Unix/MacOS: | ||
|
||
```bash | ||
$ set -a; . .env; set +a | ||
``` | ||
|
||
No Windows/PowerShell, pode ser necessário adicionar um pacote adicional para | ||
ler as variáveis de ambiente do arquivo `.env` (privilégios de administrador | ||
podem ser necessários): | ||
|
||
```powershell | ||
PS> Install-Module -Name Set-PsEnv | ||
``` | ||
|
||
--> | ||
|
||
Com o utilitário instalado como um pacote e o respectivo ambiente virtual | ||
ativado, basta rodar o comando `fetch-portals` na linha de comando. Esse | ||
comando fará requisições a todos os portais de publicação de diários oficiais | ||
mapeados no Censo, e salvará os resultados no dataset do Kaggle indicado no | ||
arquivo `.env`. | ||
|
||
```bash | ||
(.venv) $ fetch-portals | ||
``` | ||
|
||
## Adicionando um novo utilitário | ||
|
||
Para desenvolver um pacote Python que consuma e processe os dados do Censo | ||
Querido Diário, [faça um | ||
*fork*](https://github.com/okfn-brasil/censo-querido-diario/fork) do | ||
repositório para a sua própria conta e adicione os scripts em um sub-diretório | ||
da pasta `censo-querido-diario/utils/src`. | ||
|
||
Para o nome do diretório e dos módulos, utilize apenas letras minúsculas e | ||
*underscores* (\_). Insira também um arquivo `__init__.py` vazio no diretório | ||
criado, e adicione as dependências utilizadas na lista de pacotes abaixo do | ||
item `install_requires` do arquivo | ||
[`censo-querido-diario/utils/setup.cfg`](./setup.cfg). | ||
|
||
Se você quiser que o utilitário seja acessível por meio do Docker, crie um | ||
arquivo chamado `<NOME_DO_UTILITARIO>.Dockerfile` em | ||
`censo-querido-diario/utils`, contendo as instruções de construção do contâiner | ||
(veja a [referência do Dockerfile]). Em seguida, adicione uma entrada no | ||
arquivo `docker-compose.yml` localizado no mesmo diretório (veja a [referência | ||
do Docker Compose] para mais detalhes). | ||
|
||
[referência do Dockerfile]: https://docs.docker.com/engine/reference/builder/ | ||
[referência do Docker Compose]: https://docs.docker.com/compose/compose-file/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# Copyright 2020 Open Knowledge Brasil | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe 2021? |
||
|
||
# Use of this source code is governed by an MIT-style | ||
# license that can be found in the LICENSE file or at | ||
# https://opensource.org/licenses/MIT. | ||
|
||
"""Periodically check the availability of Official Gazettes portals. | ||
""" | ||
|
||
import logging | ||
|
||
import azure.functions as func | ||
from fetch_portals.main import main as fetch | ||
|
||
|
||
def main(timer: func.TimerRequest): | ||
"""Ping Querido Diario Census portals to check their availability.""" | ||
logging.info(f"Starting function (past due {timer.past_due})") | ||
fetch(mode="ping", existing="append", callback="kaggle") | ||
logging.info("Finished checking portals from Census.") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
{ | ||
"scriptFile": "__init__.py", | ||
"bindings": [ | ||
{ | ||
"name": "timer", | ||
"type": "timerTrigger", | ||
"direction": "in", | ||
"schedule": "0 0 */3 * * *" | ||
} | ||
] | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
version: '3.8' | ||
services: | ||
fetch-portals: | ||
build: | ||
context: . | ||
dockerfile: fetch_portals.Dockerfile | ||
volumes: | ||
- ./cache:/usr/src/data | ||
env_file: .env | ||
environment: | ||
- FETCHPORTALS_LOCALDIR=/usr/src/data |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
FROM python:3.8.5-slim | ||
|
||
# Setup env | ||
ENV LANG C.UTF-8 | ||
ENV LC_ALL C.UTF-8 | ||
ENV PYTHONDONTWRITEBYTECODE 1 | ||
ENV PYTHONFAULTHANDLER 1 | ||
|
||
RUN mkdir /usr/src/app | ||
WORKDIR /usr/src/app | ||
|
||
COPY . . | ||
|
||
RUN python -m pip install . | ||
|
||
# Run the executable | ||
ENTRYPOINT ["fetch-portals"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
{ | ||
"functionTimeout": "00:10:00", | ||
"version": "2.0", | ||
"watchDirectories": [ "src" ] | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
[build-system] | ||
requires = [ | ||
"setuptools>=42", | ||
"wheel" | ||
] | ||
build-backend = "setuptools.build_meta" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
.[dev] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
[metadata] | ||
name = censusqdutils | ||
version = 0.1.0 | ||
url = https://github.com/okfn-brasil/censo-querido-diario | ||
author = Open Knowledge Brasil | ||
author_email = [email protected] | ||
classifiers = | ||
Programming Language :: Python :: 3 | ||
License :: OSI Approved :: MIT License | ||
Operating System :: OS Independent | ||
description = Utils for processing Querido Diario Census data. | ||
long_description = file: README.md | ||
long_description_content_type = text/markdown | ||
license = MIT | ||
|
||
[options] | ||
python_requires = >=3.7 | ||
package_dir = | ||
=src | ||
packages = find: | ||
install_requires = | ||
aiohttp >= 3.7 | ||
kaggle >= 1.5 | ||
pandas >= 1.2 | ||
python-dotenv >= 0.15 | ||
|
||
[options.extras_require] | ||
dev = | ||
black == 20.8b1 | ||
flake8 >= 3.8.4 | ||
isort >= 5.7.0 | ||
mypy >= 0.800 | ||
pandas-stubs >= 1.0.4.4 | ||
pytest >= 6.2.2 | ||
|
||
[options.packages.find] | ||
where = src | ||
|
||
[options.entry_points] | ||
console_scripts = | ||
fetch-portals = fetch_portals.main:main |
Empty file.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to take a look in Createnv to easily automate that.
Disclaimer: I am the creator of that package! But, I mean… I wrote it exactly to make it easy to onboard newcomers to projects using
.env
files ✨