This repository contains the source code of a joint project of the AI + Automation Lab of Bayerischer Rundfunk (abbr. BR) and Mitteldeutscher Rundfunk (abbr. mdr) as well as ida to identify user comments that address the newsrooms to foster constructive exchange with our audiences.
This project documents the status of the project work during JournalismAI fellowship (more info bellow in chapter On the fellowship) in 2022. The fellowship was used by this projects team to explore technical solutions to support the mdr and BRs comment moderation teams. The goal was to allow the moderation teams to engage in real time communication with it's audience. For this purpose we constructed a system to bring comments with direct mentions of the media house to the immediate attention of the moderation team. That involves:
- fetch the comment instantanenoulsy
- preprocess the comments
- store the comments
- classify the into relevant and irrelevant comments
- publish the relevant comments to the moderations teams instance
- forward moderation team to comment in moderation tool
- collect feedback by moderation team to improve model
Part of the running project is a text classification model that was released on huggingface.
Create a virtualenv using
python3.9 -m venv .venv
source .venv/bin/activate
Install dependencies using
pip3 install -r requirements.txt
Run the API as uvicorn api:app --host <ip_address> --port <port>
Necessary settings for the usage of this project can be found in settings.py
.
The project's APIs are document via the endpoint /docs
This repository is connected by git actions to the GCloud Kubernetes cluster of BR. Access to the BR infrastructure is restricted to members of the BR.
Necessary settings for the deployment can be found in config.yml
.
NOTE: To optimize the deployment runtime of this repository, the documented dependencies in requirements.txt
where refactored into the base image wtwm-application-base-image
and are configured for inheritance. The image can be adapted from the dependencies in requirements.txt
.
image:
imageFrom: europe-west3-docker.pkg.dev/brdata-dev/cloud-deploy-images/wtwm-application-base-image
The comment data from BR and mdr is provided through APIs external to this repository. To include own comment data APIs follow the example of the endpoints /v1/get_mdr_comments
and /v1/get_latest_br_comments
. The comment data must fit the format of the Comment
class in src/models.py
.
Data files and the various models are stored in a persistent google bucket that is connected by the deployment routine to the pod.
The classification model, that was last used for the running system, can be found here.
The processed comments and their mentions are stored in a postgres database instance that is connected by the deployment routine to the pod.
The project's APIs are document via the endpoint /docs
API endpoints are secured by a bearer token. Requests must include the bearer token to be accepted.
JournalismAI is a project of Polis – the journalism think-tank at the London School of Economics and Political Science – and it’s sponsored by the Google News Initiative). If you want to know more about the Fellowship and the other JournalismAI activities, sign up for the newsletter or get in touch with the team via [email protected]