GOAL: to export IPCC authors data from MySQL database back to CSV to perform QA checks in ipcc-facts-checking
Differences in the structure of the source for authors data in IPCC assessment reports and the structure of the database where the data is stored make it hard to perform analyses for Quality Assurance.
This project fills this gap by formatting data from the database in chunks matching the contents and organization of the source documents: for each chapter, authors are sorted first by role, then by last name and first name, and authors are listed with full names and countries. On the other hand, only the lists of authors in annexes mention the institutions.
Run feedback.sh
, providing the optional user
, host
and password
to connect to the MySQL database (by default as root@localhost without
password).
The CSV data is exported as separate files for each chapter and annex,
to the ipcc-facts-checking
folder of the git submodule within this
repository.
The exported data can then be committed in the ipcc-facts-checking
submodule and pushed to the remote repository on GitHub.
The selection and formatting of data is performed in SQL, with a
distinct query for each exported dataset. Common parts are extracted
to separate files including queries with variables, which are then
shared by multiple scripts, each setting different values to the
variables to customize the common behavior before triggering its
execution with the source
instruction.
The output of each SQL script is filtered by the shell script
tsv2csv.sh
to convert it from TSV to CSV before saving the file
as back.csv
in the same folder as the reference PDF document
for the QA checks within ipcc-facts-checking
subdirectory.