Skip to content

Commit

Permalink
Merge pull request #16 from tibonto/13-versioning-the-dfg-fachsystematik
Browse files Browse the repository at this point in the history
DRAFT: 13 versioning the dfg fachsystematik
  • Loading branch information
André Castro authored Jul 4, 2024
2 parents 9ade52b + b609f21 commit 9cd607b
Show file tree
Hide file tree
Showing 14 changed files with 2,265 additions and 685 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
venv
zz
*#
49 changes: 28 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,52 +9,59 @@ We decided to build upon this work and build and RDF based ontology, for the *DF

![](./docs/dfgfo-hierarchies.png)

## Ontology


## Ontology
* **Ontology TTL**: [dfgfo.ttl](./dfgfo.ttl)
* **Ontology IRI**: https://github.com/tibonto/dfgfo/
* **Ontology IRI**: <https://github.com/tibonto/dfgfo/>
* **Ontology PURL**: <https://raw.githubusercontent.com/tibonto/DFG-Fachsystematik-Ontology/main/dfgfo.ttl>
* **ontology prefix/id**: `dfgfo`


## Create/update ontology

## Create/update ontology

**[dfgfo.ttl](./dfgfo.ttl) ontology file is created, by [scripts/create_ontology.py](./scripts/create_ontology.py) python script**, which
* parses the DFG classification system encoded [csv/Fachsystematik_2020-2024.csv](./csv/Fachsystematik_2020-2024.csv) (in EN/DE)

* parses the DFG classification system encoded in csv/Fachsystematik_20XX-20XX.csv (in EN/DE) (cf. directory [csv/](/csv/) and [csv/README.md](/csv/README.md))
* encodes each of the DFG's classification subjects (in .csv cells) into RDF graph triples
* of type `owl:Class`
* with `rdfs:label` in EN and skos:altLabel in DE
* subsumed to parent subject with `rdfs:subClassOf` accordinng to DFG Classification hierarchy
* of type `owl:Class`
* with `rdfs:label` in EN and skos:altLabel in DE
* subsumed to parent subject with `rdfs:subClassOf` accordinng to DFG Classification hierarchy
* parses the metadata triples from [metadata.ttl](./metadata.ttl) into a graph
* joins metadata and DFG classification graphs into [dfgfo.ttl](./dfgfo.ttl)


**Run**
### Run

Create a python3 Virtual Environment

Install requirements `pip install -r scripts/requirements.txt`

Run script to create ontology `python scripts/create_ontology.py`. Make sure to use end of line sequence `LF` for [/csv/Fachsystematik_2020-2024.csv](/csv/Fachsystematik_2020-2024.csv).


## Other scripts

* [scripts/parse_csv.py](./scripts/parse_csv.py) parses the CSV and ensures that the columns `Subject Number` and `Fachnummer` have the same values

## Ontology contributions:
Contributions are welcome.

At every push or pull_request a [ROBOT report](http://robot.obolibrary.org/report) and [ROBOT validate OWL DL profile](http://robot.obolibrary.org/validate-profile)test will be run from [.github/workflows/main.yml](.github/workflows/main.yml).
## Ontology contributions

Contributions are welcome.

At every push or pull_request a [ROBOT report](http://robot.obolibrary.org/report) and [ROBOT validate OWL DL profile](http://robot.obolibrary.org/validate-profile) test will be run from [.github/workflows/main.yml](.github/workflows/main.yml).

## DFG Classification of Scientific Disciplines

* [PDF(en)](https://www.dfg.de/download/pdf/dfg_im_profil/gremien/fachkollegien/amtsperiode_2020_2024/fachsystematik_2020-2024_en_grafik.pdf)
* [PDF(de)](https://www.dfg.de/download/pdf/dfg_im_profil/gremien/fachkollegien/amtsperiode_2020_2024/fachsystematik_2020-2024_de_grafik.pdf)
* [HTML page](https://www.dfg.de/en/dfg_profile/statutory_bodies/review_boards/subject_areas/index.jsp)
* [Edited CSV - combining both German and English labels](./csv/Fachsystematik_2020-2024.csv) (this repo)


* [HTML page](https://www.dfg.de/en/research-funding/proposal-funding-process/interdisciplinarity/subject-area-structure)
* PDFs
* 2020-2024
* [PDF(en)](https://www.dfg.de/download/pdf/dfg_im_profil/gremien/fachkollegien/amtsperiode_2020_2024/fachsystematik_2020-2024_en_grafik.pdf)
* [PDF(de)](https://www.dfg.de/download/pdf/dfg_im_profil/gremien/fachkollegien/amtsperiode_2020_2024/fachsystematik_2020-2024_de_grafik.pdf)
* 2024-2028
* [PDF(en)](https://www.dfg.de/resource/blob/331950/85717c3edb9ea8bd453d5110849865d3/fachsystematik-2024-2028-en-data.pdf)
* [PDF(de)](https://www.dfg.de/resource/blob/331944/33422f091e941592cdc355038a865e03/fachsystematik-2024-2028-de-data.pdf)
* Edited CSV - combining both German and English labels
* [2020-2024](/csv/2020-2024/Fachsystematik_2020-2024.csv) (this repo)
* [2024-2028](/csv/2024-2028/Fachsystematik_2024-2028.csv) (this repo)

## Releases:

For previous versions (2020-2024) see [[https://github.com/tibonto/DFG-Fachsystematik-Ontology/releases]]
File renamed without changes.
File renamed without changes.
36 changes: 36 additions & 0 deletions csv/2024-2028/CVS_Creation_Process.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# CSV creation from DFG's XLSXs

## For each of the Excel (.xlsx) files

1. remove rows 1,2 (title, empty)
2. remove empty columns A and G
5. save both DE and EN sheets into a CSV (to allow the following operations)
5.1 CSV export: Check "Quote all text cells" so that we avoid issues with commas within the cells
5.2 **from this point onward we shall only work on the CSVs and not the .xlsx)**

## in CSVs (easier to edit and see errors)

1. add headers EN: `Subject Number` and `Subject` for column A, B . DE: `Fachnummer`, `Fach`
3. add to header (row 1) "Subject Area" and "Scientific Discipline" in columns D, E
4. remove header rows (except row 1): 57, 137, 169
5. remove empty rows (search in column A)
6. fill-in the missing values (in Review Board, Subject Area, Scientific Discipline columns) - this is tedious but important, as we cannot reply on merged cells in the CSV. And it is at the core of the tree structure (@SArndt-TIB let me knows if this needs clarification)

## Join both CSVs

* just a copy-pasta
* ensure that EN comes before the DE terms
* headers should be in the following sequence:
```
Subject Number
Subject
Review Board
Subject Area
Scientific Discipline
Fachnummer
Fach
Fachkollegium
Fachgebiet
Wissenschaftsbereich
```

Loading

0 comments on commit 9cd607b

Please sign in to comment.