Various DCAT tools for harvesting metadata from Belgian open data portals, converting metadata to DCAT-AP files and updating the Belgian data.gov.be portal.
The portal itself is a Drupal 9 website, based on Fedict's Openfed distribution.
Only interested in the result ? The N-Triples and XML files (DCAT-AP) used to update data.gov.be can be found in the dcat repository
These tools can be used with a Java runtime 17 or newer, on a headless machine, i.e. there is no fancy GUI.
Internet connection is obviously required, although a proxy can be used.
- Helper classes: for storing scraped pages locally, conversion tools etc.
- Various scrapers: getting metadata from various repositories and websites, and turning the metadata into DCAT files
- Also part of the scrapers are a series of SPARQL scripts to turn DCAT into DCAT-AP: e.g. map site-specific themes, add missing properties and prepare the files for updating data.gov.be
- Data.gov.be updater: update the data.gov.be (currently Drupal 7) website using the enhanced DCAT files
- Some tools: link checker, EDP converter tool
There is also separate, stand-alone RDF validator project which can be used to validate DCAT metadata, regardless if the metadata is to be published on data.gov.be or not.
- The various portals (except
all
) should be harvested using the scrapers. - The enhanced files can be uploaded to the data.gov.be portal using the updater
- Then use
all
enhancer to merge all the files from the various portals into one filedatagovbe.nt
- Convert the merged file using the EDP tool to an XML file called
datagovbe_edp.xml
- Upload both the
datagovbe.nt
anddatagovbe_edp.xml
to github - This will be used as input for the European Data Portal (scheduled Thursday morning, every week)
See also the Notes