Skip to content

Commit

Permalink
Update metadata-catalogue.md
Browse files Browse the repository at this point in the history
  • Loading branch information
pvgenuchten authored Feb 15, 2024
1 parent b2560f8 commit 50077d7
Showing 1 changed file with 36 additions and 13 deletions.
49 changes: 36 additions & 13 deletions tech/docs/dmc/metadata-catalogue.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,38 @@
# Catalogue Server

- store metadata
- process queries
- findability (fulltext, spatial and temporal search)
- searchable items - definitions DCAT,...
- link to metadata, external
- semantic consistency
- validation capabilities
- relationship with OGC APIs, OGC CSW, GeoDCAT

- connections with:harvester, scheme, presentation, processing, storage & structure
- technologies used: Virtuoso or Fuseki or Pycsw, GeoNetwork
- responsible person:
- participating: Tomas Reznik; Luís de Sousa
- query metadata
- M: filter by (configurable set of) properties (AND/OR/NOT, FullTextSearch, by geography)
- M: Sorting and pagination
- S: aggregate results (faceted search, dashboarding)
- W: customise ranking of the results
- OGC:CSW, OGCAPI:Records, OAI-PMH
- Search engine discoverability / Schema.org
- Link to data download / data preview

## Relationship
- Storage
- Metadata harvesting
- Metadata processing
- Link checking
- Metadata content authoring
- Metadata consistency
- Git participatory content moderation
- Metadata validation
- Data quality validation
- Metadata transformations

## Technology

To date 2 relevant technologies for catalogue server:
- [pycsw](https://pycsw.org) is a python implementation of OGCAPI Records (and CSW, oai-omh,...) with a tailored html output used in the [ejpsoil project](https://catalogue.ejpsoil.eu). The implementation at EJPSoil has a github backend using the [mcf format](https://geopython.github.io/pygeometa/reference/mcf/) (a subset of iso19115 in YAML encoding) to faciliate participatory content creation. Harvesting is managed via CI-CD pipelines, using the [geodatacrawler](https://pypi.org/project/geodatacrawler/) tool. Content queries and faceted search are managed by a PostGreSQL database. Ranking is not available.
- [GeoNetwork](https://geonetwork-opensource.org) is a catalogue implementation in java. Backend is a PostGreSQL database, queries are managed by an Elastic Search index. Supports ranking and faceting. GeoNetwork contains harvesters which run at intervals, metadata transformations and metadata authoring workflows. JRC INSPIRE has build a number of extensions to GeoNetwork to facilitate the INSPIRE GeoPortal, such as bulk CSW harvesting, metadata validation and link checking.

## Considerations

- Both technologies are oriented to the geospatial governmental domain and have limited options to interact with Academic repositories (Zenodo, Dataverse, OpenAire, Datacite), Open data catalogues (CKAN, european data portal). Semantic web portals (DCAT/schema.org), Earth observation catalogues (STAC and EO OpenSearch) and Biodiversity portals (GBIF EML).
- GeoNetwork is a nice one stop solution, but presents some challenges on participatory content creation and maintainability.

## People

- responsible person: Paul van Genuchten
- participating: Tomas Reznik; Luís de Sousa

0 comments on commit 50077d7

Please sign in to comment.