Skip to content

GCM Configuration

Anna Bernasconi edited this page Jul 30, 2018 · 7 revisions

The GCM configurations include specific files used to configure:

  • The connection parameters for the database
  • The mappings used in the Mapper step
  • Preferences to run the Mapper (which builds the GCM global schema)
  • Preferences to run the Enricher (which builds the local knowledge base LKB and builds references from the GCM to the KLB)

Database

To connect to the database the url, username, password, and driver need to be specified. As an example:

database {

url = "jdbc:postgresql://localhost/gecotest"

username = "geco_user"

password = "***"

driver = "org.postgresql.Driver" }

Mappings

To create mappings between attributes of the source schema (which we call "cleaned keys", since they are produced during the Cleaning phase) and attributes of the global schema (i.e., the GCM) we use an XML file with the following structure.

xml_schema

The first level contains a number of table elements, one for each entity in the GCM, which is identified by the attribute "name". This contains an arbitrary number of mapping elements. This level describes a single mapping between a row in the source file (SourceKey) and an attribute of a GCM table (GlobalKey).

A mapping can be of different kinds, as specified through the attribute "method". In the majority of cases the method is not specified, which corresponds to a direct mapping from the source key corresponding value to the GCM attribute value. Other times the user may want to specify a manual value (method = "MANUALLY"). Other examples of these "method" functions are CONCAT (to concatenate multiple source key values), CHECKPREC (to define a second choice source key, in case the first corresponds to a null value), REMOVE (to remove a string from the value corresponding to the source key), SUB (to substitute a string to an existing one), UPPERCASE/LOWERCASE.

These methods can be composed and used together on a same mapping by using a dash character. For example, to concatenate words that have been first put in uppercase, we should write: metod = "CONCAT-UPPERCASE".

Depending on the "method" choice, other attributes may be required to further define the behavior of the function. These include:

  • CONCAT_CHARACTER: defines the concatenation string
  • SUB_CHARACTER: defines the string to be substituted
  • NEW_CHARACTER: defines the string which will replace the string to be substituted
  • REM_CHARACTER: defines the string to be removed

Preferences for Mapper

When the Mapper is run, using specific settings (whose location should be specified as a parameter in Configuration), the user can, for example, set the following:

import_pairs = true

multiple_value_concatenation = ", "

method_character_separation = "-"

Preferences for Enricher

When the Enricher is run, using specific settings (whose location should be specified as a parameter in Configuration), the user can, for example, set the following:

  • ontological depth for ancestors and descendants
  • preferred ontologies for each enrichable GCM attribute
  • match score threshold
  • specific api keys for external services calls

Preferences for Flattener

When the Flattener is run, using specific settings (whose location should be specified as a parameter in Configuration), the user can, for example, set the following:

separation = "__" prefix = "integrated"