-
Notifications
You must be signed in to change notification settings - Fork 4
GCM Configuration
The GCM configurations include specific files used to configure:
- The connection parameters for the database
- The mappings used in the Mapper step
- Preferences to run the Mapper (which builds the GCM global schema)
- Preferences to run the Enricher (which builds the local knowledge base LKB and builds references from the GCM to the KLB)
To connect to the database the url
, username
, password
, and driver
need to be specified.
As an example:
database {
url = "jdbc:postgresql://localhost/gecotest"
username = "geco_user"
password = "***"
driver = "org.postgresql.Driver" }
To create mappings between attributes of the source schema (which we call "cleaned keys", since they are produced during the Cleaning phase) and attributes of the global schema (i.e., the GCM) we use an XML file with the following structure.
The first level contains a number of table
elements, one for each entity in the GCM, which is identified by the attribute "name".
This contains an arbitrary number of mapping
elements. This level describes
a single mapping between a row in the source file (SourceKey) and an attribute of a GCM table (GlobalKey).
A mapping can be of different kinds, as specified through the attribute "method". In the majority of cases the method is not specified, which corresponds to a direct mapping from the source key corresponding value to the GCM attribute value. Other times the user may want to specify a manual value (method = "MANUALLY"). Other examples of these "method" functions are CONCAT (to concatenate multiple source key values), CHECKPREC (to define a second choice source key, in case the first corresponds to a null value), REMOVE (to remove a string from the value corresponding to the source key), SUB (to substitute a string to an existing one), UPPERCASE/LOWERCASE.
These methods can be composed and used together on a same mapping by using a dash character. For example, to concatenate words that have been first put in uppercase, we should write: metod = "CONCAT-UPPERCASE"
.
Depending on the "method" choice, other attributes may be required to further define the behavior of the function. These include:
- CONCAT_CHARACTER: defines the concatenation string
- SUB_CHARACTER: defines the string to be substituted
- NEW_CHARACTER: defines the string which will replace the string to be substituted
- REM_CHARACTER: defines the string to be removed
When the Mapper is run, using specific settings (whose location should be specified as a parameter in Configuration), the user can, for example, set the following:
import_pairs = true
multiple_value_concatenation = ", "
method_character_separation = "-"
When the Enricher is run, using specific settings (whose location should be specified as a parameter in Configuration), the user can, for example, set the following:
- ontological depth for ancestors and descendants
- preferred ontologies for each enrichable GCM attribute
- match score threshold
- specific api keys for external services calls
When the Flattener is run, using specific settings (whose location should be specified as a parameter in Configuration), the user can, for example, set the following:
separation = "__"
prefix = "integrated"
Usage
Supporting Tools