You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The default ckan_harvester will run into trouble if the harvesting ckan has a custom ckanext-scheming schema. The incompatibility lies with the handling of extra fields: Scheming uses extra fields to store its custom fields. The ckan_harvester on the other hand creates/overrides extra fields as found on the harvested instance. However, scheming's package_read templates will fail if any non-defined extra fields are present.
Assuming an unknown schema A (default, hard-coded like data.gov.au, or ckanext-scheming) is harvested into a custom ckanext-scheming schema B using the ckan_harvester, there are two overlapping sets of fields present:
fields in A and in B
fields in A not in B
fields in B not in A
Would it make sense to modify the ckan_harvester's behaviour around extra fields as follows:
parse ckan config
if no ckanext.scheming custom schemas are set, continue as normal, otherwise:
if ckanext.scheming custom schemas are present, use (first?) schema as "B"
iterate over present keys in schema B
fields identical in A and B: direct transfer
fields in A not in B: suggestion: append to B.notes?
fields in B not in A: if optional, leave blank. If mandatory, suggestion: set dummy value and append warning to B.notes?
The text was updated successfully, but these errors were encountered:
It should be possible to add this sort of logic to a custom harvester, right?
Also, sites using ckanext-scheming will advertise the schemas they have installed through actions scheming_dataset_schema_list and scheming_dataset_schema_show so it's possible to query the schema in use on both ends instead of checking the config.
The default ckan_harvester will run into trouble if the harvesting ckan has a custom ckanext-scheming schema. The incompatibility lies with the handling of extra fields: Scheming uses extra fields to store its custom fields. The ckan_harvester on the other hand creates/overrides extra fields as found on the harvested instance. However, scheming's package_read templates will fail if any non-defined extra fields are present.
Pinging @amercader and @wardi for advice:
Assuming an unknown schema A (default, hard-coded like data.gov.au, or ckanext-scheming) is harvested into a custom ckanext-scheming schema B using the ckan_harvester, there are two overlapping sets of fields present:
Would it make sense to modify the ckan_harvester's behaviour around extra fields as follows:
The text was updated successfully, but these errors were encountered: