Validators ignored for fields with repeating subfields #394

themowski · 2023-12-05T16:41:36Z

Environment

CKAN version: 2.10.1
ckanext-scheming version: release-3.0.0

Description

Our CKAN implementation is being used to index datasets across a variety of domains. We have a schema that defines a core set of fields that all datasets need to specify. To support flexibility for dataset authors to add additional key-value metadata that is outside of the required core set of fields, our schema also has a field called custom_fields that uses repeating_subfields to allow users to specify additional key-value pairs to associate with their dataset. The key subfield has a validator that makes sure that the key's name meets certain requirements:

- field_name: custom_fields
  label: Custom Fields
  repeating_label: Custom Field
  repeating_subfields:
  - field_name: key
    label: Key
    validators: key_validator
    required: true
  - field_name: value
    label: Value
    required: true

We want to block users from specifying the same key value in multiple entries in custom_fields. The obvious way to do this is to define a validator function, keys_are_unique(), that can be applied to custom_fields to verify that the keys are unique.

- field_name: custom_fields
  label: Custom Fields
  validators: keys_are_unique
  repeating_label: Custom Field
  repeating_subfields:
  - field_name: key
    label: Key
    validators: key_validator
    required: true
  - field_name: value
    label: Value
    required: true

After trying this & spending some time in my debugger, I discovered that the extension initially saves the top-level field's validator, but then replaces it with the subfields' validator(s). When this happens, the data type of the validators list also changes from list[str] to dict. I haven't had time yet to dig into why/how this works in the grand scheme of CKAN dataset creation, but that's probably my next step.

One example of this behavior is in the _field_validators() function in ckanext/scheming/plugins.py:

ckanext-scheming/ckanext/scheming/plugins.py

Lines 582 to 608 in 8646a9d

    
           def _field_validators(f, schema, convert_extras): 
        
               """ 
        
               Return the validators for a scheming field f 
        
               """ 
        
               if 'validators' in f: 
        
                   validators = validation.validators_from_string( 
        
                       f['validators'], 
        
                       f, 
        
                       schema 
        
                   ) 
        
               elif helpers.scheming_field_required(f): 
        
                   validators = [not_empty] 
        
               else: 
        
                   validators = [ignore_missing] 
        
               if convert_extras: 
        
                   validators.append(convert_to_extras) 
        
               # If this field contains children, we need a special validator to handle 
        
               # them. 
        
               if 'repeating_subfields' in f: 
        
                   validators = { 
        
                       sf['field_name']: _field_validators(sf, schema, False) 
        
                       for sf in f['repeating_subfields'] 
        
                   } 
        
               return validators

As a user, this behavior is unexpected -- I would expect that validators can be applied to both the top-level & sub-level fields. I think that it's worth updating the documentation for repeating_subfields and/or validators to explain that if a field has repeating_subfields, then validators for that field are ignored in favor of validators defined for the subfields.

I also have two specific questions, any insight would be appreciated:

We are trying to allow users to add an arbitrary number of "extra" key-value pairs to a dataset that is defined by a ckanext-scheming schema. Is our current approach (custom_fields field with repeating_subfields) the "correct" way to do what we are trying to do, or is there another method that is better supported, either by ckanext-scheming or by CKAN itself?
Assuming that what we are doing is correct, roughly how difficult would it be to modify ckanext-scheming to support validation of both the subfields and the top-level field? Any tips on what we would need to do to support that? It almost seems like this would require changes to the way the data is modelled, which I assume would be nontrivial.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validators ignored for fields with repeating subfields #394

Validators ignored for fields with repeating subfields #394

themowski commented Dec 5, 2023

Validators ignored for fields with repeating subfields #394

Validators ignored for fields with repeating subfields #394

Comments

themowski commented Dec 5, 2023

Environment

Description