Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add instrument (e.g., CRF form) to VLMD schema #39

Closed
mbkranz opened this issue Oct 11, 2023 · 9 comments
Closed

Add instrument (e.g., CRF form) to VLMD schema #39

mbkranz opened this issue Oct 11, 2023 · 9 comments

Comments

@mbkranz
Copy link
Collaborator

mbkranz commented Oct 11, 2023

The standardsMappings property has been developed to over time to make a general enough property that encompasses both HEAL CDEs and external CDEs. However, as we get further into development there is a need to add instrument (eg form level of details). This proposed change creates nests two sub properties: item and instrument. The below word document contains this proposal as well.

**Note: see comments for addition of root level standardsMappings amendment to proposal.

standardsMappings-proposed-changes-datataskforce-10-17.docx

Current

The current standardsMappings property was designed with the NIH CDE repository in mind with the idea that folks would look up a common data element on the website and map to VLMD (or we would do “fuzzy matching.” Below is used in the current filled out examples (this example uses a race var from NIH CDE repo). Note, an investigator can fill in any one of these (see attached doc)

Examples

JSON heal vlmd schema
CSV heal vmld schema

Json example:

...
"standardsMappings": [
    {
        "type": "cde",
        "url": "https://cde.nlm.nih.gov/deView?tinyId=Fakc6Jy2x",
        "label": "NLM race",
        "source": "NLM",
        "id": "Fakc6Jy2x"
    }
]
...

CSV example:
name standardsMappings.type standardsMappings.label standardsMappings.url standardsMappings.source standardsMappings.id
race cde NLM race https://cde.nlm.nih.gov/deView?tinyId=Fakc6Jy2x NLM Fakc6Jy2x

Proposed

In the proposed schema, we have introduced two nested properties, instrument and item, under standardsMappings. These sub-properties allow for flexibility and accommodate additional sources and IDs as needed (ie non NIH HEAL CDEs discussed in data TF). The properties are not required, so users can fill them out as desired (except source or url if no source). Additionally, title is included in the instrument as no other VLMD properties address instrument info. Also, we reduce complexity and confusion by removing other potentially ambiguous properties such as type. Below are snippets showing combinations of valid entries for this property.

All Fields Mapped (Both Instrument and Item)

"standardsMappings": [
    {
        "instrument": {
            "url": "https://www.heal.nih.gov/files/CDEs/2023-05/adult-demographics-cdes.xlsx",
            "source": "heal-cde",
            "title": "adult-demographics",
            "id": <drupal id here>
        },
        "item": {
            "url": "https://evs.nci.nih.gov/ftp1/CDISC/SDTM/SDTM%20Terminology.html#CL.C74457.RACE",
            "source": "CDISC",
            "id": "C74457"
        }
    }
]

Only Instrument Title of Form CDE File Mapped

In this scenario, especially as CDE variables do not have associated CDISC ids listed, only instrument information is given.

"standardsMappings": [
    {
        "instrument": {
            "source": "heal-cde",
            "title": "adult-demographics"
        }
    }
]

Only Instrument ID of HEAL CDE Mapped

"standardsMappings": [
    {
        "instrument": {
            "source": "heal-cde",
            "id": <drupal id here>
        }
    }
]

Other Non-HEAL CDE Use Cases

Only item matched (for example if found in the NIH (not HEAL) CDE repository). Folks would enter the information in the "Identifier" section. Similar to the above, they could also just enter the "url".

"standardsMappings": [
    {
        "item": {
            "source": "NLM",
            "id": "Fakc6Jy2x"
        }
    }
]

Multiple CDE Mappings

Two separate records. If desired, multiple standard mappings can be entered, say from the NIH HEAL CDE repo and the NIH CDE lookup (NLM) by way of two separate records in the list.

"standardsMappings": [
    {
        "instrument": {
            "source": "heal-cde",
            "title": "adult-demographics"
        },
        "item": {
            "source": "CDISC",
            "id": "C74457"
        },
    },
    {
        "item": {
            "source": "NLM",
            "id": "Fakc6Jy2x"
        }
    }
]
@mbkranz
Copy link
Collaborator Author

mbkranz commented Oct 11, 2023

@gaurav @artinthetrees

@mbkranz
Copy link
Collaborator Author

mbkranz commented Oct 11, 2023

@pschumm

@pschumm
Copy link
Contributor

pschumm commented Oct 18, 2023

I have reviewed this Mike, and think it reflects our discussion accurately and will work well. My only final comment would be that we might consider adding/restoring the standardsMappings property at the root level as well as the field level, but with only instrument (not item) permitted. If information is present at both the root and the field level, then the information at the field level would take precedence (i.e., it would cascade).

@mbkranz
Copy link
Collaborator Author

mbkranz commented Oct 18, 2023

I have reviewed this Mike, and think it reflects our discussion accurately and will work well. My only final comment would be that we might consider adding/restoring the standardsMappings property at the root level as well as the field level, but with only instrument (not item) permitted. If information is present at both the root and the field level, then the information at the field level would take precedence (i.e., it would cascade).

I think that makes sense @pschumm . Previously we had proposed a references property at the root but it was a big vague as to what to put in there. Note, "fields" will repalce "data_dictionary" to reduce ambiguity/increase alignment with fricitonless.

{
    "title":"Adult Demographics",
    "standardsMappings": [
        {
        "instrument": {
            "source": "heal-cde",
            "title": "adult-demographics"
        }
     ],
   "fields":[
        {"name":"race",
       "title":"Racial Category",
       "description":"This is an example CDE using racial category",
       "type":"string",
       "constraints":{"enum":["White","Black",...,...],
       ....
     "standardsMappings": [
        "item": {
            "source": "CDISC",
            "id": "C74457"
        },
       }
    ]
    }

@pschumm
Copy link
Contributor

pschumm commented Oct 18, 2023

Previously we had proposed a references property at the root but it was a big vague as to what to put in there.

Right. Moreover, the unnecessary use of a different property name (i.e., references as opposed to standardsMappings) may have contributed to confusion and would have made cascading less natural.

@mbkranz mbkranz added the v0.2.0 label Dec 1, 2023
mbkranz added a commit to norc-heal/heal-metadata-schemas that referenced this issue Dec 4, 2023
mbkranz added a commit to norc-heal/heal-metadata-schemas that referenced this issue Dec 4, 2023
@mbkranz
Copy link
Collaborator Author

mbkranz commented Dec 8, 2023

Upon implementing this, realized it would increase interoperability and ease of conversion to use relative json paths as the csv variable names. So, for example. With standardMappings instruments:

"standardsMappings": [
    {
        "instrument": {
            "source": "heal-cde",
            "title": "adult-demographics"
        },
        "item": {
            "source": "CDISC",
            "id": "C74457"
        },
    }
  ]

Corresponds to:


standardsMappings[0].instrument.source, standardsMappings[0].instrument.title,  standardsMappings[0].item.source, standardsMappings[0].item.id
heal-cde,adult-demographics,CDISC,C74457
 

@mbkranz
Copy link
Collaborator Author

mbkranz commented Dec 8, 2023

Implementation rule for csv to json conversion: if a property is defined at root level, is also defined at field level, then carry up to root level? This will work well for defining standardsMappings instrument values (and also schema versions -- but would need to clearly define difference between schemaVersion and version -- or the version of the instance).

@mbkranz mbkranz mentioned this issue Dec 8, 2023
@mbkranz
Copy link
Collaborator Author

mbkranz commented Dec 20, 2023

Implementation rule for csv to json conversion: if a property is defined at root level, is also defined at field level, then carry up to root level? This will work well for defining standardsMappings instrument values (and also schema versions -- but would need to clearly define difference between schemaVersion and version -- or the version of the instance).

Could also add another rule: if variable name begins with $. in csv, then the first unique, non-missing record in column populates the root json property of said json path (e.g., $.standardsMappings[0].instrument)

@mbkranz
Copy link
Collaborator Author

mbkranz commented Jan 28, 2024

Closing out #50

@mbkranz mbkranz closed this as completed Jan 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants