From 163e75007d70999fb0d366db68203b2b93b2bc95 Mon Sep 17 00:00:00 2001
From:  <>
Date: Fri, 8 Sep 2023 22:34:42 +0000
Subject: [PATCH] Deployed 84f723d with MkDocs version: 1.5.2

---
 search/search_index.json          |   2 +-
 sitemap.xml.gz                    | Bin 336 -> 336 bytes
 vlmd/extract/exceldata/index.html |  19 ++++++++++---------
 3 files changed, 11 insertions(+), 10 deletions(-)
diff --git a/search/search_index.json b/search/search_index.json
index c1b6017..411cbff 100644
--- a/search/search_index.json
+++ b/search/search_index.json
@@ -1 +1 @@
-{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"HEAL Data Utilities","text":"<p>The HEAL Data Utilities python package provides data packaging tools for the HEAL Data Ecosystem to facilitate data discovery, sharing, and harmonization on the HEAL Platform.</p> <p>Currently, the focus of this repository is generating standardized variable level metadata (VLMD) in the form of data dictionaries. See the quick start section to get started without installing any of the prerequisites. (Click here for the Variable-level Metadata documentation section).</p> <p>However, in the future, this will be expanded for all HEAL-specific data packaging functions (e.g., study- and file-level metadata and data).</p>"},{"location":"#quick-start","title":"Quick start","text":"<p>Note</p> <p>If using the quick start option, no prerequisites are required.</p> <p>Double click on the <code>vlmd</code> (or <code>vlmd.exe</code>) executable or run the <code>vlmd</code> executable without any arguments to quickly start using this tool. This \"quick start\" will take walk you through step by step by prompting you of the various options.</p> <p>Important</p> <p>Stand alone applications for different operating systems are available here. These allow you to run the <code>vlmd</code> tool without needing to install anything else. Just (1) download, (2) unzip, and (3) double click on the <code>vlmd</code> application icon.</p>"},{"location":"#prerequisites","title":"Prerequisites","text":""},{"location":"#python","title":"Python","text":"<p>While the HEAL Data Utilities should be compatible with most versions of Python, you can download the latest version of Python here and install it on your local computer. We recommend installing Python version 3.10.</p>"},{"location":"#installation","title":"Installation","text":"<p>To install the latest official release of healdata-utils, from your computer's command prompt, run:</p> <p><code>pip install healdata-utils</code></p> <p>OR for the most up-to-date unreleased version run: </p> <p><code>pip install git+https://github.com/norc-heal/healdata-utils.git</code></p> <p>Note</p> <p>Installing the unreleased version requires having <code>git</code> software installed.</p>"},{"location":"vlmd/","title":"Variable-level Metadata (Data Dictionaries)","text":""},{"location":"vlmd/#motivation","title":"Motivation","text":"<p>Variable level metadata (VLMD), in the form of standardized data dictionaries, provides an exciting opportunity:</p> <ul> <li>a way to search, understand, and compare datasets before (potentially sensitive) data is shared. </li> </ul> <p>For an example of this searchability in the context of study level metadata, see the platform's discovery page</p> <ul> <li> <p>When data is available, VLMD provides a way to validate the data as well.</p> </li> <li> <p>Supports HEAL projects and goals such as the common data elements program</p> </li> </ul>"},{"location":"vlmd/#functions","title":"Functions","text":"<p><code>extract</code>: Extract the variable level metadata from an existing file with a specific   type/format</p> <p><code>start</code>: Start a data dictionary from an empty template</p> <p><code>validate</code>: Check (validate) an existing HEAL data dictionary file to see if it follows the HEAL specifications after filling out a template or further annotation after extracting from a different format.</p> <p>Typical workflows for creating a HEAL-compliant data dictionary include:</p> <ol> <li> <p>Create your data dictionary</p> <p>(a) Run the <code>vlmd extract</code> command (or <code>convert_to_vlmd</code> if in python) to generate a HEAL-compliant data dictionary via your desired input format </p> <p>(b) Run the <code>vlmd template</code> command to start from an empty template.</p> </li> <li> <p>Add/annotate with additional information in your preferred HEAL data dictionary format (either <code>json</code> or <code>csv</code>).</p> <ul> <li>To further annotate and use the data dictionary, see the variable-level metadata field property information below:<ul> <li><code>csv</code> data dictionary</li> <li><code>json</code> data dictionary</li> </ul> </li> </ul> </li> <li> <p>Run the <code>vlmd validate</code> command  with your HEAL data dictioanry as the input to validate.</p> </li> <li> <p>Repeat (2) and (3) until you are ready to submit. Please note, currently only <code>name</code> and <code>description</code> are required.</p> </li> </ol>"},{"location":"vlmd/#definitions","title":"Definitions","text":"<p>Important</p> <p>The main difference* between the CSV and JSON definitions lies in the way the data dictionaries are structured and the additional metadata included in the JSON data dictionary.</p> <p>The CSV data dictionary is a plain tabular representation with no additional metadata, while the JSON dataset includes fields along with additional metadata in the form of a root description and title.</p> <ul> <li>for field-specific differences, see the schemas in the documentation. </li> </ul> <p>For more information on variable-level metadata properties (fields), see the <code>csv</code> field specification and <code>json</code> data dictionary specification. </p>"},{"location":"vlmd/extract/","title":"<code>Extract</code> VLMD from another data type and format","text":"<p>The healdata-utils variable-level metadata (vlmd) tool inputs a variety of different input file types and extracts HEAL-compliant data dictionaries (JSON and CSV formats). Additionally, exported validation (i.e., \"error\") reports provide the user information as to a) if the exported data dictionary is valid according to HEAL specifications and b) how to modify one's data dictionary to make it HEAL-compliant.</p> <p>Warning</p> <p>Currently the python subcommand is <code>convert</code> but will be changed to <code>extract_to_vlmd</code> to be consistent with CLI. <code>extract</code> was chosen to better reflect the functionality.</p> Command Line Interface (CLI)Python <pre><code>vlmd extract --inputtype spss myproject/myfile.sav\n</code></pre> <p>Note</p> <p>To continue, it's recommended to go to the input types and formats. Also, for more details on the different flags/options, run <code>vlmd --help</code></p> <pre><code>from healdata_utils import convert_to_vlmd\n\nconvert_to_vlmd(input_filepath=\"myproject/myfile.sav\",inputtype=\"spss\")\n</code></pre> <p>Note</p> <p>To continue, it's recommended to go to the input types and formats. For a complete set of options with <code>convert_to_vlmd</code> see the docstring (if in a notebook, one can enter <code>convert_to_vlmd?</code>)</p>"},{"location":"vlmd/extract/#input-types-and-formats","title":"Input Types and Formats","text":"<p>This section provides the specific syntax for running each of the supported types for generating HEAL-compliant data dictionaries are listed. Additional instructions on how to obtain the necessary input files/software are also provided. </p> <p>Note</p> <p>To further annotate your outputted data dictionaries, see the variable-level metadata field properties (with examples) for either the <code>csv data dictionary</code> click here or the <code>json data dictionary</code> click here.</p> <p>Extract variable level metadata from your data:</p> <ul> <li>CSV datasets</li> <li>SPSS datasets</li> <li>SAS datasets</li> <li>Stata datasets</li> <li>REDCap data dictionary</li> <li>Frictionless Table Schema</li> <li>Excel dataset</li> </ul>"},{"location":"vlmd/extract/#output","title":"Output","text":"<p>Both the python and command line routes will result in a JSON and CSV version of the HEAL data dictionary in the output folder along with the validation reports in the <code>errors</code> folder. See below:</p> <ul> <li><code>errors/heal-csv-errors.json</code>: outputted validation report for table in csv file against frictionless schema</li> </ul> <p>If valid, this file will contain: <pre><code>{\n\"valid\": true,\n\"errors\": []\n}\n</code></pre> - <code>errors/heal-json-errors.json</code>:  outputted jsonschema validation report.</p> <ul> <li>If valid, this file will contain: <pre><code>{\n\"valid\": true,\n\"errors\": []\n}\n</code></pre></li> </ul> <p>If no <code>outputdir</code> specified, the resulting HEAL-compliant data dictionaries will be named:</p> <ul> <li><code>heal-csvtemplate-data-dictionary.csv</code>: This is the CSV data dictionary</li> <li><code>heal-jsontemplate-data-dictionary.json</code>: This is the JSON version of the data dictionary</li> </ul>"},{"location":"vlmd/extract/csvdata/","title":"<code>csv</code> Datasets","text":"<p>CSV (comma-separated values) is the main open tabular data format for storage and exchange. It is easy to create and understand using basic text editors in addition to popular spreadsheet software like Google Sheets and Excel. Importantly, CSVs are simple and can be easily integrated into web applications and just about any software.</p> <p>Currently, the HEAL Data Utilities <code>vlmd</code> function can infer a minimal, HEAL-compliant dataset by inferring <code>name</code>, <code>type</code>, and <code>enum</code> (i.e., possible values). After this minimal data dictionary is generated, the researcher can further annotate it with fields' <code>description</code> and other optional properties in either the HEAL-compliant <code>csv</code>- or <code>json</code>-formatted data dictionary (see the HEAL data dictionary template sections below for more information).</p>"},{"location":"vlmd/extract/exceldata/","title":"Excel (xlsx) dataset","text":"<p>Excel workbooks contain tabular data tables across named worksheets.</p> <p>This vlmd extraction tool provides the ability to extract vlmd from all of these worksheets either as a combined data dictionary or as multiple data dictionaries.</p>"},{"location":"vlmd/extract/exceldata/#run-the-vlmd-command","title":"Run the <code>vlmd</code> command","text":"CLIPython <pre><code>vlmd extract --inputtype excel-data myexcelfile.xlsx\n</code></pre>"},{"location":"vlmd/extract/exceldata/#to-output-multiple-sheets-as-separate-data-dictionaries","title":"To output multiple sheets as separate data dictionaries","text":"<pre><code>from healdata_utils import convert_to_vlmd\n\nconvert_to_vlmd(input_filepath=\"myexcelfile.xlsx\",inputtype=\"excel-data\")\n</code></pre>"},{"location":"vlmd/extract/exceldata/#to-extract-multiple-sheets-as-one-data-dictionary","title":"To extract multiple sheets as one data dictionary","text":"<p>Note</p> <p>Be careful about using the <code>multiple_data_dicts=False</code>. In most instances, one sheet should correspond to one separate data table and thus have one corresponding data dictionary.  </p> <p>Note, this combines (ie concatenates all data tables) and then infers fields. This use case is when sheets are viewed as \"chunks\" of one resource/dataset. </p> <pre><code>from healdata_utils import convert_to_vlmd\n\nconvert_to_vlmd(\n    filepath=\"myexcelfile.xlsx\",\n    inputtype=\"excel-data\",\n    multiple_data_dicts=False\n    )\n</code></pre>"},{"location":"vlmd/extract/exceldata/#to-extract-a-subset-of-sheets-as-one-data-dictionary","title":"To extract a subset of sheets as one data dictionary","text":"<p>```python</p> <p>from healdata_utils import convert_to_vlmd</p> <p>convert_to_vlmd(     filepath=\"myexcelfile.xlsx\",     inputtype=\"excel-data\",     multiple_data_dicts=False,     sheet_name=[\"mysheet1\",\"mysheet2\"]     )</p>"},{"location":"vlmd/extract/frictionlessschema/","title":"Frictionless Table Schema","text":"<p>While vlmd specifications are designed (and still being developed), to support interoperability with the heal platform, minor naming translations may be needed. This function supports any of said translations (eg., frictionless <code>fields</code> name --&gt; heal <code>data_dictionary</code>)</p> <p>Note, this conversion supports either <code>yaml</code> or <code>json</code> format (currently only tests for <code>json</code> format but should work with yaml). </p>"},{"location":"vlmd/extract/frictionlessschema/#creating-a-frictionless-table-schema","title":"Creating a frictionless table schema","text":"<p>Below are the official frictionless table schema specifications, which you will notice a high degree of overlap with the heal variable level metadata specifications.</p> <p>See here for the frictionless table schema specs</p>"},{"location":"vlmd/extract/frictionlessschema/#run-the-vlmd-command","title":"Run the <code>vlmd</code> command","text":"<pre><code>vlmd extract --inputtype frictionless data/frictionless_dataset1.frictionless.schema.json\n</code></pre>"},{"location":"vlmd/extract/redcapcsv/","title":"REDCap: Data Dictionary CSV Export","text":"<p>For users collecting data in a REDCap data management system, HEAL-compliant data dictionaries can be generated directly from REDCap exports. </p> <p>The REDCap data dictionary export serves the purpose of providing variable-level metadata in a standardized, tabular format and is generally easy to export. The HEAL data utilities leverages this user experience and standardized format to enable HEAL researchers to generate a Heal-compliant data dictionary. </p>"},{"location":"vlmd/extract/redcapcsv/#export-your-redcap-data-dictionary","title":"Export your Redcap data dictionary","text":"<p>To download a REDCap CSV export, do the following*: </p> <ol> <li>After logging in to your REDCap project page, locate the <code>Data dictionary</code> page. A link to this page may be available on the project side bar (see image below) or in the <code>Project Setup tab</code> at the top of your page.</li> </ol> <p></p> <ol> <li>After arriving at the <code>Data dictionary</code> page, click on <code>Download the current data dictionary</code> to export the dictionary (see below).</li> </ol> <p></p> <p>*there may be slight differences depending on your specific REDCap instance and version</p>"},{"location":"vlmd/extract/redcapcsv/#run-the-vlmd-command","title":"Run the <code>vlmd</code> command","text":"<pre><code>vlmd extract --inputtype redcap input/example_redcap_demo.redcap.csv </code></pre>"},{"location":"vlmd/extract/sas/","title":"SAS <code>sas7bdat</code> (and <code>sas7bcat</code>) files","text":"<p>To accommodate SAS users, the HEAL Data Utilities supports the binary <code>sas7bdat</code> file format, which contains the actual data values (observations/records). This file also includes variable metadata (variable <code>names</code> and variable labels/ <code>descriptions</code>).</p> <p>The HEAL Data Utilities also provides the option to include a catalog file \u2013 <code>sas7bcat</code> format - with the <code>sas7bdat</code>.  A <code>sas7bcat</code> file contains variable value labels, or <code>encodings</code>, that can be mapped onto the corresponding data from a <code>sas7bdat</code> file.</p>"},{"location":"vlmd/extract/sas/#creating-a-sas7bdat-and-a-sas7bcat-file","title":"Creating a <code>sas7bdat</code> and a <code>sas7bcat</code> file","text":"<p>Many SAS users build formats and labels into their data processing and analysis scripts. In this section, we provide syntax that can be easily copy-pasted into these existing workflows to create <code>sas7bdat</code> and <code>sas7bcat</code> files to input into the <code>vlmd</code> tool. </p> <p>This script template can be run separately or inserted directly at the end of a SAS user's workflow. </p> <p>Note</p> <p>If inserted directly, remember to delete the lines with <code>%INCLUDE</code>)</p> Template template.sas<pre><code>/*1. Read in data file without value labels and run full code. \n        Note: The most important pieces to run here are the PROC FORMAT statement(s) and any data steps \n        that assign formats and variable labels which are needed for the data dictionary. You may have defined variable labels and values in separate scripts for different analyses. In order to capture all your defined variable labels and values across scripts, you will need an %INCLUDE statement for each SAS script that defines unique variable labels or value labels.*/\n\n%INCLUDE \"&lt;INSERT SAS SCRIPT HERE FILE PATH HERE&gt;\"; /* THIS WILL RUN A SEPARATE SAS SCRIPT*/\n%INCLUDE \"&lt;INSERT SAS SCRIPT HERE FILE PATH HERE&gt;\"; /* THIS WILL RUN A SECOND SEPARATE SAS SCRIPT*/ \n\n/*2. Output the format catalog (sas7bcat) */\n/*2a. If you do not have an out directory, assign one to output the SAS catalog and data file. If you already have an out directory assigned, skip this step and replace \u201cout\u201d with your out directory libname in the flow*/\n\nlibname out \"&lt;INSERT THE DESIRED LOCATION (FILE PATH) TO YOUR SAS7BCAT AND SAS7BDAT FILES HERE&gt;\";\n\n/*2b. Output the format catalog.\n        Note: The format catalog is automatically stored in work.formats. This step copies the format file to the \n        out directory as a sas7bcat file.*/\nproc catalog cat=work.FORMATS;\n    copy out=out.FORMATS;\n    run;\n\n/*3. Output the data file (sas7bdat) */\ndata out.yourdata;\n    set &lt;INSERT THE NAME OF YOUR FINAL SAS DATASET HERE&gt;;\n    run;\n</code></pre> <p>The below SAS syntax is an example of how to use the template within your SAS workflow.</p> <p>The below sample script creates all of our variable and value labels. Your workflow may include multiple SAS scripts with multiple format statements and may include analyses and other PROC calls for data exploration,  but for demonstration purposes, this example only uses one script and focuses on defining the variable and value labels.</p> Example my_existing_sas_workflow.sas<pre><code>/*1. Read in input data */\nproc import datafile=\"myprojectfolder/input/mydata.csv\"\n    out=raw\n    dbms=csv replace;\n    getnames=yes;\nrun;\n\n/*2. Set up proc format and apply formats and variable labels in data step */\n/*Create encodings (value labels)*/\nproc format;\n    VALUE YESNO\n    0       =\"No\"\n    1       =\"Yes\"\n\n    VALUE PUBLIC\n    1='State mental health authority (SMHA)'\n    2='Other state government agency or department'\n    3='Regional/district authority or county, local, or municipal government'\n    4='Tribal government'\n    5='Indian Health Service'\n    6='Department of Veterans Affairs'\n    7='Other'\n\n    VALUE FOCUS\n    1='Mental health treatment'\n    2='Substance abuse treatment'\n    3='Mix of mental health and substance abuse treatment (neither is primary)'\n    4='General health care'\n    5='Other service focus';\n\n**Apply formats to dataset;\ndata processed;\n    set raw;\n\n    /*Assign formats*/\n    format YOUNGADULTS TREATPSYCHOTHRPY TREATTRAUMATHRPY YESNO. FOCUS FOCUS. PUBLIC PUBLIC.;\n    /*Add variable labels*/\n    label YOUNGADULTS=\"Accepts young adults (aged 18-25 years old) for Tx\"\n            TREATPSYCHOTHRPY=\"Facility offers individual psychotherapy\"\n            TREATTRAUMATHRPY=\"Facility offers trauma therapy\"\n            FOCUS=\"Primary treatment focus of facility\"\n            PUBLIC=\"Public agency or department that operates facility\";\nrun;\n</code></pre> <p>This second script called <code>my_output.sas</code> is the filled out template. Note the <code>%INCLUDE</code> function that calls <code>my_existing_sas_workflow.sas</code></p> my_output.sas<pre><code>/*1. Read in data file without value labels and run full code. \n        Note: The most important pieces to run here are the PROC FORMAT statement(s) and any data steps \n        that assign formats and variable labels which are needed for the data dictionary. You may have defined variable labels and values in separate scripts for different analyses. In order to capture all your defined variable labels and values across scripts, you will need an %INCLUDE statement for each SAS script that defines unique variable labels or value labels.*/*/\n\n%INCLUDE \"myprojectfolder/my_existing_workflow.sas\"; /* THIS WILL RUN A SEPARATE SAS SCRIPT*/\n\n/*2. Output the format catalog (sas7bcat) */\n/*2a. If you do not have an out directory, assign one to output the SAS catalog and data file.*/\nlibname out \"myprojectfolder/output\";\n\n/*2b. Output the format catalog.\n        Note: The format catalog is automatically stored in work.formats. This step copies the format file to the \n        out directory as a sas7bcat file.*/\nproc catalog cat=work.FORMATS;\n    copy out=out.FORMATS;\n    run;\n\n/*3. Output the data file (sas7bdat) to your output folder*/\ndata out.yourdata;\n    set processed;\n    run;\n</code></pre>"},{"location":"vlmd/extract/sas/#run-the-vlmd-command","title":"Run the <code>vlmd</code> command","text":"<p>After creating the necessary <code>sas7bdat</code> and <code>sas7bcat</code> files, you can then run the <code>vlmd</code> command. The tool, will automatically detect the sas7bcat file if located in the same directory as your data file. If not detected, the command will run without the sas7bcat catalog file and the <code>encodings</code> (i.e., value labels) will not be extracted from the catalog file.</p> <pre><code>vlmd extract --inputtype sas input/data.sas7bdat </code></pre>"},{"location":"vlmd/extract/spss/","title":"SPSS <code>.sav</code> files","text":"<p>For SPSS users, the HEAL Data Utilities generates HEAL-compliant data dictionaries from SPSS's default file format for storing datasets: a <code>SAV</code> file. It stores not only the data itself but also metadata such as variable names, variable labels, types, and value labels. The HEAL Data Utilities extracts these data and metadata to create HEAL-compliant data dictionaries.</p>"},{"location":"vlmd/extract/spss/#run-the-vlmd-command","title":"Run the <code>vlmd</code> command","text":"<pre><code>vlmd extract --inputtype spss data/example_pyreadstat_output.sav </code></pre>"},{"location":"vlmd/extract/stata/","title":"Stata <code>.dta</code> files","text":"<p>For Stata users, the HEAL Data Utilities generates HEAL-compliant data dictionaries through Stata's default file format: a <code>DTA</code> file. <code>DTA</code> files store not only the data itself but also metadata such as variable names, variable labels, types, and value labels.</p>"},{"location":"vlmd/extract/stata/#run-the-vlmd-command","title":"Run the <code>vlmd</code> command","text":"<pre><code>vlmd extract --inputtype stata data/mydatafile.dta </code></pre>"},{"location":"vlmd/schemas/","title":"HEAL data dictionary schemas","text":"<p>Click on each data dictionary schema below to view information about each format's data dictionary properties (such as a description, examples, etc).</p> <p>CSV fields</p> <p>JSON data dictionary</p> <p>Note</p> <p><code>enum</code> type means that a field can only be one of a certain set of possible values.</p>"},{"location":"vlmd/schemas/csv-fields/","title":"Tabular (CSV) data dictionary","text":"HEAL Variable Level Metadata Fields HEAL Variable Level Metadata Fields Type: object <p>Variable level metadata individual fields integrated into the variable level metadata object within the HEAL platform metadata service.</p> <p>Note, only <code>name</code> and <code>description</code> are required. Listed at the end of the description are suggested \"priority\" levels in brackets (e.g., []): 1. [Required]: Needs to be filled out to be valid. 2. [Highly recommended]: Greatly help using the data dictionary but not required. 3. [Optional, if applicable]: May only be applicable to certain fields. 4. [Autopopulated, if not filled]: These fields are intended to be autopopulated from other fields but can be filled out if desired. 5. [Experimental]: These fields are not currently used but are in development. module root  moduleType: string <p>The section, form, survey instrument, set of measures or other broad category used to group variables.</p> Examples: <pre>\"Demographics\"\n</pre> <pre>\"PROMIS\"\n</pre> <pre>\"Substance use\"\n</pre> <pre>\"Medical History\"\n</pre> <pre>\"Sleep questions\"\n</pre> <pre>\"Physical activity\"\n</pre> name Required root  nameType: string <p>The name of a variable (i.e., field) as it appears in the data. </p> <p>[Required]</p> title root  titleType: string <p>The human-readable title or label of the variable. </p> <p>[Highly recommended]</p> Example: <pre>\"My Variable (for name of my_variable)\"\n</pre> description Required root  descriptionType: string <p>An extended description of the variable. This could be the definition of a variable or the question text (e.g., if a survey). </p> <p>[Required]</p> Examples: <pre>\"Definition\"\n</pre> <pre>\"Question text (if a survey)\"\n</pre> type root  typeType: enum (of string) <p>A classification or category of a particular data element or property expected or allowed in the dataset.</p> <ul> <li><code>number</code> (A numeric value with optional decimal places. (e.g., 3.14))</li> <li><code>integer</code> (A whole number without decimal places. (e.g., 42))</li> <li><code>string</code> (A sequence of characters. (e.g., \\\"test\\\"))</li> <li><code>any</code> (Any type of data is allowed. (e.g., true))</li> <li><code>boolean</code> (A binary value representing true or false. (e.g., true))</li> <li><code>date</code> (A specific calendar date. (e.g., \\\"2023-05-25\\\"))</li> <li><code>datetime</code> (A specific date and time, including timezone information. (e.g., \\\"2023-05-25T10:30:00Z\\\"))</li> <li><code>time</code> (A specific time of day. (e.g., \\\"10:30:00\\\"))</li> <li><code>year</code> (A specific year. (e.g., 2023)</li> <li><code>yearmonth</code> (A specific year and month. (e.g., \\\"2023-05\\\"))</li> <li><code>duration</code> (A length of time. (e.g., \\\"PT1H\\\")</li> <li><code>geopoint</code> (A pair of latitude and longitude coordinates. (e.g., [51.5074, -0.1278]))</li> </ul> Must be one of: <ul><li>\"number\"</li><li>\"integer\"</li><li>\"string\"</li><li>\"any\"</li><li>\"boolean\"</li><li>\"date\"</li><li>\"datetime\"</li><li>\"time\"</li><li>\"year\"</li><li>\"yearmonth\"</li><li>\"duration\"</li><li>\"geopoint\"</li></ul> format root  format <p>A format taken from one of the frictionless specification schemas. For example, for tabular data, there is the Table Schema specification</p> <p>Each format is dependent on the <code>type</code> specified. For example: If <code>type</code> is \"string\", then see the String formats. If <code>type</code> is one of the date-like formats, then see Date formats.</p> Any of <ul><li> String Format </li><li> Date Format </li><li> Geopoint Format </li><li> geojson </li></ul> root  format anyOf String FormatType: enum (of string) Must be one of: <ul><li>\"uri\"</li><li>\"email\"</li><li>\"binary\"</li><li>\"uuid\"</li></ul> root  format anyOf Date FormatType: object <p>A format for a date variable (<code>date</code>,<code>time</code>,<code>datetime</code>). \\n\\t* default: An ISO8601 format string. \\n\\t* any: Any parsable representation of a date/time/datetime. The implementing library can attempt to parse the datetime via a range of strategies. \\n\\t* {PATTERN}: The value can be parsed according to <code>{PATTERN}</code>, which <code>MUST</code> follow the date formatting syntax of C / Python strftime.</p> <p>\\nExamples:</p> <p><code>%Y-%m-%d</code> (for date, e.g., 2023-05-25) <code>%Y%-%d</code> (for date, e.g., 20230525) for date without dashes\" <code>%Y-%m-%dT%H:%M:%S</code> (for datetime, e.g., 2023-05-25T10:30:45) <code>%Y-%m-%dT%H:%M:%SZ</code> (for datetime with UTC timezone, e.g., 2023-05-25T10:30:45Z) <code>%Y-%m-%dT%H:%M:%S%z</code> (for datetime with timezone offset, e.g., 2023-05-25T10:30:45+0300) <code>%Y-%m-%dT%H:%M</code> (for datetime without seconds, e.g., 2023-05-25T10:30) <code>%Y-%m-%dT%H</code> (for datetime without minutes and seconds, e.g., 2023-05-25T10) <code>%H:%M:%S</code> (for time, e.g., 10:30:45) <code>%H:%M:%SZ</code> (for time with UTC timezone, e.g., 10:30:45Z) <code>%H:%M:%S%z</code> (for time with timezone offset, e.g., 10:30:45+0300)</p> root  format anyOf Geopoint Format <p>The two types of formats for <code>geopoint</code> (describing a geographic point).</p> One of <ul><li> Option 1 </li><li> Option 2 </li></ul> root  format anyOf Geopoint Format oneOf item 0Type: array <p>A JSON array or a string parsable as a JSON array where each item is a number with the first as the latitude and the second as longitude. </p> root  format anyOf Geopoint Format oneOf item 1Type: object <p>Contains latitude and longitude with two keys (\"lat\" and \"long\") with number items mapped to each key.</p> root  format anyOf geojsonType: enum (of string) <p>The JSON object according to the geojson spec.</p> Must be one of: <ul><li>\"topojson\"</li><li>\"default\"</li></ul> constraints.maxLength root  constraints.maxLengthType: integer <p>Indicates the maximum length of an iterable (e.g., array, string, or object). For example, if 'Hello World' is the longest value of a categorical variable, this would be a maxLength of 11.</p> <p>[Optional,if applicable]</p> constraints.enum root  constraints.enumType: string <p>Constrains possible values to a set of values.</p> <p>[Optional,if applicable]</p> Must match regular expression: <code>^(?:[^|]+\\||[^|]*)(?:[^|]*\\|)*[^|]*$</code> constraints.pattern root  constraints.patternType: string <p>A regular expression pattern the data MUST conform to.</p> <p>[Optional,if applicable]</p> constraints.maximum root  constraints.maximumType: integer <p>Specifies the maximum value of a field (e.g., maximum -- or most recent -- date, maximum integer etc). Note, this is different then maxLength property.</p> <p>[Optional,if applicable]</p> encodings root  encodingsType: string <p>Variable value encodings provide a way to further annotate any value within a any variable type, making values easier to understand. </p> <p>Many analytic software programs (e.g., SPSS,Stata, and SAS) use numerical encodings and some algorithms only support numerical values. Encodings (and mappings) allow categorical values to be stored as numerical values.</p> <p>Additionally, as another use case, this field provides a way to store categoricals that are stored as \"short\" labels (such as abbreviations).</p> <p>[Optional,if applicable]</p> Must match regular expression: <code>^(?:.*?=.*?(?:\\||$))+$</code> Examples: <pre>\"0=No|1=Yes\"\n</pre> <pre>\"HW=Hello world|GBW=Good bye world|HM=Hi,Mike\"\n</pre> ordered root  orderedType: boolean <p>Indicates whether a categorical variable is ordered. This variable is relevant for variables that have an ordered relationship but not necessarily a numerical relationship (e.g., Strongly disagree &lt; Disagree &lt; Neutral &lt; Agree).</p> <p>[Optional,if applicable]</p> missingValues root  missingValuesType: string <p>A list of missing values specific to a variable.</p> <p>[Optional, if applicable]</p> Must match regular expression: <code>^(?:[^|]+\\||[^|]*)(?:[^|]*\\|)*[^|]*$</code> trueValues root  trueValuesType: string <p>For boolean (true) variable (as defined in type field), this field allows a physical string representation to be cast as true (increasing readability of the field). It can include one or more values.</p> <p>[Optional, if applicable]</p> Must match regular expression: <code>^(?:[^|]+\\||[^|]*)(?:[^|]*\\|)*[^|]*$</code> Examples: <pre>\"Required|REQUIRED\"\n</pre> <pre>\"required|Yes|Y|Checked\"\n</pre> <pre>\"Checked\"\n</pre> <pre>\"Required\"\n</pre> falseValues root  falseValuesType: string <p>For boolean (false) variable (as defined in type field), this field allows a physical string representation to be cast as false (increasing readability of the field) that is not a standard false value. It can include one or more values.</p> Must match regular expression: <code>^(?:[^|]+\\||[^|]*)(?:[^|]*\\|)*[^|]*$</code> repo_link root  repo_linkType: string <p>A link to the variable as it exists on the home repository, if applicable</p> cde_id.source root  cde_id.sourceType: string cde_id.id root  cde_id.idType: string ontology_id.relation root  ontology_id.relationType: string ontology_id.source root  ontology_id.sourceType: string ontology_id.id root  ontology_id.idType: string standardsMappings.type root  standardsMappings.typeType: string <p>The type of mapping linked to a published set of standard variables such as the NIH Common Data Elements program. [Autopopulated, if not filled]</p> Examples: <pre>\"cde\"\n</pre> <pre>\"ontology\"\n</pre> <pre>\"reference_list\"\n</pre> standardsMappings.label root  standardsMappings.labelType: string <p>A free text label of a mapping indicating a mapping(s) to a published set of standard variables such as the NIH Common Data Elements program.</p> <p>[Autopopulated, if not filled]</p> Examples: <pre>\"substance use\"\n</pre> <pre>\"chemical compound\"\n</pre> <pre>\"promis\"\n</pre> standardsMappings.url root  standardsMappings.urlType: stringFormat: uri <p>The url that links out to the published, standardized mapping.</p> <p>[Autopopulated, if not filled]</p> Example: <pre>\"https://cde.nlm.nih.gov/deView?tinyId=XyuSGdTTI\"\n</pre> standardsMappings.source root  standardsMappings.sourceType: string <p>The source of the standardized variable.</p> Example: <pre>\"TBD (will have controlled vocabulary)\"\n</pre> standardsMappings.id root  standardsMappings.idType: string <p>The id locating the individual mapping within the given source.</p> relatedConcepts.type root  relatedConcepts.typeType: string <p>The type of mapping to a published set of concepts related to the given field such as ontological information (eg., NCI thesaurus, bioportal etc)</p> <p>[Autopopulated, if not filled]</p> relatedConcepts.label root  relatedConcepts.labelType: string <p>A free text label of mapping to a published set of concepts related to the given field such as ontological information (eg., NCI thesaurus, bioportal etc)</p> <p>[Autopopulated, if not filled]</p> relatedConcepts.url root  relatedConcepts.urlType: stringFormat: uri <p>The url that links out to the published, standardized concept.</p> <p>[Autopopulated, if not filled]</p> Example: <pre>\"https://cde.nlm.nih.gov/deView?tinyId=XyuSGdTTI\"\n</pre> relatedConcepts.source root  relatedConcepts.sourceType: string <p>The source of the related concept.</p> <p>[Autopopulated, if not filled]</p> Example: <pre>\"TBD (will have controlled vocabulary)\"\n</pre> relatedConcepts.id root  relatedConcepts.idType: string <p>The id locating the individual mapping within the given source.</p> <p>[Autopopulated, if not filled]</p> univarStats.median root  univarStats.medianType: number univarStats.mean root  univarStats.meanType: number univarStats.std root  univarStats.stdType: number univarStats.min root  univarStats.minType: number univarStats.max root  univarStats.maxType: number univarStats.mode root  univarStats.modeType: number univarStats.count root  univarStats.countType: integer <p>Value must be greater or equal to <code>0</code></p> univarStats.twentyFifthPercentile root  univarStats.twentyFifthPercentileType: number univarStats.seventyFifthPercentile root  univarStats.seventyFifthPercentileType: number univarStats.categoricalMarginals.name root  univarStats.categoricalMarginals.nameType: string univarStats.categoricalMarginals.count root  univarStats.categoricalMarginals.countType: integer Additional Properties <p>Additional Properties of any type are allowed.</p> root  additionalPropertiesType: object <p>Generated using json-schema-for-humans on 2023-07-05 at 17:11:06 -0500</p>"},{"location":"vlmd/schemas/json-data-dictionary/","title":"JSON data dictionary","text":"Variable Level Metadata (Data Dictionaries) Variable Level Metadata (Data Dictionaries) Type: object <p>This schema defines the variable level metadata for one data dictionary for a given study.Note a given study can have multiple data dictionaries</p> title Required root  titleType: string description root  descriptionType: string data_dictionary Required root  data_dictionaryType: array of object Each item of this array must be: root  data_dictionary HEAL Variable Level Metadata FieldsType: object <p>Variable level metadata individual fields integrated into the variable level metadata object within the HEAL platform metadata service.</p> <p>Note, only <code>name</code> and <code>description</code> are required. Listed at the end of the description are suggested \"priority\" levels in brackets (e.g., []): 1. [Required]: Needs to be filled out to be valid. 2. [Highly recommended]: Greatly help using the data dictionary but not required. 3. [Optional, if applicable]: May only be applicable to certain fields. 4. [Autopopulated, if not filled]: These fields are intended to be autopopulated from other fields but can be filled out if desired. 5. [Experimental]: These fields are not currently used but are in development. module root  data_dictionary HEAL Variable Level Metadata Fields moduleType: string <p>The section, form, survey instrument, set of measures or other broad category used to group variables.</p> Examples: <pre>\"Demographics\"\n</pre> <pre>\"PROMIS\"\n</pre> <pre>\"Substance use\"\n</pre> <pre>\"Medical History\"\n</pre> <pre>\"Sleep questions\"\n</pre> <pre>\"Physical activity\"\n</pre> name Required root  data_dictionary HEAL Variable Level Metadata Fields nameType: string <p>The name of a variable (i.e., field) as it appears in the data. </p> <p>[Required]</p> title root  data_dictionary HEAL Variable Level Metadata Fields titleType: string <p>The human-readable title or label of the variable. </p> <p>[Highly recommended]</p> Example: <pre>\"My Variable (for name of my_variable)\"\n</pre> description Required root  data_dictionary HEAL Variable Level Metadata Fields descriptionType: string <p>An extended description of the variable. This could be the definition of a variable or the question text (e.g., if a survey). </p> <p>[Required]</p> Examples: <pre>\"Definition\"\n</pre> <pre>\"Question text (if a survey)\"\n</pre> type root  data_dictionary HEAL Variable Level Metadata Fields typeType: enum (of string) <p>A classification or category of a particular data element or property expected or allowed in the dataset.</p> <ul> <li><code>number</code> (A numeric value with optional decimal places. (e.g., 3.14))</li> <li><code>integer</code> (A whole number without decimal places. (e.g., 42))</li> <li><code>string</code> (A sequence of characters. (e.g., \\\"test\\\"))</li> <li><code>any</code> (Any type of data is allowed. (e.g., true))</li> <li><code>boolean</code> (A binary value representing true or false. (e.g., true))</li> <li><code>date</code> (A specific calendar date. (e.g., \\\"2023-05-25\\\"))</li> <li><code>datetime</code> (A specific date and time, including timezone information. (e.g., \\\"2023-05-25T10:30:00Z\\\"))</li> <li><code>time</code> (A specific time of day. (e.g., \\\"10:30:00\\\"))</li> <li><code>year</code> (A specific year. (e.g., 2023)</li> <li><code>yearmonth</code> (A specific year and month. (e.g., \\\"2023-05\\\"))</li> <li><code>duration</code> (A length of time. (e.g., \\\"PT1H\\\")</li> <li><code>geopoint</code> (A pair of latitude and longitude coordinates. (e.g., [51.5074, -0.1278]))</li> </ul> Must be one of: <ul><li>\"number\"</li><li>\"integer\"</li><li>\"string\"</li><li>\"any\"</li><li>\"boolean\"</li><li>\"date\"</li><li>\"datetime\"</li><li>\"time\"</li><li>\"year\"</li><li>\"yearmonth\"</li><li>\"duration\"</li><li>\"geopoint\"</li></ul> format root  data_dictionary HEAL Variable Level Metadata Fields format <p>A format taken from one of the frictionless specification schemas. For example, for tabular data, there is the Table Schema specification</p> <p>Each format is dependent on the <code>type</code> specified. For example: If <code>type</code> is \"string\", then see the String formats. If <code>type</code> is one of the date-like formats, then see Date formats.</p> Any of <ul><li> String Format </li><li> Date Format </li><li> Geopoint Format </li><li> geojson </li></ul> root  data_dictionary HEAL Variable Level Metadata Fields format anyOf String FormatType: enum (of string) Must be one of: <ul><li>\"uri\"</li><li>\"email\"</li><li>\"binary\"</li><li>\"uuid\"</li></ul> root  data_dictionary HEAL Variable Level Metadata Fields format anyOf Date FormatType: object <p>A format for a date variable (<code>date</code>,<code>time</code>,<code>datetime</code>). \\n\\t* default: An ISO8601 format string. \\n\\t* any: Any parsable representation of a date/time/datetime. The implementing library can attempt to parse the datetime via a range of strategies. \\n\\t* {PATTERN}: The value can be parsed according to <code>{PATTERN}</code>, which <code>MUST</code> follow the date formatting syntax of C / Python strftime.</p> <p>\\nExamples:</p> <p><code>%Y-%m-%d</code> (for date, e.g., 2023-05-25) <code>%Y%-%d</code> (for date, e.g., 20230525) for date without dashes\" <code>%Y-%m-%dT%H:%M:%S</code> (for datetime, e.g., 2023-05-25T10:30:45) <code>%Y-%m-%dT%H:%M:%SZ</code> (for datetime with UTC timezone, e.g., 2023-05-25T10:30:45Z) <code>%Y-%m-%dT%H:%M:%S%z</code> (for datetime with timezone offset, e.g., 2023-05-25T10:30:45+0300) <code>%Y-%m-%dT%H:%M</code> (for datetime without seconds, e.g., 2023-05-25T10:30) <code>%Y-%m-%dT%H</code> (for datetime without minutes and seconds, e.g., 2023-05-25T10) <code>%H:%M:%S</code> (for time, e.g., 10:30:45) <code>%H:%M:%SZ</code> (for time with UTC timezone, e.g., 10:30:45Z) <code>%H:%M:%S%z</code> (for time with timezone offset, e.g., 10:30:45+0300)</p> root  data_dictionary HEAL Variable Level Metadata Fields format anyOf Geopoint Format <p>The two types of formats for <code>geopoint</code> (describing a geographic point).</p> One of <ul><li> Option 1 </li><li> Option 2 </li></ul> root  data_dictionary HEAL Variable Level Metadata Fields format anyOf Geopoint Format oneOf item 0Type: array <p>A JSON array or a string parsable as a JSON array where each item is a number with the first as the latitude and the second as longitude. </p> root  data_dictionary HEAL Variable Level Metadata Fields format anyOf Geopoint Format oneOf item 1Type: object <p>Contains latitude and longitude with two keys (\"lat\" and \"long\") with number items mapped to each key.</p> root  data_dictionary HEAL Variable Level Metadata Fields format anyOf geojsonType: enum (of string) <p>The JSON object according to the geojson spec.</p> Must be one of: <ul><li>\"topojson\"</li><li>\"default\"</li></ul> constraints root  data_dictionary HEAL Variable Level Metadata Fields constraintsType: object maxLength root  data_dictionary HEAL Variable Level Metadata Fields constraints maxLengthType: integer <p>Indicates the maximum length of an iterable (e.g., array, string, or object). For example, if 'Hello World' is the longest value of a categorical variable, this would be a maxLength of 11.</p> <p>[Optional,if applicable]</p> enum root  data_dictionary HEAL Variable Level Metadata Fields constraints enumType: array <p>Constrains possible values to a set of values.</p> <p>[Optional,if applicable]</p> pattern root  data_dictionary HEAL Variable Level Metadata Fields constraints patternType: string <p>A regular expression pattern the data MUST conform to.</p> <p>[Optional,if applicable]</p> maximum root  data_dictionary HEAL Variable Level Metadata Fields constraints maximumType: integer <p>Specifies the maximum value of a field (e.g., maximum -- or most recent -- date, maximum integer etc). Note, this is different then maxLength property.</p> <p>[Optional,if applicable]</p> encodings root  data_dictionary HEAL Variable Level Metadata Fields encodingsType: object <p>Variable value encodings provide a way to further annotate any value within a any variable type, making values easier to understand. </p> <p>Many analytic software programs (e.g., SPSS,Stata, and SAS) use numerical encodings and some algorithms only support numerical values. Encodings (and mappings) allow categorical values to be stored as numerical values.</p> <p>Additionally, as another use case, this field provides a way to store categoricals that are stored as \"short\" labels (such as abbreviations).</p> <p>[Optional,if applicable]</p> Examples: <pre>{\n\"0\": \"No\",\n\"1\": \"Yes\"\n}\n</pre> <pre>{\n\"HW\": \"Hello world\",\n\"GBW\": \"Good bye world\",\n\"HM\": \"Hi, Mike\"\n}\n</pre> ordered root  data_dictionary HEAL Variable Level Metadata Fields orderedType: boolean <p>Indicates whether a categorical variable is ordered. This variable is relevant for variables that have an ordered relationship but not necessarily a numerical relationship (e.g., Strongly disagree &lt; Disagree &lt; Neutral &lt; Agree).</p> <p>[Optional,if applicable]</p> missingValues root  data_dictionary HEAL Variable Level Metadata Fields missingValuesType: array <p>A list of missing values specific to a variable.</p> <p>[Highly recommended]</p> trueValues root  data_dictionary HEAL Variable Level Metadata Fields trueValuesType: array of string <p>For boolean (true) variable (as defined in type field), this field allows a physical string representation to be cast as true (increasing readability of the field). It can include one or more values.</p> <p>[Optional, if applicable]</p> Each item of this array must be: root  data_dictionary HEAL Variable Level Metadata Fields trueValues trueValues itemsType: string Examples: <pre>\"Required\"\n</pre> <pre>\"REQUIRED\"\n</pre> <pre>\"required\"\n</pre> <pre>\"Yes\"\n</pre> <pre>\"Checked\\\"\"\n</pre> falseValues root  data_dictionary HEAL Variable Level Metadata Fields falseValuesType: array <p>For boolean (false) variable (as defined in type field), this field allows a physical string representation to be cast as false (increasing readability of the field) that is not a standard false value. It can include one or more values.</p> repo_link root  data_dictionary HEAL Variable Level Metadata Fields repo_linkType: string <p>A link to the variable as it exists on the home repository, if applicable</p> cde_id root  data_dictionary HEAL Variable Level Metadata Fields cde_idType: array of object <p>[FUTURE WARNING: WILL BE DEPRECATED] Use <code>standardsMapping</code>. The source and id for the NIH Common Data Elements program.</p> Each item of this array must be: root  data_dictionary HEAL Variable Level Metadata Fields cde_id cde_id itemsType: object source root  data_dictionary HEAL Variable Level Metadata Fields cde_id cde_id items sourceType: string id root  data_dictionary HEAL Variable Level Metadata Fields cde_id cde_id items idType: string ontology_id root  data_dictionary HEAL Variable Level Metadata Fields ontology_idType: array of object <p>[FUTURE WARNING: WILL BE DEPRECATED] - Use <code>relatedConcepts</code>. Ontological information for the given variable as indicated by the source, id, and relation to the specified classification. One or more ontology classifications can be specified. </p> Each item of this array must be: root  data_dictionary HEAL Variable Level Metadata Fields ontology_id ontology_id itemsType: object relation root  data_dictionary HEAL Variable Level Metadata Fields ontology_id ontology_id items relationType: string source root  data_dictionary HEAL Variable Level Metadata Fields ontology_id ontology_id items sourceType: string id root  data_dictionary HEAL Variable Level Metadata Fields ontology_id ontology_id items idType: string standardsMappings root  data_dictionary HEAL Variable Level Metadata Fields standardsMappingsType: array of object <p>A published set of standard variables such as the NIH Common Data Elements program. [Autopopulated, if not filled]</p> Each item of this array must be: root  data_dictionary HEAL Variable Level Metadata Fields standardsMappings standardsMappings itemsType: object type root  data_dictionary HEAL Variable Level Metadata Fields standardsMappings standardsMappings items typeType: string <p>The type of mapping linked to a published set of standard variables such as the NIH Common Data Elements program. [Autopopulated, if not filled]</p> Examples: <pre>\"cde\"\n</pre> <pre>\"ontology\"\n</pre> <pre>\"reference_list\"\n</pre> label root  data_dictionary HEAL Variable Level Metadata Fields standardsMappings standardsMappings items labelType: string <p>A free text label of a mapping indicating a mapping(s) to a published set of standard variables such as the NIH Common Data Elements program.</p> <p>[Autopopulated, if not filled]</p> Examples: <pre>\"substance use\"\n</pre> <pre>\"chemical compound\"\n</pre> <pre>\"promis\"\n</pre> url root  data_dictionary HEAL Variable Level Metadata Fields standardsMappings standardsMappings items urlType: stringFormat: uri <p>The url that links out to the published, standardized mapping.</p> <p>[Autopopulated, if not filled]</p> Example: <pre>\"https://cde.nlm.nih.gov/deView?tinyId=XyuSGdTTI\"\n</pre> source root  data_dictionary HEAL Variable Level Metadata Fields standardsMappings standardsMappings items sourceType: string <p>The source of the standardized variable.</p> Example: <pre>\"TBD (will have controlled vocabulary)\"\n</pre> id root  data_dictionary HEAL Variable Level Metadata Fields standardsMappings standardsMappings items idType: string <p>The id locating the individual mapping within the given source.</p> relatedConcepts root  data_dictionary HEAL Variable Level Metadata Fields relatedConceptsType: array of object <p>Mappings to a published set of concepts related to the given field such as ontological information (eg., NCI thesaurus, bioportal etc) [Autopopulated, if not filled]</p> Each item of this array must be: root  data_dictionary HEAL Variable Level Metadata Fields relatedConcepts relatedConcepts itemsType: object type root  data_dictionary HEAL Variable Level Metadata Fields relatedConcepts relatedConcepts items typeType: string <p>The type of mapping to a published set of concepts related to the given field such as ontological information (eg., NCI thesaurus, bioportal etc)</p> <p>[Autopopulated, if not filled]</p> label root  data_dictionary HEAL Variable Level Metadata Fields relatedConcepts relatedConcepts items labelType: string <p>A free text label of mapping to a published set of concepts related to the given field such as ontological information (eg., NCI thesaurus, bioportal etc)</p> <p>[Autopopulated, if not filled]</p> url root  data_dictionary HEAL Variable Level Metadata Fields relatedConcepts relatedConcepts items urlType: stringFormat: uri <p>The url that links out to the published, standardized concept.</p> <p>[Autopopulated, if not filled]</p> Example: <pre>\"https://cde.nlm.nih.gov/deView?tinyId=XyuSGdTTI\"\n</pre> source root  data_dictionary HEAL Variable Level Metadata Fields relatedConcepts relatedConcepts items sourceType: string <p>The source of the related concept.</p> <p>[Autopopulated, if not filled]</p> Example: <pre>\"TBD (will have controlled vocabulary)\"\n</pre> id root  data_dictionary HEAL Variable Level Metadata Fields relatedConcepts relatedConcepts items idType: string <p>The id locating the individual mapping within the given source.</p> <p>[Autopopulated, if not filled]</p> univarStats root  data_dictionary HEAL Variable Level Metadata Fields univarStatsType: object <p>Univariate statistics inferred from the data about the given variable </p> <p>[Experimental]</p> median root  data_dictionary HEAL Variable Level Metadata Fields univarStats medianType: number mean root  data_dictionary HEAL Variable Level Metadata Fields univarStats meanType: number std root  data_dictionary HEAL Variable Level Metadata Fields univarStats stdType: number min root  data_dictionary HEAL Variable Level Metadata Fields univarStats minType: number max root  data_dictionary HEAL Variable Level Metadata Fields univarStats maxType: number mode root  data_dictionary HEAL Variable Level Metadata Fields univarStats modeType: number count root  data_dictionary HEAL Variable Level Metadata Fields univarStats countType: integer <p>Value must be greater or equal to <code>0</code></p> twentyFifthPercentile root  data_dictionary HEAL Variable Level Metadata Fields univarStats twentyFifthPercentileType: number seventyFifthPercentile root  data_dictionary HEAL Variable Level Metadata Fields univarStats seventyFifthPercentileType: number categoricalMarginals root  data_dictionary HEAL Variable Level Metadata Fields univarStats categoricalMarginalsType: array of object Each item of this array must be: root  data_dictionary HEAL Variable Level Metadata Fields univarStats categoricalMarginals categoricalMarginals itemsType: object name root  data_dictionary HEAL Variable Level Metadata Fields univarStats categoricalMarginals categoricalMarginals items nameType: string count root  data_dictionary HEAL Variable Level Metadata Fields univarStats categoricalMarginals categoricalMarginals items countType: integer Additional Properties <p>Additional Properties of any type are allowed.</p> root  data_dictionary HEAL Variable Level Metadata Fields additionalPropertiesType: object <p>Generated using json-schema-for-humans on 2023-07-03 at 09:08:41 -0500</p>"},{"location":"vlmd/start/","title":"<code>Start</code> from a template","text":"<p>Some folks may prefer to create their HEAL data dictionary from scratch. To support this, we have created a utility that creates either a json or csv template. </p> <p>Warning</p> <p>Currently, the command is <code>template</code> but will change to <code>start</code> to be consistent with the verb subcommand vocabulary.</p>"},{"location":"vlmd/start/#csv-template","title":"<code>csv</code> template","text":"<p>The HEAL Data Utilities can also input a <code>csv</code> HEAL data dictionary either from a manually filled out template or  as an additional step after further annotation (e.g., from the <code>csv</code> HEAL data dictionary output of the other file formats).</p> <p>To create a template <code>csv</code> version with 10 fields (variables):</p> Command line interface (CLI)Python <pre><code>vlmd template myhealdd.csv --numfields 10\n</code></pre> <pre><code>from healdata_utils import write_vlmd_template\n\nwrite_vlmd_template(tmpdir.joinpath(\"heal.csv\"),numfields=10)\n</code></pre> <p>Click here to download an example of a filled out csv HEAL data dictionary template</p>"},{"location":"vlmd/start/#json-template","title":"<code>json</code> template","text":"<p>While the <code>csv</code> HEAL data dictionary provides a tabular format for HEAL-compliant data dictionaries, ultimately,  these csv data dictionary files are converted to a json file (the most common format to store and exchange data within web applications such as the HEAL Data Platform). </p> <p>Another advantage of <code>json</code> HEAL data dictionaries is that one can specify metadata describing the data dictionary as a whole (e.g., the <code>description</code> and <code>title</code>). </p> <p>To create a template <code>json</code> version with 10 fields (variables):</p> Command line interface (CLI)Python <pre><code>vlmd template myhealdd.json --numfields 10\n</code></pre> <pre><code>from healdata_utils import write_vlmd_template\n\nwrite_vlmd_template(tmpdir.joinpath(\"heal.json\"),numfields=10)\n</code></pre> <p>Click here to download an example of filled out json HEAL data dictionary template</p>"},{"location":"vlmd/validate/","title":"<code>Validate</code> Check (validate) an existing HEAL data dictionary file","text":"<p>Will indicate if the data dictionary complies with the HEAL specifications.</p> Command line interface (CLI)Python <pre><code>vlmd validate data/myhealcsvdd.csv\n\nvlmd validate data/myhealjsondd.json\n</code></pre> <pre><code>from healdata_utils import validate_vlmd_csv,validate_vlmd_json\n\nvalidate_vlmd_csv(\"data/myhealcsvdd.csv\")\n\nvalidate_vlmd_json(\"data/myhealjsondd.json\")\n</code></pre>"}]}
\ No newline at end of file
+{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"HEAL Data Utilities","text":"<p>The HEAL Data Utilities python package provides data packaging tools for the HEAL Data Ecosystem to facilitate data discovery, sharing, and harmonization on the HEAL Platform.</p> <p>Currently, the focus of this repository is generating standardized variable level metadata (VLMD) in the form of data dictionaries. See the quick start section to get started without installing any of the prerequisites. (Click here for the Variable-level Metadata documentation section).</p> <p>However, in the future, this will be expanded for all HEAL-specific data packaging functions (e.g., study- and file-level metadata and data).</p>"},{"location":"#quick-start","title":"Quick start","text":"<p>Note</p> <p>If using the quick start option, no prerequisites are required.</p> <p>Double click on the <code>vlmd</code> (or <code>vlmd.exe</code>) executable or run the <code>vlmd</code> executable without any arguments to quickly start using this tool. This \"quick start\" will take walk you through step by step by prompting you of the various options.</p> <p>Important</p> <p>Stand alone applications for different operating systems are available here. These allow you to run the <code>vlmd</code> tool without needing to install anything else. Just (1) download, (2) unzip, and (3) double click on the <code>vlmd</code> application icon.</p>"},{"location":"#prerequisites","title":"Prerequisites","text":""},{"location":"#python","title":"Python","text":"<p>While the HEAL Data Utilities should be compatible with most versions of Python, you can download the latest version of Python here and install it on your local computer. We recommend installing Python version 3.10.</p>"},{"location":"#installation","title":"Installation","text":"<p>To install the latest official release of healdata-utils, from your computer's command prompt, run:</p> <p><code>pip install healdata-utils</code></p> <p>OR for the most up-to-date unreleased version run: </p> <p><code>pip install git+https://github.com/norc-heal/healdata-utils.git</code></p> <p>Note</p> <p>Installing the unreleased version requires having <code>git</code> software installed.</p>"},{"location":"vlmd/","title":"Variable-level Metadata (Data Dictionaries)","text":""},{"location":"vlmd/#motivation","title":"Motivation","text":"<p>Variable level metadata (VLMD), in the form of standardized data dictionaries, provides an exciting opportunity:</p> <ul> <li>a way to search, understand, and compare datasets before (potentially sensitive) data is shared. </li> </ul> <p>For an example of this searchability in the context of study level metadata, see the platform's discovery page</p> <ul> <li> <p>When data is available, VLMD provides a way to validate the data as well.</p> </li> <li> <p>Supports HEAL projects and goals such as the common data elements program</p> </li> </ul>"},{"location":"vlmd/#functions","title":"Functions","text":"<p><code>extract</code>: Extract the variable level metadata from an existing file with a specific   type/format</p> <p><code>start</code>: Start a data dictionary from an empty template</p> <p><code>validate</code>: Check (validate) an existing HEAL data dictionary file to see if it follows the HEAL specifications after filling out a template or further annotation after extracting from a different format.</p> <p>Typical workflows for creating a HEAL-compliant data dictionary include:</p> <ol> <li> <p>Create your data dictionary</p> <p>(a) Run the <code>vlmd extract</code> command (or <code>convert_to_vlmd</code> if in python) to generate a HEAL-compliant data dictionary via your desired input format </p> <p>(b) Run the <code>vlmd template</code> command to start from an empty template.</p> </li> <li> <p>Add/annotate with additional information in your preferred HEAL data dictionary format (either <code>json</code> or <code>csv</code>).</p> <ul> <li>To further annotate and use the data dictionary, see the variable-level metadata field property information below:<ul> <li><code>csv</code> data dictionary</li> <li><code>json</code> data dictionary</li> </ul> </li> </ul> </li> <li> <p>Run the <code>vlmd validate</code> command  with your HEAL data dictioanry as the input to validate.</p> </li> <li> <p>Repeat (2) and (3) until you are ready to submit. Please note, currently only <code>name</code> and <code>description</code> are required.</p> </li> </ol>"},{"location":"vlmd/#definitions","title":"Definitions","text":"<p>Important</p> <p>The main difference* between the CSV and JSON definitions lies in the way the data dictionaries are structured and the additional metadata included in the JSON data dictionary.</p> <p>The CSV data dictionary is a plain tabular representation with no additional metadata, while the JSON dataset includes fields along with additional metadata in the form of a root description and title.</p> <ul> <li>for field-specific differences, see the schemas in the documentation. </li> </ul> <p>For more information on variable-level metadata properties (fields), see the <code>csv</code> field specification and <code>json</code> data dictionary specification. </p>"},{"location":"vlmd/extract/","title":"<code>Extract</code> VLMD from another data type and format","text":"<p>The healdata-utils variable-level metadata (vlmd) tool inputs a variety of different input file types and extracts HEAL-compliant data dictionaries (JSON and CSV formats). Additionally, exported validation (i.e., \"error\") reports provide the user information as to a) if the exported data dictionary is valid according to HEAL specifications and b) how to modify one's data dictionary to make it HEAL-compliant.</p> <p>Warning</p> <p>Currently the python subcommand is <code>convert</code> but will be changed to <code>extract_to_vlmd</code> to be consistent with CLI. <code>extract</code> was chosen to better reflect the functionality.</p> Command Line Interface (CLI)Python <pre><code>vlmd extract --inputtype spss myproject/myfile.sav\n</code></pre> <p>Note</p> <p>To continue, it's recommended to go to the input types and formats. Also, for more details on the different flags/options, run <code>vlmd --help</code></p> <pre><code>from healdata_utils import convert_to_vlmd\n\nconvert_to_vlmd(input_filepath=\"myproject/myfile.sav\",inputtype=\"spss\")\n</code></pre> <p>Note</p> <p>To continue, it's recommended to go to the input types and formats. For a complete set of options with <code>convert_to_vlmd</code> see the docstring (if in a notebook, one can enter <code>convert_to_vlmd?</code>)</p>"},{"location":"vlmd/extract/#input-types-and-formats","title":"Input Types and Formats","text":"<p>This section provides the specific syntax for running each of the supported types for generating HEAL-compliant data dictionaries are listed. Additional instructions on how to obtain the necessary input files/software are also provided. </p> <p>Note</p> <p>To further annotate your outputted data dictionaries, see the variable-level metadata field properties (with examples) for either the <code>csv data dictionary</code> click here or the <code>json data dictionary</code> click here.</p> <p>Extract variable level metadata from your data:</p> <ul> <li>CSV datasets</li> <li>SPSS datasets</li> <li>SAS datasets</li> <li>Stata datasets</li> <li>REDCap data dictionary</li> <li>Frictionless Table Schema</li> <li>Excel dataset</li> </ul>"},{"location":"vlmd/extract/#output","title":"Output","text":"<p>Both the python and command line routes will result in a JSON and CSV version of the HEAL data dictionary in the output folder along with the validation reports in the <code>errors</code> folder. See below:</p> <ul> <li><code>errors/heal-csv-errors.json</code>: outputted validation report for table in csv file against frictionless schema</li> </ul> <p>If valid, this file will contain: <pre><code>{\n\"valid\": true,\n\"errors\": []\n}\n</code></pre> - <code>errors/heal-json-errors.json</code>:  outputted jsonschema validation report.</p> <ul> <li>If valid, this file will contain: <pre><code>{\n\"valid\": true,\n\"errors\": []\n}\n</code></pre></li> </ul> <p>If no <code>outputdir</code> specified, the resulting HEAL-compliant data dictionaries will be named:</p> <ul> <li><code>heal-csvtemplate-data-dictionary.csv</code>: This is the CSV data dictionary</li> <li><code>heal-jsontemplate-data-dictionary.json</code>: This is the JSON version of the data dictionary</li> </ul>"},{"location":"vlmd/extract/csvdata/","title":"<code>csv</code> Datasets","text":"<p>CSV (comma-separated values) is the main open tabular data format for storage and exchange. It is easy to create and understand using basic text editors in addition to popular spreadsheet software like Google Sheets and Excel. Importantly, CSVs are simple and can be easily integrated into web applications and just about any software.</p> <p>Currently, the HEAL Data Utilities <code>vlmd</code> function can infer a minimal, HEAL-compliant dataset by inferring <code>name</code>, <code>type</code>, and <code>enum</code> (i.e., possible values). After this minimal data dictionary is generated, the researcher can further annotate it with fields' <code>description</code> and other optional properties in either the HEAL-compliant <code>csv</code>- or <code>json</code>-formatted data dictionary (see the HEAL data dictionary template sections below for more information).</p>"},{"location":"vlmd/extract/exceldata/","title":"Excel (xlsx) dataset","text":"<p>Excel workbooks contain tabular data tables across named worksheets.</p> <p>This vlmd extraction tool provides the ability to extract vlmd from all of these worksheets either as a combined data dictionary or as multiple data dictionaries.</p>"},{"location":"vlmd/extract/exceldata/#run-the-vlmd-command","title":"Run the <code>vlmd</code> command","text":"CLIPython <pre><code>vlmd extract --inputtype excel-data myexcelfile.xlsx\n</code></pre>"},{"location":"vlmd/extract/exceldata/#to-output-multiple-sheets-as-separate-data-dictionaries","title":"To output multiple sheets as separate data dictionaries","text":"<pre><code>from healdata_utils import convert_to_vlmd\n\nconvert_to_vlmd(input_filepath=\"myexcelfile.xlsx\",inputtype=\"excel-data\")\n</code></pre>"},{"location":"vlmd/extract/exceldata/#to-extract-multiple-sheets-as-one-data-dictionary","title":"To extract multiple sheets as one data dictionary","text":"<p>Note</p> <p>Be careful about using the <code>multiple_data_dicts=False</code>. In most instances, one sheet should correspond to one separate data table and thus have one corresponding data dictionary.  </p> <p>Note, this combines (ie concatenates all data tables) and then infers fields. This use case is when sheets are viewed as \"chunks\" of one resource/dataset. </p> <pre><code>from healdata_utils import convert_to_vlmd\n\nconvert_to_vlmd(\n    input_filepath=\"myexcelfile.xlsx\",\n    inputtype=\"excel-data\",\n    multiple_data_dicts=False\n    )\n</code></pre>"},{"location":"vlmd/extract/exceldata/#to-extract-a-subset-of-sheets-as-one-data-dictionary","title":"To extract a subset of sheets as one data dictionary","text":"<pre><code>from healdata_utils import convert_to_vlmd\n\nconvert_to_vlmd(\n    input_filepath=\"myexcelfile.xlsx\",\n    inputtype=\"excel-data\",\n    multiple_data_dicts=False,\n    sheet_name=[\"mysheet1\",\"mysheet2\"]\n    )\n</code></pre>"},{"location":"vlmd/extract/frictionlessschema/","title":"Frictionless Table Schema","text":"<p>While vlmd specifications are designed (and still being developed), to support interoperability with the heal platform, minor naming translations may be needed. This function supports any of said translations (eg., frictionless <code>fields</code> name --&gt; heal <code>data_dictionary</code>)</p> <p>Note, this conversion supports either <code>yaml</code> or <code>json</code> format (currently only tests for <code>json</code> format but should work with yaml). </p>"},{"location":"vlmd/extract/frictionlessschema/#creating-a-frictionless-table-schema","title":"Creating a frictionless table schema","text":"<p>Below are the official frictionless table schema specifications, which you will notice a high degree of overlap with the heal variable level metadata specifications.</p> <p>See here for the frictionless table schema specs</p>"},{"location":"vlmd/extract/frictionlessschema/#run-the-vlmd-command","title":"Run the <code>vlmd</code> command","text":"<pre><code>vlmd extract --inputtype frictionless data/frictionless_dataset1.frictionless.schema.json\n</code></pre>"},{"location":"vlmd/extract/redcapcsv/","title":"REDCap: Data Dictionary CSV Export","text":"<p>For users collecting data in a REDCap data management system, HEAL-compliant data dictionaries can be generated directly from REDCap exports. </p> <p>The REDCap data dictionary export serves the purpose of providing variable-level metadata in a standardized, tabular format and is generally easy to export. The HEAL data utilities leverages this user experience and standardized format to enable HEAL researchers to generate a Heal-compliant data dictionary. </p>"},{"location":"vlmd/extract/redcapcsv/#export-your-redcap-data-dictionary","title":"Export your Redcap data dictionary","text":"<p>To download a REDCap CSV export, do the following*: </p> <ol> <li>After logging in to your REDCap project page, locate the <code>Data dictionary</code> page. A link to this page may be available on the project side bar (see image below) or in the <code>Project Setup tab</code> at the top of your page.</li> </ol> <p></p> <ol> <li>After arriving at the <code>Data dictionary</code> page, click on <code>Download the current data dictionary</code> to export the dictionary (see below).</li> </ol> <p></p> <p>*there may be slight differences depending on your specific REDCap instance and version</p>"},{"location":"vlmd/extract/redcapcsv/#run-the-vlmd-command","title":"Run the <code>vlmd</code> command","text":"<pre><code>vlmd extract --inputtype redcap input/example_redcap_demo.redcap.csv </code></pre>"},{"location":"vlmd/extract/sas/","title":"SAS <code>sas7bdat</code> (and <code>sas7bcat</code>) files","text":"<p>To accommodate SAS users, the HEAL Data Utilities supports the binary <code>sas7bdat</code> file format, which contains the actual data values (observations/records). This file also includes variable metadata (variable <code>names</code> and variable labels/ <code>descriptions</code>).</p> <p>The HEAL Data Utilities also provides the option to include a catalog file \u2013 <code>sas7bcat</code> format - with the <code>sas7bdat</code>.  A <code>sas7bcat</code> file contains variable value labels, or <code>encodings</code>, that can be mapped onto the corresponding data from a <code>sas7bdat</code> file.</p>"},{"location":"vlmd/extract/sas/#creating-a-sas7bdat-and-a-sas7bcat-file","title":"Creating a <code>sas7bdat</code> and a <code>sas7bcat</code> file","text":"<p>Many SAS users build formats and labels into their data processing and analysis scripts. In this section, we provide syntax that can be easily copy-pasted into these existing workflows to create <code>sas7bdat</code> and <code>sas7bcat</code> files to input into the <code>vlmd</code> tool. </p> <p>This script template can be run separately or inserted directly at the end of a SAS user's workflow. </p> <p>Note</p> <p>If inserted directly, remember to delete the lines with <code>%INCLUDE</code>)</p> Template template.sas<pre><code>/*1. Read in data file without value labels and run full code. \n        Note: The most important pieces to run here are the PROC FORMAT statement(s) and any data steps \n        that assign formats and variable labels which are needed for the data dictionary. You may have defined variable labels and values in separate scripts for different analyses. In order to capture all your defined variable labels and values across scripts, you will need an %INCLUDE statement for each SAS script that defines unique variable labels or value labels.*/\n\n%INCLUDE \"&lt;INSERT SAS SCRIPT HERE FILE PATH HERE&gt;\"; /* THIS WILL RUN A SEPARATE SAS SCRIPT*/\n%INCLUDE \"&lt;INSERT SAS SCRIPT HERE FILE PATH HERE&gt;\"; /* THIS WILL RUN A SECOND SEPARATE SAS SCRIPT*/ \n\n/*2. Output the format catalog (sas7bcat) */\n/*2a. If you do not have an out directory, assign one to output the SAS catalog and data file. If you already have an out directory assigned, skip this step and replace \u201cout\u201d with your out directory libname in the flow*/\n\nlibname out \"&lt;INSERT THE DESIRED LOCATION (FILE PATH) TO YOUR SAS7BCAT AND SAS7BDAT FILES HERE&gt;\";\n\n/*2b. Output the format catalog.\n        Note: The format catalog is automatically stored in work.formats. This step copies the format file to the \n        out directory as a sas7bcat file.*/\nproc catalog cat=work.FORMATS;\n    copy out=out.FORMATS;\n    run;\n\n/*3. Output the data file (sas7bdat) */\ndata out.yourdata;\n    set &lt;INSERT THE NAME OF YOUR FINAL SAS DATASET HERE&gt;;\n    run;\n</code></pre> <p>The below SAS syntax is an example of how to use the template within your SAS workflow.</p> <p>The below sample script creates all of our variable and value labels. Your workflow may include multiple SAS scripts with multiple format statements and may include analyses and other PROC calls for data exploration,  but for demonstration purposes, this example only uses one script and focuses on defining the variable and value labels.</p> Example my_existing_sas_workflow.sas<pre><code>/*1. Read in input data */\nproc import datafile=\"myprojectfolder/input/mydata.csv\"\n    out=raw\n    dbms=csv replace;\n    getnames=yes;\nrun;\n\n/*2. Set up proc format and apply formats and variable labels in data step */\n/*Create encodings (value labels)*/\nproc format;\n    VALUE YESNO\n    0       =\"No\"\n    1       =\"Yes\"\n\n    VALUE PUBLIC\n    1='State mental health authority (SMHA)'\n    2='Other state government agency or department'\n    3='Regional/district authority or county, local, or municipal government'\n    4='Tribal government'\n    5='Indian Health Service'\n    6='Department of Veterans Affairs'\n    7='Other'\n\n    VALUE FOCUS\n    1='Mental health treatment'\n    2='Substance abuse treatment'\n    3='Mix of mental health and substance abuse treatment (neither is primary)'\n    4='General health care'\n    5='Other service focus';\n\n**Apply formats to dataset;\ndata processed;\n    set raw;\n\n    /*Assign formats*/\n    format YOUNGADULTS TREATPSYCHOTHRPY TREATTRAUMATHRPY YESNO. FOCUS FOCUS. PUBLIC PUBLIC.;\n    /*Add variable labels*/\n    label YOUNGADULTS=\"Accepts young adults (aged 18-25 years old) for Tx\"\n            TREATPSYCHOTHRPY=\"Facility offers individual psychotherapy\"\n            TREATTRAUMATHRPY=\"Facility offers trauma therapy\"\n            FOCUS=\"Primary treatment focus of facility\"\n            PUBLIC=\"Public agency or department that operates facility\";\nrun;\n</code></pre> <p>This second script called <code>my_output.sas</code> is the filled out template. Note the <code>%INCLUDE</code> function that calls <code>my_existing_sas_workflow.sas</code></p> my_output.sas<pre><code>/*1. Read in data file without value labels and run full code. \n        Note: The most important pieces to run here are the PROC FORMAT statement(s) and any data steps \n        that assign formats and variable labels which are needed for the data dictionary. You may have defined variable labels and values in separate scripts for different analyses. In order to capture all your defined variable labels and values across scripts, you will need an %INCLUDE statement for each SAS script that defines unique variable labels or value labels.*/*/\n\n%INCLUDE \"myprojectfolder/my_existing_workflow.sas\"; /* THIS WILL RUN A SEPARATE SAS SCRIPT*/\n\n/*2. Output the format catalog (sas7bcat) */\n/*2a. If you do not have an out directory, assign one to output the SAS catalog and data file.*/\nlibname out \"myprojectfolder/output\";\n\n/*2b. Output the format catalog.\n        Note: The format catalog is automatically stored in work.formats. This step copies the format file to the \n        out directory as a sas7bcat file.*/\nproc catalog cat=work.FORMATS;\n    copy out=out.FORMATS;\n    run;\n\n/*3. Output the data file (sas7bdat) to your output folder*/\ndata out.yourdata;\n    set processed;\n    run;\n</code></pre>"},{"location":"vlmd/extract/sas/#run-the-vlmd-command","title":"Run the <code>vlmd</code> command","text":"<p>After creating the necessary <code>sas7bdat</code> and <code>sas7bcat</code> files, you can then run the <code>vlmd</code> command. The tool, will automatically detect the sas7bcat file if located in the same directory as your data file. If not detected, the command will run without the sas7bcat catalog file and the <code>encodings</code> (i.e., value labels) will not be extracted from the catalog file.</p> <pre><code>vlmd extract --inputtype sas input/data.sas7bdat </code></pre>"},{"location":"vlmd/extract/spss/","title":"SPSS <code>.sav</code> files","text":"<p>For SPSS users, the HEAL Data Utilities generates HEAL-compliant data dictionaries from SPSS's default file format for storing datasets: a <code>SAV</code> file. It stores not only the data itself but also metadata such as variable names, variable labels, types, and value labels. The HEAL Data Utilities extracts these data and metadata to create HEAL-compliant data dictionaries.</p>"},{"location":"vlmd/extract/spss/#run-the-vlmd-command","title":"Run the <code>vlmd</code> command","text":"<pre><code>vlmd extract --inputtype spss data/example_pyreadstat_output.sav </code></pre>"},{"location":"vlmd/extract/stata/","title":"Stata <code>.dta</code> files","text":"<p>For Stata users, the HEAL Data Utilities generates HEAL-compliant data dictionaries through Stata's default file format: a <code>DTA</code> file. <code>DTA</code> files store not only the data itself but also metadata such as variable names, variable labels, types, and value labels.</p>"},{"location":"vlmd/extract/stata/#run-the-vlmd-command","title":"Run the <code>vlmd</code> command","text":"<pre><code>vlmd extract --inputtype stata data/mydatafile.dta </code></pre>"},{"location":"vlmd/schemas/","title":"HEAL data dictionary schemas","text":"<p>Click on each data dictionary schema below to view information about each format's data dictionary properties (such as a description, examples, etc).</p> <p>CSV fields</p> <p>JSON data dictionary</p> <p>Note</p> <p><code>enum</code> type means that a field can only be one of a certain set of possible values.</p>"},{"location":"vlmd/schemas/csv-fields/","title":"Tabular (CSV) data dictionary","text":"HEAL Variable Level Metadata Fields HEAL Variable Level Metadata Fields Type: object <p>Variable level metadata individual fields integrated into the variable level metadata object within the HEAL platform metadata service.</p> <p>Note, only <code>name</code> and <code>description</code> are required. Listed at the end of the description are suggested \"priority\" levels in brackets (e.g., []): 1. [Required]: Needs to be filled out to be valid. 2. [Highly recommended]: Greatly help using the data dictionary but not required. 3. [Optional, if applicable]: May only be applicable to certain fields. 4. [Autopopulated, if not filled]: These fields are intended to be autopopulated from other fields but can be filled out if desired. 5. [Experimental]: These fields are not currently used but are in development. module root  moduleType: string <p>The section, form, survey instrument, set of measures or other broad category used to group variables.</p> Examples: <pre>\"Demographics\"\n</pre> <pre>\"PROMIS\"\n</pre> <pre>\"Substance use\"\n</pre> <pre>\"Medical History\"\n</pre> <pre>\"Sleep questions\"\n</pre> <pre>\"Physical activity\"\n</pre> name Required root  nameType: string <p>The name of a variable (i.e., field) as it appears in the data. </p> <p>[Required]</p> title root  titleType: string <p>The human-readable title or label of the variable. </p> <p>[Highly recommended]</p> Example: <pre>\"My Variable (for name of my_variable)\"\n</pre> description Required root  descriptionType: string <p>An extended description of the variable. This could be the definition of a variable or the question text (e.g., if a survey). </p> <p>[Required]</p> Examples: <pre>\"Definition\"\n</pre> <pre>\"Question text (if a survey)\"\n</pre> type root  typeType: enum (of string) <p>A classification or category of a particular data element or property expected or allowed in the dataset.</p> <ul> <li><code>number</code> (A numeric value with optional decimal places. (e.g., 3.14))</li> <li><code>integer</code> (A whole number without decimal places. (e.g., 42))</li> <li><code>string</code> (A sequence of characters. (e.g., \\\"test\\\"))</li> <li><code>any</code> (Any type of data is allowed. (e.g., true))</li> <li><code>boolean</code> (A binary value representing true or false. (e.g., true))</li> <li><code>date</code> (A specific calendar date. (e.g., \\\"2023-05-25\\\"))</li> <li><code>datetime</code> (A specific date and time, including timezone information. (e.g., \\\"2023-05-25T10:30:00Z\\\"))</li> <li><code>time</code> (A specific time of day. (e.g., \\\"10:30:00\\\"))</li> <li><code>year</code> (A specific year. (e.g., 2023)</li> <li><code>yearmonth</code> (A specific year and month. (e.g., \\\"2023-05\\\"))</li> <li><code>duration</code> (A length of time. (e.g., \\\"PT1H\\\")</li> <li><code>geopoint</code> (A pair of latitude and longitude coordinates. (e.g., [51.5074, -0.1278]))</li> </ul> Must be one of: <ul><li>\"number\"</li><li>\"integer\"</li><li>\"string\"</li><li>\"any\"</li><li>\"boolean\"</li><li>\"date\"</li><li>\"datetime\"</li><li>\"time\"</li><li>\"year\"</li><li>\"yearmonth\"</li><li>\"duration\"</li><li>\"geopoint\"</li></ul> format root  format <p>A format taken from one of the frictionless specification schemas. For example, for tabular data, there is the Table Schema specification</p> <p>Each format is dependent on the <code>type</code> specified. For example: If <code>type</code> is \"string\", then see the String formats. If <code>type</code> is one of the date-like formats, then see Date formats.</p> Any of <ul><li> String Format </li><li> Date Format </li><li> Geopoint Format </li><li> geojson </li></ul> root  format anyOf String FormatType: enum (of string) Must be one of: <ul><li>\"uri\"</li><li>\"email\"</li><li>\"binary\"</li><li>\"uuid\"</li></ul> root  format anyOf Date FormatType: object <p>A format for a date variable (<code>date</code>,<code>time</code>,<code>datetime</code>). \\n\\t* default: An ISO8601 format string. \\n\\t* any: Any parsable representation of a date/time/datetime. The implementing library can attempt to parse the datetime via a range of strategies. \\n\\t* {PATTERN}: The value can be parsed according to <code>{PATTERN}</code>, which <code>MUST</code> follow the date formatting syntax of C / Python strftime.</p> <p>\\nExamples:</p> <p><code>%Y-%m-%d</code> (for date, e.g., 2023-05-25) <code>%Y%-%d</code> (for date, e.g., 20230525) for date without dashes\" <code>%Y-%m-%dT%H:%M:%S</code> (for datetime, e.g., 2023-05-25T10:30:45) <code>%Y-%m-%dT%H:%M:%SZ</code> (for datetime with UTC timezone, e.g., 2023-05-25T10:30:45Z) <code>%Y-%m-%dT%H:%M:%S%z</code> (for datetime with timezone offset, e.g., 2023-05-25T10:30:45+0300) <code>%Y-%m-%dT%H:%M</code> (for datetime without seconds, e.g., 2023-05-25T10:30) <code>%Y-%m-%dT%H</code> (for datetime without minutes and seconds, e.g., 2023-05-25T10) <code>%H:%M:%S</code> (for time, e.g., 10:30:45) <code>%H:%M:%SZ</code> (for time with UTC timezone, e.g., 10:30:45Z) <code>%H:%M:%S%z</code> (for time with timezone offset, e.g., 10:30:45+0300)</p> root  format anyOf Geopoint Format <p>The two types of formats for <code>geopoint</code> (describing a geographic point).</p> One of <ul><li> Option 1 </li><li> Option 2 </li></ul> root  format anyOf Geopoint Format oneOf item 0Type: array <p>A JSON array or a string parsable as a JSON array where each item is a number with the first as the latitude and the second as longitude. </p> root  format anyOf Geopoint Format oneOf item 1Type: object <p>Contains latitude and longitude with two keys (\"lat\" and \"long\") with number items mapped to each key.</p> root  format anyOf geojsonType: enum (of string) <p>The JSON object according to the geojson spec.</p> Must be one of: <ul><li>\"topojson\"</li><li>\"default\"</li></ul> constraints.maxLength root  constraints.maxLengthType: integer <p>Indicates the maximum length of an iterable (e.g., array, string, or object). For example, if 'Hello World' is the longest value of a categorical variable, this would be a maxLength of 11.</p> <p>[Optional,if applicable]</p> constraints.enum root  constraints.enumType: string <p>Constrains possible values to a set of values.</p> <p>[Optional,if applicable]</p> Must match regular expression: <code>^(?:[^|]+\\||[^|]*)(?:[^|]*\\|)*[^|]*$</code> constraints.pattern root  constraints.patternType: string <p>A regular expression pattern the data MUST conform to.</p> <p>[Optional,if applicable]</p> constraints.maximum root  constraints.maximumType: integer <p>Specifies the maximum value of a field (e.g., maximum -- or most recent -- date, maximum integer etc). Note, this is different then maxLength property.</p> <p>[Optional,if applicable]</p> encodings root  encodingsType: string <p>Variable value encodings provide a way to further annotate any value within a any variable type, making values easier to understand. </p> <p>Many analytic software programs (e.g., SPSS,Stata, and SAS) use numerical encodings and some algorithms only support numerical values. Encodings (and mappings) allow categorical values to be stored as numerical values.</p> <p>Additionally, as another use case, this field provides a way to store categoricals that are stored as \"short\" labels (such as abbreviations).</p> <p>[Optional,if applicable]</p> Must match regular expression: <code>^(?:.*?=.*?(?:\\||$))+$</code> Examples: <pre>\"0=No|1=Yes\"\n</pre> <pre>\"HW=Hello world|GBW=Good bye world|HM=Hi,Mike\"\n</pre> ordered root  orderedType: boolean <p>Indicates whether a categorical variable is ordered. This variable is relevant for variables that have an ordered relationship but not necessarily a numerical relationship (e.g., Strongly disagree &lt; Disagree &lt; Neutral &lt; Agree).</p> <p>[Optional,if applicable]</p> missingValues root  missingValuesType: string <p>A list of missing values specific to a variable.</p> <p>[Optional, if applicable]</p> Must match regular expression: <code>^(?:[^|]+\\||[^|]*)(?:[^|]*\\|)*[^|]*$</code> trueValues root  trueValuesType: string <p>For boolean (true) variable (as defined in type field), this field allows a physical string representation to be cast as true (increasing readability of the field). It can include one or more values.</p> <p>[Optional, if applicable]</p> Must match regular expression: <code>^(?:[^|]+\\||[^|]*)(?:[^|]*\\|)*[^|]*$</code> Examples: <pre>\"Required|REQUIRED\"\n</pre> <pre>\"required|Yes|Y|Checked\"\n</pre> <pre>\"Checked\"\n</pre> <pre>\"Required\"\n</pre> falseValues root  falseValuesType: string <p>For boolean (false) variable (as defined in type field), this field allows a physical string representation to be cast as false (increasing readability of the field) that is not a standard false value. It can include one or more values.</p> Must match regular expression: <code>^(?:[^|]+\\||[^|]*)(?:[^|]*\\|)*[^|]*$</code> repo_link root  repo_linkType: string <p>A link to the variable as it exists on the home repository, if applicable</p> cde_id.source root  cde_id.sourceType: string cde_id.id root  cde_id.idType: string ontology_id.relation root  ontology_id.relationType: string ontology_id.source root  ontology_id.sourceType: string ontology_id.id root  ontology_id.idType: string standardsMappings.type root  standardsMappings.typeType: string <p>The type of mapping linked to a published set of standard variables such as the NIH Common Data Elements program. [Autopopulated, if not filled]</p> Examples: <pre>\"cde\"\n</pre> <pre>\"ontology\"\n</pre> <pre>\"reference_list\"\n</pre> standardsMappings.label root  standardsMappings.labelType: string <p>A free text label of a mapping indicating a mapping(s) to a published set of standard variables such as the NIH Common Data Elements program.</p> <p>[Autopopulated, if not filled]</p> Examples: <pre>\"substance use\"\n</pre> <pre>\"chemical compound\"\n</pre> <pre>\"promis\"\n</pre> standardsMappings.url root  standardsMappings.urlType: stringFormat: uri <p>The url that links out to the published, standardized mapping.</p> <p>[Autopopulated, if not filled]</p> Example: <pre>\"https://cde.nlm.nih.gov/deView?tinyId=XyuSGdTTI\"\n</pre> standardsMappings.source root  standardsMappings.sourceType: string <p>The source of the standardized variable.</p> Example: <pre>\"TBD (will have controlled vocabulary)\"\n</pre> standardsMappings.id root  standardsMappings.idType: string <p>The id locating the individual mapping within the given source.</p> relatedConcepts.type root  relatedConcepts.typeType: string <p>The type of mapping to a published set of concepts related to the given field such as ontological information (eg., NCI thesaurus, bioportal etc)</p> <p>[Autopopulated, if not filled]</p> relatedConcepts.label root  relatedConcepts.labelType: string <p>A free text label of mapping to a published set of concepts related to the given field such as ontological information (eg., NCI thesaurus, bioportal etc)</p> <p>[Autopopulated, if not filled]</p> relatedConcepts.url root  relatedConcepts.urlType: stringFormat: uri <p>The url that links out to the published, standardized concept.</p> <p>[Autopopulated, if not filled]</p> Example: <pre>\"https://cde.nlm.nih.gov/deView?tinyId=XyuSGdTTI\"\n</pre> relatedConcepts.source root  relatedConcepts.sourceType: string <p>The source of the related concept.</p> <p>[Autopopulated, if not filled]</p> Example: <pre>\"TBD (will have controlled vocabulary)\"\n</pre> relatedConcepts.id root  relatedConcepts.idType: string <p>The id locating the individual mapping within the given source.</p> <p>[Autopopulated, if not filled]</p> univarStats.median root  univarStats.medianType: number univarStats.mean root  univarStats.meanType: number univarStats.std root  univarStats.stdType: number univarStats.min root  univarStats.minType: number univarStats.max root  univarStats.maxType: number univarStats.mode root  univarStats.modeType: number univarStats.count root  univarStats.countType: integer <p>Value must be greater or equal to <code>0</code></p> univarStats.twentyFifthPercentile root  univarStats.twentyFifthPercentileType: number univarStats.seventyFifthPercentile root  univarStats.seventyFifthPercentileType: number univarStats.categoricalMarginals.name root  univarStats.categoricalMarginals.nameType: string univarStats.categoricalMarginals.count root  univarStats.categoricalMarginals.countType: integer Additional Properties <p>Additional Properties of any type are allowed.</p> root  additionalPropertiesType: object <p>Generated using json-schema-for-humans on 2023-07-05 at 17:11:06 -0500</p>"},{"location":"vlmd/schemas/json-data-dictionary/","title":"JSON data dictionary","text":"Variable Level Metadata (Data Dictionaries) Variable Level Metadata (Data Dictionaries) Type: object <p>This schema defines the variable level metadata for one data dictionary for a given study.Note a given study can have multiple data dictionaries</p> title Required root  titleType: string description root  descriptionType: string data_dictionary Required root  data_dictionaryType: array of object Each item of this array must be: root  data_dictionary HEAL Variable Level Metadata FieldsType: object <p>Variable level metadata individual fields integrated into the variable level metadata object within the HEAL platform metadata service.</p> <p>Note, only <code>name</code> and <code>description</code> are required. Listed at the end of the description are suggested \"priority\" levels in brackets (e.g., []): 1. [Required]: Needs to be filled out to be valid. 2. [Highly recommended]: Greatly help using the data dictionary but not required. 3. [Optional, if applicable]: May only be applicable to certain fields. 4. [Autopopulated, if not filled]: These fields are intended to be autopopulated from other fields but can be filled out if desired. 5. [Experimental]: These fields are not currently used but are in development. module root  data_dictionary HEAL Variable Level Metadata Fields moduleType: string <p>The section, form, survey instrument, set of measures or other broad category used to group variables.</p> Examples: <pre>\"Demographics\"\n</pre> <pre>\"PROMIS\"\n</pre> <pre>\"Substance use\"\n</pre> <pre>\"Medical History\"\n</pre> <pre>\"Sleep questions\"\n</pre> <pre>\"Physical activity\"\n</pre> name Required root  data_dictionary HEAL Variable Level Metadata Fields nameType: string <p>The name of a variable (i.e., field) as it appears in the data. </p> <p>[Required]</p> title root  data_dictionary HEAL Variable Level Metadata Fields titleType: string <p>The human-readable title or label of the variable. </p> <p>[Highly recommended]</p> Example: <pre>\"My Variable (for name of my_variable)\"\n</pre> description Required root  data_dictionary HEAL Variable Level Metadata Fields descriptionType: string <p>An extended description of the variable. This could be the definition of a variable or the question text (e.g., if a survey). </p> <p>[Required]</p> Examples: <pre>\"Definition\"\n</pre> <pre>\"Question text (if a survey)\"\n</pre> type root  data_dictionary HEAL Variable Level Metadata Fields typeType: enum (of string) <p>A classification or category of a particular data element or property expected or allowed in the dataset.</p> <ul> <li><code>number</code> (A numeric value with optional decimal places. (e.g., 3.14))</li> <li><code>integer</code> (A whole number without decimal places. (e.g., 42))</li> <li><code>string</code> (A sequence of characters. (e.g., \\\"test\\\"))</li> <li><code>any</code> (Any type of data is allowed. (e.g., true))</li> <li><code>boolean</code> (A binary value representing true or false. (e.g., true))</li> <li><code>date</code> (A specific calendar date. (e.g., \\\"2023-05-25\\\"))</li> <li><code>datetime</code> (A specific date and time, including timezone information. (e.g., \\\"2023-05-25T10:30:00Z\\\"))</li> <li><code>time</code> (A specific time of day. (e.g., \\\"10:30:00\\\"))</li> <li><code>year</code> (A specific year. (e.g., 2023)</li> <li><code>yearmonth</code> (A specific year and month. (e.g., \\\"2023-05\\\"))</li> <li><code>duration</code> (A length of time. (e.g., \\\"PT1H\\\")</li> <li><code>geopoint</code> (A pair of latitude and longitude coordinates. (e.g., [51.5074, -0.1278]))</li> </ul> Must be one of: <ul><li>\"number\"</li><li>\"integer\"</li><li>\"string\"</li><li>\"any\"</li><li>\"boolean\"</li><li>\"date\"</li><li>\"datetime\"</li><li>\"time\"</li><li>\"year\"</li><li>\"yearmonth\"</li><li>\"duration\"</li><li>\"geopoint\"</li></ul> format root  data_dictionary HEAL Variable Level Metadata Fields format <p>A format taken from one of the frictionless specification schemas. For example, for tabular data, there is the Table Schema specification</p> <p>Each format is dependent on the <code>type</code> specified. For example: If <code>type</code> is \"string\", then see the String formats. If <code>type</code> is one of the date-like formats, then see Date formats.</p> Any of <ul><li> String Format </li><li> Date Format </li><li> Geopoint Format </li><li> geojson </li></ul> root  data_dictionary HEAL Variable Level Metadata Fields format anyOf String FormatType: enum (of string) Must be one of: <ul><li>\"uri\"</li><li>\"email\"</li><li>\"binary\"</li><li>\"uuid\"</li></ul> root  data_dictionary HEAL Variable Level Metadata Fields format anyOf Date FormatType: object <p>A format for a date variable (<code>date</code>,<code>time</code>,<code>datetime</code>). \\n\\t* default: An ISO8601 format string. \\n\\t* any: Any parsable representation of a date/time/datetime. The implementing library can attempt to parse the datetime via a range of strategies. \\n\\t* {PATTERN}: The value can be parsed according to <code>{PATTERN}</code>, which <code>MUST</code> follow the date formatting syntax of C / Python strftime.</p> <p>\\nExamples:</p> <p><code>%Y-%m-%d</code> (for date, e.g., 2023-05-25) <code>%Y%-%d</code> (for date, e.g., 20230525) for date without dashes\" <code>%Y-%m-%dT%H:%M:%S</code> (for datetime, e.g., 2023-05-25T10:30:45) <code>%Y-%m-%dT%H:%M:%SZ</code> (for datetime with UTC timezone, e.g., 2023-05-25T10:30:45Z) <code>%Y-%m-%dT%H:%M:%S%z</code> (for datetime with timezone offset, e.g., 2023-05-25T10:30:45+0300) <code>%Y-%m-%dT%H:%M</code> (for datetime without seconds, e.g., 2023-05-25T10:30) <code>%Y-%m-%dT%H</code> (for datetime without minutes and seconds, e.g., 2023-05-25T10) <code>%H:%M:%S</code> (for time, e.g., 10:30:45) <code>%H:%M:%SZ</code> (for time with UTC timezone, e.g., 10:30:45Z) <code>%H:%M:%S%z</code> (for time with timezone offset, e.g., 10:30:45+0300)</p> root  data_dictionary HEAL Variable Level Metadata Fields format anyOf Geopoint Format <p>The two types of formats for <code>geopoint</code> (describing a geographic point).</p> One of <ul><li> Option 1 </li><li> Option 2 </li></ul> root  data_dictionary HEAL Variable Level Metadata Fields format anyOf Geopoint Format oneOf item 0Type: array <p>A JSON array or a string parsable as a JSON array where each item is a number with the first as the latitude and the second as longitude. </p> root  data_dictionary HEAL Variable Level Metadata Fields format anyOf Geopoint Format oneOf item 1Type: object <p>Contains latitude and longitude with two keys (\"lat\" and \"long\") with number items mapped to each key.</p> root  data_dictionary HEAL Variable Level Metadata Fields format anyOf geojsonType: enum (of string) <p>The JSON object according to the geojson spec.</p> Must be one of: <ul><li>\"topojson\"</li><li>\"default\"</li></ul> constraints root  data_dictionary HEAL Variable Level Metadata Fields constraintsType: object maxLength root  data_dictionary HEAL Variable Level Metadata Fields constraints maxLengthType: integer <p>Indicates the maximum length of an iterable (e.g., array, string, or object). For example, if 'Hello World' is the longest value of a categorical variable, this would be a maxLength of 11.</p> <p>[Optional,if applicable]</p> enum root  data_dictionary HEAL Variable Level Metadata Fields constraints enumType: array <p>Constrains possible values to a set of values.</p> <p>[Optional,if applicable]</p> pattern root  data_dictionary HEAL Variable Level Metadata Fields constraints patternType: string <p>A regular expression pattern the data MUST conform to.</p> <p>[Optional,if applicable]</p> maximum root  data_dictionary HEAL Variable Level Metadata Fields constraints maximumType: integer <p>Specifies the maximum value of a field (e.g., maximum -- or most recent -- date, maximum integer etc). Note, this is different then maxLength property.</p> <p>[Optional,if applicable]</p> encodings root  data_dictionary HEAL Variable Level Metadata Fields encodingsType: object <p>Variable value encodings provide a way to further annotate any value within a any variable type, making values easier to understand. </p> <p>Many analytic software programs (e.g., SPSS,Stata, and SAS) use numerical encodings and some algorithms only support numerical values. Encodings (and mappings) allow categorical values to be stored as numerical values.</p> <p>Additionally, as another use case, this field provides a way to store categoricals that are stored as \"short\" labels (such as abbreviations).</p> <p>[Optional,if applicable]</p> Examples: <pre>{\n\"0\": \"No\",\n\"1\": \"Yes\"\n}\n</pre> <pre>{\n\"HW\": \"Hello world\",\n\"GBW\": \"Good bye world\",\n\"HM\": \"Hi, Mike\"\n}\n</pre> ordered root  data_dictionary HEAL Variable Level Metadata Fields orderedType: boolean <p>Indicates whether a categorical variable is ordered. This variable is relevant for variables that have an ordered relationship but not necessarily a numerical relationship (e.g., Strongly disagree &lt; Disagree &lt; Neutral &lt; Agree).</p> <p>[Optional,if applicable]</p> missingValues root  data_dictionary HEAL Variable Level Metadata Fields missingValuesType: array <p>A list of missing values specific to a variable.</p> <p>[Highly recommended]</p> trueValues root  data_dictionary HEAL Variable Level Metadata Fields trueValuesType: array of string <p>For boolean (true) variable (as defined in type field), this field allows a physical string representation to be cast as true (increasing readability of the field). It can include one or more values.</p> <p>[Optional, if applicable]</p> Each item of this array must be: root  data_dictionary HEAL Variable Level Metadata Fields trueValues trueValues itemsType: string Examples: <pre>\"Required\"\n</pre> <pre>\"REQUIRED\"\n</pre> <pre>\"required\"\n</pre> <pre>\"Yes\"\n</pre> <pre>\"Checked\\\"\"\n</pre> falseValues root  data_dictionary HEAL Variable Level Metadata Fields falseValuesType: array <p>For boolean (false) variable (as defined in type field), this field allows a physical string representation to be cast as false (increasing readability of the field) that is not a standard false value. It can include one or more values.</p> repo_link root  data_dictionary HEAL Variable Level Metadata Fields repo_linkType: string <p>A link to the variable as it exists on the home repository, if applicable</p> cde_id root  data_dictionary HEAL Variable Level Metadata Fields cde_idType: array of object <p>[FUTURE WARNING: WILL BE DEPRECATED] Use <code>standardsMapping</code>. The source and id for the NIH Common Data Elements program.</p> Each item of this array must be: root  data_dictionary HEAL Variable Level Metadata Fields cde_id cde_id itemsType: object source root  data_dictionary HEAL Variable Level Metadata Fields cde_id cde_id items sourceType: string id root  data_dictionary HEAL Variable Level Metadata Fields cde_id cde_id items idType: string ontology_id root  data_dictionary HEAL Variable Level Metadata Fields ontology_idType: array of object <p>[FUTURE WARNING: WILL BE DEPRECATED] - Use <code>relatedConcepts</code>. Ontological information for the given variable as indicated by the source, id, and relation to the specified classification. One or more ontology classifications can be specified. </p> Each item of this array must be: root  data_dictionary HEAL Variable Level Metadata Fields ontology_id ontology_id itemsType: object relation root  data_dictionary HEAL Variable Level Metadata Fields ontology_id ontology_id items relationType: string source root  data_dictionary HEAL Variable Level Metadata Fields ontology_id ontology_id items sourceType: string id root  data_dictionary HEAL Variable Level Metadata Fields ontology_id ontology_id items idType: string standardsMappings root  data_dictionary HEAL Variable Level Metadata Fields standardsMappingsType: array of object <p>A published set of standard variables such as the NIH Common Data Elements program. [Autopopulated, if not filled]</p> Each item of this array must be: root  data_dictionary HEAL Variable Level Metadata Fields standardsMappings standardsMappings itemsType: object type root  data_dictionary HEAL Variable Level Metadata Fields standardsMappings standardsMappings items typeType: string <p>The type of mapping linked to a published set of standard variables such as the NIH Common Data Elements program. [Autopopulated, if not filled]</p> Examples: <pre>\"cde\"\n</pre> <pre>\"ontology\"\n</pre> <pre>\"reference_list\"\n</pre> label root  data_dictionary HEAL Variable Level Metadata Fields standardsMappings standardsMappings items labelType: string <p>A free text label of a mapping indicating a mapping(s) to a published set of standard variables such as the NIH Common Data Elements program.</p> <p>[Autopopulated, if not filled]</p> Examples: <pre>\"substance use\"\n</pre> <pre>\"chemical compound\"\n</pre> <pre>\"promis\"\n</pre> url root  data_dictionary HEAL Variable Level Metadata Fields standardsMappings standardsMappings items urlType: stringFormat: uri <p>The url that links out to the published, standardized mapping.</p> <p>[Autopopulated, if not filled]</p> Example: <pre>\"https://cde.nlm.nih.gov/deView?tinyId=XyuSGdTTI\"\n</pre> source root  data_dictionary HEAL Variable Level Metadata Fields standardsMappings standardsMappings items sourceType: string <p>The source of the standardized variable.</p> Example: <pre>\"TBD (will have controlled vocabulary)\"\n</pre> id root  data_dictionary HEAL Variable Level Metadata Fields standardsMappings standardsMappings items idType: string <p>The id locating the individual mapping within the given source.</p> relatedConcepts root  data_dictionary HEAL Variable Level Metadata Fields relatedConceptsType: array of object <p>Mappings to a published set of concepts related to the given field such as ontological information (eg., NCI thesaurus, bioportal etc) [Autopopulated, if not filled]</p> Each item of this array must be: root  data_dictionary HEAL Variable Level Metadata Fields relatedConcepts relatedConcepts itemsType: object type root  data_dictionary HEAL Variable Level Metadata Fields relatedConcepts relatedConcepts items typeType: string <p>The type of mapping to a published set of concepts related to the given field such as ontological information (eg., NCI thesaurus, bioportal etc)</p> <p>[Autopopulated, if not filled]</p> label root  data_dictionary HEAL Variable Level Metadata Fields relatedConcepts relatedConcepts items labelType: string <p>A free text label of mapping to a published set of concepts related to the given field such as ontological information (eg., NCI thesaurus, bioportal etc)</p> <p>[Autopopulated, if not filled]</p> url root  data_dictionary HEAL Variable Level Metadata Fields relatedConcepts relatedConcepts items urlType: stringFormat: uri <p>The url that links out to the published, standardized concept.</p> <p>[Autopopulated, if not filled]</p> Example: <pre>\"https://cde.nlm.nih.gov/deView?tinyId=XyuSGdTTI\"\n</pre> source root  data_dictionary HEAL Variable Level Metadata Fields relatedConcepts relatedConcepts items sourceType: string <p>The source of the related concept.</p> <p>[Autopopulated, if not filled]</p> Example: <pre>\"TBD (will have controlled vocabulary)\"\n</pre> id root  data_dictionary HEAL Variable Level Metadata Fields relatedConcepts relatedConcepts items idType: string <p>The id locating the individual mapping within the given source.</p> <p>[Autopopulated, if not filled]</p> univarStats root  data_dictionary HEAL Variable Level Metadata Fields univarStatsType: object <p>Univariate statistics inferred from the data about the given variable </p> <p>[Experimental]</p> median root  data_dictionary HEAL Variable Level Metadata Fields univarStats medianType: number mean root  data_dictionary HEAL Variable Level Metadata Fields univarStats meanType: number std root  data_dictionary HEAL Variable Level Metadata Fields univarStats stdType: number min root  data_dictionary HEAL Variable Level Metadata Fields univarStats minType: number max root  data_dictionary HEAL Variable Level Metadata Fields univarStats maxType: number mode root  data_dictionary HEAL Variable Level Metadata Fields univarStats modeType: number count root  data_dictionary HEAL Variable Level Metadata Fields univarStats countType: integer <p>Value must be greater or equal to <code>0</code></p> twentyFifthPercentile root  data_dictionary HEAL Variable Level Metadata Fields univarStats twentyFifthPercentileType: number seventyFifthPercentile root  data_dictionary HEAL Variable Level Metadata Fields univarStats seventyFifthPercentileType: number categoricalMarginals root  data_dictionary HEAL Variable Level Metadata Fields univarStats categoricalMarginalsType: array of object Each item of this array must be: root  data_dictionary HEAL Variable Level Metadata Fields univarStats categoricalMarginals categoricalMarginals itemsType: object name root  data_dictionary HEAL Variable Level Metadata Fields univarStats categoricalMarginals categoricalMarginals items nameType: string count root  data_dictionary HEAL Variable Level Metadata Fields univarStats categoricalMarginals categoricalMarginals items countType: integer Additional Properties <p>Additional Properties of any type are allowed.</p> root  data_dictionary HEAL Variable Level Metadata Fields additionalPropertiesType: object <p>Generated using json-schema-for-humans on 2023-07-03 at 09:08:41 -0500</p>"},{"location":"vlmd/start/","title":"<code>Start</code> from a template","text":"<p>Some folks may prefer to create their HEAL data dictionary from scratch. To support this, we have created a utility that creates either a json or csv template. </p> <p>Warning</p> <p>Currently, the command is <code>template</code> but will change to <code>start</code> to be consistent with the verb subcommand vocabulary.</p>"},{"location":"vlmd/start/#csv-template","title":"<code>csv</code> template","text":"<p>The HEAL Data Utilities can also input a <code>csv</code> HEAL data dictionary either from a manually filled out template or  as an additional step after further annotation (e.g., from the <code>csv</code> HEAL data dictionary output of the other file formats).</p> <p>To create a template <code>csv</code> version with 10 fields (variables):</p> Command line interface (CLI)Python <pre><code>vlmd template myhealdd.csv --numfields 10\n</code></pre> <pre><code>from healdata_utils import write_vlmd_template\n\nwrite_vlmd_template(tmpdir.joinpath(\"heal.csv\"),numfields=10)\n</code></pre> <p>Click here to download an example of a filled out csv HEAL data dictionary template</p>"},{"location":"vlmd/start/#json-template","title":"<code>json</code> template","text":"<p>While the <code>csv</code> HEAL data dictionary provides a tabular format for HEAL-compliant data dictionaries, ultimately,  these csv data dictionary files are converted to a json file (the most common format to store and exchange data within web applications such as the HEAL Data Platform). </p> <p>Another advantage of <code>json</code> HEAL data dictionaries is that one can specify metadata describing the data dictionary as a whole (e.g., the <code>description</code> and <code>title</code>). </p> <p>To create a template <code>json</code> version with 10 fields (variables):</p> Command line interface (CLI)Python <pre><code>vlmd template myhealdd.json --numfields 10\n</code></pre> <pre><code>from healdata_utils import write_vlmd_template\n\nwrite_vlmd_template(tmpdir.joinpath(\"heal.json\"),numfields=10)\n</code></pre> <p>Click here to download an example of filled out json HEAL data dictionary template</p>"},{"location":"vlmd/validate/","title":"<code>Validate</code> Check (validate) an existing HEAL data dictionary file","text":"<p>Will indicate if the data dictionary complies with the HEAL specifications.</p> Command line interface (CLI)Python <pre><code>vlmd validate data/myhealcsvdd.csv\n\nvlmd validate data/myhealjsondd.json\n</code></pre> <pre><code>from healdata_utils import validate_vlmd_csv,validate_vlmd_json\n\nvalidate_vlmd_csv(\"data/myhealcsvdd.csv\")\n\nvalidate_vlmd_json(\"data/myhealjsondd.json\")\n</code></pre>"}]}
\ No newline at end of file
diff --git a/sitemap.xml.gz b/sitemap.xml.gz
index 77def968778ef4a35e2ef438d57c1ec98c1226a5..71e91679102e7b2a40153c6612fedf0fa02319bc 100644
GIT binary patch
delta 15
Wcmcb>bb*OYzMF%iY2ijTFGc_-as-M1

delta 15
Wcmcb>bb*OYzMF%?YW_wxFGc_*xCB4|

diff --git a/vlmd/extract/exceldata/index.html b/vlmd/extract/exceldata/index.html
index 412a258..d92922d 100644
--- a/vlmd/extract/exceldata/index.html
+++ b/vlmd/extract/exceldata/index.html
@@ -942,20 +942,21 @@ <h3 id="to-extract-multiple-sheets-as-one-data-dictionary">To extract multiple s
 <div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">healdata_utils</span> <span class="kn">import</span> <span class="n">convert_to_vlmd</span>
 
 <span class="n">convert_to_vlmd</span><span class="p">(</span>
-    <span class="n">filepath</span><span class="o">=</span><span class="s2">&quot;myexcelfile.xlsx&quot;</span><span class="p">,</span>
+    <span class="n">input_filepath</span><span class="o">=</span><span class="s2">&quot;myexcelfile.xlsx&quot;</span><span class="p">,</span>
     <span class="n">inputtype</span><span class="o">=</span><span class="s2">&quot;excel-data&quot;</span><span class="p">,</span>
     <span class="n">multiple_data_dicts</span><span class="o">=</span><span class="kc">False</span>
     <span class="p">)</span>
 </code></pre></div>
 <h3 id="to-extract-a-subset-of-sheets-as-one-data-dictionary">To extract a subset of sheets as one data dictionary<a class="headerlink" href="#to-extract-a-subset-of-sheets-as-one-data-dictionary" title="Permanent link">&para;</a></h3>
-<p>```python</p>
-<p>from healdata_utils import convert_to_vlmd</p>
-<p>convert_to_vlmd(
-    filepath="myexcelfile.xlsx",
-    inputtype="excel-data",
-    multiple_data_dicts=False,
-    sheet_name=["mysheet1","mysheet2"]
-    )</p>
+<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">healdata_utils</span> <span class="kn">import</span> <span class="n">convert_to_vlmd</span>
+
+<span class="n">convert_to_vlmd</span><span class="p">(</span>
+    <span class="n">input_filepath</span><span class="o">=</span><span class="s2">&quot;myexcelfile.xlsx&quot;</span><span class="p">,</span>
+    <span class="n">inputtype</span><span class="o">=</span><span class="s2">&quot;excel-data&quot;</span><span class="p">,</span>
+    <span class="n">multiple_data_dicts</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span>
+    <span class="n">sheet_name</span><span class="o">=</span><span class="p">[</span><span class="s2">&quot;mysheet1&quot;</span><span class="p">,</span><span class="s2">&quot;mysheet2&quot;</span><span class="p">]</span>
+    <span class="p">)</span>
+</code></pre></div>
 </div>
 </div>
 </div>