diff --git a/docs/tsdf_dbpd_channels_and_units.md b/docs/tsdf_dbpd_channels_and_units.md new file mode 100644 index 0000000..6c06a34 --- /dev/null +++ b/docs/tsdf_dbpd_channels_and_units.md @@ -0,0 +1,155 @@ +# Channels and units in Digital Biomarkers for Parkinson's Disease (`DBPD`) schemas + +Within the `DBPD` project, some of the field types are further specialised to provide a better description of the data. These are described in the following sections. + +## Field: `channels` +**Type:** `channel_type[]` +**Description:** Describes the content of the data written. `channel_type` is specific to the `Digital biomarkers for PD` extension. + +--- + +**General types** + +| `channel_type` name | Recommended `unit` | Description +|--------------------------|--------------------|------------------------------------------------------------------------------------| +| `time` | `time_relative_ms` | Time corresponding to each datapoint (also see units below) | +| `acceleration_x` | `m/s^2` | Acceleration along the x-axis. | +| `acceleration_y` | `m/s^2` | Acceleration along the y-axis. | +| `acceleration_z` | `m/s^2` | Acceleration along the z-axis. | +| `rotation_x` | `deg/s` | Angular rotation rate around the x-axis. | +| `rotation_y` | `deg/s` | Angular rotation rate around the y-axis. | +| `rotation_z` | `deg/s` | Angular rotation rate around the z-axis. | + + +--- + +
+ +PPG-related types + +| `channel_type` name | Recommended `unit` | Description | +|----------------------------|-----------------------|---------------------------------------------------------------------------------------| +| `ppg_quality_post_prob` | `probability` | `[TODO]` Posterior probability that the corresponding PPG signal is of high quality (0 to 1). | + +
+ +--- + +
+ +Tremor-related types + +| `channel_type` name | Recommended `unit` | Description | +|--------------------------|--------------------|-------------------------------------------------------------------------------------| +| `gyro_tremor_prob` | `probability` | Probability values (0 to 1) indicating the likelihood of tremor activity for each sample. | +| `gyro_tremor_hat` | `boolean_num` | Estimated values representing the presence or absence of tremor activity for each sample. | +| `gyro_arm_actv_prob` | `probability` | Probability values (0 to 1) indicating the likelihood of arm activity for each sample. | +| `gyro_arm_actv_hat` | `boolean_num` | Estimated values representing the presence or absence of arm activity for each sample. | +| `GyMeanDx` | `unitless` | Mean gyro derivative in the x axis. | +| `GyMeanDy` | `unitless` | Mean gyro derivative in the y axis. | +| `GyMeanDz` | `unitless` | Mean gyro derivative in the z axis. | +| `GyLTreDomPowerX` | `unitless` | Gyro Low tremor (range [3.5-8 Hz]) dominant power in the x axis. | +| `GyLTreDomPowerY` | `unitless` | Gyro Low tremor (range [3.5-8 Hz]) dominant power in the y axis. | +| `GyLTreDomPowerZ` | `unitless` | Gyro Low tremor (range [3.5-8 Hz]) dominant power in the z axis. | +| `GyGaitBandPower` | `unitless` | Gyro gait bandpower (range [0.4 – 2] Hz) – PSD: sum of the axes. | +| `GyGaitBandpowerRatio` | `unitless` | Gyro gait bandpower sum / total bandpower sum up to 15 Hz – PSD: sum of the axes. | +| `GyGaitFreqPeak` | `unitless` | Frequency peak of the in the gyro gait range – PSD: sum of the axes. | +| `GyGaitFixedDomPower` | `unitless` | `[TODO]` Gyro dominant power in a fixed range (specific frequency range not provided). | +| `GyGaitFixedDomPowerRatio` | `unitless` | `[TODO]` Ratio of dominant power in the gyro gait range to total power. | +| `GyGaitDomPower` | `unitless` | `[TODO]` Dominant power in the gyro gait range. | +| `GyGaitDomPowerRatio` | `unitless` | `[TODO]` Ratio of dominant power in the gyro gait range to total power. | +| `GyGaitPeakFreqWidth` | `unitless` | `[TODO]` Width of the frequency peak in the gyro gait range. | +| `GyLTreBandPower` | `unitless` | `[TODO]` Low tremor bandpower (specific frequency range not provided). | +| `GyLTreBandpower` | `unitless` | `[TODO]` Low tremor bandpower (specific frequency range not provided). | +| `GyLTreFreqPeak` | `unitless` | `[TODO]` Frequency peak in the low tremor range. | +| `GyLTreFixedDomP` | `unitless` | `[TODO]` Low tremor dominant power in a fixed range (specific frequency range not provided). | +| `GyLTreFixedDomP` | `unitless` | `[TODO]` Low tremor dominant power in a fixed range (specific frequency range not provided). | +| `GyLTreDomPower` | `unitless` | `[TODO]` Low tremor dominant power (specific frequency range not provided). | +| `GyLTreDomPowerR` | `unitless` | `[TODO]` Ratio of low tremor dominant power to total power. | +| `GyLTrePeakFreqW` | `unitless` | `[TODO]` Width of the frequency peak in the low tremor range. | +| `GyHTreBandPower` | `unitless` | `[TODO]` High tremor bandpower (specific frequency range not provided). | +| `GyHTreBandpower` | `unitless` | `[TODO]` High tremor bandpower (specific frequency range not provided). | +| `GyHTreFreqPeak` | `unitless` | `[TODO]` Frequency peak in the high tremor range. | +| `GyHTreFixedDomP` | `unitless` | `[TODO]` High tremor dominant power in a fixed range (specific frequency range not provided). | +| `GyHTreFixedDomP` | `unitless` | `[TODO]` High tremor dominant power in a fixed range (specific frequency range not provided). | +| `GyHTreDomPower` | `unitless` | `[TODO]` High tremor dominant power (specific frequency range not provided). | +| `GyHTreDomPowerR` | `unitless` | `[TODO]` Ratio of high tremor dominant power to total power. | +| `GyHTrePeakFreqW` | `unitless` | `[TODO]` Width of the frequency peak in the high tremor range. | +| `GyMFCC1` | `unitless` | `[TODO]` Mel-frequency cepstral coefficient 1. | +| `GyMFCC2` | `unitless` | `[TODO]` Mel-frequency cepstral coefficient 2. | +| `GyMFCC3` | `unitless` | `[TODO]` Mel-frequency cepstral coefficient 3. | +| `GyMFCC4` | `unitless` | `[TODO]` Mel-frequency cepstral coefficient 4. | +| `GyMFCC5` | `unitless` | `[TODO]` Mel-frequency cepstral coefficient 5. | +| `GyMFCC6` | `unitless` | `[TODO]` Mel-frequency cepstral coefficient 6. | +| `GyMFCC7` | `unitless` | `[TODO]` Mel-frequency cepstral coefficient 7. | +| `GyMFCC8` | `unitless` | `[TODO]` Mel-frequency cepstral coefficient 8. | +| `GyMFCC9` | `unitless` | `[TODO]` Mel-frequency cepstral coefficient 9. | + + +
+ +--- + +
+ +Gait-related types + +| `channel_type` name | Recommended `unit` | Description | +|--------------------------|--------------------|-------------------------------------------------------------------------------------| +| `std_accel_norm` | `m/s^2` | Standard deviation of the norm of the accelerometer axes in the temporal domain. | +| `x_accel_grav_mean` | `m/s^2` | Mean of the x-axis acceleration gravity component. | +| `y_accel_grav_mean` | `m/s^2` | Mean of the y-axis acceleration gravity component. | +| `z_accel_grav_mean` | `m/s^2` | Mean of the z-axis acceleration gravity component. | +| `x_accel_grav_std` | `m/s^2` | Standard deviation of the x-axis acceleration gravity component. | +| `y_accel_grav_std` | `m/s^2` | Standard deviation of the y-axis acceleration gravity component. | +| `z_accel_grav_std` | `m/s^2` | Standard deviation of the z-axis acceleration gravity component.. | +| `x_accel_power_below_gait` | `(m/s^2)^2/Hz` | Total power in the [0, 0.7] Hz range of the x-axis accelerometer. | +| `y_accel_power_below_gait` | `(m/s^2)^2/Hz` | Total power in the [0, 0.7] Hz range of the y-axis accelerometer. | +| `z_accel_power_below_gait` | `(m/s^2)^2/Hz` | Total power in the [0, 0.7] Hz range of the z-axis accelerometer. | +| `x_accel_power_gait` | `(m/s^2)^2/Hz` | Total power in the [0.7, 3.5] Hz range of the x-axis accelerometer. | +| `y_accel_power_gait` | `(m/s^2)^2/Hz` | Total power in the [0.7, 3.5] Hz range of the y-axis accelerometer. | +| `z_accel_power_gait` | `(m/s^2)^2/Hz` | Total power in the [0.7, 3.5] Hz range of the z-axis accelerometer. | +| `x_accel_power_tremor` | `(m/s^2)^2/Hz` | Total power in the [3.5, 8] Hz range of the x-axis accelerometer. | +| `y_accel_power_tremor` | `(m/s^2)^2/Hz` | Total power in the [3.5, 8] Hz range of the y-axis accelerometer. | +| `z_accel_power_tremor` | `(m/s^2)^2/Hz` | Total power in the [3.5, 8] Hz range of the z-axis accelerometer. | +| `x_accel_power_above_tremor` | `(m/s^2)^2/Hz` | Total power in the [8, 50] Hz range of the x-axis accelerometer. | +| `y_accel_power_above_tremor` | `(m/s^2)^2/Hz` | Total power in the [8, 50] Hz range of the y-axis accelerometer. | +| `z_accel_power_above_tremor` | `(m/s^2)^2/Hz` | Total power in the [8, 50] Hz range of the z-axis accelerometer. | +| `x_accel_dominant_frequency` | `Hz` | Dominant frequency of the x-axis accelerometer. | +| `y_accel_dominant_frequency` | `Hz` | Dominant frequency of the x-axis accelerometer. | +| `z_accel_dominant_frequency` | `Hz` | Dominant frequency of the x-axis accelerometer. | +| `accel_norm_cc_{n}` | `?` | Cepstral coefficient n with n $\in$ [1,2,...,16] of the accelerometer. | +| `gd_pred_gait_proba` | `probability` | Predicted probability of gait being the predominant activity within the window span. | +| `gyro_norm_cc_{n}` | `?` | Cepstral coefficient n with n $\in$ [1,2,...,16] of the gyroscope. | +| `x_gyro_dominant_frequency` | `Hz` | Dominant frequency of the x-axis gyroscope | +| `y_gyro_dominant_frequency` | `Hz` | Dominant frequency of the x-axis gyroscope | +| `z_gyro_dominant_frequency` | `Hz` | Dominant frequency of the x-axis gyroscope | +| `angle_mean_amplitude` | `deg` | Mean of the sum of consecutive minima and maxima angles (angle amplitude is often referred to as range of motion) | +| `angle_std_amplitude` | `deg` | Std of the sum of consecutive minima and maxima angles | +| `angle_sum_amplitude` | `deg` | Sum of the sum of consecutive minima and maxima angles | +| `ange_perc_95_amplitude` | `deg` | 95th percentile of the sum of consecutive minima and maxima angles | +| `forward_peak_ang_vel_mean` | `deg/s` | Angular velocity mean in forward direction of the first principal component | +| `forward_peak_ang_vel_std` | `deg/s` | Angular velocity standard deviation in forward direction of the first principal component | +| `backward_peak_ang_vel_mean` | `deg/s` | Angular velocity mean in backward direction of the first principal component | +| `backward_peak_ang_vel_std` | `deg/s` | Angular velocity standard deviation in backward direction of the first principal component | +| `angle_perc_power` | `percentage` | Percentage of total power in the arm swing frequency band [0.3 - 3 Hz] | +
+ +--- + +## Field: `units` + +**Type:** `unit_type[]` + +**Description:** Describes the format of the data written. `unit_type` is specific to the `Digital biomarkers for PD` extension. + +| `unit_type` | Description | +|-----------------|-----------------------------------------------------------------------------------------------------| +| `time_relative_ms` | Time in milliseconds, relative to the `start_iso8601`. | +| `time_absolute_unix_s` | Absolute time in seconds, relative to unix epoch. | +| `time_absolute_unix_ms` | [TODO] Absolute time in milliseconds, relative to unix epoch. | +| `probability` | Probability values (0 to 1) indicating the likelihood of tremor activity for each sample. | +| `boolean_num` | `[TODO]` Integer values (0 or 1) representing the true (1) or false (0) presence of an activity. | +| `unitless` | Numerical values without units. | +| `m/s^2` | Acceleration in meters per second squared. | +| `deg/s` | Angular velocity in degrees per second. | diff --git a/docs/tsdf_dbpd_schemas.md b/docs/tsdf_dbpd_schemas.md new file mode 100644 index 0000000..f2f7209 --- /dev/null +++ b/docs/tsdf_dbpd_schemas.md @@ -0,0 +1,66 @@ + +# TSDF fields in Digital Biomarkers for Parkinson's Disease (`DBPD`) schemas + +## Mandatory fields + +This is a preliminary list of mandatory fields (to be shaped into schemas) that are used in the `DBPD` project. The list will be updated based on the upcoming discussions. + +| Field | Type | Description | +|----------------------------|--------------|-----------------------------------------------------------------------------------| +| `window_size_sec` | `float` | Size of the window (in seconds) used in the analysis. | +| `step_size_sec` | `float` | Duration in seconds for each segment in the written data. | +| `freq_sampling` | `int` | Sampling frequency (in Hz) of the input data. | +| `channels` | [channel_type](tsdf_field_types.md)`[]` | Description of the content of the data written. `channel_type` is specific to the `Digital biomarkers for PD` extension. | +| `units` | [unit_type](tsdf_field_types.md)`[]` | Description of the format of the data written. `unit_type` is specific to the `Digital biomarkers for PD` extension. | + + + +## **Tremor** pipeline specific fields + +Non-mandatory fields used in the tremor pipeline. + +| Field | Type | Description | +|----------------------------|--------------|------------------------------------------------------------------------------| +| `mfcc_num_filters` | `int` | Number of filters used for estimating the mel-frequency cepstral coefficients. | +| `mfcc_num_mel_coeff` | `int` | Number of coefficients used for estimating the mel-frequency cepstral coefficients. | +| `mfcc_max_freq_filter` | `float` | Maximum frequency (in Hz) used for filtering in mel-frequency cepstral coefficients. | +| `mfcc_window_size` | `float` | Size of the sub-window in seconds used to estimate the spectrogram used in the evaluation of the mel-frequency cepstral coefficients. | +| `excluded_hours` | `int[]` | `[TODO]` List of the excluded hours from the analysis (vector scaling?) | +| `sum_features_gyro_scale` | `float[]` | `[TODO]` Scaling factors for the sum of tremor-related features (from gyro) | +| `sum_squared_features_gyro_scale` | `float[]` | `[TODO]` Scaling factors for the sum of squared tremor-related features (from gyro) | +| `n_features_gyro_scale` | `int` | `[TODO]` Scaling factor for the number of gyro features | + +## **PPG** pipeline specific fields + +Non-mandatory fields used in the PPG pipeline. + +| Field | Type | Description | +|----------------------------|----------------------|------------------------------------------------------------------------------| +| `segment_number` | `int` | Order number of the analyzed data segment. | +| `freq_sampling_original` | `int` | Sampling frequency (in Hz) of the original data (before adjustments for the analysis). | + +## **Gait** pipeline specific fields + +Non-mandatory fields used in the gait pipeline. We currently do not have information about the fields used in the gait pipeline, but we will update this section as soon as we have more information. + +| Field | Type | Description | +|----------------------------|----------------------|------------------------------------------------------------------------------| +| `side_watch` | `string` | `[TODO]` Possible values: ['left', 'right']. | + +## Additional generic fields + +These fields are non-mandatory, and provide standardised vocabulary for describing the data. + + +| Field | Type | Description | +|------------------------------|--------------|------------------------------------------------------| +| `week_number` | `int` | Denotes the specific study week number used for tracking or comparing data. | +| `columns` | `int` | Number of columns in the data matrix. | +| `interpolated` | `bool` | Indicates whether interpolation was performed on the data. | +| `high_pass_filter_applied` | `bool` | Indicates whether a high-pass filter was applied to remove low-frequency noise. | +| `high_pass_filter_cutoff` | `float` | Cutoff frequency (in Hz) for the high-pass filter, in case it was applied. | +| `z_score_normalised` | `bool` | Indicates whether z-score normalization was applied to the data. | +| `start_datetime_unix_ms` | `string` | UNIX timestamp for the start of the recording (milliseconds). Equivalent to `start_iso8601` in UNIX format. | +| `end_datetime_unix_ms` | `string` | UNIX timestamp for the end of the recording (milliseconds). Equivalent to `end_iso8601` in UNIX format. | + + diff --git a/docs/tsdf_fields_table.md b/docs/tsdf_fields_table.md index f8d9849..7958af9 100644 --- a/docs/tsdf_fields_table.md +++ b/docs/tsdf_fields_table.md @@ -1,8 +1,8 @@ -# TSDF fields +# TSDF schema - metadata fields -TSDF metadata is represented as a dictionary. In this section, we will comprehensively list the mandatory and optional fields within the TSDF format. +TSDF metadata is represented as a dictionary (or a JSON object). In this section, we will comprehensively list the mandatory and optional fields within the TSDF format. -## TSDF mandatory fields +## TSDF v0.1 mandatory fields | Field | Type | Description | |------------------|--------------|-----------------------------------------------------------------------------| @@ -15,49 +15,16 @@ TSDF metadata is represented as a dictionary. In this section, we will comprehen | `start_iso8601` | `str` | [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) time stamp for the start of the recording, with ms precision. | | `end_iso8601` | `str` | Same as `start_iso8601`, but for the end of the recording. | | `file_name` | `str` | The name of the file in consideration, e.g., "eeee.bin". | -| `channels` | `str[]` | Labels for each data channel (_e.g.:_ `["time", "X", "Y", "Z"]` for 3D accelerometry). | +| `channels` | `str[]` | Labels for each data channel (_e.g.:_ `[time]` for time data or `[X, Y, Z]` for 3D accelerometry). | | `time_encode` | `str` | Encoding type for time, e.g., "difference". | | `units` | `str[]` | Units for each channel in the data, e.g., "ms" for milliseconds. | -| `data_type` | `str` | Number format of the measured data (_e.g.:_ `"float"`). | +| `data_type` | `str` | Number format of the measured data (_e.g.:_ `float`). | | `bits` | `int` | Bit-length of the number format (e.g., 32-bit). | -| `columns` | `int` | Number of columns in the data matrix. | | `rows` | `int` | Number of rows in the data matrix. | -## TSDF domain-specific fields - - -| Field | Type | Description | -|------------------------------|--------------|------------------------------------------------------| -| `start_datetime_unix_ms` | `string` | UNIX timestamp for the start of the recording (milliseconds). Equivalent to `start_iso8601` in UNIX format. | -| `end_datetime_unix_ms` | `string` | UNIX timestamp for the end of the recording (milliseconds). Equivalent to `end_iso8601` in UNIX format. | -| `scale_factors` | `float[]` | Scale factors applied to each data channel to adjust their values. | -| `week_number` | `int` | Denotes the specific week for tracking or comparing weekly data. | -| `freq_sampling_original` | `int` | Represents the original sampling frequency at which the data was recorded. | -| `freq_sampling_adjusted` | `int` | The adjusted sampling frequency optimized for data processing or analysis. | -| `interpolate` | `bool` | If set to true, missing or irregular data points will be estimated and filled. | -| `gravity_removal` | `bool` | When true, the gravity component is removed to isolate user motion. | -| `apply_high_pass_filter` | `bool` | Indicates if a high-pass filter should be applied to remove low-frequency noise. | -| `normalize_acceleration` | `bool` | If true, accelerometer data is normalized using z-score normalization. | -| `motion_intensity_thresholds` | `int[]` | List of percentage thresholds for categorizing motion intensity. | -| `accelerometer_burst_thresholds` | `float[]` | Threshold values for detecting 'burst' or sudden motion in the accelerometer. | -| `gyroscope_burst_thresholds` | `float[]` | Threshold values for detecting 'burst' or sudden motion in the gyroscope. | -| `active_burst_threshold_percentile` | `int` | Chosen percentile index for active burst threshold from the list of percentile values. | -| `average_acceleration_across_weeks` | `float` | Average accelerometer reading across multiple weeks. | -| `acceleration_stddev_across_weeks` | `float` | Standard deviation of accelerometer readings across weeks. | -| `average_gyroscope_across_weeks` | `float` | Average gyroscope reading across weeks. | -| `gyroscope_stddev_across_weeks` | `float` | Standard deviation of gyroscope readings across weeks. | -| `window_size_sec` | `int` | Duration in seconds for each data window used in segmented analysis. | -| `num_ECDE_coeff` | `int` | Number of ECDE coefficients considered in the analysis. | -| `num_filters` | `int` | Number of filters applied to refine or isolate specific frequency bands. | -| `num_ME1_coeff` | `int` | Number of ME1 coefficients considered during data processing. | -| `max_frequency_filter` | `int` | Highest frequency limit for any applied filter. | - - - - -## Legacy fields +# Legacy fields The following table lists the legacy fields from the time when the format was called TSDB, along with their updated counterparts: diff --git a/mkdocs.yml b/mkdocs.yml index cbe76fa..8b9784a 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -15,6 +15,8 @@ nav: - The format: - TSDF fields: tsdf_fields_table.md + - DBPD schemas: tsdf_dbpd_schemas.md + - DBPD channels and units: tsdf_dbpd_channels_and_units.md - Usage: - Basic reading and writing: basic_reading_and_writing.ipynb @@ -24,3 +26,6 @@ nav: - About: - Contact: contact.md + +markdown_extensions: + - extra diff --git a/src/tsdf/parse_metadata.py b/src/tsdf/parse_metadata.py index a312801..dc080d8 100644 --- a/src/tsdf/parse_metadata.py +++ b/src/tsdf/parse_metadata.py @@ -19,6 +19,8 @@ def read_data(data: Any, source_path: str) -> Dict[str, 'tsdfmetadata.TSDFMetada :param source_path: path to the metadata file. :return: list of TSDFMetadata objects. + + :raises tsdf_metadata.TSDFMetadataFieldValueError: if the TSDF metadata file is missing a mandatory field. """ # Check if the version is supported