An esoteric library
Thank you for listening
HealthKit Export Version: 13
The only metadata we collect from the export.xml is the export date
# Sample Metadata (AppleHealthKit.metadata)
{
'export_date': '2024-08-11 21:34:30 -0700'
}
Here you can find a full list of AppleHealthKit characteristics
# Sample characteristics (AppleHealthKit.characteristics)
{
'DateOfBirth': '2000-01-01',
'BiologicalSex': 'HKBiologicalSexMale',
'BloodType': 'HKBloodTypeNotSet',
'FitzpatrickSkinType': 'HKFitzpatrickSkinTypeNotSet',
'CardioFitnessMedicationsUse': 'None'
}
There are two different types of quantities found in AppleHealthKit records
Here is a sample list of quantities found while processing the export.xml
quantities = [
'Height',
'BodyMass',
'HeartRate',
'RespiratoryRate',
'StepCount',
'DistanceWalkingRunning',
'BasalEnergyBurned',
'ActiveEnergyBurned',
'FlightsClimbed',
'AppleExerciseTime',
'WaistCircumference',
'RestingHeartRate',
'VO2Max',
'WalkingHeartRateAverage',
'EnvironmentalAudioExposure',
'HeadphoneAudioExposure',
'WalkingDoubleSupportPercentage',
'SixMinuteWalkTestDistance',
'AppleStandTime',
'WalkingSpeed',
'WalkingStepLength',
'WalkingAsymmetryPercentage',
'StairAscentSpeed',
'StairDescentSpeed',
'SleepDurationGoal',
'AppleWalkingSteadiness',
'SleepAnalysis',
'AppleStandHour',
'HeartRateVariabilitySDNN'
]
Each quantity comes from a record in the xml
<Record type="HKQuantityTypeIdentifierHeartRate" sourceName="Mikian’s Apple Watch" sourceVersion="7.6.2" device="<<HKDevice: 0x282d03020>, name:Apple Watch, manufacturer:Apple Inc., model:Watch, hardware:Watch5,4, software:7.6.2>" unit="count/min" creationDate="2021-12-11 18:04:48 -0700" startDate="2021-12-11 18:04:47 -0700" endDate="2021-12-11 18:04:47 -0700" value="92">
<MetadataEntry key="HKMetadataKeyHeartRateMotionContext" value="2"/>
</Record>
In this case we store the record attributes in a dataframe and ignore any MetadataEntry inside the record. Note that device isn't always present in every record type.
quantities['HeartRate'].columns
'type',
'sourceName',
'sourceVersion',
'device',
'unit',
'creationDate',
'startDate',
'endDate',
'value'
There are many different types of workouts defined with HKWorkoutActivityType
Here is a sample list of workouts found while processing the export.xml
workout_types = [
'CardioDance',
'Walking',
'Swimming',
'Cycling',
'Running',
'Basketball',
'FunctionalStrengthTraining',
'Pickleball',
'Other',
'StairClimbing'
]
Each workout has the following attributes as defined by the DTD
workoutActivityType
duration
durationUnit
totalDistance
totalDistanceUnit
totalEnergyBurned
totalEnergyBurnedUnit
sourceName
sourceVersion
device
creationDate
startDate
endDate
Here is a abbreviated sample workout xml
<Workout workoutActivityType="HKWorkoutActivityTypeWalking" duration="123.4971656322479" durationUnit="min" sourceName="Oddish’s Apple Watch" sourceVersion="7.3.3" device="<<HKDevice: 0x282df23f0>, name:Apple Watch, manufacturer:Apple Inc., model:Watch, hardware:Watch5,4, software:7.3.3>" creationDate="2021-06-18 07:36:53 -0700" startDate="2021-06-18 05:33:19 -0700" endDate="2021-06-18 07:36:49 -0700">
<MetadataEntry key="HKIndoorWorkout" value="0"/>
<MetadataEntry key="HKAverageMETs" value="4.85252 kcal/hr·kg"/>
<MetadataEntry key="HKWeatherTemperature" value="91.4 degF"/>
<MetadataEntry key="HKWeatherHumidity" value="1500 %"/>
<MetadataEntry key="HKTimeZone" value="America/Los_Angeles"/>
<MetadataEntry key="HKElevationAscended" value="5348 cm"/>
<WorkoutEvent type="HKWorkoutEventTypeSegment" date="2021-06-18 05:33:19 -0700" duration="12.20846013824145" durationUnit="min"/>
<WorkoutEvent type="HKWorkoutEventTypeSegment" date="2021-06-18 05:33:19 -0700" duration="21.29455310106277" durationUnit="min"/>
<WorkoutEvent type="HKWorkoutEventTypeSegment" date="2021-06-18 05:45:32 -0700" duration="13.6915263513724" durationUnit="min"/>
<WorkoutEvent type="HKWorkoutEventTypeSegment" date="2021-06-18 07:20:40 -0700" duration="11.98405149181684" durationUnit="min"/>
<WorkoutEvent type="HKWorkoutEventTypeSegment" date="2021-06-18 07:28:41 -0700" duration="8.060033742586771" durationUnit="min"/>
<WorkoutStatistics type="HKQuantityTypeIdentifierActiveEnergyBurned" startDate="2021-06-18 05:33:19 -0700" endDate="2021-06-18 07:36:49 -0700" sum="920.929" unit="Cal"/>
<WorkoutStatistics type="HKQuantityTypeIdentifierDistanceWalkingRunning" startDate="2021-06-18 05:33:19 -0700" endDate="2021-06-18 07:36:49 -0700" sum="6.45289" unit="mi"/>
<WorkoutStatistics type="HKQuantityTypeIdentifierBasalEnergyBurned" startDate="2021-06-18 05:33:19 -0700" endDate="2021-06-18 07:36:49 -0700" sum="338.297" unit="Cal"/>
<WorkoutRoute sourceName="Mikian’s Apple Watch" sourceVersion="7.3.3" creationDate="2021-06-18 07:36:58 -0700" startDate="2021-06-18 05:33:19 -0700" endDate="2021-06-18 07:36:48 -0700">
<MetadataEntry key="HKMetadataKeySyncVersion" value="2"/>
<MetadataEntry key="HKMetadataKeySyncIdentifier" value="D1344CCA-5D84-49CE-99E7-8393A36BA4FE"/>
<FileReference path="/workout-routes/route_2021-06-18_7.36am.gpx"/>
</WorkoutRoute>
<MetadataEntry key="HKIndoorWorkout" value="0"/>
<MetadataEntry key="HKAverageMETs" value="4.85252 kcal/hr·kg"/>
<MetadataEntry key="HKWeatherTemperature" value="91.4 degF"/>
<MetadataEntry key="HKWeatherHumidity" value="1500 %"/>
<MetadataEntry key="HKTimeZone" value="America/Los_Angeles"/>
<MetadataEntry key="HKElevationAscended" value="5348 cm"/>
</Workout>
Note that in apple health kit export version 13 we see new HKQuantityTypeIdentifiers With min/max/avg fields and not just sum
These are some of the most common values for Workout MetadataEntry
HKIndoorWorkout
HKAverageMETs
HKWeatherTemperature
HKWeatherHumidity
HKTimeZone
HKElevationAscended
WorkoutRoute will point to the gpx file with the route taken for a outdoor walk/run in the workout-routes folder
Workout Statistics provide info for calories burned and distance walked/ran
HKQuantityTypeIdentifierActiveEnergyBurned
HKQuantityTypeIdentifierDistanceWalkingRunning
HKQuantityTypeIdentifierBasalEnergyBurned
WorkoutEvents are not needed for my usecase so I have chosen to not ingest them. They note info about segments, laps, pausing, markers, ect
Apple HealthKit data ingestion complete in 28.92 seconds (1335.04 MB)
[2024-09-12] We now try to cast values in the quantities table to numeric, this saves a massive amount of memory Apple HealthKit data ingestion complete in 51.77 seconds (996.98 MB)
The export.xml file has some Document Type Definition text at the beginning of the file. This makes the ET.fromstring() parse fail. We need to remove these lines from the string before parsing. There are several ways to do this.
One option is to just skip the first 213 lines
with open(apple_health_export_xml_file, 'r') as f:
for _ in range(213):
next(f)
xml_string = f.read()
Another option is to use a regex to find the start and end of the DTD section and remove that section.
Credit to @eotles for this much better implementation
with open(apple_health_export_xml_file, 'r') as f:
xml_string = f.read()
start_strip = re.search('<!DOCTYPE', xml_string).span()[0]
end_strip = re.search(']>', xml_string).span()[1]
xml_string = xml_string[:start_strip] + xml_string[end_strip:]
Other developers have found examples of vertical tabs (\x0b) in the xml string that prevent parsing. This may only be an issue in earlier versions of Apple Health Kit export. I have not seen any in HealthKit Export Version: 12. Regardless, we filter it out.
xml_string.replace("\x0b", "")
export_cda follows Clinical Document Architecture formatting. I believe it contains a strict subset of information that is in export.xml. It doesnt look like it contains any of the workout/apple domain information.
This is probably a better source for creating tables like heart rate and body weight and the like. Other exports are probably likely to follow this structure.
I don't have a usecase for why I should develop an import function for this now instead of just using the export.xml so I will stick with just using export.xml for now
When getting the MetadataEntry for humidity, temperature,etcfor a workout. The query also picks up metadata inside of the WorkoutRoute xml object. I don't care about storing/parsing these but they come along for the ride until its fixed.
<MetadataEntry key="HKMetadataKeySyncVersion" value="2"/>
<MetadataEntry key="HKMetadataKeySyncIdentifier" value="D1344CCA-5D84-49CE-99E7-8393A36BA4FE"/>
There are some records like heart rate variability where individual recordings are included in the XML object. I don't need the individual records, so we do not record them.
I do not need ActivitySummary data so there is no function parsing it
<ActivitySummary dateComponents="2021-06-15" activeEnergyBurned="0" activeEnergyBurnedGoal="0" activeEnergyBurnedUnit="Cal" appleMoveTime="0" appleMoveTimeGoal="0" appleExerciseTime="0" appleExerciseTimeGoal="30" appleStandHours="0" appleStandHoursGoal="12"/>
Some data was defined in the DTD that was not found in the xml
- Correlation
- ClinicalRecord
- Audiogram
- SensitivityPoint
- VisionPrescription
- RightEye
- LeftEye