You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the dataset's documentation, it is mentioned that the data contains information for 40,000 patients.
However, when computing the number of unique values in the subject_id column I receive a much greater number.
For example, I would like to know the number of patients who had a blood glucose test:
glucose_df = lebevents[lebevents['itemid'] == 50931] # 50931 is the itemid of the glucose test
print(glucose_df['subject_id'].nunique())
The output is 247,005 which is much greater than 40,000.
Thanks
The text was updated successfully, but these errors were encountered:
Hi! Correct me if I'm wrong, but I think there are 2 modules: host and icu
Hosp contains information about all patients admitted to the hospital, and icu is a subset of hosp (patients that were admitted to icu, so all icu patients have a hadm_id for hospital admission and a stay_id for icu admission).
Unique number of subject ids in hosp is 180733, and icu has 50920 unique subject ids. The documentation says "over 40000 ICU patients", so this checks out :)
The lab events table includes patients that weren't necessarily admitted to the hospital/icu, and there are 255876 unique subject ids in the lab events table.
If you only need glucose values for ICU patients, I'd recommend filtering by subject ids from the icu.icustays table:
SELECT DISTINCT lab.*
FROM mimiciv_hosp.labevents as lab, mimiciv_icu.icustays as icu
WHERE lab.subject_id = icu.subject_id
AND lab.itemid = 50931
This query returns 50738 unique subject_ids. You can rewrite it for pandas like:
glucose_df = labevents[(labevents['itemid'] == 50931) & (labevents['subject_id].isin(icustays_df['subject_id']))]
print(glucose_df['subject_id'].nunique())
Hope this provides you with more insight into the schema :)
Hello,
In the dataset's documentation, it is mentioned that the data contains information for 40,000 patients.
However, when computing the number of unique values in the subject_id column I receive a much greater number.
For example, I would like to know the number of patients who had a blood glucose test:
The output is 247,005 which is much greater than 40,000.
Thanks
The text was updated successfully, but these errors were encountered: