-
Notifications
You must be signed in to change notification settings - Fork 1
MongoDB Structure
DPdash and DPimport use a unique structure for collections stored in the MongoDB "DataDB" database, which is called dpdata
by default.
(Note that this entry does not refer to the MongoDB "AppDB" database that stores users/sessions/configs, which is called dpdmongo
by default.)
Here, individual entries in each collection will be referred to as "documents," as MongoDB is a document database.
This collection contains one document per each metadata CSV file imported by DPimport. Fields to note:
-
study
: The name of the study gleaned from the metadata CSV file. -
collection
: The study-level collection for this study. -
subjects
: A list of embedded documents corresponding to each line in the metadata CSV file, in other words, individual subjects. Each will contain a subject ID, the number of days, and the name of the study, as well as the sync date.
These collections contain one document per subject in the study, in other words, one document per line of the relevant metadata CSV file. Their names will take the form of Version 4 UUIDs: 36-character hex strings with dashes, for example, 80c147ab-72a2-4e9b-ba13-fae49a79e46b
.
Let's say we want to find the relevant study-level collection for a study called STUDY_A
. To do this, we can use the following MongoDB shell command that uses a projection:
db.metadata.find({ "study": "STUDY_A" }, { "collection": 1 })
Let's say that resulted in the following:
{ "_id" : ObjectId("60b91f8e439692cf714b72a3"), "collection" : "80c147ab-72a2-4e9b-ba13-fae49a79e46b" }
We can then locate a single document in the study-level collection like so:
db["80c147ab-72a2-4e9b-ba13-fae49a79e46b"].findOne()
This will result in a single subject's document in the study-level collection. You could also use find()
to see all subjects' documents.
Fields to note:
-
Subject ID
: The subject's ID. Required for the subject to appear on DPdash. -
Consent
orConsent Date
: The consent date for the subject. Required for dates to populate in a DPdash subject-level view. -
path
: The path to the metadata CSV file used to generate this document by DPimport. - All other fields: All other fields should be identical to the headers in the metadata CSV. In fact, that goes for all fields in all documents in this collection, with the exception of
path
and_id
.
This collection contains one document per each non-metadata CSV imported by DPimport. Fields to note:
-
study
: The name of the study for this CSV. -
subject
: The ID of the subject for this CSV. -
assessment
: The name of the assessment for this CSV. Note that all of these first 3 fields are generated from the CSV's filename. -
collection
: The assessment-level collection for this CSV. -
path
: The path of the assessment CSV used to generate this document. -
glob
: A pattern with a wildcard that should match all CSVs to be grouped under the same assessment-level collection.
You can verify that all assessments matching the glob
are being stored in the same assessment-level collection by using the following MongoDB shell command, where TOC_GLOB
is the value of the glob
field from a document in the toc
collection:
db.toc.find({ "glob": "TOC_GLOB" }, { "collection": 1 })
All results should display the same collection. (There may be only one result.)
These collections contain one document per each line of data in a given assessment CSV. Their names will be a 64-character hex string, similar to 49c6e92b06cbf0b0d6bedadd3a447de4e1ded61fe4498452eb5302d58b1e98a3
. The names are built by encoding study name + subject ID + assessment name
with the SHA 256 hashing function.
Fields to note:
-
study
: The name of the study for this CSV. -
subject
: The ID of the subject for this CSV. -
path
: The path of the assessment CSV used to generate this collection. - There should be a field for each column in the CSV referred to in
path
.