Skip to content

MongoDB

joshunrau edited this page Nov 10, 2022 · 2 revisions

MongoDB

Accessing the MongoDB Shell

If you followed the instructions on the Getting Started page, you can access the MongoDB shell with the following command:

$ docker exec -it mongo mongosh

Running a Script

To run a script located in the /scripts directory, you can use the following command:

> load( "scripts/index.js" )

Old Notes

Run MongoDB Shell in Docker

$ docker run --rm --name mongodb -d -p 27017:27017 mongo
$ docker exec mongodb bash -c "apt-get update && apt-get install -y nano"
$ docker exec --env EDITOR=nano -it mongodb mongosh

Schemas

With MongoDB, using data schemas is not mandatory; a database could technically include arbitrary collections of documents. On the other hand, it is also possible to use an SQL-like format, where all fields are required. However, for most use cases, documents will have a core schema, with possible additional entries.

How to Derive Data Structure Requirements

First, we should determine which data our app needs or generates. This defines the fields we need and how they relate.

Second, we should determine where we need this data (e.g., displaying lists of specific objects). This defines our required collections and field groupings.

Third, we should determine which kind of information we want to display. This defines which queries we will need.

Fourth, we should determine how often we will need to fetch our data. This determines whether we should optimize for easy fetching. For example, for data that needs to be fetched on every page reload, it would probably be a good idea to optimize for fetching performance, even at the cost of some duplication.

Fifth, we should determine how often we need to write or change our data. This defines whether we should optimize for easy writing (e.g., avoiding duplicates as much as possible, even if some join-like operations are required). For example, patient demographics will rarely, if ever, need to be changed, whereas something like diagnoses may need to be changed more frequently.

Modeling Relations

There are two ways to model relations: nested/embedded documents and references. If you are typically fetching the related data together, the embedded approach is generally favored, as this approach is more flexible, easier to manage, and faster to query, as only a single read operation is required. However, this can also result in a lot of data duplication, or unnecessary overhead. In such cases, a reference-based approach may be favored.

One-To-One Relationships

In general, you should use an embedded document, unless there is a compelling reason to do otherwise. For example, if the child document is frequently accessed independently from the parent document, there is an application-driven reason to split these up. This is because accessing the child document through the parent document requires a lot of unnecessary overhead, particularity if the parent document in very large.

One-To-Many Relationships

Like with one-to-one relationships, documents may make sense if the child documents are always, or almost always, accessed in the context of the parent. For example, the dates the individual attended the Douglas should probably be embedded in the patient document, as these are likely to be rarely, if ever, accessed outside the context of a specific individual.

On the other hand, the results of an complex instrument will almost certainly be unique for all individuals. However, assuming that these results will be retrieved independently of any patients, there would be a strong case to be made for using references.

Another case where using references makes sense is in the case where one document corresponds to a very large number of the other documents. For example, clinics and patients.

Nested/Embedded Documents

Here, we can create a new collection called patients and insert a new person into the database. As shown below, the diagnoses are embedded in the document.

db.patients.insertOne({
  firstName: "John",
  lastName: "Smith",
  diagnoses: [
    {
      code: "F30.0",
      name: "Hypomania",
    },
    {
      code: "F40.0",
      name: "Agoraphobia",
    },
  ],
});

This structure makes it easy to retrieve the diagnoses for the patient:

databasePlanning> db.patients.findOne().diagnoses
[
  { code: 'F30.0', name: 'Hypomania' },
  { code: 'F40.0', name: 'Agoraphobia' }
]

References

On the other hand, if there are a lot of patients and many fields for a diagnosis, the embedded approach will lead to significant amounts of data duplication. For example, if there are 100 patients with major depression, the general properties of this diagnosis (e.g., name), will be duplicated 100 times. To avoid this, the following reference-based approach could be used instead:

db.diagnoses.insertMany([
  {
    _id: "F30.0",
    name: "Hypomania",
  },
  {
    _id: "F40.0",
    name: "Agoraphobia",
  },
]);
db.patients.insertOne({
  firstName: "John",
  lastName: "Smith",
  diagnoses: ["F30.0", "F40.0"],
});

It is obvious that this structure will be more efficient when large number of individuals have the same diagnoses.

Many-To-Many Relationships

These types of relationships are often modeled with references. Unlike with SQL, there is no linking table required. However, it is also possible to use an embedded document approach here. Although this leads to duplicating data, in some use cases, this is not necessarily a bad thing. Consider, for example, if an instrument is updated to include new questions. In this case, is might be desirable to include the old data.

Using "lookUp()" for Merging Reference Relations