Skip to content
This repository has been archived by the owner on Aug 5, 2024. It is now read-only.

Commit

Permalink
fix: Add content about jsonb and nested tables
Browse files Browse the repository at this point in the history
  • Loading branch information
rsavoye committed Feb 24, 2024
1 parent 56ee732 commit f78bbb1
Showing 1 changed file with 63 additions and 7 deletions.
70 changes: 63 additions & 7 deletions docs/schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,36 @@ run to make sure nothing breaks. If the config change is to support a
new internal API, then that new function should be added to the test
cases.

## Merging Tables

I've been experimenting with the tradeoffs between JSONB columns vs
nested tables in Postgres. The current Tasking Manager database schema
has multiple small two columns tables, most of those got turned into
arrays as I refactoring the schema. But some of the tables in TM are
larger like project_team or task_history. Those I had been using as a
nested array. The advantages, easy to query, you can see all the
columns in each table. Nested tables are being used to reduce needing
multiple SQL queries in series to get basic information. The downside
is a nested table has to exist in the database when the primary table
is created, so there is a dependency problem. It's also a touch
confusing, since when you list the tables, you have several of these
adding clutter. When doing a bulk insert of multiple tables, it's
messy dealing with the dependencies.

Since Postgres v12, there are new functions for dealing with JSONB
columns that I think make them better than nested tables. A JSONB
column doesn't need to be defined ahead of time, it's created
dynamically, so also easier to update in the future. They're also
fast, JSONB columns have their own index. The downside is the syntax
for accessing a JSONB column reminds me of AWK, write-once, difficult
to debug. You wind up with SQL like this:

SELECT jsonb_path_query(team_members, '$.members[*] ? (@.function[*] == "MEMBER" && @.active=="false")') AS player FROM teams WHERE "id" = 3;

I'm burying the SQL queries in a python module, but luckily at one
point implementing new queries is mostly cut & paste with minor
editing for column names.

## The Types files

All of the enums have been extracted from FMTM and TM, and are defined
Expand Down Expand Up @@ -139,8 +169,8 @@ tables that then need to be imported. Since importing these updates an
existing record instead of inserting it, the primary table's data
obviously must be imported first.

To import the remaining tables into the array columns or nested
tables, each base class has support for their format. For example, to
To import the remaining tables into the array columns or a jsonb
column, each base class has support for their format. For example, to
import all the utility tables for the primary *users* table, do this:

users/users.py -v
Expand Down Expand Up @@ -200,7 +230,7 @@ has been removed as it's possible to just query the database for users
with or without email addresses.

The *users* tables also absorbed the team_members table, adding these
columns to the *users* table as a nested table array. This lets a user
columns to the *users* table as a jsonb array. This lets a user
have different functions or activity across multiple projects, which
is currently not supported by TM.

Expand Down Expand Up @@ -246,7 +276,7 @@ From **project_priority_areas** tables

From **project_teams** table

* Add *team_id*, *team_role* to nested *teams* table
* Add *team_id*, *team_role* to *teams* jsonb column.

#### Organizations Table

Expand All @@ -257,7 +287,10 @@ organizations table.

#### Teams Table

Added columns from the **team_members** to TM Admin *teams* nested table.
The teams table is based on OSM teams, but has been modified to
support any team.
Added columns from the **team_members** to the TM Admin *teams* jsonb
column.

* Team ID
* function (mapper or manager)
Expand All @@ -268,6 +301,15 @@ out as this will be in the notification table.

#### Tasks Table

This is obviously a heavily used table, and in the Tasking Manager, is
actually 5 tables. The goal here is to merge the tables together using
postgres arrays and jsonb columns to reduce the number of sequential
queries that need to be made for some API endpoints.

This table also has two indexes, as the task ID is not unique across
all projects, only within a single project. Both the task ID and
project ID need to be used in queries to get the right one.

* task_mapping_issues is now an Enum instead of a table

##### Task Annotations
Expand All @@ -276,15 +318,29 @@ This table appears not to be used by TM yet.

##### Task History Table

The *task_history* table is now a nested table within the tasks
The *task_history* table is now a jsonb column within the tasks
table. The *id* column is no longer needed, and the *project_id* and
*task_id* are already in the tasks table. In TM the action is a
string, which in TM Admin is a proper Enum, which is used instead. The
*action_text*, *action_date* and *user_id* are all preserved in the
nested table.
jsonb column.

##### Task Mapping Issues

This table in Tasking Manager must be new, as it contains little
data. It's basically a summary of the issue, like "Missed
Features(s)", or "Feature Geometry", with a count of how many features
have this issue.

Currently in the Tasking Manager there is an index into the history
table, which no longer exists. So the details of the issue are merged
with the task history jsonb column. This way the actual issue data is
part of the history.

##### Task Invalidation History Table

This table has been merged into the *tasks* table as a jsonb column.

##### Notifications

TODO: not implemented yet
Expand Down

0 comments on commit f78bbb1

Please sign in to comment.