From f78bbb1b17aded9cbe5063380581b09654ab1752 Mon Sep 17 00:00:00 2001 From: Rob Savoye Date: Sat, 24 Feb 2024 12:25:46 -0700 Subject: [PATCH] fix: Add content about jsonb and nested tables --- docs/schema.md | 70 +++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 63 insertions(+), 7 deletions(-) diff --git a/docs/schema.md b/docs/schema.md index 58e0da4e..f7250570 100644 --- a/docs/schema.md +++ b/docs/schema.md @@ -21,6 +21,36 @@ run to make sure nothing breaks. If the config change is to support a new internal API, then that new function should be added to the test cases. +## Merging Tables + +I've been experimenting with the tradeoffs between JSONB columns vs +nested tables in Postgres. The current Tasking Manager database schema +has multiple small two columns tables, most of those got turned into +arrays as I refactoring the schema. But some of the tables in TM are +larger like project_team or task_history. Those I had been using as a +nested array. The advantages, easy to query, you can see all the +columns in each table. Nested tables are being used to reduce needing +multiple SQL queries in series to get basic information. The downside +is a nested table has to exist in the database when the primary table +is created, so there is a dependency problem. It's also a touch +confusing, since when you list the tables, you have several of these +adding clutter. When doing a bulk insert of multiple tables, it's +messy dealing with the dependencies. + +Since Postgres v12, there are new functions for dealing with JSONB +columns that I think make them better than nested tables. A JSONB +column doesn't need to be defined ahead of time, it's created +dynamically, so also easier to update in the future. They're also +fast, JSONB columns have their own index. The downside is the syntax +for accessing a JSONB column reminds me of AWK, write-once, difficult +to debug. You wind up with SQL like this: + + SELECT jsonb_path_query(team_members, '$.members[*] ? (@.function[*] == "MEMBER" && @.active=="false")') AS player FROM teams WHERE "id" = 3; + +I'm burying the SQL queries in a python module, but luckily at one +point implementing new queries is mostly cut & paste with minor +editing for column names. + ## The Types files All of the enums have been extracted from FMTM and TM, and are defined @@ -139,8 +169,8 @@ tables that then need to be imported. Since importing these updates an existing record instead of inserting it, the primary table's data obviously must be imported first. -To import the remaining tables into the array columns or nested -tables, each base class has support for their format. For example, to +To import the remaining tables into the array columns or a jsonb +column, each base class has support for their format. For example, to import all the utility tables for the primary *users* table, do this: users/users.py -v @@ -200,7 +230,7 @@ has been removed as it's possible to just query the database for users with or without email addresses. The *users* tables also absorbed the team_members table, adding these -columns to the *users* table as a nested table array. This lets a user +columns to the *users* table as a jsonb array. This lets a user have different functions or activity across multiple projects, which is currently not supported by TM. @@ -246,7 +276,7 @@ From **project_priority_areas** tables From **project_teams** table -* Add *team_id*, *team_role* to nested *teams* table +* Add *team_id*, *team_role* to *teams* jsonb column. #### Organizations Table @@ -257,7 +287,10 @@ organizations table. #### Teams Table -Added columns from the **team_members** to TM Admin *teams* nested table. +The teams table is based on OSM teams, but has been modified to +support any team. +Added columns from the **team_members** to the TM Admin *teams* jsonb +column. * Team ID * function (mapper or manager) @@ -268,6 +301,15 @@ out as this will be in the notification table. #### Tasks Table +This is obviously a heavily used table, and in the Tasking Manager, is +actually 5 tables. The goal here is to merge the tables together using +postgres arrays and jsonb columns to reduce the number of sequential +queries that need to be made for some API endpoints. + +This table also has two indexes, as the task ID is not unique across +all projects, only within a single project. Both the task ID and +project ID need to be used in queries to get the right one. + * task_mapping_issues is now an Enum instead of a table ##### Task Annotations @@ -276,15 +318,29 @@ This table appears not to be used by TM yet. ##### Task History Table -The *task_history* table is now a nested table within the tasks +The *task_history* table is now a jsonb column within the tasks table. The *id* column is no longer needed, and the *project_id* and *task_id* are already in the tasks table. In TM the action is a string, which in TM Admin is a proper Enum, which is used instead. The *action_text*, *action_date* and *user_id* are all preserved in the -nested table. +jsonb column. + +##### Task Mapping Issues + +This table in Tasking Manager must be new, as it contains little +data. It's basically a summary of the issue, like "Missed +Features(s)", or "Feature Geometry", with a count of how many features +have this issue. + +Currently in the Tasking Manager there is an index into the history +table, which no longer exists. So the details of the issue are merged +with the task history jsonb column. This way the actual issue data is +part of the history. ##### Task Invalidation History Table +This table has been merged into the *tasks* table as a jsonb column. + ##### Notifications TODO: not implemented yet