From f78bbb1b17aded9cbe5063380581b09654ab1752 Mon Sep 17 00:00:00 2001
From: Rob Savoye <rob@senecass.com>
Date: Sat, 24 Feb 2024 12:25:46 -0700
Subject: [PATCH] fix: Add content about jsonb and nested tables

---
 docs/schema.md | 70 +++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 63 insertions(+), 7 deletions(-)

diff --git a/docs/schema.md b/docs/schema.md
index 58e0da4e..f7250570 100644
--- a/docs/schema.md
+++ b/docs/schema.md
@@ -21,6 +21,36 @@ run to make sure nothing breaks. If the config change is to support a
 new internal API, then that new function should be added to the test
 cases.
 
+## Merging Tables
+
+I've been experimenting with the tradeoffs between JSONB columns vs
+nested tables in Postgres. The current Tasking Manager database schema
+has multiple small two columns tables, most of those got turned into
+arrays as I refactoring the schema. But some of the tables in TM are
+larger like project_team or task_history. Those I had been using as a
+nested array. The advantages, easy to query, you can see all the
+columns in each table. Nested tables are being used to reduce needing
+multiple SQL queries in series to get basic information. The downside
+is a nested table has to exist in the database when the primary table
+is created, so there is a dependency problem. It's also a touch
+confusing, since when you list the tables, you have several of these
+adding clutter. When doing a bulk insert of multiple tables, it's
+messy dealing with the dependencies.
+
+Since Postgres v12, there are new functions for dealing with JSONB
+columns that I think make them better than nested tables. A JSONB
+column doesn't need to be defined ahead of time, it's created
+dynamically, so also easier to update in the future. They're also
+fast, JSONB columns have their own index. The downside is the syntax
+for accessing a JSONB column reminds me of AWK, write-once, difficult
+to debug. You wind up with SQL like this:
+
+	SELECT jsonb_path_query(team_members, '$.members[*] ? (@.function[*] == "MEMBER" && @.active=="false")') AS player FROM teams WHERE "id" = 3;
+
+I'm burying the SQL queries in a python module, but luckily at one
+point implementing new queries is mostly cut & paste with minor
+editing for column names.
+
 ## The Types files
 
 All of the enums have been extracted from FMTM and TM, and are defined
@@ -139,8 +169,8 @@ tables that then need to be imported. Since importing these updates an
 existing record instead of inserting it, the primary table's data
 obviously must be imported first.
 
-To import the remaining tables into the array columns or nested
-tables, each base class has support for their format. For example, to
+To import the remaining tables into the array columns or a jsonb
+column, each base class has support for their format. For example, to
 import all the utility tables for the primary *users* table, do this:
 
 	users/users.py -v
@@ -200,7 +230,7 @@ has been removed as it's possible to just query the database for users
 with or without email addresses.
 
 The *users* tables also absorbed the team_members table, adding these
-columns to the *users* table as a nested table array. This lets a user
+columns to the *users* table as a jsonb array. This lets a user
 have different functions or activity across multiple projects, which
 is currently not supported by TM.
 
@@ -246,7 +276,7 @@ From **project_priority_areas** tables
 
 From **project_teams** table
 
-* Add *team_id*, *team_role* to nested *teams* table
+* Add *team_id*, *team_role* to *teams* jsonb column.
 
 #### Organizations Table
 
@@ -257,7 +287,10 @@ organizations table.
 
 #### Teams Table
 
-Added columns from the **team_members** to TM Admin *teams* nested table.
+The teams table is based on OSM teams, but has been modified to
+support any team.
+Added columns from the **team_members** to the TM Admin *teams* jsonb
+column.
 
 * Team ID
 * function (mapper or manager)
@@ -268,6 +301,15 @@ out as this will be in the notification table.
 
 #### Tasks Table
 
+This is obviously a heavily used table, and in the Tasking Manager, is
+actually 5 tables. The goal here is to merge the tables together using
+postgres arrays and jsonb columns to reduce the number of sequential
+queries that need to be made for some API endpoints.
+
+This table also has two indexes, as the task ID is not unique across
+all projects, only within a single project. Both the task ID and
+project ID need to be used in queries to get the right one.
+
 * task_mapping_issues is now an Enum instead of a table
 
 ##### Task Annotations
@@ -276,15 +318,29 @@ This table appears not to be used by TM yet.
 
 ##### Task History Table
 
-The *task_history* table is now a nested table within the tasks
+The *task_history* table is now a jsonb column within the tasks
 table. The *id* column is no longer needed, and the *project_id* and
 *task_id* are already in the tasks table. In TM the action is a
 string, which in TM Admin is a proper Enum, which is used instead. The
 *action_text*, *action_date* and *user_id* are all preserved in the
-nested table.
+jsonb column.
+
+##### Task Mapping Issues
+
+This table in Tasking Manager must be new, as it contains little
+data. It's basically a summary of the issue, like "Missed
+Features(s)", or "Feature Geometry", with a count of how many features
+have this issue.
+
+Currently in the Tasking Manager there is an index into the history
+table, which no longer exists. So the details of the issue are merged
+with the task history jsonb column. This way the actual issue data is
+part of the history.
 
 ##### Task Invalidation History Table
 
+This table has been merged into the *tasks* table as a jsonb column.
+
 ##### Notifications
 
 TODO: not implemented yet