optimize map marker fetching #1171

CollinBeczak · 2025-02-25T07:54:11Z

This pull request refactors the code for the tasks/box and tasks/markers endpoints, optimizing the underlying SQL query for better performance. The original query combined spatial filtering with other conditions directly in the WHERE clause, often resulting in inefficient execution plans—such as full-table scans or the creation of a large bitmap hash map spanning the entire tasks table within the specified map bounds. This inefficiency is evident in the staging query, which took approximately 1.1 seconds to execute and processed an excessive number of rows during the spatial filter, as shown in its execution plan.

Example of old query in staging:

SELECT tasks.id, tasks.name, tasks.parent_id, c.name, tasks.instruction, tasks.status, tasks.mapped_on,
          tasks.completed_time_spent, tasks.completed_by,
          tasks.bundle_id, tasks.is_bundle_primary, tasks.cooperative_work_json::TEXT as cooperative_work,
          task_review.review_status, task_review.review_requested_by, task_review.reviewed_by, task_review.reviewed_at,
          task_review.review_started_at, task_review.meta_review_status, task_review.meta_reviewed_by,
          task_review.meta_reviewed_at, task_review.additional_reviewers,
          ST_AsGeoJSON(tasks.location) AS location, priority,
          CASE WHEN task_review.review_started_at IS NULL
                THEN 0
                ELSE EXTRACT(epoch FROM (task_review.reviewed_at - task_review.review_started_at)) END
          AS reviewDuration
    FROM tasks
    
        INNER JOIN challenges c ON c.id = tasks.parent_id
        INNER JOIN projects p ON p.id = c.parent_id
        LEFT OUTER JOIN task_review ON task_review.task_id = tasks.id
      
   WHERE (tasks.id NOT IN (select item_id from locked WHERE
                  item_id = tasks.id AND item_type = 2
                  AND user_id != 1)) AND (c.deleted = false AND p.deleted = false) AND (tasks.location && ST_MakeEnvelope (-115.57617187500001, 32.54681317351517, -87.5390625, 52.908902047770255, 4326)) AND (tasks.status IN (0,1,2,3,4,5,6,9)) AND ((tasks.id IN (SELECT task_id FROM task_review WHERE task_review.task_id = tasks.id AND task_review.review_status IN (0,1,2,3,4,5,6,7,-1)) OR NOT tasks.id IN (SELECT task_id FROM task_review task_review WHERE task_review.task_id = tasks.id))) AND ((tasks.id IN (SELECT task_id FROM task_review WHERE (task_review.task_id = tasks.id) AND ((task_review.meta_review_status IN (0,1,2,3,5,6,7,-2,-1) OR task_review.meta_review_status IS NULL))) OR NOT tasks.id IN (SELECT task_id FROM task_review task_review WHERE task_review.task_id = tasks.id))) AND (tasks.priority IN (0,1,2)) AND (c.id IN (40400)) LIMIT 1001;

This is its reads/workflow (take not of the number of rows read in the location filter):

And takes roughly 1.1seconds to complete.

This is the new query:

WITH filtered_tasks AS (
          SELECT tasks.id
          FROM tasks
          INNER JOIN challenges c ON c.id = tasks.parent_id
          INNER JOIN projects p ON p.id = c.parent_id
          LEFT OUTER JOIN task_review ON task_review.task_id = tasks.id
          WHERE (tasks.id NOT IN (select item_id from locked WHERE
                  item_id = tasks.id AND item_type = 2
                  AND user_id != 1)) AND (c.deleted = false AND p.deleted = false) AND (tasks.status IN (0,1,2,3,4,5,6,9)) AND ((tasks.id IN (SELECT task_id FROM task_review WHERE task_review.task_id = tasks.id AND task_review.review_status IN (0,1,2,3,4,5,6,7,-1)) OR NOT tasks.id IN (SELECT task_id FROM task_review task_review WHERE task_review.task_id = tasks.id))) AND ((tasks.id IN (SELECT task_id FROM task_review WHERE (task_review.task_id = tasks.id) AND ((task_review.meta_review_status IN (0,1,2,3,5,6,7,-2,-1) OR task_review.meta_review_status IS NULL))) OR NOT tasks.id IN (SELECT task_id FROM task_review task_review WHERE task_review.task_id = tasks.id))) AND (tasks.priority IN (0,1,2)) AND (c.id IN (40400))
        )
        SELECT  tasks.id,
    tasks.name,
    tasks.parent_id,
    c.name,
    tasks.instruction,
    tasks.status,
    tasks.mapped_on,
    tasks.completed_time_spent,
    tasks.completed_by,
    tasks.bundle_id,
    tasks.is_bundle_primary,
    tasks.cooperative_work_json::TEXT as cooperative_work,
    task_review.review_status,
    task_review.review_requested_by,
    task_review.reviewed_by,
    task_review.reviewed_at,
    task_review.review_started_at,
    task_review.meta_review_status,
    task_review.meta_reviewed_by,
    task_review.meta_reviewed_at,
    task_review.additional_reviewers,
    ST_AsGeoJSON(tasks.location) AS location,
    priority,
    CASE 
        WHEN task_review.review_started_at IS NULL THEN 0
        ELSE EXTRACT(epoch FROM (task_review.reviewed_at - task_review.review_started_at))
    END AS reviewDuration
        FROM filtered_tasks
        INNER JOIN tasks ON tasks.id = filtered_tasks.id
        INNER JOIN challenges c ON c.id = tasks.parent_id
        INNER JOIN projects p ON p.id = c.parent_id
        LEFT OUTER JOIN task_review ON task_review.task_id = tasks.id
        WHERE tasks.location && ST_MakeEnvelope(-115.57617187500001, 32.54681317351517, -87.5390625, 52.908902047770255, 4326)
        LIMIT 1001
      ;

This is its reads/workflow:

and it takes roughly 150-200ms.

The refactored approach introduces a Common Table Expression (CTE) to streamline the process. By pre-filtering tasks based on non-spatial conditions (e.g., status, priority, and review criteria) in the CTE, the query reduces the dataset before applying the spatial filter. This significantly cuts down the rows processed by the spatial operation, leading to a more efficient execution plan. The updated query, now completing in 150–200 milliseconds, leverages a CTE scan to ensure the spatial filter (ST_MakeEnvelope) operates only on the pre-filtered subset of tasks, avoiding the bloated bitmap index issue seen previously. The new execution plan confirms this optimization, showing a marked reduction in rows read compared to the original.

sonarqubecloud · 2025-02-27T19:53:52Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

CollinBeczak · 2025-02-27T20:26:19Z

Here is a more extreme example, something that actually happens in a specific workflow at the moment. When the entire map is selected: -180, -85, 180, 85, things get exponentially slower.

SELECT tasks.id, tasks.name, tasks.parent_id, c.name, tasks.instruction, tasks.status, tasks.mapped_on,
          tasks.completed_time_spent, tasks.completed_by,
          tasks.bundle_id, tasks.is_bundle_primary, tasks.cooperative_work_json::TEXT as cooperative_work,
          task_review.review_status, task_review.review_requested_by, task_review.reviewed_by, task_review.reviewed_at,
          task_review.review_started_at, task_review.meta_review_status, task_review.meta_reviewed_by,
          task_review.meta_reviewed_at, task_review.additional_reviewers,
          ST_AsGeoJSON(tasks.location) AS location, priority,
          CASE WHEN task_review.review_started_at IS NULL
                THEN 0
                ELSE EXTRACT(epoch FROM (task_review.reviewed_at - task_review.review_started_at)) END
          AS reviewDuration
    FROM tasks
    
        INNER JOIN challenges c ON c.id = tasks.parent_id
        INNER JOIN projects p ON p.id = c.parent_id
        LEFT OUTER JOIN task_review ON task_review.task_id = tasks.id
      
   WHERE (tasks.id NOT IN (select item_id from locked WHERE
                  item_id = tasks.id AND item_type = 2
                  AND user_id != 1)) AND (c.deleted = false AND p.deleted = false) AND (tasks.location && ST_MakeEnvelope (-180, -85, 180, 85, 4326)) AND (tasks.status IN (0,1,2,3,4,5,6,9)) AND ((tasks.id IN (SELECT task_id FROM task_review WHERE task_review.task_id = tasks.id AND task_review.review_status IN (0,1,2,3,4,5,6,7,-1)) OR NOT tasks.id IN (SELECT task_id FROM task_review task_review WHERE task_review.task_id = tasks.id))) AND ((tasks.id IN (SELECT task_id FROM task_review WHERE (task_review.task_id = tasks.id) AND ((task_review.meta_review_status IN (0,1,2,3,5,6,7,-2,-1) OR task_review.meta_review_status IS NULL))) OR NOT tasks.id IN (SELECT task_id FROM task_review task_review WHERE task_review.task_id = tasks.id))) AND (tasks.priority IN (0,1,2)) AND (c.id IN (40400)) LIMIT 1001;

This takes 7 seconds to complete on the smaller staging environment. Here is the read rows:

As you can see, 40 million + rows are being read because of some weird sql logic with the filtering. This greatly slows it down.

The new query using the cte scan is much faster, it has virtually no change in performance between the narrowed down mapbounds and the global mapbounds taking only 200ms for the same results instead of 7 seconds:

WITH filtered_tasks AS (
          SELECT tasks.id
          FROM tasks
          INNER JOIN challenges c ON c.id = tasks.parent_id
          INNER JOIN projects p ON p.id = c.parent_id
          LEFT OUTER JOIN task_review ON task_review.task_id = tasks.id
          WHERE (tasks.id NOT IN (select item_id from locked WHERE
                  item_id = tasks.id AND item_type = 2
                  AND user_id != 1)) AND (c.deleted = false AND p.deleted = false) AND (tasks.status IN (0,1,2,3,4,5,6,9)) AND ((tasks.id IN (SELECT task_id FROM task_review WHERE task_review.task_id = tasks.id AND task_review.review_status IN (0,1,2,3,4,5,6,7,-1)) OR NOT tasks.id IN (SELECT task_id FROM task_review task_review WHERE task_review.task_id = tasks.id))) AND ((tasks.id IN (SELECT task_id FROM task_review WHERE (task_review.task_id = tasks.id) AND ((task_review.meta_review_status IN (0,1,2,3,5,6,7,-2,-1) OR task_review.meta_review_status IS NULL))) OR NOT tasks.id IN (SELECT task_id FROM task_review task_review WHERE task_review.task_id = tasks.id))) AND (tasks.priority IN (0,1,2)) AND (c.id IN (40400))
        )
        SELECT  tasks.id,
    tasks.name,
    tasks.parent_id,
    c.name,
    tasks.instruction,
    tasks.status,
    tasks.mapped_on,
    tasks.completed_time_spent,
    tasks.completed_by,
    tasks.bundle_id,
    tasks.is_bundle_primary,
    tasks.cooperative_work_json::TEXT as cooperative_work,
    task_review.review_status,
    task_review.review_requested_by,
    task_review.reviewed_by,
    task_review.reviewed_at,
    task_review.review_started_at,
    task_review.meta_review_status,
    task_review.meta_reviewed_by,
    task_review.meta_reviewed_at,
    task_review.additional_reviewers,
    ST_AsGeoJSON(tasks.location) AS location,
    priority,
    CASE 
        WHEN task_review.review_started_at IS NULL THEN 0
        ELSE EXTRACT(epoch FROM (task_review.reviewed_at - task_review.review_started_at))
    END AS reviewDuration
        FROM filtered_tasks
        INNER JOIN tasks ON tasks.id = filtered_tasks.id
        INNER JOIN challenges c ON c.id = tasks.parent_id
        INNER JOIN projects p ON p.id = c.parent_id
        LEFT OUTER JOIN task_review ON task_review.task_id = tasks.id
        WHERE tasks.location && ST_MakeEnvelope(-115.57617187500001, 32.54681317351517, -87.5390625, 52.908902047770255, 4326)
        LIMIT 1001
      ;

jake-low

Great writeup. Had a look at the code, LGTM.

CollinBeczak · 2025-02-28T23:13:28Z

Converting this to a draft, I uncovered an underlying issue with this approach. The reason the earlier queries performed so quickly was that the challenge ID filter was applied before the location filter. This significantly slows down queries that lack the challenge ID filter. To address this, I need to find a way to prioritize the challenge ID filter over the location filter exclusively.

CollinBeczak added 3 commits February 25, 2025 01:35

optimize map marker fetching

52f06e2

fix bounding box table data fetching

16c1fa8

consolidate code

79be77e

CollinBeczak requested review from ljdelight, jake-low and jschwarz2030 February 27, 2025 19:55

CollinBeczak marked this pull request as ready for review February 27, 2025 19:57

jake-low approved these changes Feb 27, 2025

View reviewed changes

CollinBeczak marked this pull request as draft February 28, 2025 23:06

CollinBeczak mentioned this pull request Mar 3, 2025

Improve location filtering in commonly used map related sql queries #1173

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimize map marker fetching #1171

optimize map marker fetching #1171

CollinBeczak commented Feb 25, 2025 •

edited

Loading

sonarqubecloud bot commented Feb 27, 2025

CollinBeczak commented Feb 27, 2025

jake-low left a comment

CollinBeczak commented Feb 28, 2025

optimize map marker fetching #1171

Are you sure you want to change the base?

optimize map marker fetching #1171

Conversation

CollinBeczak commented Feb 25, 2025 • edited Loading

sonarqubecloud bot commented Feb 27, 2025

Quality Gate passed

CollinBeczak commented Feb 27, 2025

jake-low left a comment

Choose a reason for hiding this comment

CollinBeczak commented Feb 28, 2025

CollinBeczak commented Feb 25, 2025 •

edited

Loading