Skip to content

Commit

Permalink
Remove includes course sites on provider courses page
Browse files Browse the repository at this point in the history
This is generating a rails default query where the main nested loop
are being higher and making the query takes more than 500ms on average
sometimes reaching more than 2 seconds.

Taking a look on the query:

```
EXPLAIN ANALYZE SELECT
    "course_site".*
FROM
    "course_site"
INNER JOIN
    "site"
    ON "site"."id" = "course_site"."site_id"
INNER JOIN
    "course_site" AS "site_statuses"
    ON "site_statuses"."site_id" = "site"."id"
WHERE
    "site_statuses"."status" IN ('N', 'R')
    AND "course_site"."course_id" IN (
    );
```

If you run the query above you will find some interesting
info.

If I can summarise in a statement of my interpretation is:

Somehow that it requires more deep investigations Rails default
includes generates the query above and the query above does not
use the composite index of course_site.course_id and
course_site.site_id.

The details of the explain

1. The Nested Loop join is iterating over the results of the first part of
the join (the outer loop), which is the result of scanning course_site.

  For each row in this result, the system performs an inner scan (the inner
  loop) to match records from site_statuses and PK_ucas_campus. This can
  result in a high number of rows if the tables being joined are large or
  if there's a large number of qualifying rows in the tables.

2. Row Count (rows=5397 / rows=483361) too high!

  Rows in the actual result suggests that the query planner was
  likely underestimating the number of rows returned from the join.

  This can occur if there is high cardinality (many different combinations of records) between the tables
involved in the join.

   The high row count suggests that each course_site row is joining with
   multiple rows in site_statuses.

   If each course_site record matches many site_statuses rows, the number
   of output rows can grow exponentially.

3. Repeated Access to Site (5775 loops):

  For every row returned by the first index scan (course_site), there are
  5775 iterations (loops) over the site data and site_statuses,
  which increases the overall row count.
  • Loading branch information
tomas-stefano committed Dec 17, 2024
1 parent 05dfb41 commit 472db60
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion app/controllers/publish/courses_controller.rb
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,7 @@ def fetch_course

def provider
@provider ||= recruitment_cycle.providers
.includes(courses: %i[sites site_statuses enrichments provider])
.includes(courses: %i[site_statuses enrichments provider])
.find_by!(provider_code: params[:provider_code])
end

Expand Down

0 comments on commit 472db60

Please sign in to comment.