Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PERFORMANCE] Optimize container based proficiency calculation [MER-2447] #4095

Merged
merged 8 commits into from
Aug 9, 2023

Conversation

darrensiegel
Copy link
Contributor

The original reported issue (https://eliterate.atlassian.net/browse/MER-2447) noted course sections on Argos Prod that would not load the Instructor Dashboard "Reports" view, with the UI eventually getting a "Gateway Timeout" error.

Part 1: Query Optimization and Post Processing

I pinned down the problem to be the query used to calculate student proficiency, per container, from Metrics.proficiency_per_container/1. The raw SQL for that query originally was:

SELECT c2.container_id, 
  CAST(COUNT(CASE WHEN s0.correct THEN 1 END) as float) / CAST(COUNT(*) as float) 
FROM snapshots AS s0 
INNER JOIN sections AS s1 ON s0.section_id = s1.id 
INNER JOIN contained_pages AS c2 ON c2.page_id = s0.resource_id 
WHERE (((s0.attempt_number = 1) AND (s0.part_attempt_number = 1)) 
  AND (s1.slug = 'bio_100_on_demand__self_paced_')) 
GROUP BY c2.container_id

This query runs just fine on Torus Prod with its 200K snapshot records. On Argos Prod there are 109 million snapshot records and this query doesn't finish even after letting it go for close to 5 minutes from within pgAdmin.

To address the problem I've reworked the overall approach here to no longer attempt to do the container "bucketing" within the query, and instead to do that in code as a post-processing step after the query finishes. I also removed the JOIN to sections that was necessary because the query was slug based. So the new query now looks like:

SELECT s0.resource_id, 
  CAST(COUNT(CASE WHEN s0.correct THEN 1 END) as float),
  CAST(COUNT(*) as float) 
 FROM snapshots AS s0 
 WHERE (((s0.attempt_number = 1) AND (s0.part_attempt_number = 1)) 
		AND (s0.section_id = 725)) 
GROUP BY s0.resource_id

The above query, on Argos Prod, runs anywhere between 2 seconds to about 25 seconds (due to cache warmth, I believe). The 25 seconds upper bound is concerning as we could still face the issue of this view still timing out if this grows. There would also be a negative UX by forcing the user to wait this long before the view would display. So, on to:

Part 2: Async Processing

This PR also changes the Instructor and Student Details dashboards LiveViews to invoke the potentially expensive proficiency calculation as an async, background task. When that task finishes it will notify the LV and the assigns are updated. The UX now will be a quick loading view, with some "Loading..." placeholder text in place for the proficiency values until those calculations finish.

Copy link
Contributor

@nicocirio nicocirio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
Loved the async implementation 😄

I tested it locally adding a :timer.sleep in the async function to see the behavior when receiving the message after navigating to another table page and worked fine!

I left some minor NITs that may improve a little bit the performance by reducing the amounts of nested Enum

lib/oli/delivery/metrics.ex Outdated Show resolved Hide resolved
lib/oli/delivery/metrics.ex Outdated Show resolved Hide resolved
darrensiegel and others added 2 commits August 9, 2023 15:09
Co-authored-by: Nicolás Cirio <[email protected]>
Co-authored-by: Nicolás Cirio <[email protected]>
lib/oli/delivery/metrics.ex Outdated Show resolved Hide resolved
@darrensiegel darrensiegel merged commit 20221bc into hotfix-v0.24.4 Aug 9, 2023
5 checks passed
@darrensiegel darrensiegel deleted the optimized-proficiency branch August 9, 2023 20:08
@nicocirio
Copy link
Contributor

@darrensiegel It seems liveview is about to support async assigns natively, so we should keep an eye on this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants