[Data rearchitecture] Implement UpdateWikidataStatsTimeslice #6059
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What this PR does
This PR creates a new
UpdateWikidataStatsTimeslice
class (based on the existingUpdateWikidataStats
) to import wikidata stats through the wikidata-diff-analyzer gem and createcourse_stats
rows for courses usingwikidata
, without using Revisions table.Revisions behavior:
As part of the daily update, we call
import_wikidata_summaries
(only for wiki education dashboard), which imports Wikidata revision summaries through theWikidataSummaryImporter
.On the other hand, as part of the
UpdateCourseStats
class, we callupdate_wikidata_stats
, which invokesUpdateWikidataStats
class. It has two main tasks:summary
field for revisions without summary through theWikidataDiffAnalyzer
gem.course_stats
table through theUpdateWikidataStats
class.This indicates a combination of revisions: some containing statistics from the
WikidataSummaryImporter
, and others derived from theWikidataDiffAnalyzer
gem.Timeslices behavior:
In the timeslice-based approach, the daily update does not generate any revision summaries because the Revisions table contains no entries. Instead, statistics for revisions are fetched dynamically using the WikidataDiffAnalyzer gem when revisions are loaded into RAM.
To store the (partial) Wikidata statistics, we introduced a stats field in the CourseWikiTimeslice model. This field is used to save the Wikidata statistics for a specific timeslice. Once statistics for individual timeslices are available, course-level statistics can be calculated by aggregating them.
Screenshots
Stats using revisions version:
Stats using timeslices version:
Open questions and concerns
< anything you learned that you want to share, or questions you're wondering about related to this PR >