Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Data rearchitecture] Implement UpdateWikidataStatsTimeslice #6059

Conversation

gabina
Copy link
Member

@gabina gabina commented Dec 12, 2024

What this PR does

This PR creates a new UpdateWikidataStatsTimeslice class (based on the existing UpdateWikidataStats) to import wikidata stats through the wikidata-diff-analyzer gem and create course_stats rows for courses using wikidata, without using Revisions table.

Revisions behavior:

As part of the daily update, we call import_wikidata_summaries (only for wiki education dashboard), which imports Wikidata revision summaries through the WikidataSummaryImporter.

On the other hand, as part of the UpdateCourseStats class, we call update_wikidata_stats, which invokes UpdateWikidataStats class. It has two main tasks:

  • It updates the summary field for revisions without summary through the WikidataDiffAnalyzer gem.
  • and updates course_stats table through the UpdateWikidataStats class.

This indicates a combination of revisions: some containing statistics from the WikidataSummaryImporter, and others derived from the WikidataDiffAnalyzer gem.

Timeslices behavior:

In the timeslice-based approach, the daily update does not generate any revision summaries because the Revisions table contains no entries. Instead, statistics for revisions are fetched dynamically using the WikidataDiffAnalyzer gem when revisions are loaded into RAM.
To store the (partial) Wikidata statistics, we introduced a stats field in the CourseWikiTimeslice model. This field is used to save the Wikidata statistics for a specific timeslice. Once statistics for individual timeslices are available, course-level statistics can be calculated by aggregating them.

Screenshots

Stats using revisions version:
image

Stats using timeslices version:
image

Open questions and concerns

< anything you learned that you want to share, or questions you're wondering about related to this PR >

@gabina gabina changed the title [WIP] [Data rearchitecture] Implement UpdateWikidataStatsTimeslice [Data rearchitecture] Implement UpdateWikidataStatsTimeslice Dec 13, 2024
@gabina gabina marked this pull request as ready for review December 18, 2024 23:17
@gabina gabina merged commit da0da8f into WikiEducationFoundation:data-rearchitecture-for-dashboard Dec 19, 2024
1 check passed
@gabina gabina deleted the data-rearchitecture-implement-update-wikidata-stats branch December 19, 2024 02:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant