[Data rearchitecture] Stop creating complete universe of article course timeslices #6069
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What this PR does
This PR stops calling
create_timeslices_for_new_article_course_records
when new articles are ingested. Having the complete universe of article course timeslices for courses with a huge number of articles implies a lot of disk space that is not useful since most timeslices are empty. For this reason, this PR implements a new approach, in which article course timeslices are only created if they are non-empty (i.e. when there is a revision for that article and that date). This new approach should considerably decrease the disk space used in the data-rearchitecture instance.Open questions and concerns
This approach could be replicated for course user wiki timeslices in the future, although the
course_user_wiki_timeslices
table does not appear to take up considerable space.