Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix sorting of dataset drafts and minor versions when sorting by "newest first" #11180

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

vera
Copy link
Contributor

@vera vera commented Jan 23, 2025

What this PR does / why we need it:

This PR fixes an issue where draft and minor versions of datasets were sorted using the release timestamp of their most recent major version.
This caused newer drafts or minor versions to appear incorrectly alongside their corresponding major version, instead of at the top, when sorted by "newest first". This affects the search results page and the "My data" page, both of which are sorted by newest by default.
Sorting now consistently uses the last update timestamp for all dataset versions (draft, minor, and major).

See bug description with screenshot etc in #11178

Which issue(s) this PR closes:

Special notes for your reviewer:

/

Suggestions on how to test this:

I've added a test that can be run with: mvn test -Dtest="DataRetrieverApiIT#testRetrieveMyDataAsJsonStringSortOrder"

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

/

Is there a release notes update needed for this change?:

I think it would be good to include a release note for this bug fix, since it affects the default sorting of datasets on the search results and "My data" page. I've added a release note as part of this PR

Additional documentation:

/

@pdurbin
Copy link
Member

pdurbin commented Jan 23, 2025

TODO: Review the sorting rules from https://docs.google.com/document/d/1DWsEqT8KfheKZmMB3n_VhJpl9nIxiUjai_AIQPAjiyA/edit?usp=sharing and update this comment.

@ofahimIQSS ofahimIQSS added the Size: 3 A percentage of a sprint. 2.1 hours. label Jan 28, 2025
@cmbz cmbz added the FY25 Sprint 15 FY25 Sprint 15 (2025-01-15 - 2025-01-29) label Jan 29, 2025
@cmbz
Copy link

cmbz commented Jan 29, 2025

2025/01/29: Julian will review and decide if/when to move forward back into the Sprint queue.

@vera
Copy link
Contributor Author

vera commented Jan 29, 2025

If it makes the merging decision easier, we could also leave the sorting of minor versions untouched for now (possibly split a sorting change for minor versions into a second PR). But it would be nice if the sorting of the draft versions could be fixed. As I mentioned in the issue, it's been noted by one of our curators that the sorting of draft versions in review based on the publication date of the most recent major version is confusing.

This bug was discovered by one of our curators who used "My data" to get a list of datasets to be reviewed. A dataset which was just submitted for review today was not to be found on page 1, as expected, but on one of the latest pages.

It seems that the dataset was sorted based on the dateSort timestamp which was copied from the latest published version (which was published September 24). This means the draft was sorted as if it was created/submitted last September instead of today.

@jggautier
Copy link
Contributor

Thanks for your comment in the GitHub issue @vera.

I've always thought, and I might've heard this from someone years ago, that sorting works the way it has because folks thought that insignificant dataset updates shouldn't make the dataset more "new" than newly published datasets and datasets with significant updates.

This reasoning isn't in the Google Doc that @pdurbin shared. The effects of this decision are discussed a bit in #2607, but the why isn't discussed there either.

@vera what you shared from your curator makes perfect sense to me, too. It sounds like "new" is being thought of in different ways.

And of course it's possible that no one will mind that minor versions start causing datasets to appear at the top when sorting by Newest. I just wasn't sure if this reasoning had been considered here and wanted to make sure that it was before it's changed.

@vera
Copy link
Contributor Author

vera commented Jan 30, 2025

I've always thought, and I might've heard this from someone years ago, that sorting works the way it has because folks thought that insignificant dataset updates shouldn't make the dataset more "new" than newly published datasets and datasets with significant updates.

That does make sense. Perhaps that means minor versions should stay sorted as they currently are, but drafts should be sorted according to their own timestamp. I would say that if you are able to see a draft, you are usually either a curator or a contributor, and in both cases it makes sense for a new draft to show up on the top, because you are interested in seeing that a new draft exists, checking what's changed, making further edits, reviewing the draft for publication, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FY25 Sprint 15 FY25 Sprint 15 (2025-01-15 - 2025-01-29) Size: 3 A percentage of a sprint. 2.1 hours.
Projects
Status: On Hold ⌛
Development

Successfully merging this pull request may close these issues.

Bug?: unexpected sorting of results when sorting by "newest first" (Search + My Data)
5 participants