-
Notifications
You must be signed in to change notification settings - Fork 498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix sorting of dataset drafts and minor versions when sorting by "newest first" #11180
base: develop
Are you sure you want to change the base?
Conversation
… of release time of most recent major version)
TODO: Review the sorting rules from https://docs.google.com/document/d/1DWsEqT8KfheKZmMB3n_VhJpl9nIxiUjai_AIQPAjiyA/edit?usp=sharing and update this comment. |
2025/01/29: Julian will review and decide if/when to move forward back into the Sprint queue. |
If it makes the merging decision easier, we could also leave the sorting of minor versions untouched for now (possibly split a sorting change for minor versions into a second PR). But it would be nice if the sorting of the draft versions could be fixed. As I mentioned in the issue, it's been noted by one of our curators that the sorting of draft versions in review based on the publication date of the most recent major version is confusing.
|
Thanks for your comment in the GitHub issue @vera. I've always thought, and I might've heard this from someone years ago, that sorting works the way it has because folks thought that insignificant dataset updates shouldn't make the dataset more "new" than newly published datasets and datasets with significant updates. This reasoning isn't in the Google Doc that @pdurbin shared. The effects of this decision are discussed a bit in #2607, but the why isn't discussed there either. @vera what you shared from your curator makes perfect sense to me, too. It sounds like "new" is being thought of in different ways. And of course it's possible that no one will mind that minor versions start causing datasets to appear at the top when sorting by Newest. I just wasn't sure if this reasoning had been considered here and wanted to make sure that it was before it's changed. |
That does make sense. Perhaps that means minor versions should stay sorted as they currently are, but drafts should be sorted according to their own timestamp. I would say that if you are able to see a draft, you are usually either a curator or a contributor, and in both cases it makes sense for a new draft to show up on the top, because you are interested in seeing that a new draft exists, checking what's changed, making further edits, reviewing the draft for publication, etc. |
It's really helpful for me that you grounded this change in a user story here (and in your GitHub issue). And I think it'll be helpful for evaluating this change of having drafts appear at the top when sorted by "Newest". And then minor versions should stay sorted as they currently, as you wondered, until we're able to think about this change by grounding it in other user stories, like if and how it would affect users who are searching for datasets and how it might affect the display of datasets by "Newest" or latest. For example, some repository homepages display the top x number of latest datasets. @vera, so could this PR be adjusted so that minor versions are sorted as they currently are, while new draft versions show up at the top so that it's easier for curators and other contributors to see that a new draft exists, check what's changed, make further edits and review drafts for publication? |
Still seems odd to me that you'll be able to see your draft and then, when you publish, the new minor version disappears back to the major version's date. I'll also note that we now have the update-current-version option to help avoid truly trivial edits from requiring new minor versions. |
I've just pushed three commits, reverting the sorting behavior of non-draft datasets (so it will not be changed by this PR). I've also added a test confirming that minor versions are sorted based on the publication date of their most recent major versions. |
Thanks for the heads up @vera! I think it's good we recorded why things may have been designed the way they were; the effects of relatively newer features, like being able to overwrite minor versions; and concerns to look out for as folks experience this change. Hopefully it helps folks who have questions or are able to review how this different types of users. |
@vera - looks like your new test is failing: |
@qqmyers that's strange, I ran the test before pushing and just ran it again to be sure, it's passing for me:
(edit: I've run it a few times now to make sure it's not a flakey test, but it always passes) I will try to investigate tomorrow. |
@qqmyers Unfortunately, I haven't been able to reproduce the test failure, so I am unsure how to fix it. (I just pushed a commit renaming some variables but that's unrelated to fixing the test failure.) Could it be some interaction with one of the other tests that is run before it in the CI pipeline? (Although the test should be encapsulated since it is using its own newly created user account, dataverse collection etc. so that shouldn't be a problem, I think.) Or do you have some other idea why the test might be failing? |
I'm not positive, but one guess would be that the order of the returned values isn't fixed, so your dataset may be in [0] or [1] in the array. I recently had to fix another test (in #11081) where the order was random, causing the test to fail intermittently. (In that test, there was already code to verify the entries existed via a path-based lookup, so all I did was delete the lines relying on the array order. In your case you might need something similar using paths if you need to find each dataset versus just verifying that there are two.) |
Hmm, the order should be fixed in this case. The results on the "my data" page should be ordered according to the This is the line that the test is failing at according to your screenshot:
In the lines leading up to it, dataset 1 is published first, and then dataset 2:
So dataset 2 should receive the newer timestamp and be sorted first. I could try adding a sleep inbetween the publish calls, to ensure they really cannot occur in the same millisecond, although I think that's already highly unlikely...? If it's not that, I'm not sure what the failure reason might be. Could be a bug in the code, although in that case I wonder why the test is passing when I run it. It's a little hard to debug without being able to look at what's indexed in |
Ah - sorry - looking further into the error response from the test: I think you're getting the wrong value because there is still a draft version in Solr (see below). With the new optimizations, we rely on solr using soft commits with a 1 second default lag time (#11206 is related). So you may need to wait 1 second for solr to be up-to-date before making your call. If adding a delay doesn't work, let me know and I can cut/paste the whole error response for you and/or see if we can get you direct access to Jenkins to dig further. {
} |
Ahh I see, thanks for checking again! I just added some sleeps, let's see if that fixes things. |
That worked - all tests passing. Thanks! |
What this PR does / why we need it:
This PR fixes an issue where draft versions of datasets were sorted using the release timestamp of their most recent major version.
This caused newer drafts to appear incorrectly alongside their corresponding major version, instead of at the top, when sorted by "newest first". This affects the search results page and the "My data" page, both of which are sorted by newest by default.
Sorting now uses the last update timestamp when sorting draft datasets. The sorting behavior of published dataset versions (major and minor) is unchanged.
See bug description with screenshot etc in #11178
Which issue(s) this PR closes:
Special notes for your reviewer:
/
Suggestions on how to test this:
I've added a test that can be run with:
mvn test -Dtest="DataRetrieverApiIT#testRetrieveMyDataAsJsonStringSortOrder"
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
/
Is there a release notes update needed for this change?:
I think it would be good to include a release note for this bug fix, since it affects the default sorting of datasets on the search results and "My data" page. I've added a release note as part of this PR
Additional documentation:
/