Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug?: unexpected sorting of results when sorting by "newest first" (Search + My Data) #11178

Open
vera opened this issue Jan 22, 2025 · 5 comments · May be fixed by #11180
Open

Bug?: unexpected sorting of results when sorting by "newest first" (Search + My Data) #11178

vera opened this issue Jan 22, 2025 · 5 comments · May be fixed by #11180
Labels
Type: Bug a defect

Comments

@vera
Copy link
Contributor

vera commented Jan 22, 2025

I noticed the following unexpected behaviour related to sorting by "newest first" (either in the search results or on the "My Data" page).

When a dataset is created and published, and then a new draft version of that dataset is created, I expected the new draft to be sorted first (because it is newer than the published version). However, this is not the case:

Image
(The draft of dataset 10.5072/FK2/NCG6QT is sorted below the published version of the same dataset, even though we are sorting by "newest".)

Looking at the index, it seems the draft version receives a copy of the dateSort value from the published version, instead of its own timestamp:

{
      "id":"dataset_29644",
      "dateSort":"2025-01-22T15:59:40.150Z",
      ...
},
{
      "id":"dataset_29644_draft",
      "dateSort":"2025-01-22T15:59:40.150Z",
      ...
}

Is this intended?

This bug was discovered by one of our curators who used "My data" to get a list of datasets to be reviewed. A dataset which was just submitted for review today was not to be found on page 1, as expected, but on one of the latest pages.

It seems that the dataset was sorted based on the dateSort timestamp which was copied from the latest published version (which was published September 24). This means the draft was sorted as if it was created/submitted last September instead of today.

What steps does it take to reproduce the issue?

I've added a test for the behaviour that I expected here: https://github.com/vera/dataverse/blob/ef8ab7ee6202a2101d17a634c0d61e7cc87c1d5c/src/test/java/edu/harvard/iq/dataverse/api/DataRetrieverApiIT.java#L108-L260

Run with: mvn test -Dtest="DataRetrieverApiIT#testRetrieveMyDataAsJsonStringSortOrder"

When does this issue occur?

When looking at datasets on the search results or my data page.

Which page(s) does it occurs on?

  • Search results page
  • My data page

What happens?

New draft versions are not sorted before the published versions.

To whom does it occur (all users, curators, superusers)?

All users.

What did you expect to happen?

Dataset draft versions to be sorted according to their creation time.

Which version of Dataverse are you using?

6.5

Any related open or closed issues to this bug report?

I didn't find any.

Screenshots:

/

Are you thinking about creating a pull request for this issue?

Yes, possibly

@vera vera added the Type: Bug a defect label Jan 22, 2025
@pdurbin
Copy link
Member

pdurbin commented Jan 22, 2025

I agree it sounds like a bug. 🐞 Good catch!

@qqmyers
Copy link
Member

qqmyers commented Jan 22, 2025

The code seems to be picking the data of the last major release (so presumably the same thing happens with v1.1 etc that you're seeing with :draft.) So - it seems intentional, but probably a choice from long ago. Sorting by version dates seems more useful to me - not sure if anyone in the community relies on it working as it does now.

@vera
Copy link
Contributor Author

vera commented Jan 23, 2025

Thanks for confirming! I've opened a PR for this issue: #11180

@jggautier
Copy link
Contributor

jggautier commented Jan 28, 2025

As of v6.5, the newer minor version of a dataset, like 1.1, doesn't appear above its major version, like 1.0 when sorting by "newest first".

Will the PR #11180 change this as well, so that if I publish a minor version, that version will appear at the top when sorting by "newest first"?

I think so from what @qqmyers wrote: "so presumably the same thing happens with v1.1 etc that you're seeing with :draft".

If PR #11180 is changing this behavior, I think it's worth discussing more. Otherwise, apologies for the noise

@vera
Copy link
Contributor Author

vera commented Jan 29, 2025

Yes, the PR #11180 will also change the sorting of minor versions, so they're sorted at the top. With that PR, every dataset (draft, minor or major version) will be sorted based on its own last update timestamp. I suggest this way of sorting because it's consistent, but I don't really have a strong opinion about where minor versions should be sorted, so it would be totally fine by me to do this differently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Bug a defect
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants