Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate missing "Filter by" dropdown menu #9199

Closed
jggautier opened this issue Nov 30, 2022 · 18 comments
Closed

Investigate missing "Filter by" dropdown menu #9199

jggautier opened this issue Nov 30, 2022 · 18 comments

Comments

@jggautier
Copy link
Contributor

jggautier commented Nov 30, 2022

What steps does it take to reproduce the issue?

  • When does this issue occur?
    Not sure what causes this

  • Which page(s) does it occurs on?
    Above the file table on version 1 of the dataset page at https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/V50UIP&version=1.0

  • What happens?
    There are no "Filter by" dropdown menus above the file table. It's missing when I visit the page using the newest versions of Chrome, Firefox and Safari.

  • To whom does it occur (all users, curators, superusers)?
    All users

  • What did you expect to happen?
    The "Filter by" dropdown menus above the file table are there so that users can find certain types of files (by file type and by access)

Which version of Dataverse are you using?
v5.13 on the Harvard Dataverse Repository

Any related open or closed issues to this bug report?
IQSS/dataverse.harvard.edu#178

Screenshots:

Above the file table (below the search box), the "Filter by" dropdown menus are missing:
Screen Shot 2022-11-30 at 9 31 07 AM

Here's what the "Filter by" dropdown menus look like:
Screen Shot 2022-11-30 at 9 43 59 AM

@jggautier
Copy link
Contributor Author

jggautier commented Nov 30, 2022

I'm emailing the depositor of this new dataset to figure out what kind of data this is, and I might eventually delete or deaccession this dataset. But since this is the second time I've seen this missing "Filter by" dropdown menu bug, I think it might be helpful to keep the dataset up so that someone can look into what's causing it.

Definition of done:

  • Explore and fix the issues causing this dataset, and maybe other datasets, to lose the "Filter by" dropdown menus above their file tables.
  • This dataset might be deleted or deaccessioned, so the definition of done may not be to restore the "Filter by" dropdown menus for this dataset.

@qqmyers
Copy link
Member

qqmyers commented Nov 30, 2022

Just happened to be working in this part of the code. It looks to me like these only show if the dataset is indexed so perhaps there has been some indexing problem with it.

@ErykKul
Copy link
Collaborator

ErykKul commented Jun 20, 2023

#9647 was caused by missing facets. But here, as can be seen on the screenshot, the "Filter by" text is also missing. This one is hidden if DatasetPage.indexedVersion gives false. It can be only true for the latest released version and draft versions, all other versions will not have filters. In order to test if the specific version is indexed, Solr is queried for the specific version.

In short: exactly what @qqmyers said, the dataset version on the screenshot is not indexed. Either by a fault, or it is not latest or a draft version.

@jggautier
Copy link
Contributor Author

Ah, thanks @ErykKul. I changed the example dataset I linked to in the first comment in this GitHub issue. (The example dataset I linked to was deaccessioned.)

It does look like the "Filter by" dropdowns show only for the latest released versions and draft versions of datasets that have more than one file. If the version is deaccessioned, the "Filter by" dropdowns also don't show.

So far, I've seen this with every dataset in Harvard Dataverse I've checked that has more than one version and a few datasets in other installations, like https://dataverse.theacss.org/dataset.xhtml?persistentId=doi:10.25825/FK2/GTR0HI&version=1.0.

I'm still confused about whether this is by design, a bug, or a limitation of how versions are indexed. Should each version of every dataset with more than one file have a "Filter by" dropdown? I'd imagine that users might want to be able to filter files in previously published versions of their datasets, but I'm not aware of any research into this (or users who have complained that they can't do this now).

@ErykKul
Copy link
Collaborator

ErykKul commented Jun 22, 2023

@jggautier
It is by design: you cannot search for files in versions other than latest published or draft, therefore no filters can be shown for other versions.

I did run against some of the history behind it, I hope it helps (#762):

                /**
                 * It sounds weird but the first thing we'll do is preemptively
                 * delete the Solr documents of all published files. Don't
                 * worry, published files will be re-indexed later along with
                 * the dataset. We do this so users can delete files from
                 * published versions of datasets and then re-publish a new
                 * version without fear that their old published files (now
                 * deleted from the latest published version) will be
                 * searchable. See also
                 * https://github.com/IQSS/dataverse/issues/762
                 */

@jggautier
Copy link
Contributor Author

This is interesting! Thanks @ErykKul.

It seems to me that these indexing decisions were made to support search when using the search boxes and facets on the Dataverse collection page. The "Filter by" dropdown menu on the dataset page didn't exist when these indexing decisions were being made, so those decisions' affect on the file table filters couldn't be known then?

That's what I mean by asking if this is a limitation of how versions are indexed now. I looked around some more and saw that when the "Filter by" dropdown menu was being developed, it was acknowledged that:

As we only index the latest published and draft versions ("present versions"), we'd only be able to use Solr for those versions. It had previously been decided that this was ok (at least, as a first batch).
#5584 (comment)

I think you've helped wrap up this investigation :) And I think the question now is if there should be more batches. Would users find it helpful to be able to filter files in previously published versions of their datasets?

@pdurbin
Copy link
Member

pdurbin commented Jul 11, 2023

Yep, sound right. What is the value of the additional functionality vs the effort required. I'd say it's a lot of effort to index more versions. Indexing is already slow and fairly complicated.

@scolapasta
Copy link
Contributor

There does seem to be a legitimate issue (due to the aync nature, likely) if index being earlier than publish. So let's fix that for this issue.

@scolapasta
Copy link
Contributor

As a short term fix for datasets where it is currently happening, we'll run a query and reindex those.

@cmbz cmbz moved this to SPRINT- NEEDS SIZING in IQSS Dataverse Project Mar 12, 2024
@cmbz
Copy link

cmbz commented Mar 12, 2024

2024/03/13

  • Prioritized and moved to Needs Sizing in advance of final sprint in 6.1

@cmbz cmbz added this to the 6.2 milestone Mar 12, 2024
@ErykKul
Copy link
Collaborator

ErykKul commented Mar 13, 2024

I think that reindexing might be a part of the problem, where you drop old index, i.e., you delete the index of previously published versions, and you make a new index only including the currently published version and the draft version, if it exists. This would mean that the short term solution by indexing old published versions would be undone on a full reindex. Finally, currently there is no API (that I am aware of) that allows for adding index of any version and all code uses indexing on Dataset level, ending up with index on latest published version and draft version. The only code I had found that could index any version is the test code in IndexServiceBeanTest.java, namely the TestIndexing() method:

...
        final IndexableDataset indexableDataset = new IndexableDataset(datasetVersion);
...
        final SolrInputDocuments docs = indexService.toSolrDocs(indexableDataset, null);
...
        // needs to call the code to send it to solr

If that is the case, for the short term solution it would require to create a new API call that would, for example, index a specific version synchronously as in the code above, and call it sequentially (reducing the load on the server) on the dataset versions you want to index from e.g. a script.

If we decide to index all published version and the current draft version on dataset reindex (e.g., as a form of a long term solution), this would require serious refactoring of the code and would end up with various side effects, like making indexing more heavy and making the solr index itself larger, as it would now include all versions of datasets, where espacially for datasets containing many files, the size increase could be significant.

@cmbz cmbz removed the status in IQSS Dataverse Project Mar 14, 2024
@jggautier
Copy link
Contributor Author

jggautier commented Mar 14, 2024

Thanks so much for taking a look, too @ErykKul.

We were talking in today's sprint planning meeting about closing this issue, since it's mostly clear why the dropdown menu is missing. And as part of a later effort, the community can think about how to learn more about the value to users of being able to filter files in previously published versions of their datasets, so that that value can be weighed against the technical constraints that you and @pdurbin have mentioned. I'll bring this up as something the UX working group can help look into, maybe as part of more general research into how users explore the data they find in Dataverse.

But it looks like @scolapasta's comment, which I don't really understand, is about a problem that's still outstanding:

There does seem to be a legitimate issue (due to the aync nature, likely) if index being earlier than publish. So let's fix that for this issue.

Is that right? Could we open a new GitHub issue about the "legitimate" issue instead?

@ErykKul
Copy link
Collaborator

ErykKul commented Mar 15, 2024

I did implement the async indexing, we use it since 5.14 I think. It should be not a problem to publish while indexing, it adds a new index request to the queue. If the published version does not end up indexed, then it is a bug that I need to solve, so let me know if it happened, and I will investigate it. The only case I can think of right now where it could happen, is if the application is stopped before the index is done. This also might be worth fixing.

@jggautier
Copy link
Contributor Author

Ah perfect, thanks @ErykKul. From my understanding there's nothing outstanding in this GitHub issue, so I'm going to close it.

@scolapasta
Copy link
Contributor

@jggautier @ErykKul I'm a little confused to why this issue got closed - as far as I can tell there still is an issue. (unless the idea is to open a new issue?). If you go to Harvard Dataverse right now and look at the datasets on the front page, most of them do not show the filters (on the currently released versions). This is because (at least from past investigation), the index time is getting set to an earlier time than the publish time. (which is why running a manual reindex fixes it for a particular dataset).

@ErykKul is this something you can look into? Should we reopen this or open a new issue? (Ideally I'd like to see this solved for 6.2 since it was introduced recently)

@jggautier
Copy link
Contributor Author

Ah okay. I'd prefer if a new issue was opened. @scolapasta, could you open a new issue? Or could you @ErykKul?

@ErykKul
Copy link
Collaborator

ErykKul commented Mar 15, 2024

OK, it is clear now. The fix should be easy. I will create a new issue and I will pick it up.

@cmbz cmbz moved this to SPRINT- NEEDS SIZING in IQSS Dataverse Project Aug 20, 2024
@cmbz cmbz removed the status in IQSS Dataverse Project Aug 20, 2024
@jggautier
Copy link
Contributor Author

I think the bug that caused filters to be missing from some datasets has been fixed by the PR that @ErykKul opened and closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants