Investigate missing "Filter by" dropdown menu #9199

jggautier · 2022-11-30T14:48:58Z

What steps does it take to reproduce the issue?

When does this issue occur?
Not sure what causes this
Which page(s) does it occurs on?
Above the file table on version 1 of the dataset page at https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/V50UIP&version=1.0
What happens?
There are no "Filter by" dropdown menus above the file table. It's missing when I visit the page using the newest versions of Chrome, Firefox and Safari.
To whom does it occur (all users, curators, superusers)?
All users
What did you expect to happen?
The "Filter by" dropdown menus above the file table are there so that users can find certain types of files (by file type and by access)

Which version of Dataverse are you using?
v5.13 on the Harvard Dataverse Repository

Any related open or closed issues to this bug report?
IQSS/dataverse.harvard.edu#178

Screenshots:

Above the file table (below the search box), the "Filter by" dropdown menus are missing:

Here's what the "Filter by" dropdown menus look like:

jggautier · 2022-11-30T14:50:20Z

I'm emailing the depositor of this new dataset to figure out what kind of data this is, and I might eventually delete or deaccession this dataset. But since this is the second time I've seen this missing "Filter by" dropdown menu bug, I think it might be helpful to keep the dataset up so that someone can look into what's causing it.

Definition of done:

Explore and fix the issues causing this dataset, and maybe other datasets, to lose the "Filter by" dropdown menus above their file tables.
This dataset might be deleted or deaccessioned, so the definition of done may not be to restore the "Filter by" dropdown menus for this dataset.

qqmyers · 2022-11-30T14:54:23Z

Just happened to be working in this part of the code. It looks to me like these only show if the dataset is indexed so perhaps there has been some indexing problem with it.

ErykKul · 2023-06-20T16:05:51Z

#9647 was caused by missing facets. But here, as can be seen on the screenshot, the "Filter by" text is also missing. This one is hidden if DatasetPage.indexedVersion gives false. It can be only true for the latest released version and draft versions, all other versions will not have filters. In order to test if the specific version is indexed, Solr is queried for the specific version.

In short: exactly what @qqmyers said, the dataset version on the screenshot is not indexed. Either by a fault, or it is not latest or a draft version.

jggautier · 2023-06-20T16:36:43Z

Ah, thanks @ErykKul. I changed the example dataset I linked to in the first comment in this GitHub issue. (The example dataset I linked to was deaccessioned.)

It does look like the "Filter by" dropdowns show only for the latest released versions and draft versions of datasets that have more than one file. If the version is deaccessioned, the "Filter by" dropdowns also don't show.

So far, I've seen this with every dataset in Harvard Dataverse I've checked that has more than one version and a few datasets in other installations, like https://dataverse.theacss.org/dataset.xhtml?persistentId=doi:10.25825/FK2/GTR0HI&version=1.0.

I'm still confused about whether this is by design, a bug, or a limitation of how versions are indexed. Should each version of every dataset with more than one file have a "Filter by" dropdown? I'd imagine that users might want to be able to filter files in previously published versions of their datasets, but I'm not aware of any research into this (or users who have complained that they can't do this now).

ErykKul · 2023-06-22T07:25:52Z

@jggautier
It is by design: you cannot search for files in versions other than latest published or draft, therefore no filters can be shown for other versions.

I did run against some of the history behind it, I hope it helps (#762):

                /**
                 * It sounds weird but the first thing we'll do is preemptively
                 * delete the Solr documents of all published files. Don't
                 * worry, published files will be re-indexed later along with
                 * the dataset. We do this so users can delete files from
                 * published versions of datasets and then re-publish a new
                 * version without fear that their old published files (now
                 * deleted from the latest published version) will be
                 * searchable. See also
                 * https://github.com/IQSS/dataverse/issues/762
                 */

jggautier · 2023-06-26T14:54:05Z

This is interesting! Thanks @ErykKul.

It seems to me that these indexing decisions were made to support search when using the search boxes and facets on the Dataverse collection page. The "Filter by" dropdown menu on the dataset page didn't exist when these indexing decisions were being made, so those decisions' affect on the file table filters couldn't be known then?

That's what I mean by asking if this is a limitation of how versions are indexed now. I looked around some more and saw that when the "Filter by" dropdown menu was being developed, it was acknowledged that:

As we only index the latest published and draft versions ("present versions"), we'd only be able to use Solr for those versions. It had previously been decided that this was ok (at least, as a first batch).
#5584 (comment)

I think you've helped wrap up this investigation :) And I think the question now is if there should be more batches. Would users find it helpful to be able to filter files in previously published versions of their datasets?

pdurbin · 2023-07-11T20:34:17Z

Yep, sound right. What is the value of the additional functionality vs the effort required. I'd say it's a lot of effort to index more versions. Indexing is already slow and fairly complicated.

scolapasta · 2024-02-22T22:16:03Z

There does seem to be a legitimate issue (due to the aync nature, likely) if index being earlier than publish. So let's fix that for this issue.

scolapasta · 2024-02-22T22:17:05Z

As a short term fix for datasets where it is currently happening, we'll run a query and reindex those.

cmbz · 2024-03-12T19:40:51Z

2024/03/13

Prioritized and moved to Needs Sizing in advance of final sprint in 6.1

ErykKul · 2024-03-13T09:12:54Z

I think that reindexing might be a part of the problem, where you drop old index, i.e., you delete the index of previously published versions, and you make a new index only including the currently published version and the draft version, if it exists. This would mean that the short term solution by indexing old published versions would be undone on a full reindex. Finally, currently there is no API (that I am aware of) that allows for adding index of any version and all code uses indexing on Dataset level, ending up with index on latest published version and draft version. The only code I had found that could index any version is the test code in IndexServiceBeanTest.java, namely the TestIndexing() method:

...
        final IndexableDataset indexableDataset = new IndexableDataset(datasetVersion);
...
        final SolrInputDocuments docs = indexService.toSolrDocs(indexableDataset, null);
...
        // needs to call the code to send it to solr

If that is the case, for the short term solution it would require to create a new API call that would, for example, index a specific version synchronously as in the code above, and call it sequentially (reducing the load on the server) on the dataset versions you want to index from e.g. a script.

If we decide to index all published version and the current draft version on dataset reindex (e.g., as a form of a long term solution), this would require serious refactoring of the code and would end up with various side effects, like making indexing more heavy and making the solr index itself larger, as it would now include all versions of datasets, where espacially for datasets containing many files, the size increase could be significant.

jggautier · 2024-03-14T20:03:29Z

Thanks so much for taking a look, too @ErykKul.

We were talking in today's sprint planning meeting about closing this issue, since it's mostly clear why the dropdown menu is missing. And as part of a later effort, the community can think about how to learn more about the value to users of being able to filter files in previously published versions of their datasets, so that that value can be weighed against the technical constraints that you and @pdurbin have mentioned. I'll bring this up as something the UX working group can help look into, maybe as part of more general research into how users explore the data they find in Dataverse.

But it looks like @scolapasta's comment, which I don't really understand, is about a problem that's still outstanding:

There does seem to be a legitimate issue (due to the aync nature, likely) if index being earlier than publish. So let's fix that for this issue.

Is that right? Could we open a new GitHub issue about the "legitimate" issue instead?

ErykKul · 2024-03-15T08:48:54Z

I did implement the async indexing, we use it since 5.14 I think. It should be not a problem to publish while indexing, it adds a new index request to the queue. If the published version does not end up indexed, then it is a bug that I need to solve, so let me know if it happened, and I will investigate it. The only case I can think of right now where it could happen, is if the application is stopped before the index is done. This also might be worth fixing.

jggautier · 2024-03-15T14:04:08Z

Ah perfect, thanks @ErykKul. From my understanding there's nothing outstanding in this GitHub issue, so I'm going to close it.

scolapasta · 2024-03-15T15:58:04Z

@jggautier @ErykKul I'm a little confused to why this issue got closed - as far as I can tell there still is an issue. (unless the idea is to open a new issue?). If you go to Harvard Dataverse right now and look at the datasets on the front page, most of them do not show the filters (on the currently released versions). This is because (at least from past investigation), the index time is getting set to an earlier time than the publish time. (which is why running a manual reindex fixes it for a particular dataset).

@ErykKul is this something you can look into? Should we reopen this or open a new issue? (Ideally I'd like to see this solved for 6.2 since it was introduced recently)

jggautier · 2024-03-15T16:03:00Z

Ah okay. I'd prefer if a new issue was opened. @scolapasta, could you open a new issue? Or could you @ErykKul?

ErykKul · 2024-03-15T16:13:49Z

OK, it is clear now. The fix should be easy. I will create a new issue and I will pick it up.

jggautier · 2025-01-13T15:32:55Z

I think the bug that caused filters to be missing from some datasets has been fixed by the PR that @ErykKul opened and closed.

jggautier added Feature: Search/Browse Type: Bug a defect labels Nov 30, 2022

jggautier mentioned this issue Jun 20, 2023

Filters for files in dataset page are gone #9647

Closed

scolapasta added this to IQSS Dataverse Project Feb 22, 2024

cmbz moved this to SPRINT- NEEDS SIZING in IQSS Dataverse Project Mar 12, 2024

cmbz added this to the 6.2 milestone Mar 12, 2024

cmbz removed the status in IQSS Dataverse Project Mar 14, 2024

jggautier closed this as completed Mar 15, 2024

ErykKul mentioned this issue Mar 15, 2024

Index after publish #10381

Closed

sbarbosadataverse reopened this Aug 20, 2024

cmbz moved this to SPRINT- NEEDS SIZING in IQSS Dataverse Project Aug 20, 2024

cmbz removed the status in IQSS Dataverse Project Aug 20, 2024

sbarbosadataverse closed this as completed Aug 20, 2024

jggautier removed this from IQSS Dataverse Project Jan 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate missing "Filter by" dropdown menu #9199

Investigate missing "Filter by" dropdown menu #9199

jggautier commented Nov 30, 2022 •

edited

Loading

jggautier commented Nov 30, 2022 •

edited

Loading

qqmyers commented Nov 30, 2022

ErykKul commented Jun 20, 2023

jggautier commented Jun 20, 2023

ErykKul commented Jun 22, 2023

jggautier commented Jun 26, 2023

pdurbin commented Jul 11, 2023

scolapasta commented Feb 22, 2024

scolapasta commented Feb 22, 2024

cmbz commented Mar 12, 2024

ErykKul commented Mar 13, 2024 •

edited

Loading

jggautier commented Mar 14, 2024 •

edited

Loading

ErykKul commented Mar 15, 2024

jggautier commented Mar 15, 2024

scolapasta commented Mar 15, 2024

jggautier commented Mar 15, 2024

ErykKul commented Mar 15, 2024

jggautier commented Jan 13, 2025

Investigate missing "Filter by" dropdown menu #9199

Investigate missing "Filter by" dropdown menu #9199

Comments

jggautier commented Nov 30, 2022 • edited Loading

jggautier commented Nov 30, 2022 • edited Loading

qqmyers commented Nov 30, 2022

ErykKul commented Jun 20, 2023

jggautier commented Jun 20, 2023

ErykKul commented Jun 22, 2023

jggautier commented Jun 26, 2023

pdurbin commented Jul 11, 2023

scolapasta commented Feb 22, 2024

scolapasta commented Feb 22, 2024

cmbz commented Mar 12, 2024

ErykKul commented Mar 13, 2024 • edited Loading

jggautier commented Mar 14, 2024 • edited Loading

ErykKul commented Mar 15, 2024

jggautier commented Mar 15, 2024

scolapasta commented Mar 15, 2024

jggautier commented Mar 15, 2024

ErykKul commented Mar 15, 2024

jggautier commented Jan 13, 2025

jggautier commented Nov 30, 2022 •

edited

Loading

jggautier commented Nov 30, 2022 •

edited

Loading

ErykKul commented Mar 13, 2024 •

edited

Loading

jggautier commented Mar 14, 2024 •

edited

Loading