-
Notifications
You must be signed in to change notification settings - Fork 495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate missing "Filter by" dropdown menu #9199
Comments
I'm emailing the depositor of this new dataset to figure out what kind of data this is, and I might eventually delete or deaccession this dataset. But since this is the second time I've seen this missing "Filter by" dropdown menu bug, I think it might be helpful to keep the dataset up so that someone can look into what's causing it. Definition of done:
|
Just happened to be working in this part of the code. It looks to me like these only show if the dataset is indexed so perhaps there has been some indexing problem with it. |
#9647 was caused by missing facets. But here, as can be seen on the screenshot, the "Filter by" text is also missing. This one is hidden if In short: exactly what @qqmyers said, the dataset version on the screenshot is not indexed. Either by a fault, or it is not latest or a draft version. |
Ah, thanks @ErykKul. I changed the example dataset I linked to in the first comment in this GitHub issue. (The example dataset I linked to was deaccessioned.) It does look like the "Filter by" dropdowns show only for the latest released versions and draft versions of datasets that have more than one file. If the version is deaccessioned, the "Filter by" dropdowns also don't show. So far, I've seen this with every dataset in Harvard Dataverse I've checked that has more than one version and a few datasets in other installations, like https://dataverse.theacss.org/dataset.xhtml?persistentId=doi:10.25825/FK2/GTR0HI&version=1.0. I'm still confused about whether this is by design, a bug, or a limitation of how versions are indexed. Should each version of every dataset with more than one file have a "Filter by" dropdown? I'd imagine that users might want to be able to filter files in previously published versions of their datasets, but I'm not aware of any research into this (or users who have complained that they can't do this now). |
@jggautier I did run against some of the history behind it, I hope it helps (#762):
|
This is interesting! Thanks @ErykKul. It seems to me that these indexing decisions were made to support search when using the search boxes and facets on the Dataverse collection page. The "Filter by" dropdown menu on the dataset page didn't exist when these indexing decisions were being made, so those decisions' affect on the file table filters couldn't be known then? That's what I mean by asking if this is a limitation of how versions are indexed now. I looked around some more and saw that when the "Filter by" dropdown menu was being developed, it was acknowledged that:
I think you've helped wrap up this investigation :) And I think the question now is if there should be more batches. Would users find it helpful to be able to filter files in previously published versions of their datasets? |
Yep, sound right. What is the value of the additional functionality vs the effort required. I'd say it's a lot of effort to index more versions. Indexing is already slow and fairly complicated. |
There does seem to be a legitimate issue (due to the aync nature, likely) if index being earlier than publish. So let's fix that for this issue. |
As a short term fix for datasets where it is currently happening, we'll run a query and reindex those. |
2024/03/13
|
I think that reindexing might be a part of the problem, where you drop old index, i.e., you delete the index of previously published versions, and you make a new index only including the currently published version and the draft version, if it exists. This would mean that the short term solution by indexing old published versions would be undone on a full reindex. Finally, currently there is no API (that I am aware of) that allows for adding index of any version and all code uses indexing on Dataset level, ending up with index on latest published version and draft version. The only code I had found that could index any version is the test code in ...
final IndexableDataset indexableDataset = new IndexableDataset(datasetVersion);
...
final SolrInputDocuments docs = indexService.toSolrDocs(indexableDataset, null);
...
// needs to call the code to send it to solr If that is the case, for the short term solution it would require to create a new API call that would, for example, index a specific version synchronously as in the code above, and call it sequentially (reducing the load on the server) on the dataset versions you want to index from e.g. a script. If we decide to index all published version and the current draft version on dataset reindex (e.g., as a form of a long term solution), this would require serious refactoring of the code and would end up with various side effects, like making indexing more heavy and making the solr index itself larger, as it would now include all versions of datasets, where espacially for datasets containing many files, the size increase could be significant. |
Thanks so much for taking a look, too @ErykKul. We were talking in today's sprint planning meeting about closing this issue, since it's mostly clear why the dropdown menu is missing. And as part of a later effort, the community can think about how to learn more about the value to users of being able to filter files in previously published versions of their datasets, so that that value can be weighed against the technical constraints that you and @pdurbin have mentioned. I'll bring this up as something the UX working group can help look into, maybe as part of more general research into how users explore the data they find in Dataverse. But it looks like @scolapasta's comment, which I don't really understand, is about a problem that's still outstanding:
Is that right? Could we open a new GitHub issue about the "legitimate" issue instead? |
I did implement the async indexing, we use it since 5.14 I think. It should be not a problem to publish while indexing, it adds a new index request to the queue. If the published version does not end up indexed, then it is a bug that I need to solve, so let me know if it happened, and I will investigate it. The only case I can think of right now where it could happen, is if the application is stopped before the index is done. This also might be worth fixing. |
Ah perfect, thanks @ErykKul. From my understanding there's nothing outstanding in this GitHub issue, so I'm going to close it. |
@jggautier @ErykKul I'm a little confused to why this issue got closed - as far as I can tell there still is an issue. (unless the idea is to open a new issue?). If you go to Harvard Dataverse right now and look at the datasets on the front page, most of them do not show the filters (on the currently released versions). This is because (at least from past investigation), the index time is getting set to an earlier time than the publish time. (which is why running a manual reindex fixes it for a particular dataset). @ErykKul is this something you can look into? Should we reopen this or open a new issue? (Ideally I'd like to see this solved for 6.2 since it was introduced recently) |
Ah okay. I'd prefer if a new issue was opened. @scolapasta, could you open a new issue? Or could you @ErykKul? |
OK, it is clear now. The fix should be easy. I will create a new issue and I will pick it up. |
I think the bug that caused filters to be missing from some datasets has been fixed by the PR that @ErykKul opened and closed. |
What steps does it take to reproduce the issue?
When does this issue occur?
Not sure what causes this
Which page(s) does it occurs on?
Above the file table on version 1 of the dataset page at https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/V50UIP&version=1.0
What happens?
There are no "Filter by" dropdown menus above the file table. It's missing when I visit the page using the newest versions of Chrome, Firefox and Safari.
To whom does it occur (all users, curators, superusers)?
All users
What did you expect to happen?
The "Filter by" dropdown menus above the file table are there so that users can find certain types of files (by file type and by access)
Which version of Dataverse are you using?
v5.13 on the Harvard Dataverse Repository
Any related open or closed issues to this bug report?
IQSS/dataverse.harvard.edu#178
Screenshots:
Above the file table (below the search box), the "Filter by" dropdown menus are missing:
Here's what the "Filter by" dropdown menus look like:
The text was updated successfully, but these errors were encountered: