Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use RecordDataFormatter for EDS #4240

Open
wants to merge 12 commits into
base: dev
Choose a base branch
from

Conversation

ThoWagen
Copy link
Contributor

@ThoWagen ThoWagen commented Feb 12, 2025

In this PR the RecordDataFormatter is used for EDS.

It also changes the templates of EDS to better match the ones of DefaultRecord. My hope would be that we can extract more of the common layout for result lists and the core view into driver independent templates in the future.

I also realized that the change here could/should be done for EPF. Please let me know if you think that it should also be done in this PR or in a followup PR.

TODO

  • Also implement changes for EPF after the EDS part is done
  • Update changelog when merging (changes to EDS record driver, signature changes to \VuFind\View\Helper\Root\RecordDataFormatter\SpecBuilder.php constructor and methods)
  • Document new SpecBuilder "item" functionality in wiki when merging

@demiankatz demiankatz requested a review from sturkel89 February 12, 2025 13:41
@demiankatz
Copy link
Member

@sturkel89, this PR is designed to make EDS records behave more consistently with other types of records in the system, which should make both code maintenance and customization easier. However, it involves a lot of changes, and since EDS is somewhat strange, it could have side effects. Do you mind looking at a variety of different records in EDS and comparing how they look in this PR vs. the dev branch both in search results and on the record screen? I expect that some minor differences may be unavoidable, but please pay particular attention for missing or incorrectly formatted data.

I'll give this a deeper technical review when time permits, but this seems like the type of change where starting with end-user testing might be the best way to point us to potential problem areas quickly.

Regarding the question about EPF, I think it would probably make sense to do that as part of this PR as well, but it may be easier to refine the EDS code first and then adjust EPF to match at the end, to minimize the amount of work needed while things are in flux. I'd also be interested to hear @maccabeelevine's thoughts on all of this.

@maccabeelevine
Copy link
Member

Regarding the question about EPF, I think it would probably make sense to do that as part of this PR as well, but it may be easier to refine the EDS code first and then adjust EPF to match at the end, to minimize the amount of work needed while things are in flux. I'd also be interested to hear @maccabeelevine's thoughts on all of this.

Yeah I agree, probably the same PR but do EDS first. The EPF templates are (for the most part) trivial extensions of the EDS ones, so changes to EDS should mostly carry over and I'm happy to help if there are issues. That said if you prefer to leave EPF to a separate PR, I'm happy to take care of it. Very glad to have this EDS cleanup started, thank you @ThoWagen for taking it on!

Copy link
Member

@demiankatz demiankatz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ThoWagen, I haven't done anything approaching an exhaustive review yet, but here are a couple of comments from my first quick pass through. I have also added a couple of TODO checkboxes for things we'll want to remember when we finish this PR.

Copy link
Contributor

@sturkel89 sturkel89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed the test branch. Thanks for all of your work on this, @ThoWagen! I have noticed several differences between the test branch and the dev branch, and will list them here. You will probably want to address them in separate comments.


Author name display in Results browse

In many cases, the test branch improves the display of author names in results browse!

On dev, terms like "edited by" are connected to the actual name as a hyperlink so those searches lead to empty results. On test, the "edited by" is not hyperlinked, so the author name is clickable and leads to good results. (I've noticed extra repeats of one of the author/editor names on dev, but I think those are extraneous so the way you're doing it on test is also better.)

Example: http://localhost/vufind_test/EDS/Search?lookfor=edsjbk.j.ctt2ttrqn&type=AllFields&filter%5B%5D=LIMIT%7CFT%3Ay&filter%5B%5D=EXPAND%3Afulltext&limit=20

Test: (name links are clickable)
image

Dev: (name links don't work)
image

(NB: in item record view, the authors are identical between test and dev for this record.)

However, there's one area where author name display on test could use improvement to look more like dev. When there is extra text including superscripts and email addresses, it would be better to suppress those from the results browse page, as on test.

Example: http://localhost/vufind_test/EDS/Search?lookfor=174337969&type=AllFields&filter%5B%5D=LIMIT%7CFT%3Ay&filter%5B%5D=EXPAND%3Afulltext&limit=20

Test: (author list has too much content displayed)
image

Dev: (author list has brief display - seems better)
image

UPDATE: Superscripts are gone, but email addresses and text like (AUTHOR) still appears on the results browse page.


Author name display in item record view

Using the same example, I would prefer if the author email address were NOT hyperlinked in item record view. It's easier to pick out the author names if they are the only hyperlinked content.

Example: http://localhost/vufind_test/EdsRecord/buh,174337969

Test: (author email addresses are mailto: links)
image

Dev: (author email addresses are plain text; this seems better)
image

UPDATE: This is fixed!


Icon for "Export Record" is missing in item view

In item record display, the test branch doesn't display the square with arrow icon on the Export Record button. Can you add it in?

Test: (no icon on Export button)
image

Dev: (icon on Export button - better)
image

UPDATE: This is fixed!


Subject list is incomplete (and missing important content) in some cases on test

The subjects lists look good when the source database uses the Subject Terms field to hold the terms.

However, when the source database uses the Subjects field to hold the subject terms, the test branch displays ONLY the last subject heading on the list. This difference is visible in both search results display and item record display.

Example: http://localhost/vufind_test/EDS/Search?filter%5B%5D=EXPAND%3A%22fulltext%22&filter%5B%5D=LIMIT%7CFT%3A%22y%22&filter%5B%5D=%7EContentProvider%3A%22OmniFile+Full+Text+Mega+%28H.W.+Wilson%29%22&dfApplied=1&lookfor=%22Assemblage+%28Art%29%22&type=SU

Test: (only one subject per result)

Dev: (full subject list for each result - much better, and shows why these results were relevant)

Here is a similar example for a different database family. In this case, the database is MEDLINE and the problematic subject field is the MeSH Terms field. In the example, you'll see that only the last part of each subject heading is displayed, rather than the full heading (which includes the part before the slash).

Example:
http://localhost/vufind_test/EdsRecord/cmedm,37578047

Test: (the subject terms displayed are incomplete)

image

Dev: (the full subject terms are displayed - better)

image

UPDATE: This all seems to be fixed!


Formatting of multi-item lists in core record

When there are multiple values in a core record field, they are displayed as a comma-separated list on test. The values are displayed on separate lines on dev. I think it's easier to read the items when shown on separate lines; this type of display also matches the non-EDS VuFind display.

Example : http://localhost/vufind_test/EdsRecord/edselc,edselc.2-52.0-85192383171

Note the Author Affiliation, Author Keywords and ISSN fields:

Test:

Dev:

UPDATE: This is fixed in for the fields I identified.


Description tab is not functioning properly in the test branch

I've found that in every record I've viewed in the test branch, the Description tab has displayed "Description not available." Almost every record has Abstract, DOI, ISSN in the dev branch.

Test:

Dev:

UPDATE: This is fixed!

@ThoWagen
Copy link
Contributor Author

ThoWagen commented Feb 18, 2025

@sturkel89 Thanks for the review! I will just address some of your points and look into the rest tomorrow.

I created a separate PR #4259 that should allow us to remove the links from the email addresses.

Should the superscripts also be removed in the record view?

The icon was probably missing in the dev version I created this branch from. I merged dev again and it is being displayed now.

@sturkel89
Copy link
Contributor

@ThoWagen - I would say that the superscripts can remain in record view as long as the Affiliation field is also displayed, since that's what the superscripts map on to.

@ThoWagen
Copy link
Contributor Author

ThoWagen commented Feb 19, 2025

@sturkel89 - Since #4259 was merged the email addresses do not get linkified anymore.

The superscripts are now hidden in the results. And I'm not sure if we should hide them in the core based on the existence of the affiliation field. We can probably just assume that it always exists.

The subject list should now be complete. I could not access your first example because our clients don't have access to that database. Can you confirm that it is also fixed?

Regarding the formatting of multi-item lists in core record, now each item is displayed on a new line except for authors and subjects. Should we also have one line per item for those fields?

The description tab is also fixed but I have a question for @demiankatz: In #4195 I did not realize that the RecordDataFormatter is already used for all types of Records in the descriptions. What do you think of this commit aa1bbc7 that moves the description specs to the base class that is not abstract anymore? If you like that approach I would create a separate PR for it to reduce the changes here.

@sturkel89
Copy link
Contributor

Thanks for all of this good work, @ThoWagen! Please see above; I've added a note at the bottom of each section of my review to confirm which parts have been addressed.

See below for my few remaining requests and questions.


Display of author names in results browse - simplify further?

I see that the superscripts are hidden now in search results - that's a good improvement! I would like to see the email addresses and text like (AUTHOR) also removed in results browse, so all we see are hyperlinked names (as in my example from dev). The other details will appear when the user views the record. (We can ask others for opinions on this if you'd like.)

Example: http://localhost/vufind_test/EDS/Search?filter%5B%5D=LIMIT%7CFT%3A%22y%22&filter%5B%5D=EXPAND%3A%22fulltext%22&dfApplied=1&lookfor=10.3390%2Fs24248141&type=AllFields

Test:

Dev:


Location of Affiliation field in core record - can this be programmatically set?

In the record I used as an example, the Affiliation field is immediately below the author names. In many other records, it seems to display at the bottom of the record. (Example: http://localhost/vufind_test/EdsRecord/buh,179256203?sid=28 )

Is it possible for you to specify the location of that field so that it's always close to the author name field?


Subject terms - can they be one per row in item record view?

In item record view in "standard" VuFind, the lists of Subjects are displayed with one subject per line, while authors are comma separated. Can you update the EDS branch so that subject lists display each item on a new line?.

I'm curious to see how this would look in practice, because in many cases the source databases list MANY MANY subject terms, sometimes in multiple fields, for some items. This is different from the book catalog, where cataloging rules keep the number of headings fairly small.

I would still like to see how it looks to have the subject headings on one line each.


Thanks again for all of this work!

@demiankatz
Copy link
Member

The description tab is also fixed but I have a question for @demiankatz: In #4195 I did not realize that the RecordDataFormatter is already used for all types of Records in the descriptions. What do you think of this commit aa1bbc7 that moves the description specs to the base class that is not abstract anymore? If you like that approach I would create a separate PR for it to reduce the changes here.

I definitely like the idea of using inheritance to avoid redundant configuration. Just one question, though: would it be better to make the EDS specs a subclass of the DefaultRecord specs instead of moving logic all the way up to the base class? It feels to me like it might be more logical to have the specs class hierarchy as parallel to the record driver hierarchy as possible. The default description specs should make sense for anything that inherits from the DefaultRecord record driver, but maybe not for every record driver in existence.

@demiankatz
Copy link
Member

Display of author names in results browse - simplify further?

I see that the superscripts are hidden now in search results - that's a good improvement! I would like to see the email addresses and text like (AUTHOR) also removed in results browse, so all we see are hyperlinked names (as in my example from dev). The other details will appear when the user views the record. (We can ask others for opinions on this if you'd like.)

There is an [AuthorDetails] section in EDS.ini that is meant to control the way authors are displayed in different contexts. Does the refactored code still respect this? If not, maybe that needs to be factored in somehow. See: https://github.com/vufind-org/vufind/blob/dev/config/vufind/EDS.ini#L442-L461

@ThoWagen
Copy link
Contributor Author

Thanks @sturkel89,

"Display of author names in results browse - simplify further?"

On dev the behavior is as follows. If either "ResultListFormat" in EDS.ini is set to "Long" or the record only contains one author the complete info is displayed. Otherwise, only the hyperlinked names are displayed. I wonder if that is intentional or if one of the following behaviors are better:

  1. Only the hyperlinked names are displayed in the result list no matter what. The "ResultListFormat" only changes the maximum number of displayed authors.
  2. The whole info is always displayed if the config is "Long" and never if it is "Short", even if there is only one author.

@demiankatz Yes, the [AuthorDisplay] section of the INI is also used in this PR but currently only controls the maximum number of displayed authors.

"Location of Affiliation field in core record - can this be programmatically set?"
Yes, it should be possible. But it most likely is out of scope for this PR. So I created a Jira ticket for it https://openlibraryfoundation.atlassian.net/browse/VUFIND-1759

"Subject terms - can they be one per row in item record view?"

Fortunately this is really simple to do now. I changed it in this PR.

@ThoWagen
Copy link
Contributor Author

The description tab is also fixed but I have a question for @demiankatz: In #4195 I did not realize that the RecordDataFormatter is already used for all types of Records in the descriptions. What do you think of this commit aa1bbc7 that moves the description specs to the base class that is not abstract anymore? If you like that approach I would create a separate PR for it to reduce the changes here.

I definitely like the idea of using inheritance to avoid redundant configuration. Just one question, though: would it be better to make the EDS specs a subclass of the DefaultRecord specs instead of moving logic all the way up to the base class? It feels to me like it might be more logical to have the specs class hierarchy as parallel to the record driver hierarchy as possible. The default description specs should make sense for anything that inherits from the DefaultRecord record driver, but maybe not for every record driver in existence.

Okay, I will create a separate PR and we can discuss the details there.

@sturkel89
Copy link
Contributor

I took another look at this PR, @ThoWagen.

I see that you put the subject terms each on a separate line. That looks great!

I also see the JIRA ticket re: moving Author Affiliation up in the record. I hope that can happen. The affiliation field might be labelled differently in different database sources that feed EDS, so let me know if I can help you identify the relevant fields.

Re: the display of author names in results browse, I am still not sure why test does not behave the same as dev. I checked my [AuthorDisplay] settings in EDS.ini, and ResultListFormat had been set as 'Short.' But I still see all those extra words and email addresses in test but not in dev. There are definitely records where the data are formatted strangely, so the author affiliation or email appears in results browse regardless of the setting, but the example I provided has the potential to appear with only hyperlinked author names (I think that's the behavior #1 that you describe).

Thanks for continuing to improve this! I think this PR has the potential to really improve the display of EDS content through VuFind.

@ThoWagen
Copy link
Contributor Author

@sturkel89 Sorry, if I created some confusion here. The behavior in this PR is indeed different from dev.

On dev the behavior is as follows. If either "ResultListFormat" in EDS.ini is set to "Long" or the record only contains one author the complete info is displayed. Otherwise, only the hyperlinked names are displayed. I wonder if that is intentional or if one of the following behaviors are better:

In this part I just described the current state on dev, questioned if that makes sense and provided the following two alternatives.

1. Only the hyperlinked names are displayed in the result list no matter what. The "ResultListFormat" only changes the maximum number of displayed authors.

2. The whole info is always displayed if the config is "Long" and never if it is "Short", even if there is only one author.

When we decide if we want the same behavior as dev or one of those alternatives I have to implement it here.

@sturkel89
Copy link
Contributor

Thank you for clarifying!

I think the Results List view is much cleaner if we consistently only show hyperlinked author names, regardless of the number of author names displayed. More information about the authors will be available in Item Record view, and/or on the article or ebook itself.

So, my preference is that we implement your option #1 for author names in Results List browse. It will not always be perfect because of issues with metadata in some cases, but it will mostly be better.

@demiankatz
Copy link
Member

@ThoWagen, regarding the long and short author formats, I think one motivation for the original settings was concern over inconsistencies between the data presented in the two different parts of the EDS record, and a desire to control which version is used. I'm honestly not sure how much that matters, but if nothing else, if we're no longer pulling data from different places for different values of the setting, we should be sure to adjust the relevant notes in EDS.ini. A lot of this logic came from @alexklbuckley, so it might also be valuable to get his input on the approach taken here.

@alexklbuckley
Copy link
Contributor

@ThoWagen, regarding the long and short author formats, I think one motivation for the original settings was concern over inconsistencies between the data presented in the two different parts of the EDS record, and a desire to control which version is used. I'm honestly not sure how much that matters, but if nothing else, if we're no longer pulling data from different places for different values of the setting, we should be sure to adjust the relevant notes in EDS.ini. A lot of this logic came from @alexklbuckley, so it might also be valuable to get his input on the approach taken here.

Thanks @ThoWagen and @demiankatz just acknowledging I have received this and it is on my list to look at asap!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants