-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IQSS/7349-4 creator updates in schema.org #9089
IQSS/7349-4 creator updates in schema.org #9089
Conversation
it does not appear to be useful given the tests in PersonOrOrgUtilTest
FWIW: The 4 changes to the schema.org output are live on https://data.qdr.syr.edu now if anyone wants to take a look at the results. |
After some testing at QDR, it appears that organizations ending in Project get coded as a person. The PR now adds a jvm option to allow the algorithm to assume that all Person names are added in the recommended Family Name, Given Name format which enables it to code 'John Smith Project' and any other org without a comma correctly. This is off by default and may not be useful for non-curated repositories. An alternative/additional approach that could be used would be to add a configurable list of non-person names (like 'Project') that would allow finer grained control over specific cases. |
@qqmyers I'm a little confused. I see you mention this issue above... ... but this PR doesn't close it. Can you please explain what else is needed? I'll leave a comment over there as well if it makes more sense to have the conversation there. Thanks. |
Looks good. thanks for adding the ToDos for code consolidation. @qqmyers Looks like there are conflicts that need to be resolved. I pulled it back into Review for the merge conflicts |
IQSS/7349-4_creator_updates
Issues found:
Jim was not able to reproduce the above issue and retesting I was not able to either. Theory is a temp caching issue. |
What this PR does / why we need it: Per discussion in #7349 and #5029, Google ignores creator entries from Dataverse because they don't include an @type Person or Organization. Since we don't collect that directly, we have to use some mechanism to infer the @type. The OpenAire export format does this by leveraging an algorithm developed by DataCite. This PR abstracts that algorithm from the code producing the OpenAire xml format into a new PersonOrOrgUtil class that is then used to identify an @type for creators in Dataverse. The new functionality is then used to provide the @type in the schema.org export format and in-dataset-page metadata, along with the givenName and firstName for a person if/when the algorithm determines them.
The PR also updates how the 'affiliation' is handled. According to schema.org, only Persons have 'affilation' and that must be to an 'Organization'. The PR makes this change - to send the affiliation as an object of type Organization with the 'name' specified. Since Organizations don't have an 'affilation', but they do have a 'parentOrganization' key, for Organizations the code now encodes the affiliation for Orgs as a 'parentOrganzation' object of @type 'Organization' and 'name' as specified.
Which issue(s) this PR closes:
Special notes for your reviewer:
FWIW: I think the new PersonOrOrgUtil can replace the code in the OpenAire export but I didn't do that in this PR. The algorithm is used in that code with two minor variations that I think can be handled by an appropriate entry for the organizationIfTied param.
Suggestions on how to test this: Unit tests cover most of this. As with others, visually inspecting the schema.org export and/or embedded datasetpage json-ld can be done to verify that Person or Organization is added and that affiliation is handled as described here. Note that miscategorizations (a person with @type Organization or vice versa) is not considered a bug and is considered better than not sending a type at all. If there are consistent issues, the algorithm can potentially be improved.
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
Is there a release notes update needed for this change?: included.
Additional documentation: