Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate detection during external source import should escape metadata #3712

Open
KingKrimmson opened this issue Dec 3, 2024 · 0 comments
Labels
bug component: configurable entities related to configurable entities help wanted Needs a volunteer to claim to move forward integration: OpenAIRE Related to integration with OpenAIRE tools:import-sources Related to "Live Import" Sources feature, allowing import of content via external APIs.

Comments

@KingKrimmson
Copy link

Describe the bug

When importing an item from an external source (e.g. OpenAIRE), DSpace now checks if the item already exists locally and alerts the user. However, the strategy for doing this is simply a solr query using the name metadata. This works for most cases, but breaks when certain characters are not escaped in the title. e.g. :

This means any titles with colons will necessarily create duplicates every time they are imported.

I suspect these other characters may present issues as well: +, --, -, &&, ||, !, (, ), ", ~, *, ?, :

To Reproduce

  1. Setup OpenAIRE to be used during Publication submission to attach a related Project entity.
  2. During submission, do a lookup using the "Funding OpenAIRE API" tab with the query 655609. You should see an item with the title "Adriatic Perspectives: Memory and Identity on a Transnational European Periphery".
  3. Import this item and ensure the Project is created and installed as a DSpace entity.
  4. Ensure solr core is up-to-date
  5. Repeat step 2, we should expect the first Project to appear in the "Select a local match" section, but it is blank, because the colon is not escaped and no results were returned.

Expected behavior

During step 5, the original Project should be detected and displayed for the user to select

Related work

TBD

@KingKrimmson KingKrimmson added bug needs triage New issue needs triage and/or scheduling labels Dec 3, 2024
@github-project-automation github-project-automation bot moved this to 🆕 Triage in DSpace Backlog Dec 3, 2024
@tdonohue tdonohue added tools:import-sources Related to "Live Import" Sources feature, allowing import of content via external APIs. integration: OpenAIRE Related to integration with OpenAIRE help wanted Needs a volunteer to claim to move forward component: configurable entities related to configurable entities and removed needs triage New issue needs triage and/or scheduling labels Dec 3, 2024
@tdonohue tdonohue removed this from DSpace Backlog Dec 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug component: configurable entities related to configurable entities help wanted Needs a volunteer to claim to move forward integration: OpenAIRE Related to integration with OpenAIRE tools:import-sources Related to "Live Import" Sources feature, allowing import of content via external APIs.
Projects
Development

No branches or pull requests

2 participants