Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scraper fixes #1137

Merged
merged 5 commits into from
Mar 1, 2024
Merged

Scraper fixes #1137

merged 5 commits into from
Mar 1, 2024

Conversation

ewan-escience
Copy link
Collaborator

@ewan-escience ewan-escience commented Mar 1, 2024

Scraper fixes

Changes proposed in this pull request:

  • Fix bug where an OpenAlex raw author was wrongly assumed to be not null
  • Adapt to the changed DataCite GraphQL API, where the publisher now has sub fields
  • Escape colons when searching in the DataCite GraphQL API on title
  • Show DataCite instead of Crossref when DataCite was used as the source in bulk import
  • Set the container names for Swagger and CodeMeta in production

How to test:

  • docker compose down --volumes && docker compose build --parallel && docker compose up --scale data-generation=0
  • Create a software or project page, publish it
  • Add the following DataCite mentions directly, then remove them, then add them with bulk import:
    • 10.48550/arxiv.2310.12084
    • 10.5281/zenodo.1140396
  • Search for the title CellProfiler and KNIME: Open-Source Tools for High-Content Screening, a DataCite result should be shown, add it.
  • Run the mentions scraper: docker compose exec scrapers java -cp /usr/myjava/scrapers.jar nl.esciencecenter.rsd.scraper.doi.MainMentions
  • The scraper should run without errors, the scraped_at fields should be set at http://localhost/api/v1/mention
  • Add the following reference papers:
    • 10.1021/acs.jpclett.9b01634
    • 10.1021/ja026939x
  • Run the citation scraper: docker compose exec scrapers java -cp /usr/myjava/scrapers.jar nl.esciencecenter.rsd.scraper.doi.MainCitations, it should run without errors

PR Checklist:

  • Increase version numbers in docker-compose.yml
  • Link to a GitHub issue
  • Update documentation
  • Tests

@ewan-escience ewan-escience self-assigned this Mar 1, 2024
Copy link

sonarcloud bot commented Mar 1, 2024

Copy link

sonarcloud bot commented Mar 1, 2024

@ewan-escience ewan-escience merged commit e82e831 into main Mar 1, 2024
5 checks passed
@ewan-escience ewan-escience deleted the scraper-fixes branch March 21, 2024 10:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant