Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conduct data experiment comparing name queries to ORCID iD queries #133

Open
jacobthill opened this issue Feb 6, 2025 · 0 comments
Open
Assignees

Comments

@jacobthill
Copy link
Contributor

Querying researchers by name is problematic because of all of the false positives. Querying by ORCID iD returns no false positives (in theory) but it misses some publications. In short, there is a precision-recall trade off between these two querying strategies. As ORCID adoption increases, and data providers improve their metadata, querying by ORCID iD will replace the need to query by name strings. We need an experiment (in a jupyter notebook) that can be re-run periodically to see the current state of querying by ORCID iD.

  • Export a sample of SUL-Pub authors who have reviewed all of their publications AND gone through the ORCID integration.
  • Export all publications from SUL-Pub for these authors.
  • Query Dimensions, OpenAlex, PubMed, and WoS by ORCID id for these authors.
  • Compare publications sets with and without the WoS publications.

The results will help us understand the following questions:

  • Does querying by ORCID iD alone return sufficiently complete results to move away from name querying?
  • Does querying Dimensions, OpenAlex, and PubMed return sufficiently complete results to end our expensive WoS subscription?
@jacobthill jacobthill self-assigned this Feb 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant