Conduct data experiment comparing name queries to ORCID iD queries #133

jacobthill · 2025-02-06T18:00:47Z

Querying researchers by name is problematic because of all of the false positives. Querying by ORCID iD returns no false positives (in theory) but it misses some publications. In short, there is a precision-recall trade off between these two querying strategies. As ORCID adoption increases, and data providers improve their metadata, querying by ORCID iD will replace the need to query by name strings. We need an experiment (in a jupyter notebook) that can be re-run periodically to see the current state of querying by ORCID iD.

Export a sample of SUL-Pub authors who have reviewed all of their publications AND gone through the ORCID integration.
Export all publications from SUL-Pub for these authors.
Query Dimensions, OpenAlex, PubMed, and WoS by ORCID id for these authors.
Compare publications sets with and without the WoS publications.

The results will help us understand the following questions:

Does querying by ORCID iD alone return sufficiently complete results to move away from name querying?
Does querying Dimensions, OpenAlex, and PubMed return sufficiently complete results to end our expensive WoS subscription?

jacobthill self-assigned this Feb 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conduct data experiment comparing name queries to ORCID iD queries #133

Conduct data experiment comparing name queries to ORCID iD queries #133

jacobthill commented Feb 6, 2025

Conduct data experiment comparing name queries to ORCID iD queries #133

Conduct data experiment comparing name queries to ORCID iD queries #133

Comments

jacobthill commented Feb 6, 2025