You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After all the initial harvesting of publications by ORCID is complete we want to examine all publications that have a DOI and also lack OpenAlex metadata, to see if OpenAlex metadata is actually available when querying by DOI. The rationale for this previously was that some publications may be in OpenAlex, but may not be associated with a ORCID.
We had some code for this, which took advantage of the fact that you can query for more than one DOI at a time:
logging.warn(f"Found multiple publications for DOI {doi}")
iflen(pubs) >0:
yieldnormalize_publication(pubs[0])
exceptapi.QueryErrorase:
logging.error(f"OpenAlex QueryError for {doi}: {e}")
continue
We will want to use the database in such a way that we aren't pulling all the query results back into memory for SELECT * FROM publications WHERE sulpub_json IS NULL AND doi IS NOT NULL.
It might be interesting to keep a count of how often we find metadata this way, and log it at the end, to help evaluate how important this is.
The text was updated successfully, but these errors were encountered:
edsu
changed the title
Create a new DAG task to enhance metadata by DOI for Dimensions
Create "fill-in" task for OpenAlex
Mar 3, 2025
After all the initial harvesting of publications by ORCID is complete we want to examine all publications that have a DOI and also lack OpenAlex metadata, to see if OpenAlex metadata is actually available when querying by DOI. The rationale for this previously was that some publications may be in OpenAlex, but may not be associated with a ORCID.
We had some code for this, which took advantage of the fact that you can query for more than one DOI at a time:
rialto-airflow/rialto_airflow/harvest/openalex.py
Lines 83 to 109 in 94c1a28
We will want to use the database in such a way that we aren't pulling all the query results back into memory for
SELECT * FROM publications WHERE sulpub_json IS NULL AND doi IS NOT NULL
.It might be interesting to keep a count of how often we find metadata this way, and log it at the end, to help evaluate how important this is.
The text was updated successfully, but these errors were encountered: