Make ROR data dump available as CSV & RDF #113

mariagould · 2020-08-27T22:22:32Z

GRID data is currently published as a package of files including:

JSON
RDF (as .ttl)
CSV (institute list and full tables)

https://www.grid.ac/format

ROR should also publish these file formats, so that GRID users can easily migrate.

paulmillar · 2021-10-10T13:23:09Z

If it helps any, I've created a mapping from JSON to RDF here. It uses the existing grid ontology, but with the ROR IRIs

LynneD424 · 2022-09-13T13:15:14Z

It would be excellent to have this resource available as a .csv file if possible!

amandafrench · 2022-11-15T13:52:53Z

Just a note that I spoke with Otto Lange from the library at Universiteit Utrecht today and he's very interested in having the ROR registry as RDF.

joachimBrindeau · 2022-12-11T14:42:45Z

CSV would be greatly appreciated.
Or at least a tutorial on how to import the data as it exists into an SQL database.
I found this but it's too difficult for a no-coder like me.
Thanks a lot

amandafrench · 2022-12-12T18:04:03Z

@joachimBrindeau Gotcha! As you can see above, we've moved this issue from Backlog to Planned, which means that we are now planning to provide ROR data in CSV. Meanwhile, if you're familiar with OpenRefine, that's a reasonably easy way to convert it to CSV.

If you're not familiar with OpenRefine, you can email [email protected] and ask for a CSV of the most recent release of the ROR data.

Note that ROR data is in the JSON format, so you could look for tutorials on importing JSON into SQL.

adambuttrick · 2023-01-19T19:42:36Z

Script I've used for various projects is available here:

https://github.com/ror-community/curation_ops/tree/main/utilities/data_dump_to_csv

lizkrznarich · 2023-01-20T15:45:57Z

Due to the need to create/maintain an RDF ontology in order to generate a ttl version, the initial scope of this project will be CSV only. GRID created its own ontology for most fields, and this is now defunct since the GRID website no longer exists. Initially, we will produce a single CSV with a subset of fields (those that are not deprecated or empty in all records):

id
name
status
types
established
country.country_name
country.country_code
addresses.geonames_city.id
addresses.geonames_city.name
addresses.geonames_city.geonames_admin1.code
addresses.geonames_city.geonames_admin1.name
aliases (delimited list in one field)
acronyms (delimited list in one field)
labels (delimited list in one field)
links
wikipedia_url
external_ids (GRID, ISNI, Funder Registry and Wikidata only, as columns with [id name].preferred and [id_name].all as headings)
relationships (delimited list in one field)

paulmillar · 2023-01-28T22:22:24Z

Hi @lizkrznarich ,

Thanks for the update. I have a question and a comment.

First, on a purely procedural question, will this issue (#113) be closed once ROR supports CSV data dumps?

There is (seemingly) some interest in providing data dumps in RDF. I think that interest should be represented by some kind of open issue. That way, it would be possible to track the progress towards supporting RDF. Therefore, I suggest that, whatever happens to this issue, there is an open issue that requests RDF data dumps.

Second, you're right that supporting RDF would require having some kind of ontology. Without this, the RDF data wouldn't have much meaning. In the past, GRID provided a (rather minimal) ontology for their RDF data dumps.

However, I think you paint a somewhat too negative view.

Although GRID no longer supports RDF data dumps, the GRID ontology works reasonably well with the ROR data dumps. This (I think) reflects that ROR hasn't yet made major changes to the information model.

In my own modest efforts, I've adopted the GRID ontology and made only some minimal changes to support ROR IDs. Using this, I'm able to generate GRID-like RDF data dumps from data dumps from Zenodo.

I've also created a new version of the ontology that supports two new relationships (hasSuccessor and hasPredecessor). This is for the forthcoming changes in which ROR can describe organisations who's work has been adopted by some new organisation.

Continuing to use the GRID namespace is, perhaps, not ideal. However, I wanted to be backwards compatible with the GRID issued RDF dumps.

lizkrznarich · 2023-01-30T17:24:17Z

@paulmillar We're planning to keep this issue open and I've split the CSV work off onto a separate issue (140). While you're correct that we could just adapt the GRID ontology for now, the ROR schema will soon diverge from GRID, so the ontology would need to be redeveloped in short order. Before undertaking this, we'd like to know more about the use cases for ROR in RDF. Can you tell us more about how you're using ROR/GRID in RDF?

paulmillar · 2023-02-15T18:57:40Z

Thanks for the feedback.

My use-case involves working with the EU's public data of projects it funds (CORDIS) to understanding how different people and different organisations are involved and connected. Using ROR to identify organisations seems a natural choice (although the lack of PIC identifiers is unfortunate). I would be interested in using ROR as identifiers (e.g., linking with ORCID information), but also as a source of metadata about the organisations: name(s), geographic location, etc.

The goal is two-fold. First, to provide a tool that allows people working within the project to keep track of everyone else (on larger projects, this can be tricky). Second, to provide a method to create something like an organigram, allowing people outside the project to understand the project's structure and the key people involved at the different levels (something that projects seem to struggle to maintain).

At the moment, this is little concrete to show for this: I don't have much spare time to focus on this project.

rmfranken · 2023-06-22T10:57:20Z

Any update on this? I'm also using ROR ID, ORCID ID and a bunch of other RDF resources in one RDF graph. Would love to be able to just refer to ROR ID's instead of strings to increase the expresiveness! In the meantime I will try to use @paulmillar's rml file. Thanks for that!

amandafrench · 2023-06-23T15:44:02Z

@rmfranken Thanks for asking! Essentially the task of producing ROR RDF is on hold until after we launch the improved ROR metadata schema v2.0 later this year -- see https://ror.readme.io/docs/schema-v20 for details on that. Can you tell us more about the purpose of your RDF graph?

rmfranken · 2023-06-26T06:27:05Z

I work for the Swiss Data Science Center, and we are making a knowledge graph to capture some information about our skills, languages, experience and clients we work with. (similar to LinkedIn, except not proprietary). The ROR ID's come in handy when identifying organizations that we work with or have worked for before. For instance, we want to be able to ask a question like: Which people in our organization have worked with client X before, and speak German, have knowledge of the Biomedical domain, and know how to program in Python, or work closely with someone that knows Python? Or: Which office is located closer to client X?

Of course we can individually define a bunch of organizations, give them names and some metadata, but this would be a lot of work and a waste of such a nice resource like the ROR. Of course, not every organization is in the ROR, but this gives a good starting point for a searchable and append-able pick list.

mariagould added the enhancement Improvement/change to existing functionality label Aug 27, 2020

mariagould added the data dump label Feb 24, 2021

lizkrznarich changed the title ~~Make ROR data dump available as CSV or TSV~~ Make ROR data dump available as CSV & RDF Oct 4, 2021

lizkrznarich mentioned this issue Feb 7, 2022

Support RDF via content negotiation #182

Closed

mariagould removed the data dump label Feb 8, 2022

lizkrznarich mentioned this issue Jan 30, 2023

[FEATURE] Publish ROR data dump as CSV (in addition to JSON) ror-community/ror-roadmap#140

Closed

lizkrznarich mentioned this issue Sep 27, 2023

[ROADMAP] Make data dumps available in RDF ror-community/ror-roadmap#125

Open

mariagould added this to ROR Product Development Aug 15, 2024

mariagould moved this to Under development in ROR Product Development Aug 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make ROR data dump available as CSV & RDF #113

Make ROR data dump available as CSV & RDF #113

mariagould commented Aug 27, 2020 •

edited by lizkrznarich

Loading

paulmillar commented Oct 10, 2021

LynneD424 commented Sep 13, 2022

amandafrench commented Nov 15, 2022

joachimBrindeau commented Dec 11, 2022

amandafrench commented Dec 12, 2022

adambuttrick commented Jan 19, 2023 •

edited

Loading

lizkrznarich commented Jan 20, 2023

paulmillar commented Jan 28, 2023

lizkrznarich commented Jan 30, 2023 •

edited

Loading

paulmillar commented Feb 15, 2023

rmfranken commented Jun 22, 2023

amandafrench commented Jun 23, 2023

rmfranken commented Jun 26, 2023

Make ROR data dump available as CSV & RDF #113

Make ROR data dump available as CSV & RDF #113

Comments

mariagould commented Aug 27, 2020 • edited by lizkrznarich Loading

paulmillar commented Oct 10, 2021

LynneD424 commented Sep 13, 2022

amandafrench commented Nov 15, 2022

joachimBrindeau commented Dec 11, 2022

amandafrench commented Dec 12, 2022

adambuttrick commented Jan 19, 2023 • edited Loading

lizkrznarich commented Jan 20, 2023

paulmillar commented Jan 28, 2023

lizkrznarich commented Jan 30, 2023 • edited Loading

paulmillar commented Feb 15, 2023

rmfranken commented Jun 22, 2023

amandafrench commented Jun 23, 2023

rmfranken commented Jun 26, 2023

mariagould commented Aug 27, 2020 •

edited by lizkrznarich

Loading

adambuttrick commented Jan 19, 2023 •

edited

Loading

lizkrznarich commented Jan 30, 2023 •

edited

Loading