Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make ROR data dump available as CSV & RDF #113

Open
mariagould opened this issue Aug 27, 2020 · 13 comments
Open

Make ROR data dump available as CSV & RDF #113

mariagould opened this issue Aug 27, 2020 · 13 comments
Labels
enhancement Improvement/change to existing functionality

Comments

@mariagould
Copy link
Contributor

mariagould commented Aug 27, 2020

GRID data is currently published as a package of files including:

  • JSON
  • RDF (as .ttl)
  • CSV (institute list and full tables)

https://www.grid.ac/format

ROR should also publish these file formats, so that GRID users can easily migrate.

@mariagould mariagould added the enhancement Improvement/change to existing functionality label Aug 27, 2020
@lizkrznarich lizkrznarich changed the title Make ROR data dump available as CSV or TSV Make ROR data dump available as CSV & RDF Oct 4, 2021
@paulmillar
Copy link

If it helps any, I've created a mapping from JSON to RDF here. It uses the existing grid ontology, but with the ROR IRIs

@LynneD424
Copy link

It would be excellent to have this resource available as a .csv file if possible!

@amandafrench
Copy link

Just a note that I spoke with Otto Lange from the library at Universiteit Utrecht today and he's very interested in having the ROR registry as RDF.

@joachimBrindeau
Copy link

CSV would be greatly appreciated.
Or at least a tutorial on how to import the data as it exists into an SQL database.
I found this but it's too difficult for a no-coder like me.
Thanks a lot

@amandafrench
Copy link

@joachimBrindeau Gotcha! As you can see above, we've moved this issue from Backlog to Planned, which means that we are now planning to provide ROR data in CSV. Meanwhile, if you're familiar with OpenRefine, that's a reasonably easy way to convert it to CSV.

If you're not familiar with OpenRefine, you can email [email protected] and ask for a CSV of the most recent release of the ROR data.

Note that ROR data is in the JSON format, so you could look for tutorials on importing JSON into SQL.

@adambuttrick
Copy link
Contributor

adambuttrick commented Jan 19, 2023

Script I've used for various projects is available here:

https://github.com/ror-community/curation_ops/tree/main/utilities/data_dump_to_csv

@lizkrznarich
Copy link
Contributor

Due to the need to create/maintain an RDF ontology in order to generate a ttl version, the initial scope of this project will be CSV only. GRID created its own ontology for most fields, and this is now defunct since the GRID website no longer exists. Initially, we will produce a single CSV with a subset of fields (those that are not deprecated or empty in all records):

  • id
  • name
  • status
  • types
  • established
  • country.country_name
  • country.country_code
  • addresses.geonames_city.id
  • addresses.geonames_city.name
  • addresses.geonames_city.geonames_admin1.code
  • addresses.geonames_city.geonames_admin1.name
  • aliases (delimited list in one field)
  • acronyms (delimited list in one field)
  • labels (delimited list in one field)
  • links
  • wikipedia_url
  • external_ids (GRID, ISNI, Funder Registry and Wikidata only, as columns with [id name].preferred and [id_name].all as headings)
  • relationships (delimited list in one field)

@paulmillar
Copy link

Hi @lizkrznarich ,

Thanks for the update. I have a question and a comment.

First, on a purely procedural question, will this issue (#113) be closed once ROR supports CSV data dumps?

There is (seemingly) some interest in providing data dumps in RDF. I think that interest should be represented by some kind of open issue. That way, it would be possible to track the progress towards supporting RDF. Therefore, I suggest that, whatever happens to this issue, there is an open issue that requests RDF data dumps.

Second, you're right that supporting RDF would require having some kind of ontology. Without this, the RDF data wouldn't have much meaning. In the past, GRID provided a (rather minimal) ontology for their RDF data dumps.

However, I think you paint a somewhat too negative view.

Although GRID no longer supports RDF data dumps, the GRID ontology works reasonably well with the ROR data dumps. This (I think) reflects that ROR hasn't yet made major changes to the information model.

In my own modest efforts, I've adopted the GRID ontology and made only some minimal changes to support ROR IDs. Using this, I'm able to generate GRID-like RDF data dumps from data dumps from Zenodo.

I've also created a new version of the ontology that supports two new relationships (hasSuccessor and hasPredecessor). This is for the forthcoming changes in which ROR can describe organisations who's work has been adopted by some new organisation.

Continuing to use the GRID namespace is, perhaps, not ideal. However, I wanted to be backwards compatible with the GRID issued RDF dumps.

@lizkrznarich
Copy link
Contributor

lizkrznarich commented Jan 30, 2023

@paulmillar We're planning to keep this issue open and I've split the CSV work off onto a separate issue (140). While you're correct that we could just adapt the GRID ontology for now, the ROR schema will soon diverge from GRID, so the ontology would need to be redeveloped in short order. Before undertaking this, we'd like to know more about the use cases for ROR in RDF. Can you tell us more about how you're using ROR/GRID in RDF?

@paulmillar
Copy link

Thanks for the feedback.

My use-case involves working with the EU's public data of projects it funds (CORDIS) to understanding how different people and different organisations are involved and connected. Using ROR to identify organisations seems a natural choice (although the lack of PIC identifiers is unfortunate). I would be interested in using ROR as identifiers (e.g., linking with ORCID information), but also as a source of metadata about the organisations: name(s), geographic location, etc.

The goal is two-fold. First, to provide a tool that allows people working within the project to keep track of everyone else (on larger projects, this can be tricky). Second, to provide a method to create something like an organigram, allowing people outside the project to understand the project's structure and the key people involved at the different levels (something that projects seem to struggle to maintain).

At the moment, this is little concrete to show for this: I don't have much spare time to focus on this project.

@rmfranken
Copy link

Any update on this? I'm also using ROR ID, ORCID ID and a bunch of other RDF resources in one RDF graph. Would love to be able to just refer to ROR ID's instead of strings to increase the expresiveness! In the meantime I will try to use @paulmillar's rml file. Thanks for that!

@amandafrench
Copy link

@rmfranken Thanks for asking! Essentially the task of producing ROR RDF is on hold until after we launch the improved ROR metadata schema v2.0 later this year -- see https://ror.readme.io/docs/schema-v20 for details on that. Can you tell us more about the purpose of your RDF graph?

@rmfranken
Copy link

I work for the Swiss Data Science Center, and we are making a knowledge graph to capture some information about our skills, languages, experience and clients we work with. (similar to LinkedIn, except not proprietary). The ROR ID's come in handy when identifying organizations that we work with or have worked for before. For instance, we want to be able to ask a question like: Which people in our organization have worked with client X before, and speak German, have knowledge of the Biomedical domain, and know how to program in Python, or work closely with someone that knows Python? Or: Which office is located closer to client X?

Of course we can individually define a bunch of organizations, give them names and some metadata, but this would be a lot of work and a waste of such a nice resource like the ROR. Of course, not every organization is in the ROR, but this gives a good starting point for a searchable and append-able pick list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improvement/change to existing functionality
Projects
Status: Under development
Development

No branches or pull requests

8 participants