-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make ROR data dump available as CSV & RDF #113
Comments
If it helps any, I've created a mapping from JSON to RDF here. It uses the existing grid ontology, but with the ROR IRIs |
It would be excellent to have this resource available as a .csv file if possible! |
Just a note that I spoke with Otto Lange from the library at Universiteit Utrecht today and he's very interested in having the ROR registry as RDF. |
CSV would be greatly appreciated. |
@joachimBrindeau Gotcha! As you can see above, we've moved this issue from Backlog to Planned, which means that we are now planning to provide ROR data in CSV. Meanwhile, if you're familiar with OpenRefine, that's a reasonably easy way to convert it to CSV. If you're not familiar with OpenRefine, you can email [email protected] and ask for a CSV of the most recent release of the ROR data. Note that ROR data is in the JSON format, so you could look for tutorials on importing JSON into SQL. |
Script I've used for various projects is available here: https://github.com/ror-community/curation_ops/tree/main/utilities/data_dump_to_csv |
Due to the need to create/maintain an RDF ontology in order to generate a ttl version, the initial scope of this project will be CSV only. GRID created its own ontology for most fields, and this is now defunct since the GRID website no longer exists. Initially, we will produce a single CSV with a subset of fields (those that are not deprecated or empty in all records):
|
Hi @lizkrznarich , Thanks for the update. I have a question and a comment. First, on a purely procedural question, will this issue (#113) be closed once ROR supports CSV data dumps? There is (seemingly) some interest in providing data dumps in RDF. I think that interest should be represented by some kind of open issue. That way, it would be possible to track the progress towards supporting RDF. Therefore, I suggest that, whatever happens to this issue, there is an open issue that requests RDF data dumps. Second, you're right that supporting RDF would require having some kind of ontology. Without this, the RDF data wouldn't have much meaning. In the past, GRID provided a (rather minimal) ontology for their RDF data dumps. However, I think you paint a somewhat too negative view. Although GRID no longer supports RDF data dumps, the GRID ontology works reasonably well with the ROR data dumps. This (I think) reflects that ROR hasn't yet made major changes to the information model. In my own modest efforts, I've adopted the GRID ontology and made only some minimal changes to support ROR IDs. Using this, I'm able to generate GRID-like RDF data dumps from data dumps from Zenodo. I've also created a new version of the ontology that supports two new relationships ( Continuing to use the GRID namespace is, perhaps, not ideal. However, I wanted to be backwards compatible with the GRID issued RDF dumps. |
@paulmillar We're planning to keep this issue open and I've split the CSV work off onto a separate issue (140). While you're correct that we could just adapt the GRID ontology for now, the ROR schema will soon diverge from GRID, so the ontology would need to be redeveloped in short order. Before undertaking this, we'd like to know more about the use cases for ROR in RDF. Can you tell us more about how you're using ROR/GRID in RDF? |
Thanks for the feedback. My use-case involves working with the EU's public data of projects it funds (CORDIS) to understanding how different people and different organisations are involved and connected. Using ROR to identify organisations seems a natural choice (although the lack of PIC identifiers is unfortunate). I would be interested in using ROR as identifiers (e.g., linking with ORCID information), but also as a source of metadata about the organisations: name(s), geographic location, etc. The goal is two-fold. First, to provide a tool that allows people working within the project to keep track of everyone else (on larger projects, this can be tricky). Second, to provide a method to create something like an organigram, allowing people outside the project to understand the project's structure and the key people involved at the different levels (something that projects seem to struggle to maintain). At the moment, this is little concrete to show for this: I don't have much spare time to focus on this project. |
Any update on this? I'm also using ROR ID, ORCID ID and a bunch of other RDF resources in one RDF graph. Would love to be able to just refer to ROR ID's instead of strings to increase the expresiveness! In the meantime I will try to use @paulmillar's rml file. Thanks for that! |
@rmfranken Thanks for asking! Essentially the task of producing ROR RDF is on hold until after we launch the improved ROR metadata schema v2.0 later this year -- see https://ror.readme.io/docs/schema-v20 for details on that. Can you tell us more about the purpose of your RDF graph? |
I work for the Swiss Data Science Center, and we are making a knowledge graph to capture some information about our skills, languages, experience and clients we work with. (similar to LinkedIn, except not proprietary). The ROR ID's come in handy when identifying organizations that we work with or have worked for before. For instance, we want to be able to ask a question like: Which people in our organization have worked with client X before, and speak German, have knowledge of the Biomedical domain, and know how to program in Python, or work closely with someone that knows Python? Or: Which office is located closer to client X? Of course we can individually define a bunch of organizations, give them names and some metadata, but this would be a lot of work and a waste of such a nice resource like the ROR. Of course, not every organization is in the ROR, but this gives a good starting point for a searchable and append-able pick list. |
GRID data is currently published as a package of files including:
https://www.grid.ac/format
ROR should also publish these file formats, so that GRID users can easily migrate.
The text was updated successfully, but these errors were encountered: