Skip to content

20170925 Ontology Change Improvement Call

marijane white edited this page Sep 25, 2017 · 1 revision

Date: 25 Sept 2017

Attendees: Mike Conlon, Christian Hauschke, Anna Kasprzik, Graham Triggs, Marijane White, Muhammad Javed

Agenda:

Mike: Still trying to create an ontology file that represents VIVO. I have made one from the files in the filegraph directory, Javed has made one, Anna has worked with it. Not sure what the process is to settle on one.

Javed: I wasn't able to try making my own version of the file, but I promise I will do it before the next call. We should move forward assuming the one you made is the correct file.

Mike: We need to decide how we're going to work with it. Anna has some concerns about Protege and how it interacts with the file.

Anna: Yes, I was just experimenting. I was working with the file we're using here at TIB. I loaded it into Protege and press "save", and then ROBOT finds a lot of differences, lots of added axioms, not sure why. Marijane's suggestion that it's due to the nondeterministic output of the OWLAPI but if ROBOT is working the graph, that's not what's going on.

Javed: I have seen differences as well, things saved as Annotation properties that should not be saved that way.

Anna: I have also noticed Protege adds lots of axioms about classes.

Mike: I have seen that behavior as well, it's inferring things and attempting to be more complete than the file.

Anna: If we're using Protege we need to be careful, we need to understand exactly what it is doing with the file.

Javed: When I did this a year ago, I decided not to use Protege, I converted everything to triples and combined them as an N-Triples file.

OK, Mike, let's suppose your file is the correct version. What's next?

Mike: There are a couple things that immediately follow. It needs to have an IRI and a version number in the file. And then that file needs to be posted to the web as the file that is the resolver for URLs that express semantic web locations. We know the file on the web is not in correspondence with what we're working with, but we can correct that, and once the file is correct, we can post it to the web.

Javed: and when is the next VIVO release? Is that file going to be in it?

Mike: That is a good question, we currently distribute 43 files, I'm not sure it's necessary to do so. I know you would like to break them out. They're not very useful, they're in multiple formats, they're not organized by domain, so I would like to replace them with the same material all in one file.

Javed: What's the plan for the next release?

Graham: At the moment I'm just going through testing and making sure everything is ready to release.

Javed: I'm sorry, I'm talking about the next release after that.

Graham: The plan might be around the conference next year, but that might be optimistic since we've moved the conference earlier.

Javed: [missing transcription]

Graham: That is something I would need to look at to determine the impact.

Mike: Yes, we don't have enough testing harness to know that there's no impact. But I think shortly after the release we can try substituting the files with the new file and work with that.

Christian: So it would be part of 1.11?

Mike: Yes.

And the second thing we'd need to do is merge things with source.owl in VIVO-ISF.

Javed: So VIVO-ISF is source.owl?

Mike: The file I've been working with, I've been calling filegraph.owl, which is not a great name, but it's the files in the filegraph directory. Over at the VIVO-ISF repository, it contains everything, eagle-i, etc, and it used to claim to contain everything in the VIVO ontology, but we've found it is currently short assertions, so it should be possible to take the missing assertions and add them to source.owl, and then we have been discussing whether we want to extract the VIVO ontology from source.owl. Once we have a filegraph.owl

Javed: and how is VIVO-ISF used?

Marijane: It's mostly used by eagle-i, the tools that Shahim wrote extracts eagle-i from source.owl somehow. I am not sure how it works, I know Juliane at Harvard is working on that.

Javed: So we figure out how to extract the filegraph from source.owl and then we proceed with the small change?

Mike: yes, then we'd want to start actual work on the ontology. We have a collection of JIRA tickets, some have small impact, others are more significant, and we'd like to make some low-impact changes to figure out how the process should work. And then we'd work up to more impactful changes, that might require software changes and communication with users, etc.

I had a very good conversation at RDA with Pierre Robert (sp?) from Montreal about internationalization and the issue of labels. Almost all of the labels in the ontology are English-only, without a language tag. That is not good. So it seems at the very least, we should put English language tags on all of the labels. And then there's the work of providing alternate language tags. Providing alternate language tags is good work, and it is true that VIVO uses some ontology labels in presentation to the user. I believe what we saw in the demos was class name labels in particular.

Anna: Are there any labels from other languages that are not tagged right now?

Christian: Not that I've seen.

Anna: so it should be safe just to tag all labels?

Mike: Yes.

Christian: Yes.

Javed: So it seems we could proceed with adding English tags, and then add other languages and then work on the UI.

Christian: We have translated most of it, yes. I think every label is translated.

Javed: Will a translation of the ontology suffice for presentation, or is anything else needed?

Anna: We have a problem with plurals.

Christian: Yes, but that is a software problem. Some of the grammar logic is implemented in templates. We could solve this with labels in singular or plural, or there might be another way.

Javed: Can you explain a little more?

Christian: When you have the stats on the front page, it just puts an 's' on the end.

laughter

Mike: That's pretty primitive, that's not going to work in Chinese.

Graham: it doesn't even work in English.

Christian: Yes, sheeps, childs, are examples in English where it doesn't work. There are eight ways to make a plural in German. This change would have an impact on the software of course.

Mike: There are likely some other almost-trivial changes that we should be making. I have an open question, though. There are errors in some of the ontologies we are importing, FAO comes to mind. It contains information about the countries. There are some trivial errors, URLs that are broken and that are easy to fix, and my question is, what is our role in fixing those? We inherit this material from someone else, and we see the problem, and we correct it. OR if we don't correct it, we suffer the consequences?

Anna: Can't we contact the owners of those ontologies?

Mike: That's a start, but what if we find they're not maintain them?

Javed: Maybe we should change to something else.

Mike: We could do that, we could also publish updates to the ontologies we're using.

Anna: We should be up front about that at the wiki/in our documentation.

Christian: Maybe we catalog the ontologies at the wiki with their maintenance status and contact information.

Mike: That's a good question, we have about 20 we're using.

Marijane: Check them at LOV, if they are listed there, there will be notes with contact information and maintenance status. http://lov.okfn.org

Graham: If countries change over time, that has consequences for what we're representing.

Mike: Absolutely, the FAO stuff we are using are a combination of facts and ontology. It includes facts about the countries.

Javed: If we decide to use something else, LOC could be a source for that.

Mike: Wikidata could also be a source. They tend to be very actively maintained. Especially data about countries.

Javed: And thinking about internationalization, is anyone else working on other languages?

Mike: There is a group in Costa Rica that has done a full translation. We have three universities in Spain -- I don't know how different they will be. We have the French Canadians, several sites in France, and we're about to start a development effort in Italy. So there's quite a bit of active internationalization thinking going on right now. The people in India seem content to work in English.

Javed: We can work on the internationalization and the UI changes needed in parallel.

Mike: Maybe for next time, we can start reviewing issues? Go through them and indicate whether there are any in particular we should work on?

Javed: In JIRA or in the filegraph? Or should we add those also in JIRA?

Mike: Yes, so right now there are 59 issues in the GitHub and a bunch are marked VIVO. Some of them apply beyond VIVO.

Javed: How is the level of severity assigned?

Mike: There isn't, they are just things that people have asked for, and it's up to this group to figure it out. Some of them are quite trivial. For example, right now VIVO does not assert a type for a phone number unless it's a Fax number. The query being used has a MINUS query.

Marijane: Is this related to the fact that vcard removed the Telephone class?

Mike: No, that's a different issue. But right now we are not asserting a phone type in the data.

Graham: It looks like currently there is a Telephone with subtypes.

Mike: and it's multi-dimensional, there's a technology and purpose classifications.

Javed: I looked at the VCard ontology and did not see a telephone. https://www.w3.org/TR/vcard-rdf/#classes

Graham: We also don't have all of FOAF, so we're not distinguishing between things like Twitter accounts.

Mike: These are all the sorts of issues that are in the Github. Stuff that looks like clerical issues. Anyway, I think we should proceed as Javed suggests, and proceed as if the file is correct. And if we find problems, we file new issues.

Javed: So if we have 59 issues, there are 6 of us, we could each take 10 and assign impact.

Mike: Yeah, that would be useful. Some of them are incredibly high impact and we're not going to be addressing them soon.

Javed: I don't know if these are all valid or not.

Mike: Yes, some can be closed, some are obsolete, some are things that we're not interested in before.

I would be happy to volunteer to go through some issues and categorize them as high or low impact. And then we could discuss them at the next meeting.

Marijane: How do we want to split them up?

Javed: we could do that right now.

Mike: Sort by newest.

Javed: So I will take the first ten. Mike will take then next ten. Anna, can you take the next ten?

Anna: I'm sorry, I will be traveling and then I will be on holiday.

Javed: Christian can you take some?

Christian: Early October maybe.

Javed: Graham? 31-40? Marijane? 41-50?

Marijane: And we'll leave the last 10 for later.

Javed: And we'll evaluate low, medium, high

Mike: Low impact is no impact on running the software.

Javed: and how do we indicate this?

Mike: With a label. I will make those.

Javed: And before the next meeting, I will do my compares with the ontology file, and I'll add those as I find them.

Is there anything else we can do? Marijane, any status from Juliane?

Marijane: I haven't heard anything recently, I will ping her.

Does this seem like a good place to wrap up for this week?

Mike: I think so.

Marijane: OK, see everyone in two weeks.

The VIVO-ISF ontology is an information standard for representing scholarly work.

Additional Resources

Clone this wiki locally