-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update README and rename directories in line with new terminology #35
Changes from 9 commits
b50d776
299bfc6
4a1b793
ea0b7c6
e07d5f3
a032a07
869f182
6f6c142
352bfeb
e97a54f
8d09e1b
6fe80b2
df81eb6
5e1ec27
5580fbd
2c120a1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,25 +1,24 @@ | ||
# Phyloreferencing Curation Workflow | ||
The Phyloreferencing curation workflow serves three main purposes: | ||
# Clade Ontology | ||
|
||
1. It provides a set of exemplar curated phyloreferences in JSON and OWL. | ||
2. It provides a test space for trying different approaches to generating phyloreferences from JSON to OWL, although these will be moved into their own repositories if they prove to be successes. | ||
3. It provides a test suite of phyloreferences along with expected resolved nodes, allowing reasoning to be continually tested as ontologies and software tools are updated. | ||
The Clade Ontology is an ontology of exemplar phyloreferences curated from peer-reviewed publications. Phyloreferences in this ontology include their verbatim clade definition and the phylogeny upon which they were initially defined. The ontology therefore acts as both a catalogue of computable clade definitions as well as a test suite of phyloreferences that can be tested to determine if each phyloreference resolves as expected. This ontology is expressed in the [Web Ontology Language (OWL)](https://en.wikipedia.org/wiki/Web_Ontology_Language) and is available for reuse under the terms of the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0). | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My feeling is we should standardize on a "default" software license, because I don't see much point in having different licenses for different pieces of software we develop, given that we're not defaulting on the GPL, which we may otherwise have reason to relax from depending on the specific tool or code. So are you suggesting the Apache 2.0 license as our default, and do you have reasons to choose it over, say MIT, or BSD2/3, as the most commonly used "permissive" licenses. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, it looks we've standardized to the MIT license across all Phyloref software products as we decided to in phyloref/phylo2owl#1 -- except for the Clade Ontology, which ended up with the Apache License before we'd closed that issue. I've now relicensed it with the MIT license in 5e1ec27. |
||
[![Build Status](https://travis-ci.org/phyloref/curation-workflow.svg?branch=master)](https://travis-ci.org/phyloref/curation-workflow) | ||
[![Build Status](https://travis-ci.org/phyloref/clade-ontology.svg?branch=master)](https://travis-ci.org/phyloref/clade-ontology) | ||
|
||
## Currently curated phyloreferences | ||
## Executing phyloreferences as a test suite | ||
|
||
| Curated paper | DOI | Phyloreferences | Status | | ||
|---------------|-----|-----------------|--------| | ||
| [Fisher et al, 2007](testcases/Fisher%20et%20al,%202007) | [doi](https://doi.org/10.1639/0007-2745%282007%29110%5B46%3APOTCWA%5D2.0.CO%3B2#https://doi.org/10.1639/0007-2745%282007%29110%5B46%3APOTCWA%5D2.0.CO%3B2) | 11 phyloreferences | All resolved correctly, but one resolved to a different node from paper | | ||
| [Hillis and Wilcox, 2005](testcases/Hillis%20and%20Wilcox,%202005) | [doi](https://doi.org/10.1016/j.ympev.2004.10.007) | 16 phyloreferences | All resolved correctly, but in two cases the correct resolution was no nodes | | ||
To generate all OWL files and test all phyloreferences, you will need [pytest](https://docs.pytest.org/), which you can install by running `pip install -r requirements.txt`. Note that you will also need to have [Java](https://java.com/) installed to test the phyloreferences. | ||
|
||
## Executing phyloreferences as a test suite | ||
Once pytest and all other required libraries are installed, you can execute all tests by running `py.test tests/` in the root directory of this project. We support two optional marks: | ||
|
||
* `py.test tests/ -m json` executes the scripts to create OWL representations of the test suite. This tests the content of the JSON file and ensures that they can be converted into OWL. | ||
* `py.test tests/ -m owl` reasons over the created OWL files and ensures that the expected nodes are correctly resolved by the phyloreferences. | ||
|
||
## Data workflow | ||
|
||
Curated phyloreferences produced by the [Curation Tool](https://github.com/phyloref/curation-tool) as Phyloreference eXchange (PHYX) files are currently stored in the [`phyx`](phyx/) directory (see [Brochu 2003](phyx/Brochu%202003/paper.json) as an example). When executed as a test suite, these files are converted into the Web Ontology Language (OWL) in the following steps: | ||
|
||
You can execute and test all phyloreferences by running `py.test` in the root directory | ||
of this project. We support two marks: | ||
1. PHYX files are converted to JSON-LD files using the [`phyx2owl`](phyx2owl/) Python tool. This tool translates [phylogenies represented in Newick](https://en.wikipedia.org/wiki/Newick_format) into a series of statements describing individual nodes and their relationships, and translates phyloreferences into OWL class restrictions that describes the nodes they resolve to. | ||
2. The produced JSON-LD files can be transformed by any standards-compliant converter into OWL files. In the test suite, we use the [`rdfpipe`](http://rdflib.readthedocs.io/en/stable/apidocs/rdflib.tools.html#module-rdflib.tools.rdfpipe) tool included in the [`rdflib`](http://rdflib.readthedocs.io/) Python library. | ||
3. Any compliant [OWL 2 DL reasoner](https://www.w3.org/TR/2012/REC-owl2-direct-semantics-20121211/) should be able to reason over this OWL file and provide information on which nodes each phyloreference resolved to. In the test suite, we use [`jphyloref`](https://github.com/phyloref/jphyloref), a Java application that uses the [JFact++ 1.2.4 OWL reasoner](http://jfact.sourceforge.net/) to reason over input OWL files. `jphyloref` can also read the annotations that indicate where each phyloreference was expected to resolve on any of the included phylogenies, and test whether phyloreferences resolved to the expected nodes. | ||
|
||
* `py.test -m json` executes the scripts to create OWL representations of the test suite. | ||
This tests the content of the JSON file and ensures that they can be converted into OWL. | ||
* `py.test -m owl` reasons over the created OWL files and ensures that the expected nodes | ||
are correctly resolved by the phyloreferences. | ||
We are currently working on a complete workflow that would allow us to [merge separate PHYX files into a single Clade Ontology](https://github.com/phyloref/clade-ontology/projects/3) available as a single OWL file available for individual download. At the moment, therefore, OWL files need to be generated by running the test suite on your own computer. |
This file was deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where's the Apache license coming from?? OBO recommends CC-BY. I would suggest CC0, unless we have good arguments as to why we cannot waive copyright (for example, because someone other than us has it), or why our terms of reuse have to include a legal requirement for attribution (as opposed to a scientific norm mandating proper citation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Apache license is intended to apply to the software in this repository -- the phyx2owl script and the testing scripts -- which is clearly not what the text in the README file says. I've updated the README file to clarify that, and added a copy of the CC0 legal code into the
phyx
directory to make it clear that a separate license applies there.As for the question of other authors' copyright, I don't think you could argue that the phylogeny (which is a series of assertions) or the raw phyloreference (which is a list of specifiers) count as creative works, so CC0 sounds great to me. We are currently recording information that helps to explain why the clade was defined in a particular way (e.g. "Because sister group relationships are better resolved and supported than basally-branching ingroup relationships within this clade, a branch-modified node-based definition was used to maximize definitional and compositional stability (Cantino et al., 2007).", see reference below). This is arguably copyrightable, but I think it's small enough that we could argue fair use or remove it on a case-by-case basis if there are specific complaints.
clade-ontology/testcases_too_slow/Wojciechowski, 2013/paper.json
Line 74 in 187c78c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So fair use permitting us to reuse the definition texts here does not mean they are in the public domain, and fair use does mean you can claim intellectual property rights. So I'm skeptical that we have the rights to release the clade definition texts into the public domain.
Maybe this needs some consultation first with corresponding experts. Given the legal uncertainty, I'd argue we cannot at this point attach a CC0 waiver if the definition texts are included, because once we do we've waived our rights and we can't just take that back.