Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update README and rename directories in line with new terminology #35

Merged
merged 16 commits into from
Jul 17, 2018
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
# Ignore files produced by the test suite
phyx/*/paper_as_owl.json
phyx/*/paper.owl

# Ignore files produced by pytest
.pytest_cache/

# Ignore Mac-specific files
.DS_Store

Expand Down
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,5 +8,5 @@ install:

# command to run tests
script:
- cd testcase2owl; python -m pytest; cd ..
- cd phyx2owl; python -m pytest; cd ..
- travis_wait 40 py.test ./tests
35 changes: 17 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,24 @@
# Phyloreferencing Curation Workflow
The Phyloreferencing curation workflow serves three main purposes:
# Clade Ontology

1. It provides a set of exemplar curated phyloreferences in JSON and OWL.
2. It provides a test space for trying different approaches to generating phyloreferences from JSON to OWL, although these will be moved into their own repositories if they prove to be successes.
3. It provides a test suite of phyloreferences along with expected resolved nodes, allowing reasoning to be continually tested as ontologies and software tools are updated.
The Clade Ontology is an ontology of exemplar phyloreferences curated from peer-reviewed publications. Phyloreferences in this ontology include their verbatim clade definition and the phylogeny upon which they were initially defined. The ontology therefore acts as both a catalogue of computable clade definitions as well as a test suite of phyloreferences that can be tested to determine if each phyloreference resolves as expected. This ontology is expressed in the [Web Ontology Language (OWL)](https://en.wikipedia.org/wiki/Web_Ontology_Language) and is available for reuse under the terms of the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where's the Apache license coming from?? OBO recommends CC-BY. I would suggest CC0, unless we have good arguments as to why we cannot waive copyright (for example, because someone other than us has it), or why our terms of reuse have to include a legal requirement for attribution (as opposed to a scientific norm mandating proper citation).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Apache license is intended to apply to the software in this repository -- the phyx2owl script and the testing scripts -- which is clearly not what the text in the README file says. I've updated the README file to clarify that, and added a copy of the CC0 legal code into the phyx directory to make it clear that a separate license applies there.

As for the question of other authors' copyright, I don't think you could argue that the phylogeny (which is a series of assertions) or the raw phyloreference (which is a list of specifiers) count as creative works, so CC0 sounds great to me. We are currently recording information that helps to explain why the clade was defined in a particular way (e.g. "Because sister group relationships are better resolved and supported than basally-branching ingroup relationships within this clade, a branch-modified node-based definition was used to maximize definitional and compositional stability (Cantino et al., 2007).", see reference below). This is arguably copyrightable, but I think it's small enough that we could argue fair use or remove it on a case-by-case basis if there are specific complaints.

"cladeDefinition": "Papilionoideae (L.) DC. [M. F. Wojciechowski], converted clade name.\n\nDefinition (branch-modified, node-based): The most inclusive crown clade containing Castanospermum australe A. Cunn. ex Mudie 1829 and Vicia faba L. 1753 but not Caesalpinia gilliesii (Wall. ex Hook.) D. Dietr. 1840, Gleditsia triacanthos L., or Dialium guianense (Aubl.) Sandwith 1939. Because sister group relationships are better resolved and supported than basally-branching ingroup relationships within this clade, a branch-modified node-based definition was used to maximize definitional and compositional stability (Cantino et al., 2007).\n\nComments on name. Papilionoideae (L.) DC. 1825 is a preexisting scientific name established under the rank-based ICBN and applied to the subfamily of the Leguminosae Jussieu 1789 that corresponds to this clade. Faboideae Rudd 1968 is an alternative name for this subfamily of Fabaceae Lindl. 1836 or Leguminosae, whereas the similar Papilionaceae Giseke 1792 is the appropriate name for this taxon when the subfamily is treated as a separate family from the Caesalpiniaceae R. Br. 1814 and Mimosaceae R. Br. 1814.\n\nReference phylogeny: Cardoso et al. (2012; Fig. 1). The monophyly of Papilionoideae has been demonstrated in all higher-level molecular phylogenetic analyses published to date beginning with Käss and Wink (1996) and Doyle et al. (1997), albeit sometimes without robust statistical support. Later analyses such as Wojciechowski et al. (2004;Fig. 2) and Cardoso et al. (2012; Fig. 3) have provided unequivocal, robust support for this clade.\n\nDiagnostic apomorphy: Flowers generally zygomorphic, or some actinomorphic, adaxial petal generally outside the adjacent lateral petals, sepals generally united at base.\n\nComposition: This clade, comprised of more than 478 genera and 13,800 species (Lewis et al., 2005), is cosmopolitan in distribution and includes the vast majority of agriculturally important legumes such as Pisum sativum L. (pea), Medicago sativa L. (alfalfa), Trifolium L. (clovers), Vicia L. (vetches), Lens Mill. (lentils), Lupinus L. (lupins), Glycine max (L.) Merr. (soybean), Phaseolus L. (beans), Lablab purpureus (L.) Sweet (hyacinth bean), and Arachis hypogaea L. (peanut).\n\nSynonyms/Etymology: Papilionoideae (L.) DC. 1825, Faboideae Rudd 1968, and the informal name “Papilionoids”. The name “papilionoid” most likely comes from the resemblance of the typical or characteristic flower to a butterfly (papilio, from Latin), especially when the flower is dissected into its component five petals.",
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So fair use permitting us to reuse the definition texts here does not mean they are in the public domain, and fair use does mean you can claim intellectual property rights. So I'm skeptical that we have the rights to release the clade definition texts into the public domain.

Maybe this needs some consultation first with corresponding experts. Given the legal uncertainty, I'd argue we cannot at this point attach a CC0 waiver if the definition texts are included, because once we do we've waived our rights and we can't just take that back.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My feeling is we should standardize on a "default" software license, because I don't see much point in having different licenses for different pieces of software we develop, given that we're not defaulting on the GPL, which we may otherwise have reason to relax from depending on the specific tool or code.

So are you suggesting the Apache 2.0 license as our default, and do you have reasons to choose it over, say MIT, or BSD2/3, as the most commonly used "permissive" licenses.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it looks we've standardized to the MIT license across all Phyloref software products as we decided to in phyloref/phylo2owl#1 -- except for the Clade Ontology, which ended up with the Apache License before we'd closed that issue. I've now relicensed it with the MIT license in 5e1ec27.

[![Build Status](https://travis-ci.org/phyloref/curation-workflow.svg?branch=master)](https://travis-ci.org/phyloref/curation-workflow)
[![Build Status](https://travis-ci.org/phyloref/clade-ontology.svg?branch=master)](https://travis-ci.org/phyloref/clade-ontology)

## Currently curated phyloreferences
## Executing phyloreferences as a test suite

| Curated paper | DOI | Phyloreferences | Status |
|---------------|-----|-----------------|--------|
| [Fisher et al, 2007](testcases/Fisher%20et%20al,%202007) | [doi](https://doi.org/10.1639/0007-2745%282007%29110%5B46%3APOTCWA%5D2.0.CO%3B2#https://doi.org/10.1639/0007-2745%282007%29110%5B46%3APOTCWA%5D2.0.CO%3B2) | 11 phyloreferences | All resolved correctly, but one resolved to a different node from paper |
| [Hillis and Wilcox, 2005](testcases/Hillis%20and%20Wilcox,%202005) | [doi](https://doi.org/10.1016/j.ympev.2004.10.007) | 16 phyloreferences | All resolved correctly, but in two cases the correct resolution was no nodes |
To generate all OWL files and test all phyloreferences, you will need [pytest](https://docs.pytest.org/), which you can install by running `pip install -r requirements.txt`. Note that you will also need to have [Java](https://java.com/) installed to test the phyloreferences.

## Executing phyloreferences as a test suite
Once pytest and all other required libraries are installed, you can execute all tests by running `py.test tests/` in the root directory of this project. We support two optional marks:

* `py.test tests/ -m json` executes the scripts to create OWL representations of the test suite. This tests the content of the JSON file and ensures that they can be converted into OWL.
* `py.test tests/ -m owl` reasons over the created OWL files and ensures that the expected nodes are correctly resolved by the phyloreferences.

## Data workflow

Curated phyloreferences produced by the [Curation Tool](https://github.com/phyloref/curation-tool) as Phyloreference eXchange (PHYX) files are currently stored in the [`phyx`](phyx/) directory (see [Brochu 2003](phyx/Brochu%202003/paper.json) as an example). When executed as a test suite, these files are converted into the Web Ontology Language (OWL) in the following steps:

You can execute and test all phyloreferences by running `py.test` in the root directory
of this project. We support two marks:
1. PHYX files are converted to JSON-LD files using the [`phyx2owl`](phyx2owl/) Python tool. This tool translates [phylogenies represented in Newick](https://en.wikipedia.org/wiki/Newick_format) into a series of statements describing individual nodes and their relationships, and translates phyloreferences into OWL class restrictions that describes the nodes they resolve to.
2. The produced JSON-LD files can be transformed by any standards-compliant converter into OWL files. In the test suite, we use the [`rdfpipe`](http://rdflib.readthedocs.io/en/stable/apidocs/rdflib.tools.html#module-rdflib.tools.rdfpipe) tool included in the [`rdflib`](http://rdflib.readthedocs.io/) Python library.
3. Any compliant [OWL 2 DL reasoner](https://www.w3.org/TR/2012/REC-owl2-direct-semantics-20121211/) should be able to reason over this OWL file and provide information on which nodes each phyloreference resolved to. In the test suite, we use [`jphyloref`](https://github.com/phyloref/jphyloref), a Java application that uses the [JFact++ 1.2.4 OWL reasoner](http://jfact.sourceforge.net/) to reason over input OWL files. `jphyloref` can also read the annotations that indicate where each phyloreference was expected to resolve on any of the included phylogenies, and test whether phyloreferences resolved to the expected nodes.

* `py.test -m json` executes the scripts to create OWL representations of the test suite.
This tests the content of the JSON file and ensures that they can be converted into OWL.
* `py.test -m owl` reasons over the created OWL files and ensures that the expected nodes
are correctly resolved by the phyloreferences.
We are currently working on a complete workflow that would allow us to [merge separate PHYX files into a single Clade Ontology](https://github.com/phyloref/clade-ontology/projects/3) available as a single OWL file available for individual download. At the moment, therefore, OWL files need to be generated by running the test suite on your own computer.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
2 changes: 1 addition & 1 deletion testcase2owl/testcase2owl.py → phyx2owl/phyx2owl.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#!/usr/bin/env python

"""
testcase2owl.py: Converts a Phyloreference curated test case into a
phyx2owl.py: Converts a Phyloreference curated test case into a
JSON-LD file with node information. It carries out two conversions:

1. Converts all phylogenies into a node-based representation in OWL,
Expand Down
29 changes: 0 additions & 29 deletions testcases/Fisher et al, 2007/README.md

This file was deleted.

Loading