Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate classification of areas under environmental zone causes confusion: merge these or make these a truly distinct hierarchy #1583

Open
cmungall opened this issue Dec 16, 2024 · 0 comments

Comments

@cmungall
Copy link
Member

ENVO frequently has shadow classifications between zones and other branches such as landform.

And example is "desert", which is in both zone and landform branches:

image

Often things in the zone branch are labeled area, but not consistently. For example, here we have a class "rocky desert" which is defined as A desert plain characterized by a surface veneer of rock., which would lead us to think it would be in this branch:

  • [] ENVO:00000191 ! solid astronomical body part
    • [i] ENVO:01001886 ! landform
      • [i] ENVO:01001884 ! surface landform
        • [i] ENVO:01001357 ! desert

However, it's in this separate branch:

  • [] ENVO:01001199 ! terrestrial environmental zone
    • [i] ENVO:01000752 ! area of barren land
      • [i] ENVO:00000097 ! desert area
        • [i] ENVO:00000172 ! sandy desert
        • [i] ENVO:00000173 ! rocky desert
        • [i] ENVO:00000183 ! stony desert

It's not clear when curators should use one branch or another. What we see right now is people picking a mix of these, but this means that the hierarchies don't line up, and when groups build faceted browsing tools, "rocky desert" samples don't roll up under "desert" samples.

This is repeated elsewhere, for tundra, wetlands, grasslands, etc.

It looks like many of the area terms were added to provide precise equivalents to NLCD (National Land Cover Database) terms. There is an argument that a land cover based classification system should be different since this encompasses a different perspective and use case, eg. annotating remote sensing data.

The NLCD mapped terms are in bold here, interwoven with existing terms:

image

It looks like grouping classes such as "wetland area" were added to provide some kind of structure for the NLCD terms, which causes concept duplication.

These were added in 2017: #458 (comment)

It's not clear from the original request whether the use case dictated that these be modeled as a distinct branch or more woven in to the existing ENVO hierarchy.

I propose that we make all of this more consistent and less confusing for users, by picking one of the following strategies:

  1. Merge concepts
  2. Continue to have separate branches, but have this be more systematic and better documented
  3. Separate out alternative classification schemes into orthogonal mapped ontologies

Merge concepts

  • We would merge "X area" into X where an existing X term exists
  • If no "X" exists, then pick "X environment" or "X ecosystem" where these exist
    • This would mean e.g "area of woody wetland" would become a subclass of the existing "wetland ecosystem"
  • I propose we also making the naming more consistent, so "area of woody wetland" would simply be named "woody wetland"
    • We would keep the NLCD names as tagged synonyms. This is very consistent with what we do with other ontologies

This is my favored approach. It makes for a simpler ontology

Separate branches, but systematic

Here we would keep separate branches, but make the naming, coordination across branches consistent. We would have clear and simple top level documentation and inline documentation in the ontology that specifies what this separate branch is for. There would be clear use cases for when one branch should be picked over another.

As well as providing simple curator documentation, we'd need to work with external groups to make sure that reporting standards and submission tools pick from the correct branch. For example, for someone submitting metagenomic sample data to INSDC, when would they use "glassland area" vs "grassland ecosystem"

Naming should consistent, so that it's clear when one is picking an area vs ecosystem. I suggest a rule that if a term has a suffix "area" then all is-a children should also be suffixed this way.

Separate out alternative classification schemes

Recognize it is hard if not impossible to superimpose multiple alternative classification schemes in one ontology. Pick one broad and uncontroversial way of doing things, and work with other groups to make an ontological representation, and then map between the systems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant