Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identifying high value dataset (UC) #17

Open
vgole001 opened this issue Sep 30, 2024 · 2 comments
Open

Identifying high value dataset (UC) #17

vgole001 opened this issue Sep 30, 2024 · 2 comments

Comments

@vgole001
Copy link

Added a methodology at here for identifying if a dataset is of high value.

A dataset is considered of high value if the bbox it defines applies to a whole country.
The methodology has some limitations taking into account only datasets(xml file) that contain a single bbox.
We have defined a threshold if the bounding box covers at least 70% of the country to consider a dataset
of high value. This threshold can be changed based on our needs

Next Steps
The methodology identifying if a dataset is of high value taking into account multiple bboxes in a dataset(xml file) gets more complicated.

FYI @DajanaSnopkova @Max-at-Vlaanderen @pvgenuchten @Tomas-Pavelka

@pvgenuchten
Copy link
Contributor

National spatial coverage is one of the aspects to be considered for a high value label.

What do we do with Belgium, Italy, Poland and Germany, which are regionally oriented

@Max-at-Vlaanderen
Copy link
Contributor

we could also do the approach differently; instead of detecting whether something is ‘national’, try to find out what level the bbox is at:

  1. take midpoint bbox (possibly expandable to multiple points should we find this safer)
  2. query OSM with these coordinates (or other database with hierarchical administrative units)
    image
  3. compare overlapping area with all administrative units found
  4. save the level with best overlap so that use case by use case everyone can look at what is most valuable.
    4extra) use decimal numbers to get extra nuances; e.g. dataset overlapping 40% of a country gets level 3.6

For the rest, I certainly also agree with Paul that ‘High Value’ should certainly not only equate to national. I would very much like to see a data density factor added in the future.

But further approach probably still to be discussed in an upcoming meeting?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants