Identifying high value dataset (UC) #17

vgole001 · 2024-09-30T11:32:32Z

Added a methodology at here for identifying if a dataset is of high value.

A dataset is considered of high value if the bbox it defines applies to a whole country.
The methodology has some limitations taking into account only datasets(xml file) that contain a single bbox.
We have defined a threshold if the bounding box covers at least 70% of the country to consider a dataset
of high value. This threshold can be changed based on our needs

Next Steps
The methodology identifying if a dataset is of high value taking into account multiple bboxes in a dataset(xml file) gets more complicated.

FYI @DajanaSnopkova @Max-at-Vlaanderen @pvgenuchten @Tomas-Pavelka

pvgenuchten · 2024-09-30T11:46:22Z

National spatial coverage is one of the aspects to be considered for a high value label.

What do we do with Belgium, Italy, Poland and Germany, which are regionally oriented

Max-at-Vlaanderen · 2024-09-30T13:14:29Z

we could also do the approach differently; instead of detecting whether something is ‘national’, try to find out what level the bbox is at:

take midpoint bbox (possibly expandable to multiple points should we find this safer)
query OSM with these coordinates (or other database with hierarchical administrative units)
compare overlapping area with all administrative units found
save the level with best overlap so that use case by use case everyone can look at what is most valuable.
4extra) use decimal numbers to get extra nuances; e.g. dataset overlapping 40% of a country gets level 3.6

For the rest, I certainly also agree with Paul that ‘High Value’ should certainly not only equate to national. I would very much like to see a data density factor added in the future.

But further approach probably still to be discussed in an upcoming meeting?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Identifying high value dataset (UC) #17

Identifying high value dataset (UC) #17

vgole001 commented Sep 30, 2024

pvgenuchten commented Sep 30, 2024

Max-at-Vlaanderen commented Sep 30, 2024

Identifying high value dataset (UC) #17

Identifying high value dataset (UC) #17

Comments

vgole001 commented Sep 30, 2024

pvgenuchten commented Sep 30, 2024

Max-at-Vlaanderen commented Sep 30, 2024