-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE REQUEST] Persistent administrative entity identifiers #3672
Comments
This is a really hard problem, because we want to ensure that unique geometries have unique codes - i.e., if you have the same geoboundaries ID, then you should be able to assume that the underlying geometry has not changed. Today, we actually hash the geometry itself to make the code, which is why you see changes - a change in our ID means that the geometry has changed. The problem here is that, of course, most changes are fairly small, which is really just resulting in an ID system that is highly instability (which is also undesirable). We've discussed this a bunch with a range of actors, and what we're currently thinking is something like (lots of details that need to be figured out):
This would also allow us to provide a geometric-based join to other cases (i.e., UN SALB or P-Codes from OCHA) through a similar matching process to their datasets. Basically: a "coarser resolution" version of what we do now, which would result in more stability at the cost of IDs not changing with every geometric shift. Edit: Also, keep in mind that for much of the world we do not have place-names (or they are highly uncertain / unstable). So the ID has to be generated without text-based metadata, which is where the challenge comes in. This sounds like a good dissertation chapter, by the way :) |
Thanks for your response @DanRunfola
This is an excellent idea, and geoBoundaries should continue to create geometry-specific identifiers. If a shape changes even a little bit, I think it is valuable for data consumers to see this change reflected in that shape's identifier.
I think this approach could work as a supplement to the geometry hashes (or UUIDs) you established a need for above. This would provide sufficient persistence, making it worth the time to join geoBoundaries with a dataset like Wikidata. However, I wonder why a geometry-based approach is the best solution here. Why would geoBoundaries avoid direct relations with well-known administrative entity identifiers such as those I listed above?
I understand that it may not always be possible to provide an administrative entity identifier. However, I suspect a vast majority of geoBoundaries data could be directly linked to existing entries in the databases I listed above. Am I underestimating how many places have uncertain names?
Haha, I'd be happy to write a paper on this. |
TL;DR: I would like to associate geoBoundaries data with other datasets. This is difficult to do because geoBoundaries does not persist boundary identifiers across versions. I suggest that geoBoundaries introduce persistent identifiers for administrative entities.
The 2020 geoBoundaries paper states in its opening paragraph:
The "globally unique ID" for each shape is described in this table:
...which glosses over the volatility of shape identifiers:
USA-ADM2-3_0_0-B672
USA-ADM2-92793851B43358342
52423323B78509502983349
52423323B61032845323419
There is no (documented) way to reliably link geoBoundaries entities with data from other sources, or even with those from previous versions of geoBoundaries. Matching boundaries based on shapeName is bound to run into difficulties with regard to formatting, language differences, and formal name changes.
I will continue using Richmond as an example. There are many databases that catalog administrative entities, here are a few:
10045969
to Richmond0000000405094755
E39PBJtCCxFgd9Kg99KkHbYgKd
2592301390
101728675
Many more are listed on Richmond's Wikidata item page, which itself has the permanent reference
Q43421
.If Richmond annexes more of Chesterfield, its identifier in the above databases is unlikely to change. I understand that many boundaries tracked in geoBoundaries are ever-changing, yet there remains a need for persistent identifiers. This would make it much easier to associate shapes in geoBoundaries with their associated entities in the above databases.
I believe there are two options for accomplishing this:
The first option might be the easiest to implement. The persistent identifiers could be added to Wikidata for example, enabling cross-dataset queries. This would allow for complex metadata to be associated with geoBoundaries.
Thank you for your consideration!
The text was updated successfully, but these errors were encountered: