Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Population data for 247 AGS that don't exist? #31

Open
curiousleo opened this issue Apr 14, 2022 · 2 comments
Open

Population data for 247 AGS that don't exist? #31

curiousleo opened this issue Apr 14, 2022 · 2 comments

Comments

@curiousleo
Copy link

curiousleo commented Apr 14, 2022

I was trying to better understand GermanZero-de/localzero-generator-core#200, so I took a look at the population and AGS data used in that PR.

What I found is that population/2018.csv gives population data for AGS which according to ags/master.csv do not exist.

# Extract AGS from ags/master.csv
$ tail --lines=+2 ags/master.csv | cut -d, -f1 | sort >ags

# Extract AGS from population/2018.csv
$ tail --lines=+2 population/2018.csv | cut -d, -f1 | sort >pop

# Count the number of AGS in population/2018.csv but not in ags/master.csv
$ comm -13 ags pop | wc -l
248

Full result

Note that one of the AGS in population/2018.csv but not in ags/master.csv is "DG000000", which seems fine. That leaves 247 AGS that we have population data for at the end of 2018 but that are not in the list of valid AGS at the end of 2018.

There may well be a simple explanation for this, but nothing comes to mind right now. Any idea what's going on here?

@curiousleo
Copy link
Author

Ich dachte kurz, das könnte etwas mit gemeindefreien Gebieten zu tun haben. Aber davon scheint es nur knapp 80 zu geben, zumindest laut der Destatis AGS Historie wenn man im Gemeindenamen nach "gemfr." sucht:

select
  AGS,
  GemeindenameMitZusatz
from
  ags_historie
where
  GemeindenameMitZusatz like "%gemfr.%"
group by
  AGS,
  GemeindenameMitZusatz

Ergibt: gemfr.csv.txt

@bgrundmann
Copy link
Collaborator

I think it helps to know that master.csv is meant to be the list of all AGS that we think the generator can generate reasonable data for (and indeed the web frontend requires you to enter a AGS from that list). So it is fine for individual datasets (such as population) to contain entries for more AGS, they will just not be used until we add them to the master.

The other way around is not fine for what it is worth (and writing a check for that is still on the todo list).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants