Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

 Timeout for certain tables (Regionaldatenbank) #147

Open
sjockers opened this issue Sep 17, 2024 · 4 comments
Open

 Timeout for certain tables (Regionaldatenbank) #147

sjockers opened this issue Sep 17, 2024 · 4 comments

Comments

@sjockers
Copy link
Collaborator

sjockers commented Sep 17, 2024

Requesting certain tables from regionsalstatistik.de takes several minutes, eventually causing a timeout.

I've encountered this with the following tables: 41120-12-01-4, 41141-03-01-4, 41141-03-02-4-B – but other tables may have the same issue.

# Requesting the table 41120-12-01-4 takes several minutes and will eventually cause a timeout 
t = pystatis.Table(name="41120-12-01-4") 
t.get_data()

Result:

ReadTimeout: HTTPSConnectionPool(host='[www.regionalstatistik.de](https://file+.vscode-resource.vscode-cdn.net/Users/sjockers/Projects/swr/agrar_regionalstatistik/www.regionalstatistik.de)', port=443): Read timed out. (read timeout=300)

Requests for other tables (from regionsalstatistik.de) run through and have much shorter request times, including some from the same statistics as the problematic ones, e.g. 41141-03-02-4, 41141-03-02-4-B.

@pmayd
Copy link
Collaborator

pmayd commented Oct 21, 2024

I am currently looking into the issue, but I have some general troubles with the Regionalstatistik API: Currently I am not able to make any request against the API, I am always getting a Code 6 in return telling me that I am executing too many parallel requests, which is not the case...

@pmayd
Copy link
Collaborator

pmayd commented Oct 23, 2024

@sjockers I investigated more and it only happens for fomat=ffcsv which is very bad for us as we only support this format. And the problem is not even the background job, actually the initial request against the tablefile endpoint never finishes for the ffcsv, so the API does not even return a response and also does not start a background job like in the past. So it seems like the whole system is broken currently, probably because they want to switch from background jobs to the archive format, but I don't understand why this happens without a new API version, so in general the whole API is very unstable right now and a lot of things do not work with Regionalstatistik....the csv download works, I tried it with Postman, but I don't think it makes much sense for us to also support this format as it requires a different approach in parsing...

@sjockers
Copy link
Collaborator Author

sjockers commented Oct 24, 2024

Thanks for looking into this, Michael! I agree that it would not make sense to support the CSV-format instead, as it would make parsing much harder (and would probably cause more issues down the line).

I have contacted regionalstatistik.de (IT.NRW) about the problem, lets see what they say.

@sjockers
Copy link
Collaborator Author

Update from regionalstatistik.de: This is apparently a known issue that concerns certain tables with complex headers: These tables cannot currently be rendered as ffcsv, causing the present bug.

Upgrading to a newer version of GENESIS should fix the problem, which will happen "in the coming months". There's no concrete date for the migration, though.

I'd suggest to ignore this bug until regionalstatistik.de is migrated to 'new' Genesis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants