-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Repair BDC pipeline runs with forceRefresh=False #235
Comments
The issue with regionalatlas seems to be related to the hashtables, as the error only occurs when running the first time (without any hashtables) |
The address scraping step was a very early experiment and is not "production ready". It was never meant to end up in the final pipeline, as we get the address from google. I'll creat a special demo pipeline config for the BDC. |
The GPT Errors look worse than they are. It just means that the data was not present in the cache files. I adjusted the error logging. |
Good to know. However, the error message is still appearing:
|
This seems to be S3 specific, I'll check again. |
It's just ungraceful error handling, when a google place has no reviews, we don't save any to S3. The sentiment analyzer just assumes where the review file should be but cant find it. The sentiment score will be |
When running the pipeline
run_all_steps.json
(but withforceRefresh
set tofalse
everywhere), several errors happen in the different steps. These need to be fixed or the affected pipeline steps should be taken out.List of errors
Ordered by severity
| ERROR | pipeline.py:57 | Step Regional_Atlas failed! Columns must be same length as key
| ERROR | s3_repository.py:212 | Error loading review from S3 with id ChIJkdTnnsMzs1IRlCF2m6bKYsU. Error: An error occurred (NoSuchKey) when calling the GetObject operation: The specified key does not exist.
(might indicate a problem with the Google step)Getting addresses from custom domains...: 47%|████████████████████████▏ | 241/518 [17:00<19:32, 4.23s/it]
Note
The current pipeline
run_all_steps.json
should be changed to haveforceRefresh: false
set everywhere. The current configuration can optionally be copied to a new pipeline configforce_refresh_all_steps.json
.Acceptance Criteria
run_all_steps.json
with forceRefresh set tofalse
everywhereThe text was updated successfully, but these errors were encountered: