-
Notifications
You must be signed in to change notification settings - Fork 603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data has gone stale #127
Comments
Hi, any answer to this? |
Still stale. :-( |
Hello @anuveyatsu I hope you are good. I would love to take up this task to update the data on this repo. I am currently working on it at the moment. |
Hello @anuveyatsu I was able to discover an issue. The GitHub Actions workflow fails because of the large size of the CSV files which is over 100MB (the max file size for GitHub). I am of the idea that the the result should be written to CSV, compressed and then zipped so as to reduce the size OR the Paraquet should be used as a file format. Please let me know what you think about it. |
Thank you @seun-beta for spending time to investigate this issue 👍🏼 I think the best option would be to use git lfs (https://docs.github.com/en/repositories/working-with-files/managing-large-files/configuring-git-large-file-storage) so that we can keep having the data in the consistent format. I'm not sure you'd be able to complete it because I think we need to wire up an external blob storage here (e.g., S3, Google Cloud Storage etc.). |
Hello @anuveyatsu Thank you for your response. I also researched Git LFS initially but the overall setup was a little too much. An idea about using S3 and Boto3 just popped into my mind. When the workflow run is triggered based on the cron configuration, the code could push results into S3 directly. What do you think about that? |
Hello, I've tried deploying Git LFS, and getting this error.
Apparently Git LFS refuses to push against forks of non-Git LFS parent repo. See git-lfs/git-lfs#1906. What about |
@anuveyatsu I think we can go with 2 approach here either using trying |
@gradedSystem I don't believe it makes sense as upstream repo has been archived: https://github.com/CSSEGISandData/COVID-19 |
@anuveyatsu noted |
The last update appears to be 4/16.
Q: Do you have an ETA for making the data current?
The text was updated successfully, but these errors were encountered: