-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File xlxs "COVID-19 ISS open data" #1127
Comments
Ciao, non è perfetto ma ecco qui: https://github.com/floatingpurr/covid-19_sorveglianza_integrata_italia |
Grazie |
Puoi scaricare il dump dal 2022 sino ad oggi utilizzando questo codice: Le operazioni dello script sono le seguenti:
#!/usr/bin/env python
# coding: utf-8
# In[1]:
import pandas as pd
from tqdm.notebook import tqdm
import requests
import os
import zipfile
import glob
# In[2]:
base_path = "data/"
csv_path = "csv/"
base_url = "https://www.epicentro.iss.it/coronavirus/open-data/OPENDATA-{}.zip"
sheets_name = [
"casi_prelievo_diagnosi",
"casi_inizio_sintomi",
"casi_inizio_sintomi_sint",
"casi_regioni",
"casi_provincie",
"ricoveri",
"decessi",
"sesso_eta",
"stato_clinico",
]
dfs = {}
# In[9]:
for folder in ["extracted", "raw_extracted", csv_path, base_path]:
if not os.path.isdir(os.path.join(base_path, folder)):
os.mkdir(os.path.join(base_path, folder))
# In[4]:
# Download all data from ISS
for year in ["2020", "2021", "2022"]:
r = requests.get(base_url.format(year), allow_redirects=True)
open("{}/{}.zip".format(base_path, year), "wb").write(r.content)
# In[5]:
for year in tqdm(["2020", "2021", "2022"]):
with zipfile.ZipFile("{}/{}.zip".format(base_path, year)) as zf:
zf.extractall(os.path.join(base_path, "raw_extracted"))
# In[6]:
for file in glob.glob("**/**.xlsx", recursive=True):
os.replace(file, os.path.join(base_path, "extracted", os.path.basename(file)))
# In[7]:
df = pd.DataFrame()
for f in tqdm(os.listdir(os.path.join(base_path, "extracted"))):
xls = pd.ExcelFile(os.path.join(base_path, "extracted", f))
for sheet_name in sheets_name:
if sheet_name not in dfs:
dfs[sheet_name] = pd.DataFrame()
# now read your csv file
temp = pd.read_excel(xls, sheet_name=sheet_name)
dfs[sheet_name] = dfs[sheet_name].append(temp)
# In[15]:
for k in tqdm(dfs):
dfs[k]["iss_date"] = pd.to_datetime(dfs[k]["iss_date"])
dfs[k].sort_values("iss_date",inplace=True)
dfs[k].to_csv("{}.csv".format(os.path.join(base_path, csv_path, k)), index=False)
# In[16]:
for k in dfs:
display(dfs[k].head()) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
E' possibile avere i dati storici della cartella "Stato_Clinico" del file "COVID-19 ISS open data"?
Mi serve per verificare come sta cambiando nel corso di questi ultimi mesi lo stato clinico dei pazienti attualmente positivi rispetto alla % vaccinati per età.
Ringrazio in anticipi Andrea
The text was updated successfully, but these errors were encountered: