generated from dataforgoodfr/d4g-project-template
-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #8 from dataforgoodfr/scorreia/poc_analysis_notebook
team analyse de données: notebooks avec le questionnaire v1 et première application streamlit
- Loading branch information
Showing
9 changed files
with
2,948 additions
and
1,396 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -161,5 +161,5 @@ cython_debug/ | |
|
||
# Precommit hooks: ruff cache | ||
.ruff_cache | ||
|
||
.DS_Store | ||
data/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
Pour exécuter l'application Streamlit de , lancer la commande suivante depuis ce répertoire : | ||
`streamlit run odi_streamlit.py` | ||
|
||
Note: | ||
L'application streamlit `analyse_app_OLD.py` sera à supprimer. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
""" | ||
# My first app with Streamlit https://streamlit.io/ | ||
Here's our first attempt at using data to create a table: | ||
""" | ||
|
||
import matplotlib.pyplot as plt | ||
import pandas as pd | ||
import streamlit as st | ||
|
||
st.title("Observatoire des Imaginaires") | ||
st.divider() | ||
st.header("Fait par la dream team _Analyse de données_") | ||
st.write( | ||
( | ||
"Cette application analyse les données du PoC. On peut se faire plaisir" | ||
" en y ajoutant tous les graphiques nécessaires. " | ||
"Le code est à nettoyer pour une meilleure maintenance ;-) " | ||
), | ||
) | ||
|
||
|
||
st.container() | ||
st.header("Aperçu des données") | ||
# Load the data | ||
file_path = "../data/Analyse réponses.xlsx - Treated data.csv" | ||
|
||
# ne pas lire la première ligne | ||
data = pd.read_csv(file_path, skiprows=1) | ||
|
||
# Supprimer les lignes où la première colonne contient "Contenu XXX" | ||
# XXX est un nombre | ||
# Et Supprimer les lignes où toutes les valeurs sont NaN | ||
df = data[~data["TITRE"].str.contains(r"Contenu \d+", na=False)].dropna(how="all") | ||
|
||
# ne conserver qu'une ligne sur 4 (ce qui revient à supprimer les informations | ||
# des personnages 2, 3, 4 quand ils existent) | ||
df_truncated = df.iloc[::4] | ||
# Nettoyage du data set | ||
|
||
# mettre les titres en majuscule | ||
df_truncated["TITRE"] = df_truncated["TITRE"].str.upper() | ||
|
||
### Convertir les types de données correctement ici | ||
# Convertir les années en entier | ||
annee = "ANNEE" | ||
df_truncated[annee] = pd.to_numeric(df_truncated[annee], errors="coerce").fillna(0).astype(int) | ||
# Trouver les titres qui apparaissent plus de 4 fois dans la colonne "TITRE" | ||
# (car chaque titre a 4 lignes, une pour chaque personnage) | ||
titles_more_than_once = df_truncated["TITRE"].value_counts() | ||
titles_more_than_once = titles_more_than_once[titles_more_than_once > 1] | ||
|
||
# Afficher un bar chart des titres les plus fréquents | ||
# Affichage d'un bar chart horizontal | ||
|
||
|
||
st.header("Films les plus fréquents") | ||
# Création du graphique | ||
fig, ax = plt.subplots() | ||
t = titles_more_than_once.sort_values(ascending=True) | ||
t.plot(kind="barh", color="skyblue", ax=ax) | ||
ax.set_xlabel("Nb") | ||
ax.set_title("Fréquence des films/séries") | ||
st.pyplot(fig) |
Oops, something went wrong.