Skip to content

Commit

Permalink
Adjust obvious data errors in TYOAIKA (h/t tvainika)
Browse files Browse the repository at this point in the history
  • Loading branch information
akx committed Sep 25, 2023
1 parent 6bb7038 commit 2049638
Show file tree
Hide file tree
Showing 3 changed files with 5 additions and 0 deletions.
Binary file modified data/2023/results-en.xlsx
Binary file not shown.
Binary file modified data/2023/results-fi.xlsx
Binary file not shown.
5 changes: 5 additions & 0 deletions pulkka/data_ingest.py
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,11 @@ def read_data() -> pd.DataFrame:
df[SUKUPUOLI_COL] = df[SUKUPUOLI_COL].apply(map_sukupuoli).astype("category")
df[IKA_COL] = df[IKA_COL].astype("category")

# Assume that people entering 37.5 (hours) as their työaika means 100%
df.loc[df[TYOAIKA_COL] == 37.5, TYOAIKA_COL] = 100
# Assume there is no actual 10x koodari among us
df.loc[df[TYOAIKA_COL] == 1000, TYOAIKA_COL] = 100

df[TYOAIKA_COL] = to_percentage(df[TYOAIKA_COL], 100)
df[LAHITYO_COL] = to_percentage(df[LAHITYO_COL], 100)

Expand Down

0 comments on commit 2049638

Please sign in to comment.