-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove records that are mistaken #76
Comments
Some cases/events have exactly two records, but the first one is Some events have exactly two records, but both of them have status |
In fact, I can't find any records with |
These are the combinations I can find for events/cases with exactly two records:
|
Messaged Lien about this to hear her opinion |
For my own future reference, you can get these records out as follows: readr::read_csv("data/raw/rato_data.csv") %>%
group_by(Dossier_ID) %>%
mutate(n_records = n()) %>%
filter(n_records == 2) %>%
filter(
Dossier_Status[which.max(Laatst_Bewerkt_Datum)] == "Verwerkt en afgesloten" &
Dossier_Status[which.min(Laatst_Bewerkt_Datum)] == "Opvolging") %>%
arrange(Dossier_ID) And count the combinations like this: readr::read_csv("data/raw/rato_data.csv") %>%
group_by(Dossier_ID) %>%
mutate(n_records = n()) %>%
filter(n_records == 2) %>%
group_by(Dossier_ID) %>%
summarise(comb_status = paste(Dossier_Status, collapse = "|")) %>%
count(comb_status, sort = TRUE) |
Thanks for raising this issue. Anyhow, you're correct that the only two remainig statuses are I checked your table, i believe that the 319 records with the consecutive statuses |
Overview of dossier_status orders that are not logical to us: https://docs.google.com/spreadsheets/d/1oIM33fpTivNYAsbVWW-MxHTpXD4_FCxSZMTyNQCwE4E/edit?usp=sharing We expect:
To create: readr::read_csv("data/raw/rato_data.csv") %>%
filter(Domein != "Werken") %>%
mutate(.keep = "used", Dossier_Status, Dossier_ID) %>%
group_by(Dossier_ID) %>%
summarise(status_col = glue::glue_collapse(Dossier_Status, sep = "|")) %>%
ungroup() %>%
group_by(status_col) %>%
summarise(dossiers_col = glue::glue_collapse(Dossier_ID, sep = ",")) %>%
filter(!stringr::str_detect(status_col, "^Opvolging.*Verwerkt en afgesloten$")) %>%
filter(!stringr::str_detect(status_col, "^Opvolging(\\|Opvolging)*$")) %>%
tidyr::separate_wider_delim(status_col, delim = "|", names_sep = "_", too_few = "align_start") %>%
mutate(n_dossiers = stringr::str_count(dossiers_col, ",") + 1) %>%
arrange(-n_dossiers) %>%
dplyr::relocate(n_dossiers, dossiers_col, .before = status_col_1)
Remove the filters to see all combinations. |
@LienReyserhove I remember we brought this up during a meeting with RATO, but I can't find any action points anywhere. We currently have 1482 cases where there are exactly two records in a case, and the first one is raw_data %>% group_by(Dossier_ID) %>%
mutate(n_records = n()) %>%
filter(n_records == 2) %>%
filter(
Dossier_Status[which.max(Laatst_Bewerkt_Datum)] == "Verwerkt en afgesloten" &
Dossier_Status[which.min(Laatst_Bewerkt_Datum)] == "Opvolging") I've updated this google doc: https://docs.google.com/spreadsheets/d/1oIM33fpTivNYAsbVWW-MxHTpXD4_FCxSZMTyNQCwE4E/ To the most recent situation, we currently have 203 different patterns that do not make sense, for a total of 4191 cases (dossiers), most of which have been closed without opening them (3277). Questions:
The dataset is growing quickly with the rats, and this phenomenon has grown with it |
Records from the raw data that:
Are mistakes. Lien and Karel have decided that they should be removed.
The text was updated successfully, but these errors were encountered: