Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove records that are mistaken #76

Open
PietrH opened this issue Sep 26, 2023 · 8 comments
Open

Remove records that are mistaken #76

PietrH opened this issue Sep 26, 2023 · 8 comments
Assignees
Labels
bug Something isn't working help wanted Extra attention is needed mapping

Comments

@PietrH
Copy link
Member

PietrH commented Sep 26, 2023

Records from the raw data that:

  • have the same Dossier_ID
  • Have exactly 2 records for that ID
  • Have Dossier_Status: Ingepland for the first and Vewerkt en afgesloten for the second

Are mistakes. Lien and Karel have decided that they should be removed.

@PietrH PietrH added bug Something isn't working enhancement New feature or request mapping labels Sep 26, 2023
@PietrH PietrH self-assigned this Sep 26, 2023
@PietrH
Copy link
Member Author

PietrH commented Nov 8, 2023

Some cases/events have exactly two records, but the first one is Opvolging and the second one Verwerkt en afgesloten, eg Dossier_ID == 26732

Some events have exactly two records, but both of them have status Verwerkt en afgesloten, such as Dossier_ID == 27590

@PietrH
Copy link
Member Author

PietrH commented Nov 8, 2023

In fact, I can't find any records with Dossier_Status == "Ingepland", I've probably misunderstood something. Do we mean Opvolging?

@PietrH PietrH added help wanted Extra attention is needed and removed enhancement New feature or request labels Nov 8, 2023
@PietrH
Copy link
Member Author

PietrH commented Nov 8, 2023

These are the combinations I can find for events/cases with exactly two records:

status1|status2 n
Opvolging|Verwerkt en afgesloten 319
Opvolging|Opvolging 33
Verwerkt en afgesloten|Verwerkt en afgesloten 20
Verwerkt en afgesloten|Opvolging 1

@PietrH
Copy link
Member Author

PietrH commented Nov 8, 2023

Messaged Lien about this to hear her opinion

@PietrH
Copy link
Member Author

PietrH commented Nov 8, 2023

For my own future reference, you can get these records out as follows:

readr::read_csv("data/raw/rato_data.csv") %>% 
  group_by(Dossier_ID) %>% 
  mutate(n_records = n()) %>% 
  filter(n_records == 2) %>% 
  filter(
    Dossier_Status[which.max(Laatst_Bewerkt_Datum)] == "Verwerkt en afgesloten" &
      Dossier_Status[which.min(Laatst_Bewerkt_Datum)] == "Opvolging") %>% 
  arrange(Dossier_ID)

And count the combinations like this:

readr::read_csv("data/raw/rato_data.csv") %>% 
group_by(Dossier_ID) %>% 
mutate(n_records = n()) %>% 
filter(n_records == 2) %>%
group_by(Dossier_ID) %>%
summarise(comb_status = paste(Dossier_Status, collapse = "|")) %>%
count(comb_status, sort = TRUE)

@LienReyserhove
Copy link
Contributor

Thanks for raising this issue.
It's a bit odd that there's no status ingepland anymore. Perhaps I'm mistaken or it was removed from the data in the WFS.

Anyhow, you're correct that the only two remainig statuses are Opvolging and Verwerkt en afgesloten.

I checked your table, i believe that the 319 records with the consecutive statuses Opvolging and Opvolging are the ones that are opened by mistake and closed afterwards. The others records do not make any sense, in my opinion, and should be considered as errors as well. To be verified by RATO/Emiel.

@PietrH
Copy link
Member Author

PietrH commented Dec 1, 2023

Overview of dossier_status orders that are not logical to us:

https://docs.google.com/spreadsheets/d/1oIM33fpTivNYAsbVWW-MxHTpXD4_FCxSZMTyNQCwE4E/edit?usp=sharing

We expect:

  • always start with Opvolging
  • Verwerkt en afgesloten can only be the last element

To create:

readr::read_csv("data/raw/rato_data.csv") %>%
  filter(Domein != "Werken") %>%
  mutate(.keep = "used", Dossier_Status, Dossier_ID) %>%
  group_by(Dossier_ID) %>%
  summarise(status_col = glue::glue_collapse(Dossier_Status, sep = "|")) %>%
  ungroup() %>%
  group_by(status_col) %>%
  summarise(dossiers_col = glue::glue_collapse(Dossier_ID, sep = ",")) %>%
  filter(!stringr::str_detect(status_col, "^Opvolging.*Verwerkt en afgesloten$")) %>%
  filter(!stringr::str_detect(status_col, "^Opvolging(\\|Opvolging)*$")) %>%
  tidyr::separate_wider_delim(status_col, delim = "|", names_sep = "_", too_few = "align_start") %>%
  mutate(n_dossiers = stringr::str_count(dossiers_col, ",") + 1) %>%
  arrange(-n_dossiers) %>%
  dplyr::relocate(n_dossiers, dossiers_col, .before = status_col_1)
  

Remove the filters to see all combinations.

@PietrH
Copy link
Member Author

PietrH commented Aug 19, 2024

@LienReyserhove I remember we brought this up during a meeting with RATO, but I can't find any action points anywhere.

We currently have 1482 cases where there are exactly two records in a case, and the first one is Opvolging and the second one is Verwerkt en afgesloten, to replicate:

raw_data %>% group_by(Dossier_ID) %>% 
    mutate(n_records = n()) %>% 
    filter(n_records == 2) %>% 
    filter(
        Dossier_Status[which.max(Laatst_Bewerkt_Datum)] == "Verwerkt en afgesloten" &
            Dossier_Status[which.min(Laatst_Bewerkt_Datum)] == "Opvolging")

I've updated this google doc: https://docs.google.com/spreadsheets/d/1oIM33fpTivNYAsbVWW-MxHTpXD4_FCxSZMTyNQCwE4E/

To the most recent situation, we currently have 203 different patterns that do not make sense, for a total of 4191 cases (dossiers), most of which have been closed without opening them (3277).

Questions:

  • Did we ever get an answer from RATO?
  • Can we identify combinations that are not allowed, and then remove those records from the publication

The dataset is growing quickly with the rats, and this phenomenon has grown with it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed mapping
Projects
None yet
Development

No branches or pull requests

2 participants