Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple type = value pairs in waarneming #23

Open
damianooldoni opened this issue Jul 6, 2023 · 4 comments
Open

Multiple type = value pairs in waarneming #23

damianooldoni opened this issue Jul 6, 2023 · 4 comments
Assignees

Comments

@damianooldoni
Copy link
Contributor

There are some occurrences with multiple type = value pairs in column waarneming. Example: "Secundair nest vastgesteld = 1; Primair nest vastgesteld = 1; Grootte van het nest = 15; Hoogte van het nest = 8;"

It means that the splitting by = chunk should be preceded by a pivot_longer() maybe. But no fast idea how to map all the information when we have multiple inputs. The height and the size of the nest, for example, are something for the measurementOrFacts extensions? And what to do when info about both primary and secundary nests is available?

damianooldoni added a commit that referenced this issue Jul 6, 2023
@PietrH
Copy link
Member

PietrH commented Jul 18, 2023

A possible solution would be to first separate longer on the delimiter ;, and then separate wider on =

For example:

raw_data %>% 
    separate_longer_delim(cols = Waarneming, delim = "; ") %>%
    separate_wider_delim(cols = Waarneming, delim = " = ", names = c("waarneming_type","waarneming_kwaliteit")) %>%
    mutate(waarneming_kwaliteit = as.numeric(stringr::str_extract(waarneming_kwaliteit,"[0-9]+")))

Currently rows with multiple records for Waarneming are dropped as per:

input_data %<>%
  filter(is.na(.data$waarneming) | 
           !str_detect(.data$waarneming, pattern = "; "))

It's not clear to me why rows where Waarneming == NA also need to be dropped.

@PietrH PietrH self-assigned this Jul 18, 2023
@PietrH
Copy link
Member

PietrH commented Jul 18, 2023

And we could do the same for Materiaal_vast

@PietrH
Copy link
Member

PietrH commented Jul 19, 2023

Actually we already use pipe separation for this in samplingProtocol/samplingEffort: Materiaal_Vast:

datasetID eventID occurrenceID samplingProtocol samplingEffort
https://doi.org/10.15468/fw2rbx 28839 52988 conibear trap | bait trap | fike | raft trap | ground trap 2 conibear trap | 11 bait trap | 2 fike | 5 raft trap | 1 ground trap
https://doi.org/10.15468/fw2rbx 28839 65845 conibear trap | bait trap | fike | raft trap | ground trap 2 conibear trap | 11 bait trap | 2 fike | 5 raft trap |
https://doi.org/10.15468/fw2rbx 28839 79317 conibear trap | bait trap | fike | raft trap | ground trap 2 conibear trap | 11 bait trap | 2 fike | 5 raft trap |
https://doi.org/10.15468/fw2rbx 28839 93328 conibear trap | bait trap | fike | raft trap | ground trap 2 conibear trap | 11 bait trap | 2 fike | 5 raft trap |
https://doi.org/10.15468/fw2rbx 35730 237365 conibear trap | bait trap | fike | raft trap | ground trap 3 conibear trap | 7 bait trap | 6 fike | 8 raft trap | 2 ground trap

@PietrH
Copy link
Member

PietrH commented Jul 19, 2023

Currently waarneming is mapped to organismQuantity and organismQuantityType, as well as occurrenceStatus. A number of other values are not mapped. A suggestion to map them all, using measurementOrFact:

waarneming_type n currently_mapped mapped_to
Haard vastgesteld 6067 TRUE occurrenceStatus
Waarneming onzeker 794 TRUE occurrenceStatus
Vastgesteld (aantal) 626 TRUE organismQuantity/organismQuantityType/occurrenceStatus
Vastgesteld (in m²) 559 TRUE organismQuantity/organismQuantityType/occurrenceStatus
Hoogte van het nest 206 FALSE measurementType/measurementValue/measurementUnit
Grootte van het nest 192 FALSE measurementType/measurementValue/measurementUnit
Secundair nest vastgesteld 160 TRUE organismQuantityType
Geen haard vastgesteld 117 TRUE occurrenceStatus
Niet vastgesteld 117 TRUE occurrenceStatus
Vastgesteld 65 TRUE occurrenceStatus
Beverdam vastgesteld (aantal) 32 FALSE canonicalName
Primair nest vastgesteld 28 FALSE organismQuantityType
Geen Aziatische hoornaar 9 FALSE occurrenceStatus
Nest vastgesteld 4 TRUE occurrenceStatus
Uitgebouwd primair nest vastgesteld 4 FALSE organismQuantityType
Embryonest vastgesteld 1 FALSE organismQuantityType
Rode Amerikaanse rivierkreeft 1 TRUE canonicalName

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants