Multiple type = value pairs in `waarneming` #23

damianooldoni · 2023-07-06T10:17:47Z

There are some occurrences with multiple type = value pairs in column waarneming. Example: "Secundair nest vastgesteld = 1; Primair nest vastgesteld = 1; Grootte van het nest = 15; Hoogte van het nest = 8;"

It means that the splitting by = chunk should be preceded by a pivot_longer() maybe. But no fast idea how to map all the information when we have multiple inputs. The height and the size of the nest, for example, are something for the measurementOrFacts extensions? And what to do when info about both primary and secundary nests is available?

The text was updated successfully, but these errors were encountered:

Patch for #23

PietrH · 2023-07-18T14:31:09Z

A possible solution would be to first separate longer on the delimiter ;, and then separate wider on =

For example:

raw_data %>% 
    separate_longer_delim(cols = Waarneming, delim = "; ") %>%
    separate_wider_delim(cols = Waarneming, delim = " = ", names = c("waarneming_type","waarneming_kwaliteit")) %>%
    mutate(waarneming_kwaliteit = as.numeric(stringr::str_extract(waarneming_kwaliteit,"[0-9]+")))

Currently rows with multiple records for Waarneming are dropped as per:

input_data %<>%
  filter(is.na(.data$waarneming) | 
           !str_detect(.data$waarneming, pattern = "; "))

It's not clear to me why rows where Waarneming == NA also need to be dropped.

PietrH · 2023-07-18T14:32:03Z

And we could do the same for Materiaal_vast

PietrH · 2023-07-19T13:01:07Z

Actually we already use pipe separation for this in samplingProtocol/samplingEffort: Materiaal_Vast:

datasetID	eventID	occurrenceID	samplingProtocol	samplingEffort
https://doi.org/10.15468/fw2rbx	28839	52988	conibear trap \| bait trap \| fike \| raft trap \| ground trap	2 conibear trap \| 11 bait trap \| 2 fike \| 5 raft trap \| 1 ground trap
https://doi.org/10.15468/fw2rbx	28839	65845	conibear trap \| bait trap \| fike \| raft trap \| ground trap	2 conibear trap \| 11 bait trap \| 2 fike \| 5 raft trap \|
https://doi.org/10.15468/fw2rbx	28839	79317	conibear trap \| bait trap \| fike \| raft trap \| ground trap	2 conibear trap \| 11 bait trap \| 2 fike \| 5 raft trap \|
https://doi.org/10.15468/fw2rbx	28839	93328	conibear trap \| bait trap \| fike \| raft trap \| ground trap	2 conibear trap \| 11 bait trap \| 2 fike \| 5 raft trap \|
https://doi.org/10.15468/fw2rbx	35730	237365	conibear trap \| bait trap \| fike \| raft trap \| ground trap	3 conibear trap \| 7 bait trap \| 6 fike \| 8 raft trap \| 2 ground trap

PietrH · 2023-07-19T13:41:56Z

Currently waarneming is mapped to organismQuantity and organismQuantityType, as well as occurrenceStatus. A number of other values are not mapped. A suggestion to map them all, using measurementOrFact:

waarneming_type	n	currently_mapped	mapped_to
Haard vastgesteld	6067	TRUE	occurrenceStatus
Waarneming onzeker	794	TRUE	occurrenceStatus
Vastgesteld (aantal)	626	TRUE	organismQuantity/organismQuantityType/occurrenceStatus
Vastgesteld (in m²)	559	TRUE	organismQuantity/organismQuantityType/occurrenceStatus
Hoogte van het nest	206	FALSE	measurementType/measurementValue/measurementUnit
Grootte van het nest	192	FALSE	measurementType/measurementValue/measurementUnit
Secundair nest vastgesteld	160	TRUE	organismQuantityType
Geen haard vastgesteld	117	TRUE	occurrenceStatus
Niet vastgesteld	117	TRUE	occurrenceStatus
Vastgesteld	65	TRUE	occurrenceStatus
Beverdam vastgesteld (aantal)	32	FALSE	canonicalName
Primair nest vastgesteld	28	FALSE	organismQuantityType
Geen Aziatische hoornaar	9	FALSE	occurrenceStatus
Nest vastgesteld	4	TRUE	occurrenceStatus
Uitgebouwd primair nest vastgesteld	4	FALSE	organismQuantityType
Embryonest vastgesteld	1	FALSE	organismQuantityType
Rode Amerikaanse rivierkreeft	1	TRUE	canonicalName

damianooldoni added a commit that referenced this issue Jul 6, 2023

Remove multiple type - value pairs in waarneming

6c44e17

Patch for #23

PietrH self-assigned this Jul 18, 2023

PietrH mentioned this issue Nov 9, 2023

[AUTO] Update data #104

Merged

This was referenced Jan 22, 2024

[AUTO] Update data #151

Closed

waarneming now has trailing whitespace by default, tripping multiple value detection #152

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple type = value pairs in `waarneming` #23

Multiple type = value pairs in `waarneming` #23

damianooldoni commented Jul 6, 2023

PietrH commented Jul 18, 2023

PietrH commented Jul 18, 2023

PietrH commented Jul 19, 2023

PietrH commented Jul 19, 2023

Multiple type = value pairs in waarneming #23

Multiple type = value pairs in waarneming #23

Comments

damianooldoni commented Jul 6, 2023

PietrH commented Jul 18, 2023

PietrH commented Jul 18, 2023

PietrH commented Jul 19, 2023

PietrH commented Jul 19, 2023

Multiple type = value pairs in `waarneming` #23

Multiple type = value pairs in `waarneming` #23