Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gbif codes outdated or not correct: manually map taxonomic information? #24

Open
damianooldoni opened this issue Jul 6, 2023 · 2 comments
Assignees
Labels
enhancement New feature or request mapping

Comments

@damianooldoni
Copy link
Contributor

It seems that some GBIF codes are not pointing to what the Dutch names refer to, e.g. Waterteunisbloem should point to Ludwigia grandiflora, while the provided gbif_code is pointing to the genus.

My opinion is to not use column gbif_code as it can become outdated. RATO could also drop it in their database. As the number of species is quite limited (<30) a manual mapping is the best solution, I think. We do so in POV datasets as well. In this way we get easily a warning if some new species pop up in the raw data as NA would occur in DwC output file and this will be detected by the specific test.

As we need to publish fast now, I will correct the GBIF codes in the mapping as a patch.

@damianooldoni damianooldoni changed the title gbif codes outdated or not correct: manually map taxonomic information gbif codes outdated or not correct: manually map taxonomic information? Jul 6, 2023
damianooldoni added a commit that referenced this issue Jul 6, 2023
@PietrH
Copy link
Member

PietrH commented Oct 24, 2023

As is the case for this record:

Dossier_ID OBJECTID Dossier_Status Domein Soort Waarneming Actie Materiaal_Vast Opmerkingen_admin Opmerkingen Melder_Naam Melder_Klant Planning_Datum X Y Gemeente Aard_Locatie GBIF_Code Dossier_Link Dossier_Link_ID Hoofddossier_ID Aangemaakt_Datum Laatst_Bewerkt_Datum Datum_Van Geometrie_Type Shape
460271028 589775 Opvolging Plant Mantsjoerese wilde rijst NA NA NA NA NA NA Andere NA 95383.03 189125.1 Deinze Publiek 7901745 0 NA -1 2023-10-09 15:19:50 2023-10-09 15:20:11 2023-10-09 15:19:50 Point POINT (95383.02510000 189125.06350000)

It was mapped to the wrong taxonkey, I can look up the used taxonkeys for riparias on this page: https://alert.riparias.be/about-data

image

Which you can then lookup as so:

c('2978552', '2489005', '3190653', '2498252', '3084923', '2340977', '2706080', '5328593', '2502792', '3170247', '2440934', '3129663', '2882443', '2437394', '2437399', '3189935', '3169169', '4284921', '4417558', '2704521', '2482499', '5362054', '5329263', '2702865', '2765942', '5329212', '2225776', '7346102', '8930656', '8721209', '8909595', '8979506', '8971201', '5712056', '2350580', '2350570', '2984306', '6063677', '7287606', '3034825', '3628745', '3642949', '2434271', '5384931', '2984537', '7978544', '2891770', '8848208', '2865565', '9799308', '2394486', '8114276', '5855350', '2427091', '5421039', '5420991', '2650436', '2869311', '5289808', '2394604', '2440946', '4264680', '5361785', '5361762', '2433536', '2434552', '5219858', '2498305', '2226990', '3086784', '5828232', '2390064', '4033648', '3088310', '2870583', '7965247', '2766030', '2227289', '2227300', '9442269', '5218786', '5358460', '2362868', '2977647', '2486131', '5824863', '5274863', '5384932', '5219681', '5219683', '5035187', '5035230', '5035017', '2437450', '2480764', '2443002', '3054399', '1311477', '1315391', '5217334', '10919373') %>%
    purrr::map_dfr(~rgbif::name_usage(.x)$data) %>%
    View("riparias-taxa")

So I suggest mapping the vernacular name in "Soort" manually to the table created by parsing the list of LIFE RIPARIAS target species via a lookup table.

We currently already have a hardcoded list of species we expect in the tests:

testthat::test_that("scientificName is never NA and one of the list", {
  species <- c(
    "Ondatra zibethicus",
    "Fallopia japonica",
    "Castor fiber",
    "Gallus gallus domesticus",
    "Myriophyllum aquaticum",
    "Alopochen aegyptiaca",
    "Ludwigia peploides",
    "Martes foina",
    "Hydrocotyle ranunculoides",
    "Vespa velutina",
    "Heracleum mantegazzianum",
    "Rattus norvegicus",
    "Cairina moschata",
    "Anser anser domesticus",
    "Neovison vison",
    "Trachemys scripta",
    "Psittacula krameri",
    "Oryctolagus cuniculus",
    "Branta canadensis",
    "Branta leucopsis",
    "Anatidae",
    "Anser anser",
    "Impatiens glandulifera",
    "Myocastor coypus",
    "Lysichiton americanus",
    "Procambarus clarkii",
    "Ludwigia grandiflora",
    "Sciurus",
    "Crassula helmsii"
  )
  testthat::expect_true(all(!is.na(dwc_occurrence$scientificName)))
  testthat::expect_true(all(dwc_occurrence$scientificName %in% species))
})

We are also currently already overwriting some of the provided taxonids: GBIF_Code:

input_data %<>%
  mutate(gbif_code = case_when(
    soort == "Waterteunisbloem" ~ 5421039,
    soort == "Rivierkreeft" & 
      (str_detect(waarneming, "Rode Amerikaanse rivierkreeft") | 
         str_detect(opmerkingen, "Amerikaanse")) ~ 2227300, 
    TRUE ~ gbif_code
  )
)

In short, I support this idea. I think we should switch over to a manual mapping via a lookup table. I will do this, but see this as medium priority.

@PietrH PietrH added enhancement New feature or request mapping labels Oct 24, 2023
@PietrH PietrH self-assigned this Oct 24, 2023
@PietrH
Copy link
Member

PietrH commented Aug 19, 2024

Related to #207

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request mapping
Projects
None yet
Development

No branches or pull requests

2 participants