-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gbif codes outdated or not correct: manually map taxonomic information? #24
Comments
As is the case for this record:
It was mapped to the wrong taxonkey, I can look up the used taxonkeys for riparias on this page: https://alert.riparias.be/about-data Which you can then lookup as so: c('2978552', '2489005', '3190653', '2498252', '3084923', '2340977', '2706080', '5328593', '2502792', '3170247', '2440934', '3129663', '2882443', '2437394', '2437399', '3189935', '3169169', '4284921', '4417558', '2704521', '2482499', '5362054', '5329263', '2702865', '2765942', '5329212', '2225776', '7346102', '8930656', '8721209', '8909595', '8979506', '8971201', '5712056', '2350580', '2350570', '2984306', '6063677', '7287606', '3034825', '3628745', '3642949', '2434271', '5384931', '2984537', '7978544', '2891770', '8848208', '2865565', '9799308', '2394486', '8114276', '5855350', '2427091', '5421039', '5420991', '2650436', '2869311', '5289808', '2394604', '2440946', '4264680', '5361785', '5361762', '2433536', '2434552', '5219858', '2498305', '2226990', '3086784', '5828232', '2390064', '4033648', '3088310', '2870583', '7965247', '2766030', '2227289', '2227300', '9442269', '5218786', '5358460', '2362868', '2977647', '2486131', '5824863', '5274863', '5384932', '5219681', '5219683', '5035187', '5035230', '5035017', '2437450', '2480764', '2443002', '3054399', '1311477', '1315391', '5217334', '10919373') %>%
purrr::map_dfr(~rgbif::name_usage(.x)$data) %>%
View("riparias-taxa") So I suggest mapping the vernacular name in "Soort" manually to the table created by parsing the list of LIFE RIPARIAS target species via a lookup table. We currently already have a hardcoded list of species we expect in the tests: testthat::test_that("scientificName is never NA and one of the list", {
species <- c(
"Ondatra zibethicus",
"Fallopia japonica",
"Castor fiber",
"Gallus gallus domesticus",
"Myriophyllum aquaticum",
"Alopochen aegyptiaca",
"Ludwigia peploides",
"Martes foina",
"Hydrocotyle ranunculoides",
"Vespa velutina",
"Heracleum mantegazzianum",
"Rattus norvegicus",
"Cairina moschata",
"Anser anser domesticus",
"Neovison vison",
"Trachemys scripta",
"Psittacula krameri",
"Oryctolagus cuniculus",
"Branta canadensis",
"Branta leucopsis",
"Anatidae",
"Anser anser",
"Impatiens glandulifera",
"Myocastor coypus",
"Lysichiton americanus",
"Procambarus clarkii",
"Ludwigia grandiflora",
"Sciurus",
"Crassula helmsii"
)
testthat::expect_true(all(!is.na(dwc_occurrence$scientificName)))
testthat::expect_true(all(dwc_occurrence$scientificName %in% species))
})
We are also currently already overwriting some of the provided taxonids: input_data %<>%
mutate(gbif_code = case_when(
soort == "Waterteunisbloem" ~ 5421039,
soort == "Rivierkreeft" &
(str_detect(waarneming, "Rode Amerikaanse rivierkreeft") |
str_detect(opmerkingen, "Amerikaanse")) ~ 2227300,
TRUE ~ gbif_code
)
) In short, I support this idea. I think we should switch over to a manual mapping via a lookup table. I will do this, but see this as medium priority. |
Related to #207 |
It seems that some GBIF codes are not pointing to what the Dutch names refer to, e.g.
Waterteunisbloem
should point to Ludwigia grandiflora, while the providedgbif_code
is pointing to the genus.My opinion is to not use column
gbif_code
as it can become outdated. RATO could also drop it in their database. As the number of species is quite limited (<30) a manual mapping is the best solution, I think. We do so in POV datasets as well. In this way we get easily a warning if some new species pop up in the raw data as NA would occur in DwC output file and this will be detected by the specific test.As we need to publish fast now, I will correct the GBIF codes in the mapping as a patch.
The text was updated successfully, but these errors were encountered: