Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing Date_Types in 12 packages #97

Open
stschiff opened this issue Oct 1, 2022 · 2 comments
Open

Missing Date_Types in 12 packages #97

stschiff opened this issue Oct 1, 2022 · 2 comments

Comments

@stschiff
Copy link
Member

stschiff commented Oct 1, 2022

Lots of packages contain missing Date_Types in the Janno file. In my, a lot of those we should be able to fill easily:

a. If there are entries in the C14-type columns, put Date_Type to C14.
b. If there are entries in the calbrated columns, but not in the C14-columns, put Date_Type to contextual.
c. If it's modern samples, put to modern.
d. If the sample is ancient, but there is no date at all, keep at n/a for now, but of course those we should anyway also fill soon, at least as a contextual range, which should always be possible from a look into the paper.

published_data % trident list --individuals -d . -j Date_Type --raw | awk '$4 == "n/a"' | cut -f1 | sort | uniq -c
   5 2020_Brunel_France
   1 2020_Cassidy_IrishDynastic
  12 2020_Furtwaengler_Switzerland
  20 2020_Nakatsuka_SouthPatagonia
  30 2020_Ning_China
   1 2020_Wang_subSaharanAfrica
  24 2020_Yang_China
  40 2021_Kilinc_northeastAsia
 826 2021_PattersonNature
  18 2021_Saag_EastEuropean
  22 2021_SaupeCurrBiol
 383 2021_Wang_EastAsia
@nevrome
Copy link
Member

nevrome commented Oct 4, 2022

I'm slowly crawling out of my hole and thought I quickly take a peek into this. Dana and I concluded back then for #25 that there is unfortunately a lot of d. in the mix. This might have changed now, so let's see. c. is trivial (although I think there is no automatic way to find these samples, right?), so let's check a. and b.

a. should be an impossible state of the system, so it would surprise me if it exists:

https://github.com/poseidon-framework/poseidon-hs/blob/6be96d0a933b564cfa017471aedaa30a32a7ebd0/src/Poseidon/Janno.hs#L820-L831

I checked anyway:

janno <- poseidonR::read_janno("~/agora/published_data/")

### If there are entries in the C14-type columns, put Date_Type to C14.

janno_with_actual_C14_dates <- janno %>% dplyr::filter(
  # do not include dates for which applies
  !purrr::map_lgl(Date_C14_Uncal_BP, \(x) {
    is.null(x) ||           # date is NULL
      if (length(x) == 1) { # if there is exactly one date value
        is.na(x)            # date is NA
      } else {
        FALSE
      }
  })
)

janno_with_actual_C14_dates %>% nrow # 3606
janno %>% dplyr::filter(Date_Type == "C14") %>% nrow # 3607
janno_with_actual_C14_dates %>%
  dplyr::filter(is.na(Date_Type) | Date_Type != "C14") %>% nrow() # 0

So I think such a sample does indeed not exist. b. is a lot more likely.

### If there are entries in the calibrated columns, but not in the C14-columns, put Date_Type to contextual.

janno_with_result_dates <- janno %>% dplyr::filter(
  !is.na(janno$Date_BC_AD_Median)
)

janno_potentially_contextual <- dplyr::anti_join(
  janno_with_result_dates,
  janno_with_actual_C14_dates,
  by = "Poseidon_ID"
)

janno_potentially_contextual %>%
  dplyr::filter(is.na(Date_Type) | Date_Type != "contextual") %>%
  nrow # 840 

OK! So we could automatically fill these 840 (826 from 2021_PattersonNature) with contextual. I fear this will often be factually incorrect, but it makes our DB consistent. We should also make sure that b. is caught by the validation and can not emerge any more in the future.

Btw. my brain is still pretty mushy so take this with a grain of salt.

@stschiff
Copy link
Member Author

stschiff commented Nov 2, 2022

OK, good catch that 826 of the missing date infos with calibrated dates are from Patterson. I think we should then open a separate issue to fill in the uncalibrated dates for these, as I think they must have C14-dated most if not all of their samples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants