Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get updated ARTIS tables to Friends of the Web #43

Open
1 task
theamarks opened this issue Jan 9, 2025 · 2 comments
Open
1 task

Get updated ARTIS tables to Friends of the Web #43

theamarks opened this issue Jan 9, 2025 · 2 comments
Assignees
Labels
🎨 Data Viz / Reporting Create a beautiful story

Comments

@theamarks
Copy link
Member

theamarks commented Jan 9, 2025

Tables needed for displaying ARTIS data on website. Use tables corresponding to KNB 2024_07_31 V1 release. Created local untracked script to assess tables theamarks-computer-dirs/git-projects/artis-model/AM_local/scripts/website_tables_2025_01_13.R

Here are the final cleaned tables for website --> This Google Drive Folder

SciName

with all taxa fields filled out. SciName metadata table wit hcomplete taxonomic info. Remove sub and super fields? Or may need to provide specific guidance to create taxonomic tree

web_sciname <- data.table::fread(data_dir_knb, "sciname.csv")

subfam_rows <- web_sciname %>%
  filter(!is.na(subfamily) & is.na(genus))
# result - no rows meet the criteria
# could remove subfamily without disrupting the data

subfam_rows_1 <- web_sciname %>%
     filter(!is.na(subfamily))

subfam_rows_2 <- web_sciname %>%
     filter(!is.na(subfamily) & subfamily!=family)
# result - there are ~1,200 rows where subfamily is not equal to family
# looks like subfamily is important to retain

Recommendation for use on website

If subfamily is not NA then the family should point to the subfamily value and subsequently point to genus next (where there is always a value). If subfamily is NA then family should point to genus.

Products

Commodity metadata table in 04-create-metadata.R table with all HS versions

Recommendation for use on website

Here is a filtered products table, it only contains HS product codes that appear in ARTIS FAO trade data on KNB Google Drive Folder Link

knb_prod <- read_parquet(file.path("~/Documents/UW-SAFS/ARTIS/data/KNB_2024_07_31/data","artis_midpoint_all_HS_all_yrs_knb_v1.parquet"))

# Only want HS product codes that are in trade data
web_products_clean <- web_products %>%
  filter(hs6 %in% knb_prod$hs6)

Nutrient content data

  • Get recent nutrient table version from @whitneyf with comments of any necessary cleaning steps

Recommendation for use on website

@theamarks theamarks moved this to 🏗 In Progress in ARTIS Maintence & Analysis Jan 9, 2025
@theamarks theamarks self-assigned this Jan 13, 2025
@theamarks theamarks added the 🎨 Data Viz / Reporting Create a beautiful story label Jan 14, 2025
@theamarks
Copy link
Member Author

theamarks commented Jan 14, 2025

❓Questions

@jagephart Do we want all possible HS codes for the website or do we only want all HS codes that appeared in ARTIS? Current products.csv file output in sql_database/ directory contains model_inputs_raw/All_HS_Codes.csv joined with additional presentation and state information pulled from all "hs-hs-match" files output by the model.

# Creating Product metadata table
# hs codes, descriptions, FMFO status, product form
# Read in list of HS codes found in K Drive Data folder
products <- read.csv(file.path(model_inputs_raw, "All_HS_Codes.csv"))
products <- products %>%
mutate(Code = as.character(Code)) %>%
mutate(Code = case_when(
str_length(Code) < 6 ~ paste("0", Code, sep=""),
TRUE ~ Code
))
# Read in all hs-hs_match files concentrate on:
# code_pre, code_post, presentation_pre, presentation_post, state_pre, state_post
# Get list of all hs-hs-match files
prep_state_files <- list.files(path=model_inputs_dir, pattern="hs-hs-match", include.dirs=FALSE)
prep_state <- data.frame()
for (i in 1:length(prep_state_files)) {
curr_file <- file.path(model_inputs_dir, prep_state_files[i])
curr_prep_state <- read.csv(curr_file)
curr_prep_state <- curr_prep_state %>%
select(Code_pre, Code_post, Presentation_pre, Presentation_post, State_pre, State_post)
curr_prep_state <- data.frame(
hs6 = c(curr_prep_state$Code_pre, curr_prep_state$Code_post),
presentation = c(curr_prep_state$Presentation_pre, curr_prep_state$Presentation_post),
state = c(curr_prep_state$State_pre, curr_prep_state$State_post)
) %>%
distinct() %>%
mutate(hs6 = as.character(hs6)) %>%
mutate(hs6 = case_when(
str_length(hs6) < 6 ~ paste("0", hs6, sep=""),
TRUE ~ hs6
))
prep_state <- prep_state %>%
bind_rows(curr_prep_state)
}
products <- products %>%
left_join(prep_state, by=c("Code"="hs6")) %>%
rename(hs6 = Code)
names(products) <- tolower(names(products))
# Writing out results
write.csv(products, file.path(outdir, "products.csv"), row.names=FALSE)

ONLY HS codes that appear in ARTIS in any HS version

This is a somewhat separate issue

@theamarks
Copy link
Member Author

Questions 2025_01_21

@jagephart

For the sciname table I pulled missing taxa ranks from WoRMS API and only retained taxa rank columns already in sciname . However, by limiting the taxa rank columns in this way, I'm questioning if this remaining taxa information makes sense for actinopteri and hippoglossinae. As we talked about last week, actinopteri is a superclass and if we use the current sciname taxa classification schema this row ends up with the sciname value not matching any of the rank columns.

This isn't appropriate for the website tree viz right, it would need sciname to match a value in one of the taxa classification columns? Would it be a problem if I changed the sciname value now? This might mess up website taxa filtering functionality depending on how things are set up?

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🎨 Data Viz / Reporting Create a beautiful story
Projects
Status: 🏗 In Progress
Development

No branches or pull requests

1 participant