start on fishbc issue NewGraphEnvironment/fish_passage_fraser_2023_re…

…porting#75
lucy-schick · Jul 11, 2024 · 4b1dc06 · 4b1dc06
1 parent 3bb4f4b
commit 4b1dc06
Show file tree

Hide file tree

Showing 6 changed files with 25,782 additions and 0 deletions.
diff --git a/R/cdc.Rmd b/R/cdc.Rmd
@@ -0,0 +1,89 @@
+
+
+1. could determine which columns in `data-raw/cdc/cdc.csv` are needed to tie Species Code to Element Code. Make a new csv called `xref_sp_element_codes.csv` (or something) and burn just those columns to it
+
+```{r select-columns}
+
+## select species codes and elements codes from cdc.csv
+xref_sp_element_codes <- cdc |> 
+  select("Species Code", "Element Code")
+
+#burn to csv
+readr::write_csv(xref_sp_element_codes, "data-raw/cdc/xref_sp_element_codes.csv")
+
+```
+
+2. download both exports from the cdc website into data-raw.  keep their names as is if they are descriptive (can't remember)
+
+3. In a new `data-raw/cdc.R` file read in xl files with` readxl` (I think you need to open them first and close or some weird thing to avoid a  error - link to help url in your .R file  if you run into it) and export both of those as csvs with their original exported names in data-raw.
+
+```{r import-data}
+## Read in results from cdc website
+results_raw <- readr::read_csv("data-raw/cdc/resultsExport.csv")
+
+## Read in conservation status info from cdc website
+constat_raw <- readr::read_csv("data-raw/cdc/ConsStatusRptExport.csv")
+```
+
+
+4. In `data-raw/cdc.R` join  both exports from the cdc website together by Element Code excluding any duplicated columns - then join to  `xref_sp_element_codes.`  if there are missing Species Code entries in any rows we have new codes to find (don't know how yet) and if there are less unique(cdc$Species Code) than before we lost some. Can document that in data
+
+```{r join-data}
+
+#Join the two dataframe
+cdc_prep1 <- left_join(results_raw, constat_raw,
+                      by = c("Element Code",
+                         "Scientific Name",
+                         "English Name")) |> 
+  select(-Provincial) ## remove duplicated column
+```
+
+## Some issues:
+- We need to remove all columns that are not present in cdc.csv
+- Then we need to rename all columns to match those in cdc.csv
+- we need to separate the the Global review date in parentheses from the global ranking
+```{r}
+
+## lets compare columns names to see what we need to remove
+dplyr::setdiff(names(cdc_prep1), names(cdc))
+
+dplyr::setdiff(names(cdc), names(cdc_prep1))
+
+## START HERE. DATES ARE NOT BEING EXTRACTED
+
+cdc_prep2 <- cdc_prep1 |> 
+  ## We need to rename the columns to match those in the cdc.csv
+  rename("Prov Status" = "Provincial Status",
+         "Prov Status Review Date" = "Date Status Last Reviewed",
+         "Global Status" = "Global") |> 
+  ## we need to separate the the Global review date in parentheses from the global ranking
+  mutate("Global Status Review Date" = case_when(
+      str_detect("Global Status", "\\(\\d{4}\\)") ~ str_extract("Global Status", "\\(\\d{4}\\)"),
+      TRUE ~ NA_character_
+    )
+  ##  "Global Status" = str_replace("Global Status", "\\s*\\(\\d{4}\\)", "")
+  ) |> 
+  relocate("Global Status Review Date", .after = "Global Status")
+
+  ## We need to remove all columns that are not present in cdc.csv
+  select(-c())
+
+
+```
+
+
+cdc_updated <- left_join(cdc_prep1, xref_sp_element_codes) |> 
+  select(names(cdc))
+
+waldo::compare(cdc, cdc_prep1)
+```
+
+5. burn over data-raw/cdc/cdc.csv
+
+6. Run through `fishbc/data-raw/data-raw.R` and  run usethis::use_data(cdc, overwrite = TRUE)
+build the repo locally (just like fpr)
+
+Worst that can happen is we redo using new branch....
+
+
+