forked from poissonconsulting/fishbc
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
start on fishbc issue NewGraphEnvironment/fish_passage_fraser_2023_re…
- Loading branch information
1 parent
3bb4f4b
commit 4b1dc06
Showing
6 changed files
with
25,782 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,89 @@ | ||
|
||
|
||
1. could determine which columns in `data-raw/cdc/cdc.csv` are needed to tie Species Code to Element Code. Make a new csv called `xref_sp_element_codes.csv` (or something) and burn just those columns to it | ||
|
||
```{r select-columns} | ||
## select species codes and elements codes from cdc.csv | ||
xref_sp_element_codes <- cdc |> | ||
select("Species Code", "Element Code") | ||
#burn to csv | ||
readr::write_csv(xref_sp_element_codes, "data-raw/cdc/xref_sp_element_codes.csv") | ||
``` | ||
|
||
2. download both exports from the cdc website into data-raw. keep their names as is if they are descriptive (can't remember) | ||
|
||
3. In a new `data-raw/cdc.R` file read in xl files with` readxl` (I think you need to open them first and close or some weird thing to avoid a error - link to help url in your .R file if you run into it) and export both of those as csvs with their original exported names in data-raw. | ||
|
||
```{r import-data} | ||
## Read in results from cdc website | ||
results_raw <- readr::read_csv("data-raw/cdc/resultsExport.csv") | ||
## Read in conservation status info from cdc website | ||
constat_raw <- readr::read_csv("data-raw/cdc/ConsStatusRptExport.csv") | ||
``` | ||
|
||
|
||
4. In `data-raw/cdc.R` join both exports from the cdc website together by Element Code excluding any duplicated columns - then join to `xref_sp_element_codes.` if there are missing Species Code entries in any rows we have new codes to find (don't know how yet) and if there are less unique(cdc$Species Code) than before we lost some. Can document that in data | ||
|
||
```{r join-data} | ||
#Join the two dataframe | ||
cdc_prep1 <- left_join(results_raw, constat_raw, | ||
by = c("Element Code", | ||
"Scientific Name", | ||
"English Name")) |> | ||
select(-Provincial) ## remove duplicated column | ||
``` | ||
|
||
## Some issues: | ||
- We need to remove all columns that are not present in cdc.csv | ||
- Then we need to rename all columns to match those in cdc.csv | ||
- we need to separate the the Global review date in parentheses from the global ranking | ||
```{r} | ||
## lets compare columns names to see what we need to remove | ||
dplyr::setdiff(names(cdc_prep1), names(cdc)) | ||
dplyr::setdiff(names(cdc), names(cdc_prep1)) | ||
## START HERE. DATES ARE NOT BEING EXTRACTED | ||
cdc_prep2 <- cdc_prep1 |> | ||
## We need to rename the columns to match those in the cdc.csv | ||
rename("Prov Status" = "Provincial Status", | ||
"Prov Status Review Date" = "Date Status Last Reviewed", | ||
"Global Status" = "Global") |> | ||
## we need to separate the the Global review date in parentheses from the global ranking | ||
mutate("Global Status Review Date" = case_when( | ||
str_detect("Global Status", "\\(\\d{4}\\)") ~ str_extract("Global Status", "\\(\\d{4}\\)"), | ||
TRUE ~ NA_character_ | ||
) | ||
## "Global Status" = str_replace("Global Status", "\\s*\\(\\d{4}\\)", "") | ||
) |> | ||
relocate("Global Status Review Date", .after = "Global Status") | ||
## We need to remove all columns that are not present in cdc.csv | ||
select(-c()) | ||
``` | ||
|
||
|
||
cdc_updated <- left_join(cdc_prep1, xref_sp_element_codes) |> | ||
select(names(cdc)) | ||
|
||
waldo::compare(cdc, cdc_prep1) | ||
``` | ||
5. burn over data-raw/cdc/cdc.csv | ||
6. Run through `fishbc/data-raw/data-raw.R` and run usethis::use_data(cdc, overwrite = TRUE) | ||
build the repo locally (just like fpr) | ||
Worst that can happen is we redo using new branch.... | ||
Oops, something went wrong.