Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Portage data workflow #11

Closed
wants to merge 87 commits into from
Closed

Portage data workflow #11

wants to merge 87 commits into from

Conversation

bpbond
Copy link
Member

@bpbond bpbond commented Nov 15, 2022

Sample workflow for processing raw data to L0. This includes

  • A Quarto file that takes all the *.dat files in the raw folder, reads them in, parses out the Logger and Table info, and writes them to the L0 folder. It then generates a summary table.
  • A driver script that renders the Quarto file, putting the output html into the L0 folder as documentation.
  • Sample Raw/ data (see below)

Sample html documentation file: https://rpubs.com/bpbond/970986

Raw folder before processing:

Compass_PTR_UP_313_WaterLevel600A.dat
Compass_PTR_UP_313_TerosTableA.dat
Compass_PTR_UP_313_SapflowA.dat

L0 folder after processing:

Compass_PTR_UP_313_SapflowA_L0.csv
Compass_PTR_UP_313_TerosTableA_L0.csv
Compass_PTR_UP_313_WaterLevel600A_L0.csv
raw_to_L0_20221115145825.html   <--- see link above

This is very much a first pass—feedback welcome @stephpenn1

@bpbond
Copy link
Member Author

bpbond commented Nov 16, 2022

Added what the SERC folks call the "normalization" step: matching loggernet variables with their design_link entries; the updated data files are moved to L1_normalize/.

Example html output: https://rpubs.com/bpbond/971152

Need to stick the expand_df code somewhere it'll get tested — probably in compasstools?

@bpbond bpbond changed the title Beginning of Portage data workflow Portage data workflow Nov 30, 2022
@bpbond
Copy link
Member Author

bpbond commented Nov 30, 2022

@stephpenn1 FYI in e86ca27 I added a skeleton L1a.qmd file; all it does is move/split the data into yyyy_mm_<logger> folders—nothing else, as we said you'd write the L1a code. I just wanted to get this in place so I can proceed to the L1b step.

@bpbond
Copy link
Member Author

bpbond commented Dec 2, 2022

Visualization of files moving through the processing pipeline:


=========================================
 20221202.1317 Starting L0 
|
|- Raw/ 
|	|- Compass_PTR_UP_313_SapflowA.dat 
|	|- Compass_PTR_UP_313_TerosTableA.dat 
|	|- Compass_PTR_UP_313_WaterLevel600A.dat 
|	|- README.md 
|	|
|	|- done/ 
|		|- README.md 
|
|- L0/ 
|	|- README.md 
|
|- L1_normalize/ 
|	|- README.md 
|
|- L1a/ 
|	|- README.md 
|
|- L1b/ 
	|- README.md 

=========================================
 20221202.1317 Starting L1_normalize 
|
|- Raw/ 
|	|- Compass_PTR_UP_313_SapflowA.dat 
|	|- Compass_PTR_UP_313_TerosTableA.dat 
|	|- Compass_PTR_UP_313_WaterLevel600A.dat 
|	|- README.md 
|	|
|	|- done/ 
|		|- README.md 
|
|- L0/ 
|	|- Compass_PTR_UP_313_SapflowA_L0.csv 
|	|- Compass_PTR_UP_313_TerosTableA_L0.csv 
|	|- Compass_PTR_UP_313_WaterLevel600A_L0.csv 
|	|- README.md 
|
|- L1_normalize/ 
|	|- README.md 
|
|- L1a/ 
|	|- README.md 
|
|- L1b/ 
	|- README.md 

=========================================
 20221202.1317 Starting L1a 
|
|- Raw/ 
|	|- Compass_PTR_UP_313_SapflowA.dat 
|	|- Compass_PTR_UP_313_TerosTableA.dat 
|	|- Compass_PTR_UP_313_WaterLevel600A.dat 
|	|- README.md 
|	|
|	|- done/ 
|		|- README.md 
|
|- L0/ 
|	|- README.md 
|
|- L1_normalize/ 
|	|- Compass_PTR_UP_313_SapflowA_L1_norm.csv 
|	|- Compass_PTR_UP_313_TerosTableA_L1_norm.csv 
|	|- Compass_PTR_UP_313_WaterLevel600A_L1_norm.csv 
|	|- README.md 
|
|- L1a/ 
|	|- README.md 
|
|- L1b/ 
	|- README.md 

=========================================
 20221202.1317 Starting L1b 
|
|- Raw/ 
|	|- Compass_PTR_UP_313_SapflowA.dat 
|	|- Compass_PTR_UP_313_TerosTableA.dat 
|	|- Compass_PTR_UP_313_WaterLevel600A.dat 
|	|- README.md 
|	|
|	|- done/ 
|		|- README.md 
|
|- L0/ 
|	|- README.md 
|
|- L1_normalize/ 
|	|- README.md 
|
|- L1a/ 
|	|- README.md 
|	|
|	|- 2022_11_Compass_PTR_UP_313/ 
|		|- Compass_PTR_UP_313_SapflowA_2022_11.csv 
|		|- Compass_PTR_UP_313_WaterLevel600A_2022_11.csv 
|		|- README.md 
|
|- L1b/ 
	|- README.md 

bpbond added 27 commits March 5, 2023 19:42
* Name all chunks
* Add GitHub Action and badge
* Move units and OOB checks from L1a to L1_normalize
* Add missing design_link data
* Change Quarto file params to relative to DATA_ROOT
* Clean up L1a functionality; move timestamp rounding to L1b
* Remove other new variables code
* L1a don't use templates; written to site-year-month folders
* Rearrange columns
* Disable L1b
* Fix L0 data-writing location problem
* Update test data: spans August and September, adds CheckTable
* Use ALL Portage data for testing
* Update design table to be consistent with Google Sheet
* Update metadata_vars to be consistent with google sheet
* readr/vroom error on GA; try write.csv
* Improve handling and logging of unexpected errors
* Quarto files now write error info to driver log
* L0 pivots data to long and assigns unique observation IDs
* MInor fixes to read_csv_group
* Initial pass at writing L1 metadata
* Update helpers.R
* OK to have missing year-month combinations; remove dupes in L0
* Print Git commit in reproducibility section
* Drop site column per @stephpenn1 #49
* Update L1_metadata_template.txt
Guarantee files not overwritten by adding a short hash
* Remove dplyr from L1_normalize
* Update check-workflow.yaml
@bpbond
Copy link
Member Author

bpbond commented Nov 15, 2023

Superseded by #57

@bpbond bpbond closed this Nov 15, 2023
@bpbond bpbond deleted the portage branch November 15, 2023 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant