Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate scripts for correcting typos and renaming domains #173

Open
andkov opened this issue Feb 27, 2017 · 5 comments
Open

Separate scripts for correcting typos and renaming domains #173

andkov opened this issue Feb 27, 2017 · 5 comments
Assignees

Comments

@andkov
Copy link
Member

andkov commented Feb 27, 2017

Currently these two tasks are accomplished by a single script `./manipulation/rename-classify.R.

Such practice is far from optimal for the following reasons:

  • once all models are generated automatically, spelling correction will be obsolete
  • may need to re-organize domains for a specific study to increase the bin size within domains
  • different tracks may require different domain grouping.

For these and other reasons, it is advisable to develop a function that would take in a catalog and and the external csv with grouping instructions, so that this procedure could be applied immediately before table or graph production and NOT during the manipulation phase.

@andkov andkov self-assigned this Feb 27, 2017
@wibeasley
Copy link
Member

@andkov, for the renaming part of the script (currently at line 172), consider pulling that out intoa metadata csv with three columns: name_old, name_new, and comments.

It may not be worth messing with now, unless there are multiple name_olds that map to a single name_new. For instance, say one of the scripts produces aa_TAU_00_est, while another (renegade set of scripts had produced aa_TAU_est_00. Assuming a third set of scripts didn't use both aa_TAU_00_est and aa_TAU_est_00, this should work.

@andkov
Copy link
Member Author

andkov commented Feb 27, 2017

Good point, thank you, @wibeasley. I would very much like a registry of names of model components. This would especially be useful for different tiers of coordination:

  • 1 - drivers prepare data and run models on their own (like we did for Portland-2015)
  • 2- drivers use automation scripts for modeling and submit model through github (what we do now)
  • 3 - drivers using a REDCap API to run and submit models (the envisioned future)

The next work-through of the existing scripts will help me identify where the renaming you've mentioned should be the most organic.

@wibeasley
Copy link
Member

Cool. Then here's a regex script that will pull out those values and put them into a CSV. Copy & paste the meat of that dplyr::rename() snippet so it looks like:

 column_renames <- '
  # general model information
    "study_name"                  = "`study_name`"
  , "model_number"                = "`model_number`"
  , "subgroup"                    = "`subgroup`"
  , "model_type"                  = "`model_type`"
...
  , "b_gamma_16_se"               = "`b_GAMMA_16_se`"
  , "b_gamma_16_wald"             = "`b_GAMMA_16_wald`"
  , "b_gamma_16_pval"             = "`b_GAMMA_16_pval`"
'

Then run this and rename/move the column-renames.csv in some metadata directory.

pattern <- '(?s).+?"(\\w+)"\\s+=\\s*"`(\\w+)`".*?'
rearranged <- gsub(pattern, "\\2,\\1,\n",  column_renames, perl=TRUE) 
rearranged

ds <- rearranged %>% 
  readr::read_csv(, col_names = c("name_old", "name_new", "comments"))

readr::write_csv(ds, "./column-renames.csv")

This is a handy little script for converting code into proper metadata. I'm surprised we haven't need to write something like this yet.

@wibeasley
Copy link
Member

wibeasley commented Feb 27, 2017

This is the code that should work (I haven't tested it) when you read the metadata and apply the column name changes.

ds <- readr::read_csv("./column-renames.csv")
renaming_vector        <- ds$name_old
names(renaming_vector) <- ds$name_new

ds_names_new <- ds_names_old %>% 
  dplyr::rename_(.dots = renaming_vector)

edit:: and don't be afraid to add extra columns to this, if it helps anything.

@andkov
Copy link
Member Author

andkov commented Feb 27, 2017

Great regex example for studying. I've finally got over the initial scare of using it and can learn more elaborate applications. Can't imagine an efficient data manipulations without regexes anymore. Thanks for pushing me down that hill!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants