Removing Trailing Comma from Header
Background: Occasionally a Meditech Extract will have an extra comma at the end of the 1st line. For each subsequent line, readr:read_csv()
appropriately throws a new warning that it is missing a column. This warning flood can mask real problems.
Explanation: This snippet (a) reads the csv as plain text, (b) removes the final comma, and (c) passes the plain text to readr::read_csv()
to convert it into a data.frame.
Instruction: Modify Dx50 Name
to the name of the final (real) column.
Real Example: truong-pharmacist-transition-1 (Accessible to only CDW members.)
Last Modified: 2019-12-12 by Will
-
# The next two lines remove the trailing comma at the end of the 1st line.
-raw_text <- readr::read_file(path_in)
-raw_text <- sub("^(.+Dx50 Name),", "\\1", raw_text)
-
-ds <- readr::read_csv(raw_text, col_types=col_types)
+
# The next two lines remove the trailing comma at the end of the 1st line.
+raw_text <- readr::read_file(path_in)
+raw_text <- sub("^(.+Dx50 Name),", "\\1", raw_text)
+
+ds <- readr::read_csv(raw_text, col_types=col_types)
+
Removing Trailing Comma from Header
Background: When incoming data files are on the large side to comfortably accept with readr, we use vroom. The two packages are develoepd by the same group and might be combined in the future.
Explanation: This snippet defines the col_types
list with names to mimic our approach of using readr. There are some small differences with our readr approach:
@@ -600,78 +607,78 @@
Removing Trailing Comma fro
1. If the data file contains columns we don’t need, we define them in col_types
anyway; vroom needs to know the file structure if it’s missing a header row.
Real Example: akande-medically-complex-1 (Accessible to only CDW members.) Thesee files did not have a header of variable names; the first line of the file is the first data row.
Last Modified: 2020-08-21 by Will
-
# ---- declare-globals ---------------------------------------------------------
-config <- config::get()
-
-col_types <- list(
- sak = vroom::col_integer(), # "system-assigned key"
- aid_category_id = vroom::col_character(),
- age = vroom::col_integer(),
- service_date_first = vroom::col_date("%m/%d/%Y"),
- service_date_lasst = vroom::col_date("%m/%d/%Y"),
- claim_type = vroom::col_character(),
- provider_id = vroom::col_character(),
- provider_lat = vroom::col_double(),
- provider_long = vroom::col_double(),
- provider_zip = vroom::col_character(),
- cpt = vroom::col_integer(),
- revenue_code = vroom::col_integer(),
- icd_code = vroom::col_character(),
- icd_sequence = vroom::col_integer(),
- vocabulary_coarse_id = vroom::col_integer()
-)
-
-# ---- load-data ---------------------------------------------------------------
-ds <- vroom::vroom(
- file = config$path_ohca_patient,
- delim = "\t",
- col_names = names(col_types),
- col_types = col_types
-)
-
-rm(col_types)
+
# ---- declare-globals ---------------------------------------------------------
+config <- config::get()
+
+col_types <- list(
+ sak = vroom::col_integer(), # "system-assigned key"
+ aid_category_id = vroom::col_character(),
+ age = vroom::col_integer(),
+ service_date_first = vroom::col_date("%m/%d/%Y"),
+ service_date_lasst = vroom::col_date("%m/%d/%Y"),
+ claim_type = vroom::col_character(),
+ provider_id = vroom::col_character(),
+ provider_lat = vroom::col_double(),
+ provider_long = vroom::col_double(),
+ provider_zip = vroom::col_character(),
+ cpt = vroom::col_integer(),
+ revenue_code = vroom::col_integer(),
+ icd_code = vroom::col_character(),
+ icd_sequence = vroom::col_integer(),
+ vocabulary_coarse_id = vroom::col_integer()
+)
+
+# ---- load-data ---------------------------------------------------------------
+ds <- vroom::vroom(
+ file = config$path_ohca_patient,
+ delim = "\t",
+ col_names = names(col_types),
+ col_types = col_types
+)
+
+rm(col_types)
+
Grooming
-
+
Correct for misinterpreted two-digit year
Background: Sometimes the Meditech dates are specified like 1/6/54
instead of 1/6/1954
. readr::read_csv()
has to choose if the year is supposed to be ‘1954’ or ‘2054.’ A human can use context to guess a birth date is in the past (so it guesses 1954), but readr can’t (so it guesses 2054). For avoid this and other problems, request dates in an ISO-8601 format.
Explanation: Correct for this in a dplyr::mutate()
clause; compare the date value against today. If the date is today or before, use it; if the day is in the future, subtract 100 years.
Instruction: For future dates such as loan payments, the direction will flip.
Last Modified: 2019-12-12 by Will
-
ds %>%
- dplyr::mutate(
- dob = dplyr::if_else(dob <= Sys.Date(), dob, dob - lubridate::years(100))
- )
+
ds %>%
+ dplyr::mutate(
+ dob = dplyr::if_else(dob <= Sys.Date(), dob, dob - lubridate::years(100))
+ )
-
+
Identification
-
-
+
Correspondence with Collaborators
-
+
Excel files
Receiving and storing Excel files should almost always be avoided for the reasons explained in this letter.
We receive extracts as Excel files frequently, and have the following request ready to email the person sending us Excel files. Adapt the bold values like “109.19” to your situation. If you are familiar with their tools, suggest an alternative for saving the file as a csv. Once presented with these Excel gotchas, almost everyone has an ‘aha’ moment and recognizes the problem. Unfortunately, not everyone has flexible software and can adapt easily.
diff --git a/docs/style.html b/docs/style.html
index 0584f67..7973212 100644
--- a/docs/style.html
+++ b/docs/style.html
@@ -24,7 +24,7 @@
-
+
@@ -456,6 +456,13 @@
20.2 Training to Data Science
20.3 Bridges Outside the Team
+
21 Material for REDCap Users
+
+
22 Material for REDCap Developers
+
23 Material for REDCap Admins
Appendix
A Git & GitHub
+
21 Material for REDCap Users
+
+
22 Material for REDCap Developers
+
23 Material for REDCap Admins
Appendix
A Git & GitHub
@@ -571,16 +578,13 @@ Bridges Outside the Team
-
-
-
-
+
diff --git a/docs/testing-and-validation.html b/docs/testing-and-validation.html
index edc1059..13a8faf 100644
--- a/docs/testing-and-validation.html
+++ b/docs/testing-and-validation.html
@@ -24,7 +24,7 @@
-
+
@@ -456,6 +456,13 @@
20.2 Training to Data Science
20.3 Bridges Outside the Team
+
21 Material for REDCap Users
+
+
22 Material for REDCap Developers
+
23 Material for REDCap Admins
Appendix
A Git & GitHub
+
21 Material for REDCap Users
+
+
22 Material for REDCap Developers
+
23 Material for REDCap Admins
Appendix
A Git & GitHub
+
21 Material for REDCap Users
+
+
22 Material for REDCap Developers
+
23 Material for REDCap Admins
Appendix
A Git & GitHub
+
21 Material for REDCap Users
+
+
22 Material for REDCap Developers
+
23 Material for REDCap Admins
Appendix
A Git & GitHub
@@ -619,6 +626,17 @@ Azure Data Studio
- Data | Sql | Show Connection Info In Title: uncheck {
"sql.showConnectionInfoInTitle": false
}
- Data | Sql | Copy Include Headers: check {
"sql.copyIncludeHeaders": true
}
+{
+ "workbench.enablePreviewFeatures": true,
+ "workbench.colorTheme": "Default Dark Azure Data Studio",
+ "editor.tabSize": 2,
+ "editor.detectIndentation": false,
+ "files.insertFinalNewline": true,
+ "files.trimFinalNewlines": true,
+ "files.trimTrailingWhitespace": true,
+ "queryEditor.showConnectionInfoInTitle": false,
+ "queryEditor.results.copyIncludeHeaders": true
+}
Visual Studio Code
@@ -633,48 +651,48 @@
Visual Studio Code
markdownlint has linting and style checking.
These extensions can be installed by command line.
-
code --list-extensions
-code --install-extension GrapeCity.gc-excelviewer
-code --install-extension mechatroner.rainbow-csv
-code --install-extension ms-mssql.mssql
-code --install-extension streetsidesoftware.code-spell-checker
-code --install-extension yzhang.markdown-all-in-one
-code --install-extension yzane.markdown-pdf
-code --install-extension DavidAnson.vscode-markdownlint
+
code --list-extensions
+code --install-extension GrapeCity.gc-excelviewer
+code --install-extension mechatroner.rainbow-csv
+code --install-extension ms-mssql.mssql
+code --install-extension streetsidesoftware.code-spell-checker
+code --install-extension yzhang.markdown-all-in-one
+code --install-extension yzane.markdown-pdf
+code --install-extension DavidAnson.vscode-markdownlint
Note: here are some non-default changes that facilitate our workflow. Either copy this configuration into settings.json
, or manually specify the options with the settings editor.
-
{
- "diffEditor.ignoreTrimWhitespace": false,
- "diffEditor.maxComputationTime": 0,
- "editor.acceptSuggestionOnEnter": "off",
- "editor.renderWhitespace": "all",
- "explorer.confirmDragAndDrop": false,
- "files.associations": {
- "*.Rmd": "markdown"
- },
- "files.trimFinalNewlines": true,
- "files.trimTrailingWhitespace": true,
- "git.autofetch": true,
- "git.confirmSync": false,
- "window.zoomLevel": 2,
-
- "markdown.extension.orderedList.autoRenumber": false,
- "markdown.extension.orderedList.marker": "one",
- "markdownlint.config": {
- "MD003": { "style": "setext_with_atx" },
- "MD007": { "indent": 2 },
- "MD022": { "lines_above": 1,
- "lines_below": 1 },
- "MD024": { "siblings_only": true },
- "no-bare-urls": false,
- "no-inline-html": {
- "allowed_elements": [
- "mermaid",
- "a",
- "img"
- ]
- }
- }
-}
+
{
+ "diffEditor.ignoreTrimWhitespace": false,
+ "diffEditor.maxComputationTime": 0,
+ "editor.acceptSuggestionOnEnter": "off",
+ "editor.renderWhitespace": "all",
+ "explorer.confirmDragAndDrop": false,
+ "files.associations": {
+ "*.Rmd": "markdown"
+ },
+ "files.trimFinalNewlines": true,
+ "files.trimTrailingWhitespace": true,
+ "git.autofetch": true,
+ "git.confirmSync": false,
+ "window.zoomLevel": 2,
+
+ "markdown.extension.orderedList.autoRenumber": false,
+ "markdown.extension.orderedList.marker": "one",
+ "markdownlint.config": {
+ "MD003": { "style": "setext_with_atx" },
+ "MD007": { "indent": 2 },
+ "MD022": { "lines_above": 1,
+ "lines_below": 1 },
+ "MD024": { "siblings_only": true },
+ "no-bare-urls": false,
+ "no-inline-html": {
+ "allowed_elements": [
+ "mermaid",
+ "a",
+ "img"
+ ]
+ }
+ }
+}
- Settings | Extensions |Markdown All in One | Ordered List | Auto Renumber: false {
"markdown.extension.orderedList.autoRenumber": false
}
- Settings | Extensions |Markdown All in One | Ordered List | Marker: one {
"markdown.extension.orderedList.marker": "one"
}
@@ -702,7 +720,7 @@ Python
Python is used by some analysts. The prototypical installation involves two options.
Anaconda, which include Jupyter Notebooks, Jupyter Lab, and Spyder. Plus two programs that are already on this list: RStudio and VS Code. In Windows, open “Anaconda Prompt” with administrative privileges
-conda install numpy pandas scikit-learn matplotlib
+conda install numpy pandas scikit-learn matplotlib
Standard Python, while installing packages through pip3 in the terminal. If the pip3
command is unrecognized because it’s missing from the OS path variable, an alternative is py -3 -mpip install pysftp
; this calls pip through the py
command which is sometimes in the path variable after installation.
@@ -769,98 +787,98 @@
Installation Troubleshooting
Ubuntu Installation
Ubuntu desktop 19.04 follows these instructions for the R and RStudio and required these debian packages to be installed before the R packages. The --yes
option avoids manual confirmation for each line, so you can copy & paste this into the terminal.
Add the following to the sources with sudo nano /etc/apt/sources.list
. The ‘eoan’ version may be updated; The ‘metrocast’ part could be modified too from this list. I found it worked better for a new Ubuntu release than ‘cloud.r-project.org.’
-
# For R 4.0
-deb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/
-deb http://mirror.genesisadaptive.com/ubuntu/ focal-backports main restricted universe
-
-# For R 3.5 & #.6
-deb https://cloud.r-project/bin/linux/ubuntu/ eoan-cran35/
-deb-src https://cloud.r-project/bin/linux/ubuntu/ eoan-cran35/
-deb http://mirror.metrocast.net/ubuntu/ eoan-backports main restricted universe
+
# For R 4.0
+deb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/
+deb http://mirror.genesisadaptive.com/ubuntu/ focal-backports main restricted universe
+
+# For R 3.5 & #.6
+deb https://cloud.r-project/bin/linux/ubuntu/ eoan-cran35/
+deb-src https://cloud.r-project/bin/linux/ubuntu/ eoan-cran35/
+deb http://mirror.metrocast.net/ubuntu/ eoan-backports main restricted universe
This next block can be copied and pasted (ctrl-shift-v) into the console entirely. Or lines can be pasted individual (without the ( function install-packages {
line, or the last three lines).
-
( function install-packages {
- ### Add the key, update the list, then install base R.
- sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9
- sudo apt-get update
- sudo apt-get install r-base r-base-dev
-
- ### Git
- sudo apt-get install git-core
- git config --global user.email "wibeasley@hotmail.com"
- git config --global user.name "Will Beasley"
- git config --global credential.helper 'cache --timeout=3600000'
-
- ### Ubuntu & Bioconductor packages that are indirectly needed for packages and BBMC scripts
-
- # Supports the `locate` command in bash
- sudo apt-get install mlocate
-
- # The genefilter package is needed for 'modeest' on CRAN.
- # No longer a modeest dependency: Rscript -e 'BiocManager::install("genefilter")'
-
- ### CRAN packages that are also on the Ubuntu repositories
-
- # The 'xml2' package; https://CRAN.R-project.org/package=xml2
- sudo apt-get --yes install libxml2-dev r-cran-xml
-
- # The 'curl' package, and others; https://CRAN.R-project.org/package=curl
- sudo apt-get --yes install libssl-dev libcurl4-openssl-dev
-
- # The 'udunits2' package: https://cran.r-project.org/web/packages/udunits2/index.html
- sudo apt-get --yes install libudunits2-dev
-
- # The 'odbc' package: https://github.com/r-dbi/odbc#linux---debian--ubuntu
- sudo apt-get --yes install unixodbc-dev tdsodbc odbc-postgresql libsqliteodbc
-
- # The 'rgl' package; https://stackoverflow.com/a/39952771/1082435
- sudo apt-get --yes install libcgal-dev libglu1-mesa-dev
-
- # The 'magick' package; https://docs.ropensci.org/magick/articles/intro.html#build-from-source
- sudo apt-get --yes install 'libmagick++-dev'
-
- # To compress vignettes when building a package; https://kalimu.github.io/post/checklist-for-r-package-submission-to-cran/
- sudo apt-get --yes install qpdf
-
- # The 'pdftools' and 'Rpoppler' packages, which involve PDFs
- sudo apt-get --yes install libpoppler-cpp-dev libpoppler-glib-dev
-
- # The 'sys' package
- sudo apt-get --yes install libapparmor-dev
-
- # The 'sf' and other spatial packages: https://github.com/r-spatial/sf#ubuntu; https://github.com/r-spatial/sf/pull/1208
- sudo apt-get --yes install libudunits2-dev libgdal-dev libgeos-dev libproj-dev libgeos++-dev
-
- # For Cairo package, a dependency of Shiny & plotly; https://gykovacsblog.wordpress.com/2017/05/15/installing-cairo-for-r-on-ubuntu-17-04/
- sudo apt-get --yes install libcairo2-dev
-
- # 'rJava' and others; https://www.r-bloggers.com/installing-rjava-on-ubuntu/
- sudo apt-get --yes install default-jre default-jdk
- sudo R CMD javareconf
- sudo apt-get --yes install r-cran-rjava
-
- # For reprex and sometimes ssh keys; https://github.com/tidyverse/reprex#installation
- sudo apt-get --yes install xclip
-
- # gifski -apparently the rust compiler is necessary
- sudo apt-get --yes install cargo
-
- # For databases
- sudo apt-get --yes install sqlite sqliteman
- sudo apt-get --yes install postgresql postgresql-contrib pgadmin3
-
- # pandoc
- sudo apt-get --yes install pandoc
-
- # For checking packages. Avoid `/usr/bin/texi2dvi: not found` warning.
- sudo apt-get install texinfo
-}
-install-packages
-)
+
( function install-packages {
+ ### Add the key, update the list, then install base R.
+ sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9
+ sudo apt-get update
+ sudo apt-get install r-base r-base-dev
+
+ ### Git
+ sudo apt-get install git-core
+ git config --global user.email "wibeasley@hotmail.com"
+ git config --global user.name "Will Beasley"
+ git config --global credential.helper 'cache --timeout=3600000'
+
+ ### Ubuntu & Bioconductor packages that are indirectly needed for packages and BBMC scripts
+
+ # Supports the `locate` command in bash
+ sudo apt-get install mlocate
+
+ # The genefilter package is needed for 'modeest' on CRAN.
+ # No longer a modeest dependency: Rscript -e 'BiocManager::install("genefilter")'
+
+ ### CRAN packages that are also on the Ubuntu repositories
+
+ # The 'xml2' package; https://CRAN.R-project.org/package=xml2
+ sudo apt-get --yes install libxml2-dev r-cran-xml
+
+ # The 'curl' package, and others; https://CRAN.R-project.org/package=curl
+ sudo apt-get --yes install libssl-dev libcurl4-openssl-dev
+
+ # The 'udunits2' package: https://cran.r-project.org/web/packages/udunits2/index.html
+ sudo apt-get --yes install libudunits2-dev
+
+ # The 'odbc' package: https://github.com/r-dbi/odbc#linux---debian--ubuntu
+ sudo apt-get --yes install unixodbc-dev tdsodbc odbc-postgresql libsqliteodbc
+
+ # The 'rgl' package; https://stackoverflow.com/a/39952771/1082435
+ sudo apt-get --yes install libcgal-dev libglu1-mesa-dev
+
+ # The 'magick' package; https://docs.ropensci.org/magick/articles/intro.html#build-from-source
+ sudo apt-get --yes install 'libmagick++-dev'
+
+ # To compress vignettes when building a package; https://kalimu.github.io/post/checklist-for-r-package-submission-to-cran/
+ sudo apt-get --yes install qpdf
+
+ # The 'pdftools' and 'Rpoppler' packages, which involve PDFs
+ sudo apt-get --yes install libpoppler-cpp-dev libpoppler-glib-dev
+
+ # The 'sys' package
+ sudo apt-get --yes install libapparmor-dev
+
+ # The 'sf' and other spatial packages: https://github.com/r-spatial/sf#ubuntu; https://github.com/r-spatial/sf/pull/1208
+ sudo apt-get --yes install libudunits2-dev libgdal-dev libgeos-dev libproj-dev libgeos++-dev
+
+ # For Cairo package, a dependency of Shiny & plotly; https://gykovacsblog.wordpress.com/2017/05/15/installing-cairo-for-r-on-ubuntu-17-04/
+ sudo apt-get --yes install libcairo2-dev
+
+ # 'rJava' and others; https://www.r-bloggers.com/installing-rjava-on-ubuntu/
+ sudo apt-get --yes install default-jre default-jdk
+ sudo R CMD javareconf
+ sudo apt-get --yes install r-cran-rjava
+
+ # For reprex and sometimes ssh keys; https://github.com/tidyverse/reprex#installation
+ sudo apt-get --yes install xclip
+
+ # gifski -apparently the rust compiler is necessary
+ sudo apt-get --yes install cargo
+
+ # For databases
+ sudo apt-get --yes install sqlite sqliteman
+ sudo apt-get --yes install postgresql postgresql-contrib pgadmin3
+
+ # pandoc
+ sudo apt-get --yes install pandoc
+
+ # For checking packages. Avoid `/usr/bin/texi2dvi: not found` warning.
+ sudo apt-get install texinfo
+}
+install-packages
+)
The version of pandoc from the Ubuntu repository may be delayed. To install the latest version, download the .deb file then install from the same directory. Finally, verify the version.
-
sudo dpkg -i pandoc-*
-pandoc -v
+
sudo dpkg -i pandoc-*
+pandoc -v
The Postman native app for Ubuntu is installed through snap, which is updated daily automatically.
-
+
Retired Tools
@@ -897,7 +915,7 @@
Retired Tools
git-plus: Do git things without the terminal (I don’t think this is necessary anymore).
The packages can be installed through Atom, or through the apm
utility in the command line:
-
apm install sublime-style-column-selection atom-language-r language-csv atom-beautify atom-wrap-in-tag minimap script
+
apm install sublime-style-column-selection atom-language-r language-csv atom-beautify atom-wrap-in-tag minimap script
And the following settings keep files consistent among developers.
- File | Settings | Editor | Tab Length: 2 (As opposed to 3 or 4, used in other conventions)