Skip to content

Latest commit

 

History

History
189 lines (151 loc) · 8.5 KB

File metadata and controls

189 lines (151 loc) · 8.5 KB

Open Agenda

SCUG, Dec 2018

Actual Topics

Shiny

  1. Review of Oct 2015 presentation
  2. Deployment scenarios
    • public vs pro
    • hosts
    • authentication
  3. Comparison with similar and/or overlapping reporting solutions, including:
    • JavaScript-enhanced graphs using
      • ggplot + plotly
      • straight plotly
    • 'static' knitr html reports
      • cron job refreshing the underlying dataset every ~10 min
      • saved to a secured file server accessible by your team
    • Power BI and others

Possible Topics (that weren't covered today)

  1. plumber* package:

    Gives the ability to automatically generate and serve an HTTP API from R functions using the annotations in the R documentation around your functions.

  2. Creating thumbnails in an html page by scanning for all graphics in a subdirectory

  3. Technical Debt

  4. yaml & csv

    • flatten/denormalize list to data.frame example
  5. controlling long pipelines with flow files, such as reproduce.R

  6. config package

    • centralize your project-wide settings so it's available & consistent across multiple files.
    • similar to a project-wide 'declare-globals' chunk.
  7. tight text control

  8. Landing page for documentation across projects, such as BbmcResources

  9. writing style guides with your team

  10. Use skeleton repos to jumpstart your projects, such as RAnalysisSkeleton

  11. verify-values

# ---- verify-values -----------------------------------------------------------
# Sniff out problems
# OuhscMunge::verify_value_headstart(ds)
checkmate::assert_integer(ds$county_month_id    , lower=          1L              , any.missing=F, unique=T)
checkmate::assert_integer(ds$county_id          , lower=          1L   , upper=77L, any.missing=F, unique=F)
checkmate::assert_date(   ds$month              , lower="2012-01-01"              , any.missing=F)
checkmate::assert_integer(ds$region_id          , lower=          1L   , upper=20L, any.missing=F)
checkmate::assert_numeric(ds$fte                , lower=          0    , upper=40L, any.missing=F)
checkmate::assert_logical(ds$fte_approximated                                     , any.missing=F)
  1. inequality joins with sqldf

    Bounded by another table, using a join

    d2 <- "
      SELECT
        o.[.record_matching_id],
        o.gender,
        o.age_months,
        o.bmi,
        p.percentile     AS percentile_lower,
        p.value
      FROM d_observed AS o
        LEFT OUTER JOIN d_pop_long AS p ON
          o.age_months = p.age_months AND
          o.gender     = p.gender     AND
          p.value      < o.bmi
      " %>%
      sqldf::sqldf(
        stringsAsFactors = FALSE
      )   

    Cumulation, by restricting on itself

    ds_visit_cumulative_count <- "
      SELECT
        b.week, b.program_code, b.worker_name,
        count(distinct a.case_number) as     client_distinct_cumulative_by_worker
      FROM ds_visit_3 a
      JOIN ds_visit_3 b ON
        (a.week <= b.week)
        AND (a.program_code=b.program_code AND     a.worker_name=b.worker_name)
      GROUP BY b.program_code, b.worker_name, b.week
      ORDER BY b.program_code, b.worker_name, b.week
    " %>%
    sqldf::sqldf()

    Windows of time, using a join

    ds_client_week_visit_goal <- "
      SELECT
        p.case_number,
        p.program_code,
        p.worker_name_last                AS worker_name,
        p.week_start_inclusive,
        --COUNT(v.visit_date)              AS visit_week_scheduled_count,
        SUM(v.visit_completed)           AS visit_week_completed_count
      FROM ds_possible_client_week p
        LEFT JOIN ds_visit v ON (
          p.case_number=v.case_number
          AND
          (p.week_start_inclusive <= v.visit_date AND v.visit_date<p.week_stop_exclusive)
        )
      GROUP BY p.case_number, p.week_start_inclusive
      ORDER BY p.case_number, p.week_start_inclusive
    " %>%
      sqldf::sqldf()    

Recently Discussed

June 2018

  1. Shiny, especially deployment & security

  2. text editors

    • my favorites: RStudio, Atom, and Notepad++.
    • find & replace across files with regexes: Atom
    • easily zoom in & out is especially nice when sharing screens: tie -- Atom & Notepad++
    • Markdown preview in Atom
    • multicolumn select: 1st place--RStudio and 2nd place--Atom (with the Sublime-Style-Column-Selection package)
  3. Snippets

    • pros & cons vs functions
  4. GitHub Gists

    • pros & cons vs full repos
  5. Diversion about Microsoft's acquisition of GitHub.

April 2018

  1. Computations on server vs local machines

March 2018

  1. Python & R tradeoffs on the follow dimensions

    • production system vs research
    • computer science background vs stats background
    • data manipulation vs analysis
    • propagation of ideas/manuscripts to external audiences
    • development costs
  2. knitr & automated reports

  3. GitHub

  4. benefits of promoting consistency of files/patterns across projects, and using skeletons (example).

  5. REDCap & research