ps2014.Rmd

---
title: "ps2014 Public Speaking 2014 Data"
output: html_notebook
---

This notebook serves the following functions:

- load and parse human rating results on the ps2014 data set
- analyze rating quality
- link human ratings on both PS and IV tasks.


```{r lib}
library(tidyverse)
library(stringr)
library(magrittr)

```
# Public Speaking Data Set

## Data reading
Using tidyverse::readxl to load human ratings stored in Excel
```{r hs_load}
library(readxl)
ps<-read_excel('hsraw/PublicSpeakingSkillsAssessment-data.xlsx')

# Used non-verbal behavior that reinforced the (verbal) message	Had a cohesive presentation	Was easy to understand	Used monotonous speech	Used appropriately sophisticated expressions	Was nervous	Kept my attention throughout the entire presentation	Did not have a well-paced presentation
ps <- ps %>%
  setNames(., 
           c("rater_id",
             "video_id",
             "rubric_ps1",
             "rubric_ps2",
             "rubric_ps3",
             "rubric_ps4",
             "holistic",
             "pers_1",
             "pers_2",
             "pers_3",
             "pers_4",
             "pers_5",
             "char_nonverbal",
             "char_cohesion",
             "char_understandable",
             "char_monotone",
             "char_expression",
             "char_nervous",
             "char_attention",
             "char_pace"
))
# remove 1st row with sub-category names
ps <- ps[-1,]
# convert char cols to dbl.
# http://bit.ly/2mXJz7V
ps <- ps %>% 
  mutate_at(vars(matches("rubric|pers|char")),funs(as.numeric))

```

## Rating Quality
```{r irr_funs}
library(irr)
my.irr <- function(mx){
  icc <- icc(mx, model="twoway", type="consistency", unit="average")$value
  corr <- meancor(mx)$value
  c(icc, corr)
  
}
```

```{r one_rating}
one_rating <- function(score_name) {
  one_tb <- ps %>%
  select_(.dots = c('rater_id', 'video_id', score_name))
  one_tb <-
  one_tb[!duplicated(one_tb[, c("rater_id", "video_id")]),] %>%
  spread_(key_col = 'rater_id', value_col = score_name)
  return(my.irr(one_tb[2:5]))
}

ratings <- colnames(ps)[3:20]
irr_df <- sapply(ratings, FUN = one_rating)
irr_df <- t(irr_df)
colnames(irr_df) <- c('ICC', 'meanR')
show(irr_df)
```

As for rating quality on interview data, ICC values can be found in vi2014.html.

# Cross-analysis

## Cross two tasks

For each subject, averaged public speaking scores vs. averaged interview scores.
```{r}
# for persubj ratings, get subj and cat
getsubj <- function(name){
  subj <- str_sub(name, 2, 3)
  subj

}

ps_subj <- ps %>%
    mutate(subj=getsubj(video_id)) %>%
    group_by(subj) %>%
    summarise(hscore_ave = mean(holistic))

```

Using bars_persubj_final (36x4) object, to compute correlations. The very low correlation
between interview and presentation shows that it is important to measure job-related skill during interviews.

```{r}
load(file="vi2014.RData")
twotasks_subj <- merge(ps_subj, bars_persubj_final, by="subj")
cor(twotasks_subj[,2:5])

```

## Within a task, check features' inter-correlations.

### interview
```{r}
# from pers_wide (vid, big 5, holistic)
#      bars_indv (videoID, mean)
interview_item <-
  merge(pers_wide[, 1:7], bars_indv[c("videoID", "mean", "gp")], by.x = "vid", by.y =
  "videoID")
  cor(interview_item[, 2:8], use = "complete.obs")


```

### presentation
```{r}
ps_item <- ps %>%
  select(video_id:char_pace) %>%
  group_by(video_id) %>%
  summarise_each(funs(mean))

cor(ps_item[,2:19])
```

Analyze motion features' contributions on predicting human rated scores.
```{r vamp_corr, echo=TRUE}
load("../vamp_postproc/ps2014_vamp.RData")

tidy_id <- function(x){
  tks <- str_match(x, "SELTCAWRS_(\\d+)([enpi_]+)")
  if(str_length(tks[2]) == 3){
    vid = str_sub(tks[2], 2,3)
  }
  else{
    vid = tks[2]
  }
  new_id <- paste("P", vid, tks[3], sep = "")
  new_id <- str_replace(new_id, "(_)$", "") # _ before string end

}

feats_persession <- feats_persession %>%
  mutate(video_id = unlist(lapply(labsession, tidy_id))) %>%
  select(-(starts_with("symmetry")))

ps_item_vamp <- merge(ps_item, feats_persession[,2:46], by="video_id")
vamp_tb<- cor(ps_item_vamp[,20:63], ps_item_vamp[,2:6], use = "na.or.complete")

```

Only showing correlation absolute value is higher than $0.2$

```{r}
vamp_tb_high <- vamp_tb
vamp_tb_high[abs(vamp_tb_high) < 0.2] <- ""
show(vamp_tb_high)

```