Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Describe key user journeys / workflows in vignette #89

Merged
merged 25 commits into from
Aug 22, 2024
Merged
Changes from 1 commit
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
94aebb4
updates: intro
mdingemanse Aug 19, 2024
6575ef5
rewrite in progress
mdingemanse Aug 20, 2024
c509aab
+ describe quality plots
mdingemanse Aug 20, 2024
ed1d2b0
updates to Workflow part B
mdingemanse Aug 20, 2024
c72dfac
merge from main and fix issue with example inadvertently being executed
mdingemanse Aug 20, 2024
c85b33e
keep strip label for facets
mdingemanse Aug 21, 2024
d55c808
+ first version of plots
mdingemanse Aug 21, 2024
8d2de91
workflow updates
mdingemanse Aug 21, 2024
d20e534
rename to reflect multiple workflows
mdingemanse Aug 21, 2024
5d338be
add kableExtra to get nicer table output in vignette
mdingemanse Aug 21, 2024
aef9f72
intro
mdingemanse Aug 21, 2024
43fbd40
merge and fix conflicts
mdingemanse Aug 21, 2024
90d9ed3
,
mdingemanse Aug 21, 2024
fb0e876
plots
mdingemanse Aug 21, 2024
ac5345e
Merge branch 'main' into user-journeys
mdingemanse Aug 21, 2024
a6c06fa
+ final set of examples using geom_token
mdingemanse Aug 21, 2024
08a4429
Suggests: we don't use kableExtra current but we do use ggrepel
mdingemanse Aug 21, 2024
72d32f5
Update R/theme_turnPlot.R
mdingemanse Aug 22, 2024
f867c58
change to list
mdingemanse Aug 22, 2024
b582a27
simplify
mdingemanse Aug 22, 2024
f47693c
fix init description
mdingemanse Aug 22, 2024
2cb5118
load dplyr and remove dplyr:: from function calls in sample code
mdingemanse Aug 22, 2024
ded2d87
begin > end
mdingemanse Aug 22, 2024
2ac722f
rm viridis closes #103
mdingemanse Aug 22, 2024
458eee5
provide more info on `tokenize`, closes #102
mdingemanse Aug 22, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
+ describe quality plots
mdingemanse committed Aug 20, 2024
commit c509aab6e6175492b8227bd229dcd27ef987c48c
19 changes: 12 additions & 7 deletions vignettes/workflow.Rmd
Original file line number Diff line number Diff line change
@@ -25,7 +25,7 @@ library(talkr)

We will be using the IFADV corpus as example data for the workflow of `talkr`. This is a corpus consisting of 20 dyadic conversations in Dutch, published by the Nederlandse Taalunie in 2007 ([source](https://fon.hum.uva.nl/IFA-SpokenLanguageCorpora/IFADVcorpus/)). A prepared dataset can be downloaded by installing the `ifadv` package:

```{r install data package}
```{r install_data_package}
# install.packages("devtools")
devtools::install_github("elpaco-escience/ifadv")
```
@@ -42,7 +42,7 @@ The `init()` function takes these minimal fields and generates a few more based

The `init()` function can be used to rename columns if necessary. For example, if the column `participant` is named `speaker`, we can rename it as follows:

``` r
```{r init_demo}
talkr_data <- init(data,
participant = "speaker")
```
@@ -55,17 +55,17 @@ A dataset can contain additional fields. For instance, the IFADV sample dataset

The `report_stats` function provides a simple summary of a dataset, including the total number of utterances, the total duration of the conversation, the number of participants, and the number of sources.

```{r}
```{r report_stats}
report_stats(data)
```

### Visual quality checks

The `plot_quality` function provides a visual check of the quality of the data, by visualizing the distribution of turn duration, and transition timing.
The `plot_quality` function provides a visual check of the nature of the data, by visualizing the distribution of turn durations, and transition timing.

Transition timing is similar to FTO, but calculated without additional quality checks: transitions are identified when the participant changes from one turn to the next. The transition time is then calculated as the difference between the beginning of the turn of the new participant, and the end of the turn of the previous one.

By default, `plot_quality()` will plot the quality of the entire dataset:
By default, `plot_quality()` will plot the entire dataset:

```{r}
plot_quality(data)
@@ -75,10 +75,15 @@ plot_quality(data)
Quality plots can also be run for a specific source:

```{r}
plot_quality(data, source = "/dutch2/DVA9M")

plot_quality(data, source = "/dutch2/DVA8K")
```

A quality plot consists of three separate visualizations, all designed to allow rapid visual inspection and spotting oddities:
1. A density plot of turn durations. This is normally expected to look like a distribution that has a peak around 2000ms (2 seconds) and maximum lengths that do not far exceed 10000ms (10 seconds) (Liesenfeld & Dingemanse 2022). The goal of this plot is to allow eyeballing of oddities like turns of extreme durations or sets of turns with the exact same duration (unlikely in carefully segmented conversational data).
2. A density plot of turn transition times. A plot like this is expected to look like a normal distribution centered around 0-200ms (Stivers et al. 2009). Deviations from this may signal problems in the dataset, for instance due to imprecise or automated annotation methods.
3. A scatterplot of turn transition (x) by turn duration. This combines both distributions and is expected to look like a cloud of datapoints that is thickest in the middle region. Any standout patterns (for instance, turns whose duration is equal to their transition time) are indicative of problems in the segmentation or timing data.


## Workflow 2: Plot conversations

Another key use of `talkr` is to visualize conversational patterns.