Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarity on summary statistics #33

Open
bvreede opened this issue Oct 10, 2023 · 0 comments
Open

Clarity on summary statistics #33

bvreede opened this issue Oct 10, 2023 · 0 comments
Labels
documentation Improvements or additions to documentation

Comments

@bvreede
Copy link
Collaborator

bvreede commented Oct 10, 2023

To get to more deliberate summary statistics, this copied from the Project draft:

Tabular summary measures

For tabular summary measures, we provide at least the following key characteristics of the corpus:

  • turns: number of annotations with timing information in the corpus, which in most corpora corresponds to the number of turns at talk
  • turnduration: mean duration of turns in this corpus
  • density: sum of all annotation durations divided by length of source. If >1, indicates a densely annotated recording with quite some overlap. If <0.7, indicates less densely annotated recording and possibly untranscribed parts.
  • people: total number of distinct participants encountered in all source records for this corpus
  • corpus_length: total number of hours h:m:s of transcribed data (counting from the first transcription until the last by source)

In addition, we provide a table at source level for a number of these measures:

  • turns: number of annotations with timing information
  • turnduration: mean duration of turns in this source
  • density: sum of all annotation durations divided by length of source. If >1, indicates a densely annotated recording with quite some overlap. If <0.7, indicates less densely annotated recording and possibly untranscribed parts.
  • people: total number of distinct participants encountered in this source
  • source_length: amount of time h:m:s transcribed in this source (counting from the first transcription until the last in this source)
@mdingemanse mdingemanse added the documentation Improvements or additions to documentation label Aug 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants