Skip to content

Commit

Permalink
CMV Corpus doc - fix metadata list, fix main page access
Browse files Browse the repository at this point in the history
  • Loading branch information
seanzhangkx8 committed Jun 14, 2024
1 parent 8656371 commit 5d76f65
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 6 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ Available as an interactive notebook: [full version (fine-tuning + inference)](h
ConvoKit ships with several datasets ready for use "out-of-the-box".
These datasets can be downloaded using the `convokit.download()` [helper function](https://github.com/CornellNLP/ConvoKit/blob/master/convokit/util.py). Alternatively you can access them directly [here](http://zissou.infosci.cornell.edu/convokit/datasets/).

### [Conversations Gone Awry Datasets]([Wikipedia](https://convokit.cornell.edu/documentation/awry.html)/[CMV](https://convokit.cornell.edu/documentation/awry_cmv.html))
### Conversations Gone Awry Datasets ([Wikipedia](https://convokit.cornell.edu/documentation/awry.html)/[CMV](https://convokit.cornell.edu/documentation/awry_cmv.html))

Two related corpora of conversations that derail into antisocial behavior. One corpus (CGA-WIKI) consists of Wikipedia talk page conversations that derail into personal attacks as labeled by crowdworkers (4,188 conversations containing 30.021 comments). The other (CGA-CMV) consists of discussion threads on the subreddit ChangeMyView (CMV) that derail into rule-violating behavior as determined by the presence of a moderator intervention (6,842 conversations containing 42,964 comments).
Name for download: `conversations-gone-awry-corpus` (for CGA-WIKI) or `conversations-gone-awry-cmv-corpus` (for CGA-CMV)
Expand Down
10 changes: 5 additions & 5 deletions docs/source/awry_cmv.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,11 +52,11 @@ Metadata for each conversation include:
* has_removed_comment: whether the final comment in this thread was removed by CMV moderators for violation of Rule 2
* split: which split (train, val, or test) this conversation was used in for the experiments described in "Trouble on the Horizon"
* summary_meta: metadata related to conversation summaries, a list of dictionaries (one per summary available, possibly empty) with the following keys:
* * summary_text: the text of the summary;
* * summary_type: whether the summary is humman written by humans;(human_written_SCD) or generated automatically using the procedural prompt ("machine_generated_SCD") ;
* * up_to_utterance_id: the last utterance considered when creating the summary;
* * truncated_by: the number of utterances the transcript was truncated by when creating the summary (starting from the end);
* * scd_split: whether the summary was in the train/test/validation split in the 2024 Summarizing Conversations Dynamics paper;
* summary_text: the text of the summary;
* summary_type: whether the summary is humman written by humans;(human_written_SCD) or generated automatically using the procedural prompt ("machine_generated_SCD") ;
* up_to_utterance_id: the last utterance considered when creating the summary;
* truncated_by: the number of utterances the transcript was truncated by when creating the summary (starting from the end);
* scd_split: whether the summary was in the train/test/validation split in the 2024 Summarizing Conversations Dynamics paper;


Usage
Expand Down

0 comments on commit 5d76f65

Please sign in to comment.