Skip to content

Releases: CornellNLP/ConvoKit

ConvoKit Version 3.1.0

31 Dec 02:43
7a7e9f6
Compare
Choose a tag to compare

We are excited to announce the release of ConvoKit 3.1.0! This version introduces a framework for measuring redirection in conversation flow, as described in this paper. We also release a demonstration of the framework on Supreme Court oral arguments via Google Colab. In addition to redirection, we provide a generalized transformer for annotating utterance-level likelihoods given a defined conversation context. For more information, check the PR for the new features #250.

ConvoKit Version 3.0.2

28 Dec 05:03
af8adcd
Compare
Choose a tag to compare

We are excited to release ConvoKit 3.0.2! This minor update resolves installation issues related to older versions of SciPy by updating the package dependency to require a more recent version. We found Google Colab, with its pre-loaded packages at runtime, may still result in errors. This can be resolved by restarting the session and re-running the code blocks to ensure the correct package versions are imported. For more details, please refer to our Troubleshooting page and pull request #257.

ConvoKit Version 3.0.1

20 Nov 06:10
a506040
Compare
Choose a tag to compare

We are excited to announce the release of ConvoKit 3.0.1, which focuses on bug fixes, adding new datasets, and dependency upgrades. Key updates include:

  • Fixed issue with ConvoKit's download method that prevented datasets from being downloaded to the configured directory.
  • Fixed the support for downloading non-corpus objects
  • Updated the conversational forecasting transformer to make it more flexible
  • Added five new datasets, with documentation available on our website and documentation site.
  • Addressed compatibility issues related to Numpy by building against Numpy 2.0+ and upgrading dependency packages accordingly.

We address some potential issues on our Troubleshooting page, especially with Numpy. If you encounter any issues, feel free to join our Discord community for more support, or submit an issue on GitHub. Thank you!

Notice that we no longer support Python 3.8 (EOL) and 3.9 (not supported by Numpy 2.0.0+).

You can refer to the following pull requests for more details:

  • Fixing bugs:

    • [1] Fixing ConvoKit download method #225 #217
    • [2] New Forecaster Framework #217
  • New datasets:

    • [1] CANDOR corpus #201
    • [2] DeliData corpus #238
    • [3] FORA corpus #238
    • [4] NPR-2P corpus #238
    • [5] FOMC corpus #238
  • Dependency packages:

    • [1] Building ConvoKit to work with Numpy 2.0.0+ #229 #251 #247

Contributors:

  • Kaixiang Zhang (Sean)
  • Ethan Xia
  • Yash Chatha
  • Laerdon Yah-Sung Kim
  • Jonathan P. Chang

ConvoKit Version 3.0.0

26 Jul 02:41
dbca34f
Compare
Choose a tag to compare

We're excited to announce the public release of Convokit 3.0!

The new version of ConvoKit now supports MongoDB as a backend choice for working with corpus data. This update provides several benefits, such as taking advantage of MongoDB's lazy loading to handle extremely large corpora, and ensuring resilience to unexpected crashes by continuously writing all changes to the database.

To learn more about using MongoDB as a backend choice, refer to our documentation at https://convokit.cornell.edu/documentation/storage_options.html.

Database Backend

Historically, ConvoKit allows you to work with conversational data directly in program memory through the Corpus class. Moreover, long term storage is provided by dumping the contents of a Corpus onto disk using the JSON format. This paradigm works well for distributing and storing static datasets, and for doing computations on conversational data that follow the pattern of doing computations on some or all of the data over a short time period and optionally storing these results on disk. For example, ConvoKit distributes datasets included with the library in JSON format, which you can load into program memory to explore and compute with.

In ConvoKit version 3.0.0, we introduce a new option for working with conversational data: the MongoDB backend. Consider a use case where you want to collect conversational data over a long time period and ensure you maintain a persistent representation of the dataset if your data collection program unexpectedly crashes. In the memory backend paradigm, this would require regularly dumping your corpus to JSON files, requiring repeated expensive write operations. On the other hand, with the new database backend, all your data is automatically saved for long term storage in the database as it is added to the corpus.

Documentation

Please refer to this database setup document to setup a mongoDB database and this storage document for a further explanation of how the database backend option works.

Tests

Updated tests to include db_mode testing.

Examples

Updated examples to include demonstration of db_mode usage.

Bug Fixes

  • Fixed issue where corpus.utterances throws an error in politenessAPI as it should call corpus.iter_utterances() instead. Corpus items should not access their private variables and should use the public "getters" for access.
  • Fixed bug in coordination.py for the usage of metadata mutability.
  • Fixed issue in Pairer with pair_mode set to maximize causing the pairing function to return an integer, which causes an error in pairing objects.

Breaking Changes

Modified ConvoKit.Metadata to disallow any mutability to metadata fields. Implemented by returning deepcopy of metadata field storage every time the field is accessed. It is intended to align the behaviors between memory and DB modes. #197

Change Log

Added:

  • Added DB backend mode to allow working with corpora using database as a supporting backend. #175 #184
  • Extended __init__ in model/corpus.py with parameters for DB functionality. #175
  • Updated model/backendMapper to separate memory and DB transactions. #175
  • Introduces a new layer of abstraction between Corpus components (Utterance, Speaker, Conversation, ConvoKitMeta) and concrete data mapping. Data mapping is now handled by a BackendMapper instance variable in the Corpus. #169

Changed:

  • Modified files in the ConvoKit model to support both memory mode and DB mode backends. #175
  • Removed deprecated arguments and functions from ConvoKit model. #176
  • Updated demo examples with older version of ConvoKit object references. #192

Fixed:

  • Fixed usage of the mutability of metadata within coordination.py. #197
  • Fixed issue in the Pairer module when pair_mode was set to maximize, causing the pairing function to return an integer and subsequently leading to an error. #197
  • Fixed issue that caused corpus.utterances to throw an error within politenessAPI. #170
  • Fixed FightingWords to allow overlapping classes. #189

Python Version Requirement Update:

  • With Python 3.7 reached EOL (end of life) on June 27, 2023, ConvoKit now requires Python 3.8 or above.

ConvoKit version 2.5.3

18 Jan 03:18
4cc7ff2
Compare
Choose a tag to compare

v2.5.2 release adds support for Chinese politeness strategy extraction. Currently, ConvoKit's politenessStrategies supports three politeness strategy collections covering two languages.

v2.5.3 release fixes a minor bug that occurs when using TextParser with SpaCy>3.2.0.

ConvoKit version 2.5.2

07 Jan 18:05
Compare
Choose a tag to compare

This release adds support for Chinese politeness strategy extraction. Currently, ConvoKit's politenessStrategies supports three politeness strategy collections covering two languages.

ConvoKit version 2.5.1

07 Oct 21:13
c135374
Compare
Choose a tag to compare

This release includes a new method from_pandas in the Corpus class that should simplify the Corpus creation process.

It generates a ConvoKit corpus from pandas dataframes of speakers, utterances, and conversations.

A notebook demonstrating the use of this method can be found here.

ConvoKit version 2.5

05 Jul 22:38
Compare
Choose a tag to compare

This release contains an implementation of the Expected Conversational Context Framework, and associated demos.

ConvoKit version 2.4

17 Aug 07:40
Compare
Choose a tag to compare

This release describes changes that have been implemented as part of the v2.4 release.

Public-facing functionality

ConvoKitMatrix and Vectors

Vectors and Matrices now get first-class treatment in ConvoKit. Vector data can now be stored in a ConvoKitMatrix object that is integrated with the Corpus and its objects, allowing for straightforward access from Corpus component objects, user-friendly display of vectors data, and more. Read our introduction to vectors for more details.

Accordingly, we have re-implemented the relevant Transformers that were already using array or vector-like data to leverage on this new data structure, namely:

  • PromptTypes
  • HyperConvo
  • BoWTransformer
  • BoWClassifier - now renamed to VectorClassifier
  • PairedBoW - now renamed to PairedVectorClassifier

The last two Transformers can now be used for any general vector data, as opposed to just bag-of-words vector data.

Metadata deletion

We have implemented a formal way to delete metadata attributes from a Corpus component object. Prior to this, metadata attributes were deleted from objects individually -- leading to possible inconsistencies between the ConvoKitIndex (that tracks what metadata attributes currently exist) and the Corpus component objects. To rectify this, we now disallow deletion of metadata attributes from objects individually. Such deletion should instead be carried out using the Corpus method delete_metadata().

Other changes

  • FightingWords and BoWTransformer now have default text_func values for the three main component types: utterance, speaker, and conversation.
  • corpus.iterate_by() is now deprecated.
  • The API of PromptTypes has been modified: rather than selecting types of prompt and response utterances to use in the constructor, we now give users the option to select prompts and responses as arguments to the fit and transform calls.

Other internal changes

  • In light of SIGDIAL 2020, we have a new video introduction and Jupyter notebook tutorial introducing new users to ConvoKit.
  • ConvoKitIndex now tracks a list of class types for each metadata attribute, instead of a single class type. This will lead to changes in index.json during dumps of any currently existing corpora, but will have no compatibility issues with loading from existing corpora.
  • We updated the following demos that make use of Vectors and PromptTypes: PromptTypes and Predicting conversations gone awry

ConvoKit version 2.3.2

03 Jun 07:18
Compare
Choose a tag to compare

This release describes changes that have happened since the v2.3 release, and includes changes from both v2.3.1 and v2.3.2.

Functionality

Naming changes

  • Utterance.root has been renamed to Utterance.conversation_id
  • User has been renamed to Speaker. Functions with 'user' in the name have been renamed accordingly
  • User.name has been renamed to Speaker.id

(Backwards compatibility will be maintained for all the deprecated attributes and functions.)

Corpus

  • Corpus now allows users to generate pandas DataFrames for its internal components using get_conversations_dataframe(), get_utterances_dataframe(), and get_speakers_dataframe().
  • Conversation objects have a get_chronological_speaker_list() method for getting a chronological list of conversation participants
  • Conversation's print_conversation_structure() method has a new argument limit for limiting the number of utterances displayed to the number specified in limit.

Transformers

  • New invalid_val argument for HyperConvo that automatically replaces NaN values with the default value specified in invalid_val.
  • FightingWords.summarize() now provides labelled plots

Bug fixes

  • Fixed minor bug in download() when downloading Reddit corpora.
  • Fixed bugs in HyperConvo that were causing NaN warnings and incorrect calculation. Fixed minor bug that was causing HyperConvo annotations to not be JSON-serializable.
  • Fixed bug in Classifier and BoWClassifier that was causing inconsistent behaviour for compressed vs. uncompressed vector metadata

Other changes

  • Warnings in ConvoKit for deprecation have been made more consistent.
  • We now have continuous integration for pushes and pull requests! Thanks to @mwilbz for helping set this up.