Skip to content

tkarabatic/automatic-sentence-pair-tagger

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Automatic sentence pair tagger

PEP8

A simple app that leverages an SQLite database and synonym searches to locate semantic pair candidates.

It supports reading verified sentence pairs from an existing .csv file.

Notes:

Inverse sentence pairs are not stored in the DB; s1-s2 is considered to be equivalent to s2-s1. Same-sentence pairs are also excluded, since they require no evaluation. The initial number of unrated pairs is equal to:

item_count! / (2!(item_count - 2)!)

For example, a corpus consisting of 3875 unique sentences should result in

3875! / (2 x 3873!) = (3875 x 3874) / 2 = 7505875

unrated pairs.

If 250 rows are present in the verified sentence file, the number of unrated pairs after the call to the initialize() function should be 7505625.

TODO:

Implement the generation of two .csv files:

  • a list of automatically rated sentence pairs that can then be manually verified
  • a list of all the sentence pairs in the corpus, which should include both same-sentence pairs (s1-s1) and inverse pairs (s1-s2 and s2-s1)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published