Automatic sentence pair tagger

A simple app that leverages an SQLite database and synonym searches to locate semantic pair candidates.

It supports reading verified sentence pairs from an existing .csv file.

Notes:

Inverse sentence pairs are not stored in the DB; s1-s2 is considered to be equivalent to s2-s1. Same-sentence pairs are also excluded, since they require no evaluation. The initial number of unrated pairs is equal to:

item_count! / (2!(item_count - 2)!)

For example, a corpus consisting of 3875 unique sentences should result in

3875! / (2 x 3873!) = (3875 x 3874) / 2 = 7505875

unrated pairs.

If 250 rows are present in the verified sentence file, the number of unrated pairs after the call to the initialize() function should be 7505625.

TODO:

Implement the generation of two .csv files:

a list of automatically rated sentence pairs that can then be manually verified
a list of all the sentence pairs in the corpus, which should include both same-sentence pairs (s1-s1) and inverse pairs (s1-s2 and s2-s1)

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
db.py		db.py
helpers.py		helpers.py
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automatic sentence pair tagger

Notes:

TODO:

About

Releases

Packages

Languages

tkarabatic/automatic-sentence-pair-tagger

Folders and files

Latest commit

History

Repository files navigation

Automatic sentence pair tagger

Notes:

TODO:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages