Skip to content

Return a quality indicator (in %) of any text, using dictionnary

License

Notifications You must be signed in to change notification settings

Inist-CNRS/text-quality-indicator

Repository files navigation

Build Status bitHound Overall Score

Text Quality Indicator

Return a indicator (in %) of any text, using dictionnary.

The aim

TQI is a node.js written module which get any text data and return you a number regarding the quality of it.

How does it work ?

TQI compares your text to a list of words comming from large affix dictionnaries in some languages.

Which languages do TQI support ?

TQI supports all languages present in the list of dictionaries for Hunspell.

You could use all languages which are in nodes_modules/dictionnaries or a personnel dictionnary.

How to use it ?

Requirements

Hunspell (version >= 1.3)

sudo apt-get install hunspell

Using our module in your project :

npm install --save text-quality-indicator
// Load NPM Module
const Tqi = require('text-quality-indicator'),
    tqi = new Tqi();

// correct/mispelled words are disable by default. To activate it : 
// you can also set a custom timeout for hunspell calls (default to 5 sec)
const options = { wordsResult: true, timeout: 5 }

// Analyze a file
tqi.analyze(file.txt, options).then((result) => {
  console.log("result : ", result);
}

// Will return you :
{ correct: 3,
  misspelled: 0,
  rate: 100,
  words: { correct: [ 'somme', 'english', 'words' ], mispelled: [] } 
}

When you init TQI you can send an array of langage's code, a path to a personnal dictionnary or a mix of both:

const Tqi = require('text-quality-indicator'),
      tqi = new Tqi("en"),
      tqiEnFr = new Tqi(["en", "fr"]);
      tqiEnFrAndMyDictionnary = new Tqi(["en", "fr", "/path/to/my/dictionnary"]);

Using our CLI programm

npm install -g text-quality-indicator
tqi --help

Cli examples:

  • On a sample french txt files containing 1 "bad word":

      cat ./test/data/fr-sample.txt
      -> En se réveillant un matin après des rêves agités, Gregor Samsa se retrouva, dans son lit, métamorphosé en un monstrueux insecte.

    Lauch TQI with fr lang option :

      tqi -d fr ./test/data/fr-sample.txt 

    Will return you:

      fr-sample.txt => { correct: 20, mispelled: 1, rate: 95.23809523809523 }
  • On an english folder containing txts :

      tqi /path/to/folder

    English is the default lang used.

You can ask cli to send back you the corect/mispelled words :

./bin/cli.js -w ./pathToTxt.txt