Skip to content

TextpressoDevelopers/textpresso_classifiers

Repository files navigation

Build Status Coverage Status Codacy Badge Documentation Status

Introduction

Tpclassifier is a Python library that contains functions to train and apply classifiers for textual documents. It is based on Python scikit-learn library, and it provides an easy interface to train and use its classifiers. In addition, tpclassifier includes functions to transform documents from pdf and Textpresso CAS files (both generated from pdf or xml files) into text and simplify the way they are imported in the library and used by the classifiers for training, testing, and prediction.

Installing tpclassifier library

To install tpclassifier, run the following command from the root directory of the project:

pip3 install .

The installation requires Python3 and pip3 to be installed in the system.

Using the library from Python

The library can be imported as a regular Python package:

from tpclassifier import TextpressoDocumentClassifier

classifier = TextpressoDocumentClassifier()

The complete documentation of the classes and functions provided by the library can be found here.

Using the executable scripts provided by the library

tpclassifier comes with a set of executable programs that use the library as a backend to provide an easy interface to train, test, and apply classifiers for pdf or CAS documents. Go to the project wiki to see the complete documentation of these programs and for some example use cases.

About

Textpressocentral classifiers - library and scripts

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published