Skip to content

C. elegans Paper Classification with Neural Networks

goldturtle edited this page Apr 18, 2024 · 20 revisions

These instructions are based on an updated version of the Neural Network software.

Deploy Working Directory

Change to the directory you want the classifier directory to be hosted in and unpack the tar file:

tar xvfz classifier.tgz

Then change into the classifier directory

cd classifier

Load the docker image

docker load < tpnn.docker.image.tar.gz

Populate the Text Directory

Put plain text files of documents that are to be classified in the directory

fulltext.text

that is located in the classifier/ directory. The system allows for incremental classifications, i.e., you can add more files into the directory and run the classification again. The script compares files in the fulltext.text/ directory with the fulltext.span/ directory, and only new files are processed. If you want to run a classification from scratch, delete all files within the fulltext.span/ directory.

  • Start the Docker Image

Run

./run_tpnn.sh <full path of classifier directory>

This gets you into a bash shell within the Docker container.

  • Run the Classification

Change to the directory /data/textpresso/ with

cd /data/textpresso

and then run

./classifyPapers.3.0.sh

It runs inferences with models that are stored in /data/textpresso/models4production. Which of these models are used is specified in the ./classifyPapers.3.0.sh script.

Results

Results are stored in the two directories records/ and html/. Each run is identified by a directory within these directories, named after the date and time when the classification script was run (UTC time). The records/ directory contains lists for each class with a score ranging from 0 to 100, 100 strongly indicating that the paper belongs to the corresponding class, and 0 indicating that it does not. The html/ directory contains HTML files displaying the results in HTML format. The scores are mapped to a confidence level of NEG, LOW, MEDIUM and HIGH, and the papers are linked to a curation-specific site. The base URL may be modified in the ./classifyPapers.3.0.sh script. Note that the filenames provided in the fulltext.text/ directory are used as paper identifiers to form the corresponding links. This may have to be modified in the ./classifyPapers.3.0.sh script to accommodate particular requirements.