-
Notifications
You must be signed in to change notification settings - Fork 0
C. elegans Paper Classification with Neural Networks
These instructions are based on an updated version of the Neural Network software.
Change to the directory you want the classifier directory to be hosted in and unpack the tar file:
tar xvfz classifier.tgz
Then change into the classifier directory
cd classifier
docker load < tpnn.docker.image.tar.gz
Put plain text files of documents that are to be classified in the directory
fulltext.text
that is located in the classifier/ directory. The system allows for incremental classifications, i.e., you can add more files into the directory and run the classification again. The script compares files in the fulltext.text/ directory with the fulltext.span/ directory, and only new files are processed. If you want to run a classification from scratch, delete all files within the fulltext.span/ directory.
- Start the Docker Image
Run
./run_tpnn.sh <full path of classifier directory>
This gets you into a bash shell within the Docker container.
- Run the Classification
Change to the directory /data/textpresso/ with
cd /data/textpresso
and then run
./classifyPapers.3.0.sh
It runs inferences with models that are stored in /data/textpresso/models4production. Which of these models are used is specified in the ./classifyPapers.3.0.sh script.
Results are stored in the two directories records/ and html/. Each run is identified by a directory within these directories, named after the date and time when the classification script was run (UTC time). The records/ directory contains lists for each class with a score ranging from 0 to 100, 100 strongly indicating that the paper belongs to the corresponding class, and 0 indicating that it does not. The html/ directory contains HTML files displaying the results in HTML format. The scores are mapped to a confidence level of NEG, LOW, MEDIUM and HIGH, and the papers are linked to a curation-specific site. The base URL may be modified in the ./classifyPapers.3.0.sh script. Note that the filenames provided in the fulltext.text/ directory are used as paper identifiers to form the corresponding links. This may have to be modified in the ./classifyPapers.3.0.sh script to accommodate particular requirements.
© 2024 Hans-Michael Müller