Skip to content

Latest commit

 

History

History
177 lines (117 loc) · 9.23 KB

README.md

File metadata and controls

177 lines (117 loc) · 9.23 KB

SpectralMachine: SpectraKeras

Machine learning software for rapid spectral analysis. While Raman spectra were the initilal focus, SpectraKeras is flexible to be applied for classification using any spectra (from XRD, FTIR and beyond). The previous generation software (SpectraLearnPredict) is no longer developed.

SpectraKeras

  • Currently supported ML architectures:
    • DNNClassifier (TensorFlow, TensorFlow-Lite)
    • Convolutional Neural Networks (TensorFlow, TensorFlow-Lite)
  • Required libraries for prediction:

Credits and References

If you use SpectralMachine or SpectraKeras, we request that you reference the papers/resources on which SpectralMachine is based:

Installation

Installation from available wheel package

If available from the main site, you can install SpectraKeras by running:

python3 -m pip install --upgrade spectrakeras-2024.11.14.2-py3-none-any.whl

SpectraKeras_CNN and Spectrakeras_MLP are available directly from the command line. NOTE: The Utilities in the Utilities folder are not included in the package, and can be run locally as needed.

Make your own wheel package

Make sure you have the PyPA build package installed:

python3 -m pip install --upgrade build

To build the wheel package rom the SpectraKeras folder run:

python3 -m build

A wheel package is available in the subfolder dir. You can install it following the instructions shown above.

Compatibility and dependences

This software requires Python (3.9 or higher). It has been tested with Python 3.9 or higher which is the recommended platform. It is not compatible with python 2.x. Additional required packages:

numpy
scikit-learn (>=0.18)
scipy
matplotlib
pandas
pydot
graphviz
h5py
tables
tensorflow (>=2.16.2)

In addition, these packages may be needed depending on your platform (via apt-get in debian/ubuntu or port in OSX):

python3-tk
graphviz

These are found in Unix based systems using common repositories (apt-get for Debian/Ubuntu Linux, or MacPorts for MacOS). More details in the scikit-learn installation page.

TensorFlow is needed only if flag is activated. Instructions for Linux and MacOS can be found in TensorFlow installation page. Pip installation is the easiest way to get going. Tested with TensorFlow 2.x (2.15 or higher preferred). TF 2.15 is the currently supported release.

Inference can be carried out using the regular tensorflow, or using tensorflow-lite for quantized models. Loading times of tflite (direct or via tflite-runtime) are significantly faster than tensorflow with minimal loss in accuracy. SpectraKeras provides an option to convert tensorflow models to quantized tflite models. TFlite models have been tested in Linux x86-64, arm7 (including Raspberry Pi3) and aarm64, MacOS, Windows. To use quantized models, TF 2.13 or higher is recommended.

Inference using the Coral EdgeTPU tensorflow-lite requires the libedgetpu libraries compatible with the supported and current version of tflite-runtime (version v2.13.0 or hiigher, v2.16.1 recommended), which is also required (Instructions and binaries can be found here. More information on installation of such libraries at Coral EdgeTPU.

Usage (SpectraKeras)

SpectraKeras_CNN

Train (Random cross validation):

python3 SpectraKeras_CNN.py -t <learningFile>

Train (with external validation):

python3 SpectraKeras_CNN.py -t <learningFile> <validationFile>

Predict:

python3 SpectraKeras_CNN.py -p <testFile>

Batch predict:

python3 SpectraKeras_CNN.py -b <folder>

Display Neural Netwrok Configuration:

python3 SpectraKeras_CNN.py -n <learningFile>

Convert model to quantized tflite:

python3 SpectraKeras_CNN.py -l <learningFile>

Determine accuracy using h5 testing file with spectra:

python3 SpectraKeras_CNN.py -a <testFile>

SpectraKeras_MLP

Train (Random cross validation):

python3 SpectraKeras_MLP.py -t <learningFile>

Train (with external validation):

python3 SpectraKeras_MLP.py -t <learningFile> <validationFile>

Predict:

python3 SpectraKeras_MLP.py -p <testFile>

Batch predict:

python3 SpectraKeras_MLP.py -b <folder>

Convert model to quantized tflite:

python3 SpectraKeras_MLP.py -l <learningFile>

Determine accuracy using h5 testing file with spectra:

python3 SpectraKeras_MLP.py -a <testFile>

Formatting input file for training

The main idea behind the software is to train classification or regression models from plain spectra (which can be Raman, but they can be any spectra or diffraction profiles, as long as the model is consistent), rather than from manually selected features (such as bands, peaks, widths, etc). So, suppose one has training files similar to this, where the first column is the Raman shiift, the second is intensity:

1000  123
1001  140
1002  180
1003  150
...

Let's say this file correspond to label 1, and now one has a collection of files that will be used for training each with its own label, the input file will look like this:

0  1000  1001  1002  1003 ...
lab1  123 140  180  150  ...
lab2 ... ... ... ... ...

Essentially each line in the input file corresponds to a training file with its label. during training the model will learn (either through a simple deep MLP network using SpectraKeras_MLP.py, or through a Convolutional Network using SpectraKeras_CNN.py, which is recommended) to extract features needed for prediction. Note that all spectra needs to have the same Raman shifts max min and step.

Of course it is not expected that the user manually compiles the training file from a possibly large collection of spectra. For that, GenericDataMaker.py is available in the Utilities folder, that can be used to automatically create such files. Basically you can run from the folder where you have your spectra:

python3 GenericDataMaker.py <learnfile> <enInitial> <enFinal> <enStep>

The script will interpolate each spectra within the Raman shifts parameters you set above. Note that there are some basic configuration that you may need to change in the GenericDataMakerp.py for your case (such as delimiter between data, extension of the files, etc).

One can use the same to create a validation file, or you can use other scripts also provided to split the training set into training+validation. That can be done randomly within SpectraKeras, but the split will be different every time you run it.

Once models are trained trained, prediction on individual files can be made using simply formatted ASCII files (like in the example above).

Training data

We do not provide advanced training sets, some of which can be found online. We only provide a simple Raman dataset mainly for testing purposes: it is loosely based on 633nm data from Ferralis et al. Carbon 108 (2016) 440.

More on Machine Learning tools used

Known Issues

None