-
Notifications
You must be signed in to change notification settings - Fork 3
Java Developer's Guide Advanced Topics
Feature extraction and the creation of a featuregram (aka spectroram) is core to the training and classification on audio or other time series data. Each IClassifier implementation generally implements this featuregram extraction pipeline. The basic steps in creating the feature gram on which the model/classifier operates are as follows:
- Augment the data - optional and only performed during training to expand the base training data set in one or more ways. In the Java Object model this is an implementation of the ITrainingWindowTransform.
- FeatureGram creation - this is always performed on both the labeled (training)
and unlabeled data and is implemented via the Java FeatureGramExtractor class.
The steps taken by the FeatureGramExtractor are as follows.
- Break the time series segment (audio clip, etc) up into smaller sub-windows on which the features will ultimately be computed
- Extract feature(s) from each of the sub-windows. For example, an FFT might be the feature extractor applied to produce a vector of powers at different frequencies for the sub-window. This is an implementation of the Java IFeatureExtractor interface.
- Group all features from all sub-windows into a matrix with y-dimension being the feature dimension (e.g. frequency) and the x-dimensions being time.
- Optionally, apply an feature processor that examines the full feature gram matrix to apply additional computations/transforms. Computing differences across time between adjacent feature vectors is a common technique here. This is an implementation of the Java IFeatureProcessor interface.
The Java framework provided by the aisp-core projects is based on a few key classes and interfaces as follows:
- IDataWindow - defines a window of time containing a generic data type within that window. Data windows have a start and stop time, sampling rate and allow the computation of sub-windows.
- DoubleWindow is an IDataWindow implementation that uses has a vector of double values for its data type.
- SoundClip is a DoubleWindow supporting raw PCM audio data.
- ILabeledDataWindow - defines a pairing of an IDataWindow with a set of labels represented in a Java Properties instance.
- SensorRecording is an ILabeledDataWindow that uses DoubleWindow (i.e. a double[] for its data in the window).
- SoundRecording is a SensorRecording that further uses a SoundClip.
This basic model gives us the ability to represent and model any sort of data that might be captured in a window of time. For now we have focused on vectors of scalar data as the data within a window, but nothing requires this in the model.
Now that we have a data model we can start to define the classification mechanisms used on a set of training data and or an unlabeled data window to be classified. The training and classification process is represented below.
Data windows are passed through a feature creation process to develop a set of relevant data on which the classifier is trained and makes classification predictions. Features are generally smaller than the original data but contain important information on which decisions can be made. In the sound/vibration space the frequency spectrum is an important feature element.
To represent out features we define the following Java classes:
- IFeature - an empty extension of IDataWindow to provide type distinction.
- DoubleFeature - an implementation that uses a vector of doubles as its data type.
- ILabeledFeature - a pairing of an IFeature with a set of labels (again a Java Properties object).
- LabeledFeature - an simple generic implementation
Features may be computed over the whole data window or the window may be broken into sub-windows with features extracted from the sub-windows. This leads to the next level of detail depicted in the following picture.
We identify two distinct steps in the feature development process after optional sub-windowing of the data window. A process called feature extraction is applied to each sub-window resulting in an array of features (aka spectrogram). Feature extraction operates only on each sub-window of data. Examples of feature extraction are FFT and MFCC. Next, this array of features is operated on in aggregate by feature processing step. The feature processing step is able to address the whole spectrogram and perform processing across all sub-features. The following Java classes are in place to represent this processes:
- IFeatureExtractor - a simple interface defining the a function that computes a feature from a data window. This is a generic type and allows for data of any time on the data window and feature result. Some specific feature extractors mapping double[] to double[] include the following:
- FFTFeatureExtractor
- MFCCFeatureExtractor
- IFeatureProcessor - defines the processing of the array of sub-features (i.e. spectrogram) to produce a new spectrogram.
- DeltaFeatureProcessor - provides the ability to take deltas (differences) between adjacent features and append them to the spectrogram.
- NormalizingFeatureProcessor - allows for features be scaled or zero-biased in either or both the feature and time dimensions.
The IFeatureExtractor and IFeatureProcessor and a sub-windowing definitions are are combined into an IFeatureGramExtractor which, which ultimately is associated with an algorithm/classifier instance.
Finally, the IClassifier implementation generally uses one or more IFeatureGramExtractor instances to process the training data and data to be classified.
- IClassifier - provides the definition of a fully trainable classifier that can perform classifications once trained. There are many implementations of IClassifier. Here are a few of the keys ones that operate on SensorRecordings:
- GMMClassifier
- L1/Lp/EuclidianDistanceMergeKNNClassifier
- DCASEClassifier - a neural network
A classifier builder interface, IClassifierBuilder, is provide to enable easy re/creation of classifiers. Each IClassifier implementation is generally responsible for create a corresponding IClassifierBuilder implementation.
The library supports the creation of classifiers that can be trained in the server and migrated to the edge. These are IFixableClassifiers and create IFixedClassifiers. However, before moving to the more advanced topic we will build a simple classifier that implements all the key methods (i.e, train() and classify()). Then we show how to adapt the simple classifier for use on the edge.
public class ToyExampleClassifier extends TaggedEntity implements IClassifier<double[]> { private static final long serialVersionUID = 2966528639796200129L; private String primaryTrainingLabel = null; private IFeatureGramExtractor<double[], double[]> fgExtractor; /** * Hold our model of the training data. */ private Map<String, OnlineStats> labelStats = new HashMap<String, OnlineStats>(); public ToyExampleClassifier(IFeatureExtractor<double[], double[]> extractor, IFeatureProcessor<double[]> processor, int windowSizeMsec, int windowShiftMsec, double someParameter) { fgExtractor = new FeatureGramExtractor<double[],double[]>(windowSizeMsec, windowShiftMsec, extractor, processor); } /** * This simply keeps statistics (mean, stdev, etc) features grouped by label * value and then stores the model of statistics used during classification. */ public void train(String trainingLabel, Iterable<? extends ILabeledDataWindow<double[]>> data) throws AISPException { this.primaryTrainingLabel = trainingLabel; // Create feature extractor that applies the FeatureExtractionPipeline // on the iterable of data windows in a streaming fashion. // The data window is partitioned into subwindows, then the feature // extractor is called on each, and if present, the processor works on // the whole array of features for a given data window. LabeledFeatureIterable<double[], double[]> features = new LabeledFeatureIterable<double[],double[]>(data, Arrays.asList(fgExtractor)); // This loop pulls the features out of the data. Here we use an OnlineStats // object to calculate * the mean for each set of features for a given // label value. for (ILabeledFeatureGram<double[]>[] lfArray : features) { // We only have the feature extractor and process to produce a single feature gram ILabeledFeatureGram<double[]> featureGram = lfArray[0]; // Get the labelValue from the first feature. All features in the // array will have the same labels. String labelValue = featureGram.getLabels().getProperty(trainingLabel); // Get the statistics for this label value and create one if needed OnlineStats stats = labelStats.get(labelValue); if (stats == null) { stats = new OnlineStats(); labelStats.put(labelValue, stats); } // Add the features to the statistics for this label value. for (IFeature<double[]> f : featureGram.getFeatureGram().getFeatures()) stats.addSamples(f.getData()); } } @Override public Map<String, Classification> classify(IDataWindow<double[]> sample) throws AISPException { // Extract the features from this window. IFeatureGram<double[]> featureGram = fgExtractor.extract(sample); IFeature<double[]>[] features = featureGram.getFeatures(); // Compute the statistics on these features so we can compare them with // those computed during training. OnlineStats featureStats = new OnlineStats(); for (IFeature<double[]> f : features) featureStats.addSamples(f.getData()); // Now look for the label value whose stats are closest to the stats // for the feature being classified. double minDist = Double.MAX_VALUE; String minLabelValue = null; for (String labelValue : labelStats.keySet()) { OnlineStats stats = labelStats.get(labelValue); double distance = Math.abs(stats.getMean() - featureStats.getMean()); if (distance < minDist) { minDist = distance; minLabelValue = labelValue; } } // We have best label, so let's create a single Classification and put // it in the returned list You may provide other classifications as // candidates if you like, but we don't here. Map<String, Classification> cmap = new HashMap<String, Classification>(); Classification c = new Classification(this.primaryTrainingLabel, minLabelValue, 1.0); cmap.put(this.primaryTrainingLabel, c); return cmap; } @Override public String getTrainedLabel() { return this.primaryTrainingLabel; } }
The code for this example is in the samples/src/java source tree of the aisp-core/aisp-core-samples project.
The goal here is to build an implementation of IClassifier that can be trained and then provide classifications on the edge. We will reuse existing abstract helper classes and adapt the simple classifier above to build our IFixableClassifier and enable us to easily deploy an instance IFixedClassifier on an edge device. First we show the instance of IFixableClassifier that is trainable and creates an instance of IFixedClassifier. The following is based on the simple classifier created above, but implements trainFixedClassifierOnFeatures() instead of train() in order to create the instance of IFixedClassifier. The super class provides the classify() method for instances of this IFixableClassifier.
public class ToyFixableExampleClassifier extends AbstractFixableFeatureExtractingClassifier<double[], double[]> implements IFixableClassifier<double[]> { private static final long serialVersionUID = 2966528639796200129L; public ToyFixableExampleClassifier(String primaryTrainingLabel, IFeatureExtractor<double[], double[]> extractor, IFeatureProcessor<double[]> processor, int windowSizeMsec, int windowShiftMsec, double someParameter) { super(primaryTrainingLabel, extractor, null, windowSizeMsec, windowShiftMsec, false, false); } /** * This simply keeps statistics (mean, stdev, etc) features grouped by label value * and then creates the fixed classifier to use these during classification. */ @Override protected IFixedClassifier<double[]> trainFixedClassifierOnFeatures(Iterable<? extends ILabeledFeature<double[]>[]> features) throws AISPException { Map<String, OnlineStats> labelStats = new HashMap<String,OnlineStats>(); /** * This loop pulls the features out of the data. The super class creates the * iterable used here and that computes the features in a parallel and streaming * manner. Here we use an OnlineStats object to calculate the mean for each set * of features for a given label value. */ for (ILabeledFeature<double[]>[] lfArray : features) { // Get the labelValue from the first feature. All features in the array // will have the same labels. String labelValue = lfArray[0].getLabels().getProperty(primaryTrainingLabel); // Get the statistics for this label value and create one if needed OnlineStats stats = labelStats.get(labelValue); if (stats == null) { stats = new OnlineStats(); labelStats.put(labelValue, stats); } // Add the features to the statistics for this label value. for (ILabeledFeature<double[]> lf : lfArray) stats.addSamples(lf.getFeature().getData()); } // Create and return an instance of a IFixedClassifier that will use the // statistics we just computed. return new ToyFixedExampleClassifier(this.primaryTrainingLabel, this.featureExtractor, this.featureProcessor, this.windowShiftMsec, this.windowShiftMsec, labelStats); } }
Next we show the implementation of IFixedClassifier created during training. The adaptation from the simple classifier is to move the body of its classify(data) method into the classify(features) method required by the super class.
public class ToyFixedExampleClassifier extends AbstractFixedFeatureExtractingClassifier<double[],double[]> implements IFixedClassifier<double[]> { private static final long serialVersionUID = -3239508962940231962L; private final String trainingLabel; private final Map<String,OnlineStats> labelStats; protected ToyFixedExampleClassifier(String trainingLabel, List<IFeatureGramExtractor<double[], double[]>> fgeList, Map<String, OnlineStats> labelStats) { super(fgeList); this.trainingLabel = trainingLabel; this.labelStats = labelStats; } /** * Called by the super class after extracting the features from the data * passed to {@link #classify(com.ibm.watson.iot.sound.IDataWindow)}. */ @Override protected List<Classification> classify(IFeatureGram<double[]>[] features) throws AISPException { // Compute the statistics on these features so we can compare them with // those computed during training. OnlineStats featureStats = new OnlineStats(); for (IFeatureGram<double[]> fg : features) for (IFeature<double[]> f : fg.getFeatures()) featureStats.addSamples(f.getData()); // Now look for the label value whose stats are closest to the stats // for the feature being classified. double minDist = Double.MAX_VALUE; String minLabelValue = null; for (String labelValue : labelStats.keySet()) { OnlineStats stats = labelStats.get(labelValue); double distance = Math.abs(stats.getMean() - featureStats.getMean()); if (distance < minDist) { minDist = distance; minLabelValue = labelValue; } } // We have best label, so let's create a single Classification and put // it in the returned list You may provide other classifications as // candidates if you like, but we don't here. List<Classification> clist = new ArrayList<Classification>(); Classification c = new Classification(trainingLabel,minLabelValue,1.0); clist.add(c); return clist; } @Override public String getTrainedLabel() { return this.trainingLabel; } }
A special classifier is provided to enable the saving of extracted features to a comma-separated value (CSV) file. Saving via a classifier allows the features to be exported through the use of the existing model training tool. The first thing to do is define a model specification JavaScript file. For example, create the following in the file featurewriter.js:
var extractor = new MFCCFeatureExtractor(40); var processor1 = new NormalizingFeatureProcessor(true,true, false, true); var processor2 = new DeltaFeatureProcessor(2, [1,1,1]); var processor = new PipelinedFeatureProcessor([processor1, processor2]); var classifier = new FeatureWritingClassifier("features.csv", extractor, processor, 40,40);
and then run this classifier using the train tool as follows:
train -label "anything" -sound-dir yoursounds/. -model jsfile:featurewriter.js
The training label is ignored as it is not really used to train any model. However, you could use it to pass in the name of the features file on the command line by changing the classifier to be something like the following:
var classifier = new FeatureWritingClassifier(trainingLabel + ".csv", extractor, processor, 40,40);
With the following then being equivalent to the first example:
train -label features -sound-dir yoursounds/. -model jsfile:featurewriter.js
The output format is a header-less CSV (comma-separate value) file with each column defined as follows:
- Column 1: Window index (0-based). Windows appear in time order within the CSV file.
- Column 2: Sub-window index - 0-based index of the subwindow within the window with the index in column 1. Subwindows are listed in time order within the CSV file.
- Column 3: text specification of the layout of the feature data in the form P[xQ[xR...]. When more than 1 dimension is present, data is flattened in row-major order (C-style). So for a 2x3 tensor 0, 1, 2], [3, 4, 5 becomes [0, 1, 2, 3, 4, 5].
- Column 4: List of labels associated with this feature - a semi-colon separated list of name=value pairs. These will be the same for all windows with the same index (from column 1). Windows with different window indexes need not have the same set of label names.
- Column 5-N: Double-valued features extracted from the window, where N is fixed within a CSV file.
- TrainingSetInfo - reads through an Iterable to provide information about the total time and label content.