GitHub - heerme/SAX-SEQL

Description

This page describes the usage of the SAX-SEQL software and supports our publication:

Time Series Classification by Sequence Learning in All-Subsequence Space, ICDE 2017 IEEE International Conference on Data Engineering, San Diego, Thach Le Nguyen, Severin Gsponer, Georgiana Ifrim (Insight Centre for Data Analytics - University College Dublin)

The original SEQL software can be found here: https://github.com/heerme/seql-sequence-learner

Installation

To compile execute following commands in the src directory:

mkdir -p build
cd build
mkdir -p Release
cd Release
cmake -DCMAKE_BUILD_TYPE=Release ../../
make

How to Use

Convert time series data to SAX Usage:

./sax_convert [-n token_type] [-s reduction_strategy] [-N window_size] [-w word_length] [-a alphabet_size] [-i train_input] [-o train_output] [-I test_input] [-O test_output]

Train using ./seql_learn Usage:

./seql_learn    [-o objective_function] [-A alphabet_size] [-m minsup] [-l minpat] [-L maxpat] [-g maxgap] [-r traversal_strategy ]
                [-T #round] [-n token_type] [-c convergence_threshold] [-C regularizer_value] [-a l1_vs_l2_regularizer_weight]
                [-v verbosity] train_file model_file

Default values for parameters:
    [-o objective: 0 or 2] Objective function. Choice between logistic regression (-o 0) and squared-hinge support vector ma-
     chines (-o 2). By default set to logistic regression.
	[-A alphabet_size] Should be consistent with the input for sax_convert.
    [-g maxgap >= 0] Maximum number of consecutive gaps or wildcards allowed in a feature, e.g., a**b,
     is a feature of size 4 with any 2 characters from the input alphabet in the middle. By default
     set to 0.
    [-C regularizer value > 0] Value of the regularization parameter. By default set to 1.
    [-a alpha in [0,1]] Weight of l1 vs l2 regularizer for the elastic-net penalty. By default set to 0.2, i.e., 0.8*l1 + 0.2*l2 regularization.
    [-l minpat >= 1] Threshold on the minimum length of any feature. By default set to 1.
    [-L maxpat] Threshold on the maximum length of any feature. By default the maximum length
     is unrestricted, i.e., at most as long as the longest sequence in the training set.
    [-m minsup >= 1] Threshold on the minimum support of features, i.e., number of sequences containing
     a given feature. By default set to 1.
    [-n token type: 0 or 1] Word or character-level token. Words are delimited by white spaces. By default
     set to 1, character-level tokens.
    [-r traversal strategy: 0 or 1] Breadth First Search or Depth First Search traversal of the search tree.
     By default set to BFS.
    [-c convergence threshold >= 0] Stopping threshold based on change in aggregated score predictions.
     By default set to 0.005.
    [-T maxitr] Number of optimization iterations. By default set to the maximum between 5,000
     and the number of iterations resulting by using a convergence threshold on the aggregated
     change in score predictions.
    [-v verbosity: 1 to 5] Amount of printed detail about the training of the classifier. By default set to 1
     (light profiling information).

Prepare the final model using ./seql_mkmodel (this builds a trie on the features of the model for fast classification). Usage:

./seql_mkmodel [-i model_file] [-o binary_model_file] [-O predictors_file]

Classify using ./seql_classify (apply the learned model on new examples). Usage:

./seql_classify [-n token_type: 0 word tokens, 1 char tokens; by default set to 1] [-t classif_threshold: default 0] [-v verbosity level: default 0] test_file binary_model_file

For multiclass data:

./seql_multiclass [train data] [test data] [output directory] [window size] [word length] [alphabet size]

Example

For binary data:

./sax_convert -n 0 -s 1 -N 60 -w 16 -a 4  -i data/Coffee_TRAIN -o sax.train -I data/Coffee_TEST -O sax.test
./seql_learn -n 1 -v 1 -A 4 -d 1 sax.train seql.model
./seql_mkmodel -i seql.model -o seql.model.bin -O seql.predictor
./seql_classify -n 1 -v 0 -p seql.predictor -d 1 sax.test seql.model.bin

Optionally one can tune the classification threshold on the training set, to minimize the number of training errors:

 ./seql_classify_tune_threshold_min_errors -n 1 -v 2 data/sax.train seql.model.bin

    Best threshold:0.0746284

and use the best theshold for classifying the test set:

./seql_classify -n 1 -v 0 -t 0.0746284 -p seql.predictor -d 1 sax.test seql.model.bin

Or we can do all the above steps in one line of command:

./sax_seql -t data/Coffee_TRAIN -T data/Coffee_TEST -d [directory for output] -n 60 -w 16 -a 4

Disclaimer

These software distributions are open source, licensed under the GNU General Public License (v3 or later). Note that this is the full GPL, which allows many free uses, but does not allow its incorporation (even in part or in translation) into any type of proprietary software which you distribute. Commercial licensing is also available; please contact us if you are interested.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
data		data
presentation		presentation
scripts		scripts
.cproject		.cproject
.gitignore		.gitignore
.project		.project
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
SNode.cpp		SNode.cpp
SNode.h		SNode.h
basic_symbol.h		basic_symbol.h
common.h		common.h
common_string_symbol.h		common_string_symbol.h
darts.h		darts.h
mmap.h		mmap.h
mytrie.h		mytrie.h
sax_converter.cpp		sax_converter.cpp
sax_converter.h		sax_converter.h
sax_seql_exec.cpp		sax_seql_exec.cpp
seql_classify.cpp		seql_classify.cpp
seql_classify.h		seql_classify.h
seql_classify_exec.cpp		seql_classify_exec.cpp
seql_classify_tune_threshold_min_errors.cpp		seql_classify_tune_threshold_min_errors.cpp
seql_learn.cpp		seql_learn.cpp
seql_learn.h		seql_learn.h
seql_learn_exec.cpp		seql_learn_exec.cpp
seql_mkmodel.cpp		seql_mkmodel.cpp
seql_mkmodel.h		seql_mkmodel.h
seql_mkmodel_exec.cpp		seql_mkmodel_exec.cpp
seql_multiclass.cpp		seql_multiclass.cpp
str2node_string_symbol.cpp		str2node_string_symbol.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Description

Installation

How to Use

Example

Disclaimer

About

Releases

Packages

Languages

License

heerme/SAX-SEQL

Folders and files

Latest commit

History

Repository files navigation

Description

Installation

How to Use

Example

Disclaimer

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages