For each experiment described in the report, it possibile to replicate them performing the following steps:
- 1 - Launch one of the four Init.sh script
- 2 - Use the newly generated training and test sets with the CRF++ commands crf_learn and crf_test
- 3 - Evaluate the performances with the conlleval.pl script
Cross-Validation is executable by launching the cross-validation.sh script. Every time the script is run, a series of question are asked to the user, in order to perform CV properly. Here's an example:
Enter k for cross-validation > 10
Enter the hyperparameter > 2
Entern number of threads > 20
Specify the template to be used > template.Baseline
Training set (file name)> train.Baseline
Test set (file name)> test.Baseline
What features strategy did you use? (1 - Baseline, 2 - Lemma, 3 - Suffix, 4 - Last 2 Chars)> 1
Do you want to keep shuffle? (y/n) > y
Do you want to keep shuffle? (y/n) > y
Do you want to keep shuffle? (y/n) > y
Do you want to keep shuffle? (y/n) > y
Do you want to keep shuffle? (y/n) > n
After some shuffles, the CV will be performed. An ouput similar to this one will be returned:
AVERAGE SCORES
Accuracy: 98.315%
Precision: 96.63%
Recall: 93.599%
F-1: 95.08%
UNBIASED VARIANCES
Accuracy: 8.57224%
Precision: 39.7916%
Recall: 156.189%
F-1: 90.7904%
The output generated by the CV on the folds and the final average scores and unbiased variances will be stored, respectively, in files having the following formats:
- outputFold10.param2.strategy1
- score.variancesFolds10Param2strategy1
Several templates are already available. It is reccommended to use them with respect to the features strategy adopted. The templates present are:
- template.Baseline
- template.Lemma
- template.Suffix
- template.Last2chars
- Alessandro Rizzuto - ID 187156 - Balthus1989