-
-
Notifications
You must be signed in to change notification settings - Fork 91
Evaluation Measures
janvanrijn edited this page Aug 15, 2018
·
2 revisions
When uploading a run, evaluation measures can be specified. There are two forms of evaluation measures:
- Measures that are also calculated by the Evaluation Engine. See 'Evaluation Engine' for a list. These measures can still be submitted, but the result of the evaluation engine takes precedence. The values will be compared, and if they differ by more than 0.00001 (10^{-5}) a warning is recorded in the database.
- Measures that are not calculated by the Evaluation Engine, for example information about the operating system or run time. See 'User Measures'. These measures will be stored in the database.
The measures that are not calculated by the evaluation engine, and therefore can freely uploaded by the workbenches. The following are of interest:
- usercpu_time_millis, usercpu_time_millis_training, usercpu_time_testing: the number of milliseconds the CPU was busy on training/testing/both. Note that cpu time is hard to measure (requires low-level libraries) and is not widely supported across platforms.
- wall_clock_time_millis, wall_clock_time_millis_training, wall_clock_time_testing: the number of milliseconds that passed in between the start of {training, testing} and the end of {training, testing}. Does not take into account the number of cores.
- os_information: Used in Weka-based runs, records information about the OS that the JVM ran on.
- scimark_benchmark: Used in Weka-based runs, benchmarks the JVM using 5 different measures.
- run_cpu_time: legacy, old expdb measure
- run_memory: legacy, old expdb measure
- run_virtual_memory: legacy, old expdb measure
The following evaluation measures are currently calculated by the Evaluation Engine:
- mean_absolute_error
- mean_prior_absolute_error
- number_of_instances
- root_mean_squared_error
- root_mean_prior_squared_error
- relative_absolute_error
- root_relative_squared_error
- average_cost (based on cost matrix)
- total_cost (based on cost matrix)
- mean_absolute_error
- mean_prior_absolute_error
- root_mean_squared_error
- root_mean_prior_squared_error
- relative_absolute_error
- root_relative_squared_error
- prior_entropy
- kb_relative_information_score
- predictive_accuracy
- kappa
- number_of_instances
- precision (per class)
- recall (per class)
- f_measure (per class)
- area_under_roc_curve (per class)
- confusion_matrix
Drafts:
Proposals:
Other: