-
Notifications
You must be signed in to change notification settings - Fork 15
Running Experiment
The following are the steps that we use to run our batch experiments.
Acquire the target method list by running ListMethodsBatch#justRun
with your desired method filter (for more information on how to write a method filter, see here) and taking the target method list generated (typically found in D:\linyun\git_space\SF100-clean\evoTest-reports
). You may want to further restrict the target method list by taking a random sample (this has to be done manually).
Modify the Recorders in EvoSuite if required, to record the information you need. The two main Recorders used are:
-
FitnessEffectiveRecorder
: records results of each run per row in the file<project name>_evotest.xlsx
-
IterFitnessEffectiveRecorder
: records results of all iterations for each method per row in the file<project name>_evotest_<# of iterations>times.xlsx
Basically just adding information available in the EvoTestResult
object to the rowData
list. You might need to go deeper to modify EvoTestResult
and even the algorithms (e.g. DynaMOSA
, MOSA
) if the information you need is not available in EvoTestResult
.
In Evosuite, navigate to 'Run' -> 'Run Configurations' -> 'Maven Build' and run 'evosuite generate jars'. The jar files will be generated in the folder <EvoSuite project directory>/generated-jars
. Only the evosuite-shell-1.0.7-SNAPSHOT.jar
file is needed to run EvoSuite.
- Connect to SOC VPN
- Use Remote Desktop Connection to use NUS research lab computer
- Login to ncl.sg
- Click on 'Running Experiments'
- If any of the nodes have non-running status, try to stop and then start the node again. This might take 5-10 minutes.
- Additionally, connect to each node to see if they work
- Connect to the NCL gateway (using XShell)
- SSH to each experiment using the command you see under 'Realization' -> 'View' in the ncl.sg website:
n0.e<experiment #>.SNCyberP2-CS.ncl.sg
- If any of the nodes are down (either it takes very long to connect or has error 'No route to host'), contact NCL support through email
Use the following formula to find out how long it will take to finish the experiment, based on the number of available nodes.
Total number of days to run =
(# methods) * (# iterations) * (time budget/s) * (# experiment types) / [(# working nodes) * (# scripts per node) * 60s * 60min * 24h]
- e.g. 2000 * 5 * 200s * 6 / (20 * 6 * 60 * 60 * 24) ~= 1.15 days
-
# experiment types
: If you are testing a new approach against EvoSuite, you will be experimenting a total of 2 approaches. Experimenting 3 different search algorithms will give you 2 * 3 = 6 experiment types -
# scripts per node
: This number depends on the available resources for each of your nodes. Previous experiments have shown that the 'sweet spot' for EvoSuite to run is with at least 4 CPU cores and 16GB memory.
Adjust the configurations (number of methods, number of iterations etc.) as required so that the experiment will be able to end in time. Then, calculate the number of scripts needed for each experiment type. Note that each experiment type should use the same number of scripts, as they will share the same method lists that will be split equally.
For example, you might have 5 working nodes and you plan to run 6 scripts on each node, which means a total of 30 scripts can be running in parallel. However you have 4 experiment types, which means that you can only split your method list to only 30 // 4 = 7 different parts. This means that only 4 * 7 = 28 scripts will be run in parallel.
Prepare the first script for each experiment type, then use the copy-scripts.py
file to multiply the scripts according to the number of iterations set in the file. The scripts generated should only have 2 differences:
TARGET_METHOD_FILE=$ROOT/graph-final-<script #>.txt
: The name of the method list file that this script will use.
FOLDER_NAME=graph-final-<experiment type>-<script #>
: The name of the folder that this script will record the results into.
Each text file should have approximately (# methods / # scripts for each experiment type) methods each. No script to do this. By default the file names should be graph-final-<1, 2, ...>.txt
.
- Namely the shell .jar file,
- If using XShell, you can use Xftp to transfer files
- If using Remote Desktop, add local file folder to access your local files
- The NCL gateway and all the nodes share the same file system
- Have to run these commands in the NCL gateway/one of the nodes:
sed -i -e 's/\r$//' ./graph-final-*.sh
: To fix return character issue in the scripts (if you used copy-scipts.py
in windows)
chmod +x ./graph-final-*.sh
: To enable execution rights for the scripts
- Just need to modify the first method file to have only 2-3 methods, then run the first script(s). When done, remember to replace the modified method list file before starting the experiment.
-
tmux
: Enter tmux mode -
Ctrl + B
->%
: Create horizontal window -
Ctrl + B
->"
: Create vertical window -
Ctrl + B
->Up
/Down
/Left
/Right
: Navigate windows -
exit
: Delete current window -
Ctrl + B
->d
: Detach session -
tmux ls
: List all active sessions -
tmux attach -t <session #>
: Attach session <session #>
Run one script per tmux window. If successful, there should be a Progress
and Complete
bar in each window to show the progress for the current running method.
Transfer all the result folders back to your local system. Run merge-results.py
in the same directory as the result folders. This will create Excel files for each approach (named all-<your approach name>.xlsx
) as well as an Excel file containing averages for each approach (named merged_results.xlsx
).
The all-<your approach name>.xlsx
files contain statistics for each iteration, as well as averages of these statistics at the rightmost columns. It may be the case that some cells are empty due to errors during execution - these entries will not be included in the averages. Typical approaches to interpreting the results include analysing the execution time and initialisation overhead to see if the timings are reasonable, or looking at the missing branches for debugging purposes.