Running Experiment

The following are the steps that we use to run our batch experiments.

0. Acquire target method list

Acquire the target method list by running ListMethodsBatch#justRun with your desired method filter (for more information on how to write a method filter, see here) and taking the target method list generated (typically found in D:\linyun\git_space\SF100-clean\evoTest-reports). You may want to further restrict the target method list by taking a random sample (this has to be done manually).

1. Modify Recorders

Modify the Recorders in EvoSuite if required, to record the information you need. The two main Recorders used are:

FitnessEffectiveRecorder: records results of each run per row in the file <project name>_evotest.xlsx
IterFitnessEffectiveRecorder: records results of all iterations for each method per row in the file <project name>_evotest_<# of iterations>times.xlsx

Basically just adding information available in the EvoTestResult object to the rowData list. You might need to go deeper to modify EvoTestResult and even the algorithms (e.g. DynaMOSA, MOSA) if the information you need is not available in EvoTestResult.

2. Generate jar file

In Evosuite, navigate to 'Run' -> 'Run Configurations' -> 'Maven Build' and run 'evosuite generate jars'. The jar files will be generated in the folder <EvoSuite project directory>/generated-jars. Only the evosuite-shell-1.0.7-SNAPSHOT.jar file is needed to run EvoSuite.

3. Check the number of working (NCL) nodes

Connect to SOC VPN
Use Remote Desktop Connection to use NUS research lab computer
Login to ncl.sg
Click on 'Running Experiments'
If any of the nodes have non-running status, try to stop and then start the node again. This might take 5-10 minutes.
Additionally, connect to each node to see if they work
Connect to the NCL gateway (using XShell)
SSH to each experiment using the command you see under 'Realization' -> 'View' in the ncl.sg website: n0.e<experiment #>.SNCyberP2-CS.ncl.sg
If any of the nodes are down (either it takes very long to connect or has error 'No route to host'), contact NCL support through email

4. Calculate number of scripts required

Use the following formula to find out how long it will take to finish the experiment, based on the number of available nodes.

Total number of days to run =

(# methods) * (# iterations) * (time budget/s) * (# experiment types) / [(# working nodes) * (# scripts per node) * 60s * 60min * 24h]

e.g. 2000 * 5 * 200s * 6 / (20 * 6 * 60 * 60 * 24) ~= 1.15 days
# experiment types: If you are testing a new approach against EvoSuite, you will be experimenting a total of 2 approaches. Experimenting 3 different search algorithms will give you 2 * 3 = 6 experiment types
# scripts per node: This number depends on the available resources for each of your nodes. Previous experiments have shown that the 'sweet spot' for EvoSuite to run is with at least 4 CPU cores and 16GB memory.

Adjust the configurations (number of methods, number of iterations etc.) as required so that the experiment will be able to end in time. Then, calculate the number of scripts needed for each experiment type. Note that each experiment type should use the same number of scripts, as they will share the same method lists that will be split equally.

For example, you might have 5 working nodes and you plan to run 6 scripts on each node, which means a total of 30 scripts can be running in parallel. However you have 4 experiment types, which means that you can only split your method list to only 30 // 4 = 7 different parts. This means that only 4 * 7 = 28 scripts will be run in parallel.

5. Create scripts used to run the experiment

Prepare the first script for each experiment type, then use the copy-scripts.py file to multiply the scripts according to the number of iterations set in the file. The scripts generated should only have 2 differences:

TARGET_METHOD_FILE=$ROOT/graph-final-<script #>.txt: The name of the method list file that this script will use.

FOLDER_NAME=graph-final-<experiment type>-<script #>: The name of the folder that this script will record the results into.

6. Create method list files

Each text file should have approximately (# methods / # scripts for each experiment type) methods each. No script to do this. By default the file names should be graph-final-<1, 2, ...>.txt.

7. Transfer files to the NCL gateway

Namely the shell .jar file,
If using XShell, you can use Xftp to transfer files
If using Remote Desktop, add local file folder to access your local files
The NCL gateway and all the nodes share the same file system
Have to run these commands in the NCL gateway/one of the nodes:

sed -i -e 's/\r$//' ./graph-final-*.sh: To fix return character issue in the scripts (if you used copy-scipts.py in windows)

chmod +x ./graph-final-*.sh: To enable execution rights for the scripts

8. Test with a few methods (optional)

Just need to modify the first method file to have only 2-3 methods, then run the first script(s). When done, remember to replace the modified method list file before starting the experiment.

9. Use tmux to run scripts in parallel

tmux: Enter tmux mode
Ctrl + B -> %: Create horizontal window
Ctrl + B -> ": Create vertical window
Ctrl + B -> Up/Down/Left/Right: Navigate windows
exit: Delete current window
Ctrl + B -> d: Detach session
tmux ls: List all active sessions
tmux attach -t <session #>: Attach session <session #>

10. Run the scripts

Run one script per tmux window. If successful, there should be a Progress and Complete bar in each window to show the progress for the current running method.

11. Merge all results

Transfer all the result folders back to your local system. Run merge-results.py in the same directory as the result folders. This will create Excel files for each approach (named all-<your approach name>.xlsx) as well as an Excel file containing averages for each approach (named merged_results.xlsx).

12. Interpret results

The all-<your approach name>.xlsx files contain statistics for each iteration, as well as averages of these statistics at the rightmost columns. It may be the case that some cells are empty due to errors during execution - these entries will not be included in the averages. Typical approaches to interpreting the results include analysing the execution time and initialisation overhead to see if the timings are reasonable, or looking at the missing branches for debugging purposes.

Home
Compilation
Dataset
- SF100 Dataset
Features
Maintainance
Extension
- Beginner's guide to working with Evosuite++
- Adding new output variables
Experiment
FAQ

Provide feedback

Saved searches

Use saved searches to filter your results more quickly