Skip to content
This repository has been archived by the owner on Dec 14, 2020. It is now read-only.

karelvaculik/ewaldis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EWALDIS

An algorithm for mining discriminative patterns in dynamic graphs. Accompanying paper was submitted to Discovery Science (DS) 2018 conference.

Building EWALDIS (in the project directory) in Linux

  • Requires cmake of version >= 3.9 and c++17 compiler
cmake .
make

Running the experiments for DS 2018 paper

  • Results will be written into results/* directories
./ewaldis -v --vertices data/dblp1/dblp_vertices.csv --edges data/dblp1/dblp_edges.csv --train_pos data/dblp1/dblp_train_positive_events.csv --train_neg data/dblp1/dblp_train_negative_events.csv -o results/dblp1 -e edges -a nominal -p 20 -n 30 --test_pos data/dblp1/dblp_test_positive_events.csv --test_neg data/dblp1/dblp_test_negative_events.csv -u -m 5 -w 1000 -r 0.3 -pu 1.0 -su 0.5 -ep 25 -se 25 -ew 10 --unicross --baseline -s 1

./ewaldis -v --vertices data/dblp2/dblp_vertices.csv --edges data/dblp2/dblp_edges.csv --train_pos data/dblp2/dblp_train_positive_events.csv --train_neg data/dblp2/dblp_train_negative_events.csv -o results/dblp2 -e edges -a nominal -p 20 -n 30 --test_pos data/dblp2/dblp_test_positive_events.csv --test_neg data/dblp2/dblp_test_negative_events.csv -u -m 5 -w 1000 -r 0.3 -pu 1.0 -su 0.5 -ep 25 -se 25 -ew 10 --unicross --baseline -s 1

./ewaldis -v --vertices data/dblp3/dblp_vertices.csv --edges data/dblp3/dblp_edges.csv --train_pos data/dblp3/dblp_train_positive_events.csv --train_neg data/dblp3/dblp_train_negative_events.csv -o results/dblp3 -e edges -a nominal -p 20 -n 30 --test_pos data/dblp3/dblp_test_positive_events.csv --test_neg data/dblp3/dblp_test_negative_events.csv -u -m 5 -w 1000 -r 0.3 -pu 1.0 -su 0.5 -ep 25 -se 25 -ew 10 --unicross --baseline -s 1

./ewaldis -v --vertices data/enron/enron_vertices.csv --edges data/enron/enron_edges.csv --train_pos data/enron/enron_train_positive_events.csv --train_neg data/enron/enron_train_negative_events.csv -o results/enron -e edges -a nominal -p 10 -n 10 --test_pos data/enron/enron_test_positive_events.csv --test_neg data/enron/enron_test_negative_events.csv -m 5 -w 3000 -r 0.3 -pu 1728000 -su 345600 -ep 25 -se 25 -ew 10 --unicross --baseline -s 1

Description of EWALDIS output files

  • For each experiment, there is a directory with the following three files for each discriminative pattern:
  • results_K.txt with information about the pattern itself:
    • DATA - name of the experiment
    • TOTAL_TIME_PREPARATION - running time of data preparation in seconds (this phase is common for all patterns from one experiment and thus should be counted only once in total time)
    • TOTAL_TIME_MINING - running time of pattern mining in seconds
    • TOTAL_TIME_EVALUATION - running time of pattern evaluation in seconds
    • TIME_UNIT_PRIMARY - primary time unit parameter
    • TIME_UNIT_SECONDARY - secondary time unit parameter
    • RANDOM_WALKS - number of random walks
    • PROB_RESTART - probability of restart
    • MAX_PATTERN_EDGES - maximum number of pattern edges to be found
    • EVOLUATION_EPOCHS - number of epochs in the genetic algorith
    • EVOLUATION_SUBEPOCHS - number of subepochs in the genetic algorith
    • EVALUATION_RANDOM_WALKS - number of random walks used for evaluation
    • PATTERN_EDGES - list of pattern edges in the form of (source_vertex, destination_vertex)
    • PATTERN_SCORES - scores of pattern edges (their fitness from the genetic algorithm)
    • PATTERN_TIMESTAMPS - distribution of relative timestamps for each pattern edge
    • PATTERN_ATTRIBUTES - distribution of attributes for each pattern edge
    • PATTERN_DIRECTIONS - True = edge directed from source to destination, False = the opposite direction; makes sense only for directed graph
    • PATTERN_UNDIRECTED - whether the patten is an undirected graph
    • SAMPLE_EVALUATION_POSITIVE - matching scores on positive events from sample set
    • SAMPLE_EVALUATION_NEGATIVE - matching scores on negative events from sample set
    • TRAIN_EVALUATION_POSITIVE - matching scores on positive events from train set
    • TRAIN_EVALUATION_NEGATIVE - matching scores on negative events from train set
    • TEST_EVALUATION_POSITIVE - matching scores on positive events from test set
    • TEST_EVALUATION_NEGATIVE - matching scores on negative events from test set
  • results_K_positive_population.txt
    • For each pattern edge, there are summary statistics of fitness score for each epoch
    • For example, if there are 20 epochs, then first 20 rows belong to the first pattern edge, etc.
    • Each row contains 0%, 10%, 50%, 90%, and 100% percentile of the fitness
    • There are exactly MAX_PATTERN_EDGES * EVOLUATION_EPOCHS rows in the file; if the algorithm finished earlier for a given pattern, then there are missing value
  • results_K_negative_population.txt
    • Similar information as in the previous file, but now for each pattern edge and each epoch, there are summary statistics of negative population fitness for each subepoch
    • Thus, there are MAX_PATTERN_EDGES * EVOLUATION_EPOCHS * EVOLUATION_SUBEPOCHS rows
    • Again, if the computation finishes earlier, there may be missing values
  • Moreover, there are two csv files used for creating baseline dataset by R scripts: baseline_edges_data_train.csv and baseline_edges_data_test.csv

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published