- numpy v1.16.4
- pandas v0.25.0
- loguru v0.3.2
- saxpy v1.0.1.dev167
- matplotlib v3.1.3
In the datasets folder there are two different example datasets. All dataset must be .csv and must have the columns name in the first row.
Our example datasets:
-
Single test
Run the main.py file:
[*] Usage: python kp-anonymity.py k_value p_value paa_value max_level dataset.csv
Where k_value must be greater than p_value, max_level must be greater than 2 and lower than 20.
This program generates the kp-anonymization of the input dataset in a new file in the outputs folder. -
Multiple test
Run the test.py file:
[*] Usage: python test.py dataset.csv multitest
This test runs the main.py many times and creates a .csv file containing all the average values of the NCPs (Normalized Certainty Penalty) of each table, saved in the final_table folder. -
Plot
Run the test.py file:
[*] Usage: python test.py dataset.csv plot
This test can be done only if you have an output in final_table folder.
This test plots the output data from multiple test step in a three-dimensional or bi-dimensional chart.
The program chooses the pair (paa_value, max_level) that produces the best NCP. Then it plots only tuples with that values of the pair (paa_value, max_level).
- k_value: value of k-anonymity
- p_value: value of p-anonymity, the pattern
- paa_value: length of the string that can be used to describe the pattern
- max_level: number of letters in the alphabet used to describe the pattern