This is a project made for the master's thesis in my cybersecurity degree. The project consists in the implementation of a variant of the algorithm proposed in Distributed Query Execution under Access Restrictions
- Clone this Repo
cd
into the project root folder, and runpython3.9 -m venv env
to create a virtual env- Run
source env/bin/activate
to activate the virtual env - Run
pip install -r requirements.txt
to install all the packages needed to run the project- If you don't have pip installed, you can find informations here
- The script has five command line arguments:
- -p PATH, --path PATH: representing the path where to save the pdf containing the tree resulting from the computation (e.g. '../' to save the pdf in the directory containing the script folder)
- -m ASSIGNMENT, --manual ASSIGNMENT: Manually assign node to candidate, in the form 'XYZ' to assign them to nodes in pre-order visit of the query tree plan
- -i INPUT, --input INPUT: Path from where take the input of the algorithm
- -v, --verbose: Enables verbose logging
- -d, --debug: Enables debugging loggin
Inputs to the algorithm are given by four different CSV files:
Folder CSV_data contains an example of that files
This is the file modeling the base relations of the query, structured as follows:
- name: Name of the base relation
- primary_key: Primary key of the relation
- provider: Storage provider storing the relationship
- plain_attr: Attributes of the relationship stored in plain text
- enc_attr: Attributes of the relationship stored encrypted
- attr: All the attributes of the relationship (plain and enc)
- enc_costs: list of (semi colon separated) encryption costs of attributes in the relationship
- dec_costs: list of (semi colon separated) decryption costs of attributes in the relationship
- size: list of (semi colon separated) attributes sizes (used to estimate computational cost of node)
- node_id: id of the leaf node in the tree to which associate the base relation (see nodes.csv)
Parsing of relations.csv produces the following two base relations:
- relation flight(NDPC) assigned to storage provider F
- N is stored encrypted, it has an encryption cost of 1, a decryption cost of 4 and a size of 7
- D is stored in plain text, it has an encryption cost of 2, a decryption cost of 5 and a size of 8
- P is stored in plain text, it has an encryption cost of 3, a decryption cost of 6 and a size of 9
- C is stored encrypted, it has an encryption cost of 4, a decryption cost of 7 and a size of 9
- relation company(SJI) assigned to storage provider C
- S is stored encrypted, it has an encryption cost of 1, a decryption cost of 4 and a size of 7
- J is stored in plain text, it has an encryption cost of 2, a decryption cost of 5 and a size of 8
- I is stored encrypted, it has an encryption cost of 3, a decryption cost of 6 and a size of 9
This is the file modeling the query tree plan, structured as follows:
- ID: Used to associate a relationship with a leaf node
- operation: Operation of the query associated with the node
- Projection
- Selection
- Cartesian
- Cartesian product requires to put manually attributes in Ap, Ae and enc_attr taken from children
- Join
- Group-by
- Encryption
- Decryption
- Re-encryption
- Ap: Set of attributes that need to be in plaintext to evaluate the operation associated with the node
- Ae: Set of attributes that need to be re-encrypted to evaluate the operation associated with the node
- As: Remaining set of attributes
- print_label: label of the node to print when tree is exported
- group_attr: if the operation associated with the node is a group-by, this is the set of attributes on which the group-by clause is evaluated
- parent: parent node of current node, used to build the tree
Parsing of tree.csv produces the following tree:
This is the file modeling the subjects involved in query computation with its authorizations, structured as follows:
- subject: Name of the subject
- comp_price: computational price of the subject
- transfer_price: transfer price of the subject
Parsing of subjects.csv produces the following subjects:
- U with computational price 1 and transfer price 1
- X with computational price 2 and transfer price 2
- Y with computational price 3 and transfer price 3
- Z with computational price 4 and transfer price 4
- F with computational price 5 and transfer price 6
- C with computational price 6 and transfer price 7
This is the file modeling the authorizations involved in query computation, structured as follows:
- subject: Name of the subject (already specified in subjects.csv)
- plain: Attributes for which the subject is authorized to view in plaintext form
- enc: Attributes for which the subject is authorized to view in encrypted form
Parsing of authorizations.csv produces the following authorizations:
- [NCPSJI,-]→U
- [PC,NSJI]→X
- [DPJI,CNS]→Y
- [NCS,PJI]→Z
- [-,NDPCJI]→F
- [-,NCSJI]→C