Skip to content

Post processing

Rahmadi Trimananda edited this page Oct 27, 2023 · 42 revisions

This page explains OVRseen's workflow for post-processing the network traffic collected from VR apps.

Dependencies

The following dependencies have been installed in the provided VM.

  • wireshark-common
  • tshark
$ apt-get install wireshark-common tshark

Please also run the following command to activate a Python virtual environment (with the right dependencies) before using OVRseen.

OVRseen/virtualenv $ ./python3_venv.sh
OVRseen/virtualenv $ source python3_venv/bin/activate

Post-processing PCAP Files

Setup

We can use OVRseen to post-process network traffic from new data collection of other VR apps. However, the following steps assume that we use our datasets.

1) Please copy the PCAP files from pcaps (i.e., PCAPs.zip) in our datasets. Our extract_datasets.sh script should have found and copied the right files into the right locations in OVRseen's directory structure. If this script has not been run, please move PCAPs.zip into post-processing/ and unzip it. Currently, the extract_datasets.sh script only copies and unzips the PCAP files for Oculus-Paid. Thus, we have the same recommendation if one has to manually copy PCAPs.zip into post-processing/.

OVRseen/network_traffic/post-processing $ unzip PCAPs.zip
OVRseen/network_traffic/post-processing $ cd PCAPs/Oculus-Paid/; unzip "*.zip"; cd ../../
OVRseen/network_traffic/post-processing $ rm -rf PCAPs/Oculus-Free
OVRseen/network_traffic/post-processing $ rm -rf PCAPs/SideQuest

2) Please also copy the CSV files that contain app information from both Oculus and SideQuest app stores (i.e., all_150_top_apps.csv, oculus_store_apps.csv, and sidequest_store_apps.csv). These files can be found in list_of_apps in our datasets. Our extract_datasets.sh script should have found and copied the right files into the right locations in OVRseen's directory structure. If this script has not been run, please move all_150_top_apps.csv, oculus_store_apps.csv, and sidequest_store_apps.csv into OVRseen/post-processing/.

3) OVRseen users can use the samples of blocklists we previously used:

For convenience, we provide the copies of these blocklists. Please download them here and unzip them into OVRseen/network_traffic/post-processing/filter_lists.

OVRseen/network_traffic/post-processing $ unzip filter_lists.zip

Further, we welcome our OVRseen users to try other blocklists as well. Please follow the formatting of the three blocklists we used and put the new blocklists inside OVRseen/network_traffic/post-processing/filter_lists.

Running the Post-processing Analysis

4) Please run the process_pcaps.py script to post-process the PCAP files. When running it for more than once, please also delete the temporary outputs from the previous runs before rerunning the script.

OVRseen/network_traffic/post-processing $ rm -r PCAPs/*.csv; rm -r PCAPs/temp_output # Delete temporary outputs before running the script
OVRseen/network_traffic/post-processing $ python3 process_pcaps.py PCAPs .

When the script finishes, we should find the output file all-merged-with-esld-engine-privacy-developer-party.csv in PCAPs/. Nevertheless, this output CSV file only contains the statistics for the Oculus-Paid apps. To run OVRseen's post-processing on our entire network traffic dataset, one would need around 20GB of RAM and preferably 80-100GB of disk space, while the current VM only has 4GB of RAM and 30GB of disk space. Currently, the extract_datasets.sh script only copies and unzips the PCAP files for Oculus-Paid (please see Table 1 in our paper). If you can provide enough RAM and disk space for post-processing the entire network traffic dataset, please uncomment these two lines in the extract_datasets.sh script

#unzip "../network_traffic/post-processing/PCAPs/Oculus-Paid/*.zip" -d ../network_traffic/post-processing/PCAPs/Oculus-Paid/
#unzip "../network_traffic/post-processing/PCAPs/SideQuest/*.zip" -d ../network_traffic/post-processing/PCAPs/SideQuest/

and comment out the following two lines in the script.

rm -rf ../network_traffic/post-processing/PCAPs/Oculus-Free/
rm -rf ../network_traffic/post-processing/PCAPs/SideQuest/

Disclaimer

In the provided VM, we have set up and installed the necessary dependencies to run OVRseen's post-processing scripts. However, we recently found that the tshark and wireshark versions that we previously used had bugs that labeled HTTP traffic as TLS traffic. We believe the versions we have in the provided VM do not share the same bugs. This means, however, re-running OVRseen's post-processing on our entire network traffic dataset on the provided VM will present minor discrepancies. Fortunately, this issue only affected less than 5 apps. Thus, the main claims and conclusions we made in our paper remain.

Regenerating Data for Plots and Tables

5) We provide a script we used to extract data and statistics from all-merged-with-esld-engine-privacy-developer-party.csv to create Tables 1, 2, 3 and Figure 2. In this case, we also provide you with the CSV file all-merged-with-esld-engine-privacy-developer-party.csv that we extracted from our entire network traffic dataset. If the extract_datasets.sh script has been run, all-merged-with-esld-engine-privacy-developer-party.csv can be found in OVRseen/network_traffic/post-processing/. Please execute the following command to run the create_data_for_tables_and_figures.py script.

OVRseen/network_traffic/post-processing/figs_and_tables $ python3 create_data_for_tables_and_figures.py --csv_file_path ../all-merged-with-esld-engine-privacy-developer-party.csv --output_directory .

After the script finishes, we will obtain CSV files as the output in figs_and_tables/:

  • Table_1_NetworkTrafficDataSetSummary.csv is comparable with Table 1.
  • Table_2_MissingedByBlocklists.csv is comparable with Table 2 (and Table 7 in our paper's extended version/technical report).
  • Table_3_DatatypesExposed.csv is comparable with Table 3.
  • Figure_2a.csv is comparable with Figure 2a.
  • Figure_2b.csv is comparable with Figure 2b.

Please note that we may see some discrepancies.

  • Figure_2a.csv: there is a discrepancy in the order of rows in the data when there is a tie in the values; for example, when comparing Figure_2a.csv and Figure 2 in the paper, mixpanel.com is last but there may be other eSLDs that could have been in the figure due to having the same number of apps that contact it.
  • Table_1_NetworkTrafficDataSetSummary.csv: there is a discrepancy in the number of apps for the SideQuest app store: 48 apps in Table 1 in our paper vs. 46 apps in Table_1_NetworkTrafficDataSetSummary.csv; the discrepancy of 2 apps originates from one app that belongs to both Oculus and SideQuest app stores (it is only counted once by this script) and another one that did not generate any traffic (no TCP flows were extracted by OVRseen for this app and, thus, no traffic from this app is present in all-merged-with-esld-engine-privacy-developer-party.csv).
  • Table_2_MissingedByBlocklists.csv: we only show the top 19 FQDNs in Table 7 and top 5 in Table 2; there are also a few exceptions that we do not include, e.g., VirtualAge.com was incorrectly misclassified as third-party (our script has some limitations for such corner cases).