A repository for analyzing the results of a OpenWPM web crawl to find different types of fingerprinting being preformed
We also maintain the repository openwpm-mods, which details in more depth what settings to use for OpenWPM. In short ensure that:
- Instrument, at a minium, what is covered in the following js_instrument_settings JSON
- HTTP network request instrumenting is enabled, and
save_content
includes"script"
to save the JavaScript files' source code. - The structured data is stored in a sqlite database under the name 'crawl-data.sqlite'
- This is not a rigid requirement, another SQL implementation besides sqlite could easily be adapted for use. The python code in this repository uses sqlalchemy
to interact with the SQL database. Thus switching to another SQL implementation should only require changing the
sqlalchemy.create_engine()
call
- This is not a rigid requirement, another SQL implementation besides sqlite could easily be adapted for use. The python code in this repository uses sqlalchemy
to interact with the SQL database. Thus switching to another SQL implementation should only require changing the
- The unstructured data is stored in a levelDB database
- The default directory name we use is
"leveldb"
, but through the--leveldb
CLI a user can specify a different name for what directory the levelDB is in
- The default directory name we use is
Only Supports Linux. Tested on a Debian-based Linux distribution.
Requirements:
- micromamba
- micromamba can be
installed via their installation instructions, or
running
bash -i scripts/micromamba-install.sh
- Some other conda environment manager (conda, miniconda, mamba) may be used. Our bash scripts are under the
script/
directory are all set up around micromamba. However, they are quite simple scripts, so preforming these actions yourself is more than feasible.
- micromamba can be
installed via their installation instructions, or
running
- Install the virtual environment
bash -i scripts/install.sh
Enable the virtual enviroment via micromamba activate openwpmdata
The two Python files a user would run are main/run_analysis.py
and main/view_results.py
.
Intended to run all the analyses on the crawl data from OpenWPM, then store that data in new table in the SQL database. View
the CLI arguments for this by running python main/run_analysis.py --help
.
Intended to be ran after running main/run_analysis.py
, to view the results from the analysis. View
the CLI arguments for this by running python main/view_results.py --help
.