Skip to content

Latest commit

 

History

History
746 lines (690 loc) · 65.6 KB

README.md

File metadata and controls

746 lines (690 loc) · 65.6 KB

Table of Contents

Introduction

This project analyzes open source projects for malware.

Due to the high demand of the community, we decide to open source the code as it is now, to allow collaboration. The majority of the code is updated until May 2019, which indicates that some components may not work any more. Especially the components that depends on external tools (e.g. Sysdig, Airflow) or APIs (e.g. Npm).

We are actively working on the testing and improvements. Please find the todo list here. For how to run commands, please refer to howto section. For how to deploy on machines, please refer to deploy instructions. For how to request access to the supply chain attack samples, please refer to request instructions

This repository is open sourced under MIT license. If you find this repository helpful, please cite our paper:

@inproceedings{duan2021measuring,
  title={Towards Measuring Supply Chain Attacks on Package Managers for Interpreted Languages},
  author={Duan, Ruian and Alrawi, Omar and Kasturi, Ranjita Pai and Elder, Ryan and Saltaformaggio, Brendan and Lee, Wenke},
  booktitle = {28th Annual Network and Distributed System Security Symposium, {NDSS}},
  month     = Feb,
  year      = {2021},
  url       = {https://www.ndss-symposium.org/wp-content/uploads/ndss2021_1B-1_23055_paper.pdf}
}

Prerequisite

Basics

  • docker
  • basic setup for ubuntu
    • sudo ./setup.sh
  • for other OS (i.e. MacOS and Windows), please look at setup.sh and figure out their equivalencies

Dependencies

  • To test and run the project locally, you need dependencies. There are two ways to prepare dependencies
  • build the maloss docker image and test inside it
    • build docker image
      • sudo docker build -t maloss .
    • re-build docker image without cache (used when re-building image)
      • sudo docker build -t maloss . --no-cache
    • run the docker image and map your local source root to it
      • sudo docker run -it --rm -v $(pwd):/code maloss /bin/bash
    • change to the mapped mounted source root and start making changes
      • cd /code
  • install dependencies locally and test it
    • the instructions are for ubuntu 16.04. if you find them not working on other systems, please fix and commit the necessary changes. these instructions are simply copied from the Dockerfile, look into it for troubleshooting.
    • for js and python static analysis development
      • pip install -r src/requirements.txt --user
    • for the others (TODO: simplify this giant list)
      • sudo apt-get install -yqq curl php git ruby-full rubygems-integration nuget python python-pip python3-pip npm jq strace
      • sudo ./src/install_dep.sh

Development

Structure

  • registries folder contains source code for mirroring package managers. To run the program, you would need 10TB for Npm, 5TB for PyPI and 5TB for RubyGems.
  • src folder contains source code for static, dynamic and metadata analysis.
  • main folder contains source code for dynamic orchestration.
  • airflow folder contains source code for static Orchestration.
  • sysdig folder contains setup and config for dynamic tracing.
  • data contains honeypot setup and statistics.
  • config contains config for static analysis.
  • doc contains manually labeled APIs which is used to derive config.
  • testdata contains test samples.
  • ref contains related work.
  • benignware contains some benign packages.
  • malware contains the list of malicious samples, which can be used for protection.
  • maloss-samples is a private repo that contains the supply chain attack samples and are updated periodically. Please fill out the Google Form to request access. We will respond ASAP.

Instructions

  • In this project, we are currently using celery + rabbitmq to run our metadata and dynamic analyses in a distributed manner. we are using airflow + celery to run our static analyses.
    • The src/ folder contains the code for each individual analyses and should be minimized and self-contained.
      • In particular, for static/dynamic/metadata analysis, the jobs in src/ folder should be handling only one package and one versoin.
      • Each individual analyses should be developed and contained in this folder.
    • The main/ folder handles distributed computing for metadata and dynamic analyses.
      • The master node load the list of jobs (packages and their versions to analyze), send them to the rabbitmq broker.
      • The slave nodes connect to the broker and fetches jobs from broker.
      • Each individual analyses may need to change .env in this folder.
    • The airflow/ folder handles distributed computing for static analyses.
      • The master node loads the DAG of jobs (packages connected by dependency relations), send them to the redis broker.
      • The slave nodes connect to te broker and fetches jobs from broker.
      • Each individual analyses may need to change .env in this folder.
  • In this project, we run each analysis using docker. The following steps show how to start or debug the distributed jobs for metadata and dynamic analyses.
    • on worker
      • create customized main/config from main/config.tmpl
      • build docker image
        • sudo docker build -t maloss .
      • re-build docker image without cache (used when re-building image)
        • sudo docker build -t maloss . --no-cache
      • for testing, run docker image and attach to it
        • sudo docker run -it --rm --cap-add=SYS_PTRACE -v /tmp/result:/home/maloss/result -v /tmp/metadata:/home/maloss/metadata maloss /bin/bash
      • for production, refer to DEPLOY.md
    • on master
      • create customized main/config from main/config.tmpl
      • start rabbitmq
        • cd main && sudo docker-compose --compatibility -f docker-compose-master.yml up -d
      • add jobs to the queue
        • python detector.py install -i ../data/pypi.csv
    • debugging
      • comment out the QUEUING = Celery line in main/config, and then the jobs should be running locally and sequentially.
      • the entry point for celery works is main/celery_tasks.py and the entry point for master it main/detector.py.
  • TODO: how to debug static analyses

HowTo

select_pm

  • select the package managers to inspect based on num_pkg threshold
    • python main.py select_pm

select_pkg

  • select popular packages based on specified criteria, such as downloads or uses
    • python main.py select_pkg ../data/pypi.with_stats.csv ../data/pypi.with_stats.popular.csv -n 10000
    • python main.py select_pkg ../data/maven.csv ../data/maven.popular.csv -n 10000 -f use_count

crawl

  • crawl the specified package manager and save the package names
    • python main.py crawl $package_manager $outfile
  • crawl the specified package manager for package names, lookup download stats, and save to file
    • python main.py crawl $package_manager $outfile -s -p 24

edit_dist

  • run edit distance for package names
    • python main.py edit_dist $source -t $target $outfile
    • python main.py edit_dist ../data/pypi.with_stats.csv ../data/edit_dist/pypi_edist_dist.out -a c_edit_distance_batch -p 16
    • python main.py edit_dist ../data/pypi.with_stats.popular.csv ../data/edit_dist/pypi_pop_vs_all.out -t ../data/pypi.with_stats.csv -a c_edit_distance_batch -p 16 --pair_outfile ../data/edit_dist/pypi_pop_vs_all.csv

download

  • download tarball file using pip, link
    • pip download --no-binary :all: --no-deps package
  • download tgz file using npm, link
    • npm pack package
  • download php packages using composer
    • composer require -d ../testdata/php --prefer-source --no-scripts package
  • download ruby packages using gem
    • gem fetch package
  • download java packages using maven
    • mvn dependency:get -Dartifact=com.google.protobuf:protobuf-java:3.5.1 -Dtransitive=false && cp ~/.m2/repository/com/google/protobuf/protobuf-java/3.5.1/protobuf-java-3.5.1.jar ./

get_versions

  • run get_versions job to get major versions for list of packages
    • python main.py get_versions ../data/pypi.with_stats.popular.csv ../data/pypi.with_stats.popular.versions.csv -l python -c /data/maloss/info/python
    • python main.py get_versions ../data/maven.popular.csv ../data/maven.popular.versions.csv -c /data/maloss/info/java -l java
  • run get_versions job to get all versions for list of packages
    • python main.py get_versions ../data/2019.07/pypi.csv ../data/2019.07/pypi.versions.csv -c /data/maloss/info-2019.07/python -l python --max_num -1
  • run get_versions job to get all versions for list of packages and include their time as well
    • python main.py get_versions ../data/2019.07/pypi.csv ../data/2019.07/pypi.versions.csv -c /data/maloss/info-2019.07/python -l python --max_num -1 --with_time
  • run get_versions job to get recent versions for list of packages
    • python main.py get_versions ../data/2019.07/pypi.csv ../data/2019.07/pypi.versions.csv -c /data/maloss/info-2019.07/python -l python --max_num 100 --min_gap_days 1

get_author

  • run get_author job to the author for list of packages
    • python main.py get_author ../data/pypi.with_stats.popular.csv ../data/pypi.with_stats.with_author.popular.csv -l python -c /data/maloss/info/python

get_dep

  • run get_dep job to list dependencies for python packages
    • python main.py get_dep -l python -n protobuf -c ../testdata
    • python main.py get_dep -l python -n scrapy -c ../testdata
  • run get_dep job to list dependencies for javascript packages
    • python main.py get_dep -l javascript -n eslint -c ../testdata
  • run get_dep job to list dependencies for ruby packages
    • python main.py get_dep -l ruby -n protobuf -c ../testdata
  • run get_dep job to list dependencies for php packages
    • python main.py get_dep -l php -n designsecurity/progpilot -c ../testdata
  • run get_dep job to list dependencies for java packages
    • python main.py get_dep -l java -n com.google.protobuf/protobuf-java -c ../testdata

get_stats

  • get the stats for specified packages
    • python main.py get_stats ../malware/npmjs-mal-pkgs.june2019.txt ../malware/npmjs-mal-pkgs.june2019.with_stats.txt.new -m npmjs
  • get the stats for specified packages
    • `python main.py get_stats ../malware/pypi-mal-pkgs.txt ../malware/pypi-mal-pkgs.with_stats.txt -m pypi

build_dep

  • build the dependency graph
    • python main.py build_dep -c /data/maloss/info/python -l python ../data/pypi.with_stats.csv ../airflow/data/pypi.with_stats.dep_graph.pickle
  • build the dependency graph with versions (the --record_version option)
    • python main.py build_dep -c /data/maloss/info/python -v -l python ../data/pypi.with_stats.popular.versions.csv ../airflow/data/pypi.with_stats.popular.versions.dep_graph.pickle

build_author

  • build the author package graph for popular packages in pypi/npmjs/rubygems/packagist
    • python main.py build_author ../data/author_pkg_graph.popular.pickle -i ../data/pypi.with_stats.with_author.popular.csv ../data/npmjs.with_stats.with_author.popular.csv ../data/rubygems.with_stats.with_author.popular.csv ../data/packagist.with_stats.with_author.popular.csv -l python javascript ruby php -t ../data/top_authors.popular.json
  • build the author package graph for all packages in pypi/npmjs/rubygems/packagist/maven
    • python main.py build_author ../data/author_pkg_graph.pickle -i ../data/pypi.with_stats.with_author.csv ../data/npmjs.with_stats.with_author.csv ../data/rubygems.with_stats.with_author.csv ../data/packagist.with_stats.with_author.csv ../data/maven.with_author.csv -l python javascript ruby php java -t ../data/top_authors.json

split_graph

  • split the dependency graph
    • unzip the pickle files first
      • tar -zxf ../airflow/data/pypi.with_stats.dep_graph.pickle.tgz
    • split into N copies
      • python main.py split_graph ../airflow/data/pypi.with_stats.dep_graph.pickle ../airflow/pypi_dags/ -d ../airflow/data/pypi_static.py -n 20
      • python main.py split_graph ../airflow/data/pypi.with_stats.popular.versions.dep_graph.pickle ../airflow/pypi_version_dags/ -d ../airflow/data/pypi_static_versions.py -n 10
      • python main.py split_graph ../airflow/data/maven.dep_graph.pickle ../airflow/maven_dags/ -d ../airflow/data/maven_static.py -n 20
    • split into N copies and K folders
      • python main.py split_graph ../airflow/data/maven.popular.versions.dep_graph.pickle.tgz ../airflow/maven_version_dags/ -d ../airflow/data/maven_static_versions.py -n 80 -k 4
    • split out the subgraph that contains seed nodes
      • python main.py split_graph ../airflow/data/pypi.with_stats.dep_graph.pickle ../airflow/pypi_dags/ -d ../airflow/data/pypi_static.py -s ../data/pypi.with_stats.popular.csv

install

  • run install job to install python packages and capture traces
    • python main.py install -n protobuf -l python -c ../testdata -o ../testdata
  • run install job to install javascript packages and capture traces
    • python main.py install -n eslint -l javascript -c ../testdata -o ../testdata
  • run install job to install ruby packages and capture traces
    • python main.py install -n protobuf -l ruby -c ../testdata -o ../testdata
  • run install job to install php packages and capture traces
    • python main.py install -n designsecurity/progpilot -l php -c ../testdata -o ../testdata
  • run install job to install java packages and capture traces
    • python main.py install -n com.google.protobuf/protobuf-java -l java -c ../testdata -o ../testdata

astgen

  • run astgen job to compute ast for python and python3 packages
    • python main.py astgen ../testdata/test-eval-exec.py ../testdata/test-eval-exec.py.out -c ../config/test_astgen_python.config
    • python main.py astgen ../testdata/html5lib-1.0.1.tar.gz ../testdata/html5lib-1.0.1.tar.gz.out -c ../config/test_astgen_python.config
    • python main.py astgen ../testdata/python-taint-0.40.tar.gz ../testdata/python-taint-0.40.tar.gz.out -c ../config/test_astgen_python.config
  • run astgen job to compute ast for javascript packages
    • python main.py astgen ../testdata/test-eval.js ../testdata/test-eval.js.out -c ../config/test_astgen_javascript.config -l javascript
    • python main.py astgen ../testdata/urlgrey-0.4.4.tgz ../testdata/urlgrey-0.4.4.tgz.out -c ../config/test_astgen_javascript.config -l javascript
  • run astgen job to compute ast for php packages
    • cd static_proxy && php astgen.php -c ../../config/test_astgen_php.config.bin -i ../../testdata/test-eval-exec.php -o ../../testdata/test-eval-exec.php.out.bin && cd ..
    • python main.py astgen ../testdata/test-eval-exec.php ../testdata/test-eval-exec.php.out -c ../config/test_astgen_php.config -l php
    • python main.py astgen ../testdata/test-backtick.php ../testdata/test-backtick.php.out -c ../config/test_astgen_php.config -l php
    • python main.py astgen ../testdata/php/vendor/guzzlehttp/guzzle/ ../testdata/guzzlehttp_guzzle.out -c ../config/test_astgen_php.config -l php
  • run astgen job to compute ast for ruby packages
    • cd static_proxy && ruby astgen.rb -c ../../config/test_astgen_ruby.config.bin -i ../../testdata/test-eval.rb -o ../../testdata/test-eval.rb.out.bin && cd ..
    • python main.py astgen ../testdata/test-eval.rb ../testdata/test-eval.rb.out -c ../config/test_astgen_ruby.config -l ruby
  • run astgen job to compute ast for java packages
    • cd static_proxy/astgen-java && java -jar target/astgen-java-1.0.0-jar-with-dependencies.jar -help && cd ../../
    • cd static_proxy/astgen-java && java -jar target/astgen-java-1.0.0-jar-with-dependencies.jar -inpath ../../../testdata/Test.jar -outfile ../../../testdata/Test.jar.out -intype JAR -config ../../../config/astgen_java_smt.config -process_dir ../../../testdata/Test.jar && cd ../../
    • python main.py astgen ../testdata/protobuf-java-3.5.1.jar ../testdata/protobuf-java-3.5.1.jar.out -c ../config/test_astgen_java.config -l java
    • python main.py astgen ../testdata/Test.jar ../testdata/Test.jar.out -c ../config/astgen_java_smt.config -l java

astfilter

  • use the configs titled ../config/astgen_XXX_smt.config for each language (e.g. ../config/astgen_javascript_smt.config) in astfilter job
  • run astfilter job to evaluate api usage for python/pypi package and its dependent packages
    • python main.py astfilter -n protobuf -c $python_config -d ../testdata/ -o ../testdata/
  • run astfilter job to evaluate api usage for javascript/npmjs package and its dependent packages
    • python main.py astfilter -n eslint-scope -c $javascript_config -d ../testdata/ -o ../testdata/ -l javascript
  • run astfilter job to evaluate api usage for php/packagist package and its dependent packages
    • python main.py astfilter -n designsecurity/progpilot -c $php_config -d ../testdata/ -o ../testdata/ -l php
  • run astfilter job to evaluate api usage for ruby/rubygems package and its dependent packages
    • python main.py astfilter -n protobuf -c $ruby_config -d ../testdata/ -o ../testdata -l ruby
  • run astfilter job to evaluate api usage for java/maven package and its dependent packages
    • python main.py astfilter -n com.google.protobuf/protobuf-java -c $java_config -d ../testdata/ -o ../testdata -l java

taint

  • run taint analysis for specific packages
    • python main.py taint -n json -d /data/maloss/info/ruby -o /data/maloss/result/ruby -l ruby -c ../config/astgen_ruby_smt.config
  • run taint analysis for specific packages and ignore their dependencies
    • python main.py taint -n urllib -i ../malware/pypi-samples/urllib-1.21.1.tgz -d /data/maloss/info/python -o ./ -l python -c ../config/astgen_python_smt.config
    • python main.py taint -n django-server -i ../malware/pypi-samples/django-server-0.1.2.tgz -d /data/maloss/info/python -o ./ -l python -c ../config/astgen_python_smt.config
    • pip download --no-binary :all: --no-deps trustme && python main.py taint -n trustme -i trustme-0.5.1.tar.gz -d /data/maloss/info/python -o ./ -l python -c ../config/astgen_python_smt.config
    • python main.py taint -n eslint-scope -i ../malware/npmjs-samples/eslint-scope-3.7.2.tgz -d /data/maloss/info/javascript -o ./ -l javascript -c ../config/astgen_javascript_smt.config
    • python main.py taint -n custom8 -i static_proxy/jsprime/jsprimetests/custom8.js -d /data/maloss/info/javascript -o ./ -l javascript -c ../config/astgen_javascript_smt.config
    • python main.py taint -n stream-combine -i ../malware/npmjs-samples/stream-combine-2.0.2.tgz -d /data/maloss/info/javascript -o ./ -l javascript -c ../config/astgen_javascript_smt.config
    • python main.py taint -n test-eval-exec -i ../testdata/test-eval-exec.php -d /data/maloss/info/php -o ./ -l php -c ../config/astgen_php_smt.config
    • python main.py taint -n test-multiple-flows -i static_proxy/progpilot/projects/tests/tests/flows/ -d /data/maloss/info/php -o ./ -l php -c ../config/astgen_php_smt.config
    • python main.py taint -n test-flow -i ../testdata/test-flow.php -d /data/maloss/info/php -o ./ -l php -c ../config/astgen_php_smt.config
  • run taint analysis for specific input file
    • python main.py taint -n active-support -l ruby -c ../config/astgen_ruby_smt.config -i ../malware/rubygems-samples/active-support-5.2.0.gem -o ./
    • python main.py taint -n bootstrap-sass -l ruby -c ../config/astgen_ruby_smt.config -i ../malware/rubygems-samples/bootstrap-sass-3.2.0.3.gem -o ./
    • python main.py taint -n brakeman-rails4 -l ruby -c ../config/astgen_ruby_smt.config -i ../testdata/rails4/ -o ./

filter_pkg

  • filter packages based on the api usage or flow presence
    • python main.py filter_pkg ../data/pypi.with_stats.csv ../data/pypi.with_stats.with_taint_apis.csv -c ../config/astgen_python_taint_apis.config -o /data/maloss/result/python -d /data/maloss/info/python -l python
    • python main.py filter_pkg ../data/rubygems.with_stats.csv ../data/rubygems.with_stats.with_taint_apis.csv -c ../config/astgen_ruby_taint_apis.config -o /data/maloss/result/ruby -d /data/maloss/info/ruby -l ruby
    • python main.py filter_pkg ../data/npmjs.with_stats.csv ../data/npmjs.with_stats.with_taint_apis.csv -c ../config/astgen_javascript_taint_apis.config -o /data/maloss/result/javascript -d /data/maloss/info/javascript -l javascript
    • python main.py filter_pkg ../data/packagist.with_stats.csv ../data/packagist.with_stats.with_taint_apis.csv -c ../config/astgen_php_taint_apis.config -o /data/maloss/result/php -d /data/maloss/info/php -l php
    • python main.py filter_pkg ../data/maven.csv ../data/maven.with_taint_apis.csv -c ../config/astgen_java_taint_apis.config -o /data/maloss/result/java -d /data/maloss/info/java -l java

static

  • run static job to perform astfilter, taint and danger analysis for python and python3 packages
    • python main.py static -n protobuf -c $python_config -d ../testdata/ -o ../testdata/

dynamic

  • run dynamic job to install, main and exercise python packages and capture traces
    • python main.py dynamic -n protobuf -l python -c ../testdata -o ../testdata

interpret_trace

  • run interpret trace job to parse dynamic traces and dump them into per pkg/version protobuf output files
    • NOTE: sudo is needed for starting falco to parse traces
    • sudo python main.py interpret_trace -l python --trace_dir /data/maloss1/sysdig/pypi -c /data/maloss/info/python -o /data/maloss/result/python -p 8

compare_ast

  • compare the ast of specified input files and packages for permissions, apis etc.
    • python main.py compare_ast -i ../malware/npmjs-samples/flatmap-stream-0.1.1.tgz ../benignware/npmjs-samples/flatmap-stream-0.1.0.tgz -o ../testdata/ ../testdata/flatmap-stream.json -l javascript -c ../config/astgen_javascript_smt.config
    • python main.py compare_ast -i ../testdata/test-backtick.php ../testdata/test-eval-exec.php -o tempout/ tempout/test_eval_backtick.json -l php -c ../config/astgen_php_smt.config
    • python main.py compare_ast -i ../malware/rubygems-samples/bootstrap-sass-3.2.0.3.gem ../benignware/rubygems-samples/bootstrap-sass-3.2.0.2.gem -l ruby -c ../config/astgen_ruby_smt.config -o ../testdata/ --outfile ../testdata/bootstrap-sass-compare.txt
    • python main.py compare_ast -i ../malware/rubygems-samples/active-support-5.2.0.gem ../benignware/rubygems-samples/activesupport-5.2.3.gem -c ../config/astgen_ruby_smt.config -o ../testdata/ --outfile ../testdata/activesupport-compare.txt -l ruby

filter_versions

  • filter package versions based on compare_ast results, to allow further analysis such as taint analysis
    • python main.py filter_versions ../data/2019.07/packagist.versions.with_time.csv ../data/2019.07/packagist_ast_stats.apis.json ../data/2019.07/packagist.versions.with_time.filtered_loose_apis.csv

compare_hash

  • compare the hash value of same package versions across different package managers
    • python main.py compare_hash -i ../data/maven.csv ../data/jcenter.csv -d /data/maloss/info/java /data/maloss/info/jcenter -o ../data/maven_jcenter.json
    • python main.py compare_hash -i ../data/jitpack.csv ../data/jcenter.csv -d /data/maloss/info/jitpack /data/maloss/info/jcenter -o ../data/jitpack_jcenter.json
  • compare the hash value of same package versions and their content hashs or api permissions across different package managers
    • python main.py compare_hash -i ../data/jitpack.csv ../data/jcenter.csv -d /data/maloss/info/jitpack /data/maloss/info/jcenter -o ../data/jitpack_jcenter_filtered.json --inspect_content
    • python main.py compare_hash -i ../data/jitpack.csv ../data/jcenter.csv -d /data/maloss/info/jitpack /data/maloss/info/jcenter -o ../data/jitpack_jcenter_filtered.json --inspect_api -c ../config/astgen_java_smt.config
    • python main.py compare_hash -i ../data/jitpack.csv ../data/jcenter.csv -d /data/maloss/info/jitpack /data/maloss/info/jcenter -o ../data/jitpack_jcenter_filtered_api.json --inspect_api -c ../config/astgen_java_smt.config --compare_hash_cache ../data/jitpack_jcenter_filtered.json

interpret_result

  • collect and plot api stats
    • python main.py interpret_result --data_type api -c /data/maloss/info/python -o /data/maloss/result/python -l python ../data/2019.01/pypi.with_stats.csv ../data/pypi_api_stats.json
    • python main.py interpret_result --data_type api -c /data/maloss/info/python -o /data/maloss/result/python -l python ../data/2019.01/pypi.with_stats.popular.csv ../data/pypi_pop_api_stats.json
    • python main.py interpret_result --data_type api -c /data/maloss/info/python -o /data/maloss/result/python -l python ../data/2019.01/pypi.with_stats.csv ../data/pypi_api_mapping.json -d --detail_filename
  • collect and plot domain stats
    • python main.py interpret_result --data_type domain -c /data/maloss/info/python -o /data/maloss/result/python -l python ../data/2019.06/pypi.csv ../data/2019.06/pypi_domain_stats.json
    • python main.py interpret_result --data_type domain -c /data/maloss/info/python -o /data/maloss/result/python -l python ../data/2019.06/pypi.csv ../data/2019.06/pypi_domain_mapping.json -d
  • collect the pre-generated dependency stats
    • python main.py interpret_result --data_type dependency -l python ../data/pypi.with_stats.popular.csv ../data/pypi_pop_dep_stats.json
  • collect the cross version comparison results, can filter by permissions, apis etc.
    • python main.py interpret_result --data_type compare_ast -c /data/maloss/info/python -o /data/maloss/result/python -l python ../data/2019.06/pypi.with_stats.popular.csv ../data/2019.06/pypi_compare_ast_stats.json
    • python main.py interpret_result --data_type compare_ast -c /data/maloss/info-2019.07/javascript -o /data/maloss/result-2019.07/javascript -l javascript ../data/2019.07/npmjs.csv ../data/2019.07/npmjs_ast_stats.json --compare_ast_options_file ../data/2019.07/compare_ast_options.json
  • collect metadata/static/dynamic results and dump suspicious packages
    • python main.py interpret_result --data_type install_with_network -c /data/maloss/info/javascript -o /data/maloss/result/javascript -l javascript -m npmjs ../data/2019.06/npmjs.csv ../data/2019.06/npmjs.install_with_network.json
  • collect the reverse dependency results
    • python main.py interpret_result --data_type reverse_dep -l javascript -m npmjs ../airflow/data/high_impact.csv ../airflow/data/high_impact_npmjs.json
    • python main.py interpret_result --data_type reverse_dep -l python -m pypi ../airflow/data/high_impact.csv ../airflow/data/high_impact_pypi.json
    • python main.py interpret_result --data_type reverse_dep -l ruby -m rubygems ../airflow/data/high_impact.csv ../airflow/data/high_impact_rubygems.json
  • collect metadata/static/compare_ast results and dump suspicious packages
    • python main.py interpret_result --data_type correlate_info_api_compare_ast -c /data/maloss/info-2019.07/javascript -o /data/maloss/result-2019.07/javascript -l javascript -m npmjs -s ../data/2019.07/npmjs_skip_list.json ../data/2019.07/npmjs_ast_stats.json ../data/2019.07/npmjs_correlate_info_api_compare_ast.json
    • python main.py interpret_result --data_type correlate_info_api_compare_ast -c /data/maloss/info-2019.07/php -o /data/maloss/result-2019.07/php -l php -m packagist -s ../data/2019.07/packagist_skip_list.json ../data/2019.07/packagist_ast_stats.json ../data/2019.07/packagist_correlate_info_api_compare_ast.json
    • python main.py interpret_result --data_type taint -c /data/maloss/info-2019.07/php -o /data/maloss/result-2019.07/php -l php ../data/2019.07/packagist.csv ../data/2019.07/packagist_flow_stats.json

grep_pkg

  • grep through packages
    • python main.py grep_pkg ../data/2019.07/rubygems.csv ../data/2019.07/rubygems.csv.pastebin.com pastebin.com -l ruby -p 80
    • python main.py grep_pkg ../data/2019.07/npmjs.csv ../data/2019.07/npmjs.csv.pastebin.com pastebin.com -l javascript -p 20

speedup

  • measure the speedup benefits from summaries
    • python main.py speedup ../data/2019.01/pypi.with_stats.popular.csv speedup.log -l python

Tool

Internet-wide scanning

Statistics for different package managers

Static analysis tools for different languages

AST parsers for different languages

Resource

Reference