Skip to content
Ronnie Gandhi edited this page Apr 7, 2019 · 33 revisions

Background

Fuzzers are computer programs that are used to detect security holes in other programs, by sending those programs inputs that may fall outside the expected inputs, and thus reveal subtle bugs in the code. Most such fuzzers are driven by code coverage. Some fuzzers learn from previous inputs and results to produce inputs that exercise the code under test more effectively. DeepState is a testing framework that allows easy testing of C and C++ programs with sophisticated fuzzers. R has some simple random testers, but no coverage-driven fuzzers that learn to produce interesting inputs.

Related work

  • Filip Krikava presented some related work at UseR 2018 about coverage guided automatic test generation http://bit.ly/user18-genthat
  • fuzzr provides less sophisticated random testing of R functions. (and no framework for writing unit tests)
  • covr provides instrumentation of R code for unit testing. (instrumentation is required for more sophisticated fuzzers to learn a function from inputs to the execution path through the code)
  • testthat provides functions for defining tests, similar to deepstate.

GSOC coding project: “DeepState” for R

The goal of this GSOC project is to implement new features for defining unit tests in R code with coverage-driven fuzzing support. If feasible, a DeepState-like approach that allows multiple back-end fuzzers would be good; however, instrumentation that would make this useful may be hard to achieve. For an example of such an effort, see python-afl, which instruments non-compiled Python code so that AFL, a well-known and very successful fuzzer, can test Python.

After implementing the framework, we will use it to detect bugs in several widely used R packages (including base R).

Project idea 1: mutated inputs to R functions

The fuzzr package generates random inputs from scratch in order to test R functions. For this idea we would use a different approach for generating inputs: start with a random input (arg1, arg2, arg3) for an R function, and then mutate that argument based on code coverage criteria.

  • interesting inputs are ones that result in more code coverage.
  • keep a queue of interesting inputs to use, ranked by interest.
  • coverage used to select which of the arguments to mutate.
  • need to define functions in R for mutating various basic (atomic vectors) and complex types (list, data frame).

Project idea 2: interfacing DeepState with R

This idea is a bit more complicated, probably only should be attempted by a very talented student.

Create an R package that provides a function fuzz_C(Rfun) which takes as input an R function Rfun that uses .C to call compiled C/C++ code.

  • analyze the source code of Rfun to determine valid input types to the .C call (types and lengths of atomic vectors).
  • analyze if-stop/stopifnot R code to determine valid input values.
  • generate DeepState C++ code based on the inferred valid inputs.
  • use DeepState with fuzzer e.g. AFL or libfuzzer to detect new crashes.
  • may also try correctness testing by analyzing R test cases.

Examples of code that uses .C to call compiled code, passing atomic vectors:

Expected impact

Currently R package developers do not systematically use random testing on their code; this project would make it much easier to do so. If we end up detecting new bugs in R packages, it will result in improvements to these packages.

Non-mentors

These people have already been contacted and are not interested to co-mentor, so please do not bother them.

Mentors

Please get in touch with mentors after completing at least one of the tests below.

Tests

Do one or several — doing more hard tests makes you more likely to be selected.

  • Easy: use fuzzr on one of your favorite R functions, and post the results to a gist.
  • Medium: write R functions for mutating input values of various types (numeric, integer, data.frame). The function should take one R object as input, and return an object of the same type, but with mutated values.
  • Hard: use covr to write an R function that takes as input an R package, and outputs unique coverage sets for each R function in that package — what are the lines of code that only it runs?

Solutions of tests

  • Students, please post a link to your test results here.
    • Name: Ronnie Gandhi

      Email: [email protected]

      Website: RonnieGandhi

      Github: github.io

      University: Indian Institute of Technology, Roorkee

      Course: Computer Science and Engineering (B.tech)

      Solution to Easy Test: Easy

      Solution to Medium Test: Medium

      Solution to Hard Test: Hard


Clone this wiki locally