Skip to content

Programmer's Guide

Niranjhana Narayanan edited this page Sep 12, 2019 · 3 revisions

NOTE: This section is subject to change due to the evolving nature of TensorFI.

Existing files and functions:

The main files in TensorFI and their brief descriptions follow:

  • tensorFI.py

    The main fault injector class with the externally callable functions and the statistics gathering functions.

  • modifyGraph.py

    Contains functions to walk the TensorFlow graph and insert fault injection nodes corresponding to the old ones.

  • injectFault.py

    Contains functions to inject faults into different TensorFlow operation nodes in the graph, taking into account the options in the configuration file for operations and instances.

  • fiConfig.py

    Fault configuration file options, and parsing routine. Also, the main class to store the fault configuration options.

  • faultTypes.py

    Different types of supported faults and the functions corresponding to each fault type for both scalars and tensors.

  • fiStats.py

    Collects statistics for different fault injection runs. Currently, only one default statistics gatherer is supported.

  • printGraph.py

    Utility function to print the TensorFlow graph for debugging purposes.

  • fiLog.py

    Logging fault injection runs for debugging and analysis.

Under normal circumstances, the details of these modules are hidden from developers as they simply use the TensorFI package which exposes all of these. However, it is important to know the module names when extending TensorFI.

Supporting new additions:

We now take the perspective of the programmer who wishes to extend the core functionality of TensorFI in different ways, and explain how to do so:

A. Adding a new fault type

You should add two new functions to faultType.py for how the fault is to be injected into scalars and tensors respectively. Then make the following changes to fiConfig.py:

  1. add the new fault type to the FaultTypes enum, and

  2. map the new fault type to the fault type functions in FIConfig.faultTypesMap

B. Adding a new operation to inject

To support new operations, you should add the operation name to the OPS enum in fiConfig.py, along with a string representation (this would be the name used to refer to it in the fiConfig.yaml file). If the operation is already supported for emulation in injectFault.py, then no more changes are needed. Otherwise, you also need to add fault injection functions for the operation (see D below).

C. Adding a new parameter to the config file

To add a new parameter to the config file, first ensure that the parameter can be expressed using standard YAML syntax (no other format is supported at this time). Then add the parameter name to the Fields enum in fiConfig.py, and add the code to read the parameters in the fiConfig's constructor method. You may also want to add other methods to the fiConfig class for parsing the parameter value to enable modularity. Finally, it is strongly recommended that you come up with a default value of the parameter in case it is not specified, and add that to the fiConfig object so that future uses don't result in errors.

D. Supporting new kinds of TensorFlow operations

To support new kinds of TensorFlow operations is fairly straightforward. You need to add the operation to the opTable in injectFault.py along with a function to actually perform the operation. This function should correspond exactly to the format in which Tensorflow would invoke it (i.e., number of arguments passed, return type etc.). If not, you'll run into a runtime exception.

To parse the arguments from the original TensorFlow operations to the FI function, you might need to extract necessary attributes from the original operations (in modifyGraph.py where we create the FI function). This is because some attributes (e.g., strides, padding) are contained in the operation's attributes, but not their inputs. Otherwise, some inputs might be missing in the FI function.

By convention, please call this function injectFaultOperation, where Operation is the name of the Tensorflow operation (feel free to abbreviate it as some of the operation names can be rather long). You may want to ensure that it has not already been defined in the table already. As for what the function needs to do, take a look at the other injectFault functions and follow their template. As a general rule, the numpy library has implemented most of the TensorFlow operations, so you should be able to leverage them in most cases. Alternatively, you can also use the built-in TensorFlow implementation, e.g., tf.nn.conv2d within the FI function. The TensorFlow graph in the main program will not interfere with the one in the injectFault.py module, thus we can leverage the TensorFlow implementation. Also, for the actual fault injection, you need to call the condPerturb function with the operation name passed as an argument. So you also need to add the operation name to the OPS enum in the fiConfig file if it's not already there, along with a string representation of it (see B above).

E. Supporting new statistics gathering

We only support simple statistics gathering for the time being - namely the number of injections, incorrect outputs, differences etc. If you want to add more sophisticated capabilities, you will have to modify the Stats enum in the fiStats.py file. Better still, you can create derived classes of FiStats and provide your own update methods for the other statistics.

F. Adding new log entries or changing the format of the log file

You can add new fields to be logged to the LogFields Enum of filog.py. You may also want to add corresponding functions to the FILog class and call these functions at the appropriate places to do the logging.

To change the log entry format, you can modify getLogEntry method of the FILog class in filog.py. You don't need to make any changes to this method if all you're doing is adding new fields to the logEntry.