-
Notifications
You must be signed in to change notification settings - Fork 23
Programmer's Guide
NOTE: This section is subject to change due to the evolving nature of TensorFI.
The main files in TensorFI and their brief descriptions follow:
-
tensorFI.py
The main fault injector class with the externally callable functions and the statistics gathering functions.
-
modifyGraph.py
Contains functions to walk the TensorFlow graph and insert fault injection nodes corresponding to the old ones.
-
injectFault.py
Contains functions to inject faults into different TensorFlow operation nodes in the graph, taking into account the options in the configuration file for operations and instances.
-
fiConfig.py
Fault configuration file options, and parsing routine. Also, the main class to store the fault configuration options.
-
faultTypes.py
Different types of supported faults and the functions corresponding to each fault type for both scalars and tensors.
-
fiStats.py
Collects statistics for different fault injection runs. Currently, only one default statistics gatherer is supported.
-
printGraph.py
Utility function to print the TensorFlow graph for debugging purposes.
-
fiLog.py
Logging fault injection runs for debugging and analysis.
Under normal circumstances, the details of these modules are hidden from developers as they simply use the TensorFI package which exposes all of these. However, it is important to know the module names when extending TensorFI.
We now take the perspective of the programmer who wishes to extend the core functionality of TensorFI in different ways, and explain how to do so:
You should add two new functions to faultType.py for how the fault is to be injected into scalars and tensors respectively. Then make the following changes to fiConfig.py:
-
add the new fault type to the
FaultTypes
enum, and -
map the new fault type to the fault type functions in
FIConfig.faultTypesMap
To support new operations, you should add the operation name to the OPS
enum in fiConfig.py, along with a string representation (this would be the name used to refer to it in the fiConfig.yaml file). If the operation is already supported for emulation in injectFault.py, then no more changes are needed. Otherwise, you also need to add fault injection functions for the operation (see D below).
To add a new parameter to the config file, first ensure that the parameter can be expressed using standard YAML syntax (no other format is supported at this time). Then add the parameter name to the Fields
enum in fiConfig.py, and add the code to read the parameters in the fiConfig
's constructor method. You may also want to add other methods to the fiConfig
class for parsing the parameter value to enable modularity. Finally, it is strongly recommended that you come up with a default value of the parameter in case it is not specified, and add that to the fiConfig
object so that future uses don't result in errors.
To support new kinds of TensorFlow operations is fairly straightforward. You need to add the operation to the opTable
in injectFault.py along with a function to actually perform the operation. This function should correspond
exactly to the format in which Tensorflow would invoke it (i.e., number of arguments passed, return type etc.). If not, you'll run into a runtime exception.
To parse the arguments from the original TensorFlow operations to the FI function, you might need to extract necessary attributes from the original operations (in modifyGraph.py where we create the FI function). This is because some attributes (e.g., strides, padding) are contained in the operation's attributes, but not their inputs. Otherwise, some inputs might be missing in the FI function.
By convention, please call this function injectFaultOperation
, where Operation
is the name of the Tensorflow operation (feel free to abbreviate it as some of the operation names can be rather long). You may want to ensure that it has not already been defined in the table already. As for what the function needs to do, take a look at the other injectFault
functions and follow their template. As a general rule, the numpy
library has implemented most of the
TensorFlow operations, so you should be able to leverage them in most cases. Alternatively, you can also use the built-in TensorFlow implementation, e.g., tf.nn.conv2d
within the FI function. The TensorFlow graph in the main program will not interfere with the one in the injectFault.py module, thus we can leverage the TensorFlow implementation. Also, for the actual fault injection, you need to call the condPerturb
function with the operation name passed as an argument. So you also need to add the operation name to the OPS
enum in the fiConfig file if
it's not already there, along with a string representation of it (see B above).
We only support simple statistics gathering for the time being - namely the number of injections, incorrect outputs, differences etc. If you want to add more sophisticated capabilities, you will have to modify the Stats
enum in the fiStats.py file. Better still, you can create derived classes of FiStats
and provide your own update methods for the other statistics.
You can add new fields to be logged to the LogFields
Enum of filog.py. You may also want to add corresponding functions to the FILog
class and call these functions at the appropriate places to do the logging.
To change the log entry format, you can modify getLogEntry
method of the FILog
class in filog.py. You don't need to make any changes to this method if all you're doing is adding new fields to the logEntry
.
Copyright (2019) Dependable Systems Lab at UBC