SSD consists of three parts.
- Prepare. (Resize image to 300x300.)
- Detection. (Find a candidate boxes.)
- Non-Maximum Suppression. (Final detection.)
Split model divides the model before and after non-maximum suppression.
This split position provides an excellent performance and solves the performance problem of tf.where on GPU.
He did nice work!
For realtime object detection, this is the most important part.
tensorflow/models#3270
before split ssd_mobilenet_v1_coco_2017_11_17:
after split ssd_mobilenet_v1_coco_2017_11_17:
Shape of ExpandDims_1 is ?x1917x1x4. (see output shape)
"?" means that input array length is not fixed length.
This input array length using as mini batch size "24" at the training time.
At the prediction time, input image uses with array as [[image]]. This means the input array length is "1".
(When the prediction time, you can predict multiple images at once.)
In the training time and prediction time, input image array length is different. Therefore, the input is defined with tf.placeholder and the shape is defined as "None"(means not fixed array length).
That "None" will appear as "?".
Divide here.
Write the definition of this division point in the source code:lib/load_graph_nms_v1.py as follows.
""" SPLIT TARGET NAME """
SPLIT_TARGET_NAME = ['Postprocessor/convert_scores',
'Postprocessor/ExpandDims_1',
]
Shape of convert_scores is ?x1917x90. (see output shape)
Divide here.
Write the definition of this division point in the source code:lib/load_graph_nms_v1.py as follows.
""" SPLIT TARGET NAME """
SPLIT_TARGET_NAME = ['Postprocessor/convert_scores',
'Postprocessor/ExpandDims_1',
]
Write new inputs in default graph with tf.placeholder. source code:lib/load_graph_nms_v1.py
tf.reset_default_graph()
""" ADD CPU INPUT """
target_in = [tf.placeholder(tf.float32, shape=(None, split_shape, num_classes), name=SPLIT_TARGET_NAME[0]),
tf.placeholder(tf.float32, shape=(None, split_shape, 1, 4), name=SPLIT_TARGET_NAME[1]),
]
The first, I reset the default graph. I wrote it to mean that the graph is empty at this time.
The shape is in the previous graph diagram.
Set the same name for name. The new input name is appended "_1" to the name automatically, so use it.
Now, new inputs exist in default graph, get graph def from there.
After get graph def of new inputs, reset default graph. New inputs tf.placeholder were created only for graph def. Don't need anymore.
"""
Load placeholder's graph_def.
"""
target_def = []
for node in tf.get_default_graph().as_graph_def().node:
for stn in SPLIT_TARGET_NAME:
if node.name == stn:
target_def += [node]
tf.reset_default_graph()
Load frozen graph to graph_def variable.
graph_def = tf.GraphDef()
with tf.gfile.GFile(model_path, 'rb') as fid:
serialized_graph = fid.read()
graph_def.ParseFromString(serialized_graph)
If non-split model, the loaded graph_def is imported into the default graph and return the default graph.
def load_frozen_graph_without_split(self):
"""
Load frozen_graph.
"""
model_path = self.cfg['model_path']
tf.reset_default_graph()
graph_def = tf.GraphDef()
with tf.gfile.GFile(model_path, 'rb') as fid:
serialized_graph = fid.read()
graph_def.ParseFromString(serialized_graph)
tf.import_graph_def(graph_def, name='')
"""
return
"""
return tf.get_default_graph()
But in split model, processing continues.
Next is the most important code in model operation.
Load all inputs of all nodes and write inputs into edges[NODE_NAME].
"""
Check the connection of all nodes.
edges[] variable has input information for all nodes.
"""
edges = {}
name_to_node_map = {}
node_seq = {}
seq = 0
for node in graph_def.node:
n = self.node_name(node.name)
if n in SPLIT_TARGET_NAME:
print(node)
name_to_node_map[n] = node
edges[n] = [self.node_name(x) for x in node.input]
if n in SPLIT_TARGET_NAME:
print(edges[n])
node_seq[n] = seq
seq += 1
The node 'Postprocessor/ExpandDims_1' has 2 inputs.
Node of Postprocessor/ExpandDims_1:
name: "Postprocessor/ExpandDims_1"
op: "ExpandDims"
input: "Postprocessor/Reshape_2"
input: "Postprocessor/ExpandDims_1/dim"
attr {
key: "T"
value {
type: DT_FLOAT
}
}
attr {
key: "Tdim"
value {
type: DT_INT32
}
}
Therefore, edges['Postprocessor/ExpandDims_1'] has 2 input node names.
Edge of Postprocessor/ExpandDims_1:
['Postprocessor/Reshape_2', 'Postprocessor/ExpandDims_1/dim']
The node 'Postprocessor/convert_scores' has 1 input.
Node of Postprocessor/convert_scores:
name: "Postprocessor/convert_scores"
op: "Sigmoid"
input: "Postprocessor/scale_logits"
attr {
key: "T"
value {
type: DT_FLOAT
}
}
Therefore, edges['Postprocessor/convert_scores'] has 1 input node name.
Edge of Postprocessor/convert_scores:
['Postprocessor/scale_logits']
As you can see, the edges[] variable has input information for all nodes.
Alert if split target is not in the graph.
Raise ERROR is also good.
"""
Alert if split target is not in the graph.
"""
dest_nodes = SPLIT_TARGET_NAME
for d in dest_nodes:
assert d in name_to_node_map, "%s is not in graph" % d
Follow all input nodes from the split point and add it into keep_list. This is GPU part.
"""
Making GPU part.
Follow all input nodes from the split point and add it into keep_list.
"""
nodes_to_keep = set()
next_to_visit = dest_nodes
while next_to_visit:
n = next_to_visit[0]
del next_to_visit[0]
if n in nodes_to_keep:
continue
nodes_to_keep.add(n)
next_to_visit += edges[n]
nodes_to_keep_list = sorted(list(nodes_to_keep), key=lambda n: node_seq[n])
keep = graph_pb2.GraphDef()
for n in nodes_to_keep_list:
keep.node.extend([copy.deepcopy(name_to_node_map[n])])
Making CPU part is simple. It removes GPU part from loaded graph and add new inputs.
"""
Making CPU part.
It removes GPU part from loaded graph and add new inputs.
"""
nodes_to_remove = set()
for n in node_seq:
if n in nodes_to_keep_list: continue
nodes_to_remove.add(n)
nodes_to_remove_list = sorted(list(nodes_to_remove), key=lambda n: node_seq[n])
remove = graph_pb2.GraphDef()
for td in target_def:
remove.node.extend([td])
for n in nodes_to_remove_list:
remove.node.extend([copy.deepcopy(name_to_node_map[n])])
Finally, add device info and import into the default graph. And return the default graph.
"""
Import graph_def into default graph.
"""
with tf.device('/gpu:0'):
tf.import_graph_def(keep, name='')
with tf.device('/cpu:0'):
tf.import_graph_def(remove, name='')
return tf.get_default_graph()
The input of the primary graph (gpu part) does not change and it is image array. The output operation names are ExpandDims_1 and convert_scores.
The input of secondary graph (cpu part) becomes expand_in and score_in created with tf.placeholder. The output operation names are not change, these are detection_boxes, detection_scores, detection_classes and num_detections.
If load_graph() returns expand_in and score_in, I can use it for secondary graph's input tensor. But I wrote it with graph.get_tensor_by_name() like any other operations.
source code:lib/detection_nms_v1.py
if SPLIT_MODEL:
SPLIT_TARGET_NAME = ['Postprocessor/convert_scores',
'Postprocessor/ExpandDims_1',
]
split_out = []
split_in = []
for stn in SPLIT_TARGET_NAME:
split_out += [graph.get_tensor_by_name(stn+':0')]
split_in += [graph.get_tensor_by_name(stn+'_1:0')]
New Output: ExpandDims_1 and convert_scores.
New Input: ExpandDims_1_1 and convert_scores_1.
In 2018, we know ssd_mobilenet_v1 was something changed.
And we also know ssd_mobilenet_v2 was uploaded.
Ok, let's check ssd_mobilenet_v2 first.
Looking at graph, I can see that there are three inputs.
Graph diagram of ssd_mobilenet_v2_2018_03_29:
See type and output shape.
ExpandDims_1 is the same as previous one.
And, what is Slice? It seems convert_scores. Just renamed it.
And, what is stack_1? This is new face!
stack_1 seems to be an array of Float. That is, tf.placeholder with shape is None.
source code:lib/load_graph_nms_v2.py
""" SPLIT TARGET NAME """
SPLIT_TARGET_NAME = ['Postprocessor/Slice', # Tensor
'Postprocessor/ExpandDims_1', # Tensor
'Postprocessor/stack_1', # Float array
]
""" ADD CPU INPUT """
target_in = [tf.placeholder(tf.float32, shape=(None, split_shape, num_classes), name=SPLIT_TARGET_NAME[0]),
tf.placeholder(tf.float32, shape=(None, split_shape, 1, 4), name=SPLIT_TARGET_NAME[1]), # shape=output shape
tf.placeholder(tf.float32, shape=(None), name=SPLIT_TARGET_NAME[2]), # array of float
]
Operations.
source code:lib/detection_nms_v2.py
if SPLIT_MODEL:
SPLIT_TARGET_NAME = ['Postprocessor/Slice',
'Postprocessor/ExpandDims_1',
'Postprocessor/stack_1'
]
split_out = []
split_in = []
for stn in SPLIT_TARGET_NAME:
split_out += [graph.get_tensor_by_name(stn+':0')]
split_in += [graph.get_tensor_by_name(stn+'_1:0')]
Of course, arguments and returns of sess.run() use this.
- ssdlite_mobilenet_v2_coco_2018_05_09
- ssd_inception_v2_coco_2018_01_28
- ssd_mobilenet_v1_coco_2018_01_28
These are the same BatchMultiClassNonMaxSuppression inputs as ssd_mobilenet_v2_coco_2018_03_29.
ssdlite_mobilenet_v2_coco_2018_05_09:
ssd_inception_v2_coco_2018_01_28:
ssd_mobilenet_v1_coco_2018_01_28: