olgaliak · yashpande · Nov 28, 2018 · Nov 28, 2018 · Nov 28, 2018 · Nov 28, 2018
diff --git a/README.md b/README.md
@@ -1,10 +1,10 @@
 # Active learning + object detection
-Labeling images for object detection is commonly required task to get started with Computer Vision related project.
-Good news that you do not have to label all images  (draw bounding boxes) from scratch --- the goal of this project is to add (semi)automation to the process. 
-Please refer to this blog post that describes Active Learning and semi-automated flow: 
+Labeling images for object detection is a commonly required task to get started with Computer Vision related projects.
+The good news is that you do not have to label all images (draw bounding boxes) from scratch --- the goal of this project is to add (semi)automation to the process. 
+Please refer to this blog post that describes Active Learning and the semi-automated flow: 
   [Active Learning for Object Detection in Partnership with Conservation Metrics](https://www.microsoft.com/developerblog/2018/11/06/active-learning-for-object-detection/)
-We will use Transfer Learning and Active Learning as core Machine Learning  components of the pipeline.
- -- Transfer Learning: use powerful pre-trained on big dataset (COCO) model as a startining point for fine-tuning foe needed classes.
+We will use Transfer Learning and Active Learning as the core Machine Learning components of the pipeline.
+ -- Transfer Learning: use powerful pre-trained on big dataset (COCO) model as a starting point for fine-tuning for needed classes.
  -- Active Learning: human annotator labels small set of images (set1), trains Object Detection Model  (model1) on this set1 and then uses model1 to predict bounding boxes on images (thus pre-labeling those). Human annotator reviews mode1's predictions where the model was less confident -- and thus comes up with new set of images -- set2. Next phase will be to train more powerful model2 on bigger train set that includes set1 and set2 and use model2 prediction results as draft of labeled set3…
 The plan is to have 2 versions of pipeline set-up.
 
@@ -40,12 +40,12 @@ The flow below assumes the following:
 1) We use Tensorflow Object Detection API (Faster RCNN with Resnet 50 as default option)  to fine tune object detection. 
 2) Tensorflow Object Detection API is setup on Linux box (Azure DSVM is an option) that you can ssh to. See docs for Tensorflow Object Detection API regarding its general config.
 3) Data(images) is in Azure blob storage
-4) Human annotators use [VOTT](https://github.com/Microsoft/VoTT)  to label\revise images.  To support another tagging tool it's output (boudin boxes) need to be converted to csv form -- pull requests are welcomed!
+4) Human annotators use [VOTT](https://github.com/Microsoft/VoTT) to label/revise images.  To support another tagging tool it's output (bouding boxes) needs to be converted to csv format -- pull requests are welcomed!
 
 Here is general flow has 2 steps:
 1) Environments setup
 2) Active Learnining cycle: labeling data and running scipts to update model and feed back results for human annotator to review.  
-The whole flow is currenly automated with **4 scrips** user needs to run.
+The whole flow is currenly automated with **4 scripts** that the user needs to run.
 
 
 ### General  prep

diff --git a/config_description.md b/config_description.md
@@ -17,13 +17,13 @@ This determines whether or not the images in blob storage are within separate fo
 - classes:
 This is a comma separated list of all classes that are being tagged. Please ensure that there are no spaces in the list and only commas are used to separate names.
 - ideal_class_balance
-This is a comma separated list of requested classes distribution in images being reviewed by human expert.  
+This is a comma separated list of requested classes distribution in images being reviewed by the human expert.  
 Example (for 2-class scenario):  
 `ideal_class_balance=0.6,0.3,0.1`  
 In this example:  
-  60% of images that use will be reviewing will have at least one bbox with object class1,   
-  30%  images that have  bboxes for class  (defects),  
-  10% of images get class "NULL" -- were neither knots nor defects were detected by the model.
+  60% of images that the human expert will be reviewing will have at least one detected object of type class1 (knots),   
+  30% of images will have at least one detected object of type class2 (defects),  
+  10% of images will be of class "NULL" -- where neither knots nor defects were detected by the model.
 
 - filetype:
 This is the type of image file used. The format is a glob pattern, so *.jpg for a .jpg file or *.png for a .png file. Note that only JPEG or PNG filetypes can be used with tensorflow.

diff --git a/init_pred_desription.md b/init_pred_desription.md
@@ -10,7 +10,7 @@ We could use pretrained model that can detect decently few dozens or more object
 of objects are on the images. The model might not provide super-accurate results however some of those might be
 useful for more target image sampling.  
 For example if you dataset has common scenes of nature or city life than using model trained on [COCO dataset](https://github.com/amikelive/coco-labels/blob/master/coco-labels-paper.txt)
-might give you an idea what images have objects that _resembles_ person, car, deer and so on. And depedning on your
+might give you an idea what images have objects that _resembles_ person, car, deer and so on. And depending on your
 scenario you might focus you initial labeling efforts on images that have or don't have a particular class.  
 
 ![Flow](images/init_predict.PNG)  
@@ -73,4 +73,4 @@ SSH to DSVM, activate needed Tensorflow virtul environment if needed and run:
   "class mapping json" as 3rd parameter:  
   ` D:\repo\active-learning-detect\tag>python download_vott_json.py 200 ..\config.ini ..\sample_init_classes_map.json`  
 
-  ![Flow](images/VOTT_animal.PNG)  
+  ![Flow](images/VOTT_animal.PNG)  
diff --git a/tag/download_vott_json.py b/tag/download_vott_json.py
@@ -53,6 +53,30 @@ def  add_bkg_class_name(tag_names):
 def remove_bkg_class_name(tag_names):
     return  remove_class_name(tag_names, "NULL")
 
+def parse_prediction(prediction):
+    x_1, x_2, y_1, y_2, height, width = map(float, prediction[TAG_STARTING_LOCATION:TAG_ENDING_LOCATION+1])
+    x_1 = int(x_1*width)
+    x_2 = int(x_2*width)
+    y_1 = int(y_1*height)
+    y_2 = int(y_2*height)
+    valid_pred = prediction[TAG_LOCATION]!="NULL" and (x_1,x_2,y_1,y_2)!=(0,0,0,0)
+    return valid_pred, x_1, x_2, y_1, y_2, height, width
+
+def get_frame(coordinates):
+    x_1, x_2, y_1, y_2, height, width = coordinates
+    curframe = {}
+    curframe["x1"] = x_1
+    curframe["y1"] = y_1
+    curframe["x2"] = x_2
+    curframe["y2"] = y_2
+    curframe["id"] = j
+    curframe["width"] = width
+    curframe["height"] = height
+    curframe["type"] = "Rectangle"
+    curframe["tags"] = tags
+    curframe["name"] = j
+    return curframe
+
 def get_image_loc(prediction, user_folders, image_loc):
     if user_folders:
         if image_loc == "":
@@ -126,43 +150,21 @@ def make_vott_output(all_predictions, output_location_param, user_folders, image
             set_predictions = defaultdict(list)
             if max_tags_per_pixel is None:
                 for prediction in predictions:
-                    x_1, x_2, y_1, y_2, height, width = map(float, prediction[TAG_STARTING_LOCATION:TAG_ENDING_LOCATION+1])
-                    if prediction[TAG_LOCATION]!="NULL" and (x_1,x_2,y_1,y_2)!=(0,0,0,0):
-                        x_1 = int(x_1*width)
-                        x_2 = int(x_2*width)
-                        y_1 = int(y_1*height)
-                        y_2 = int(y_2*height)
+                    valid_pred, x_1, x_2, y_1, y_2, height, width = parse_prediction(prediction)
+                    if valid_pred:
                         set_predictions[(x_1, x_2, y_1, y_2, height, width)].append(prediction[TAG_LOCATION])
                         file_name = prediction[FILENAME_LOCATION]
             else:
                 if predictions:
                     num_tags = np.zeros((int(predictions[0][HEIGHT_LOCATION]),int(predictions[0][WIDTH_LOCATION])), dtype=int)
                     for prediction in sorted(predictions, key=lambda x: float(x[TAG_CONFIDENCE_LOCATION]), reverse=True):
-                        x_1, x_2, y_1, y_2, height, width = map(float, prediction[TAG_STARTING_LOCATION:TAG_ENDING_LOCATION+1])
-                        if prediction[TAG_LOCATION]!="NULL" and (x_1,x_2,y_1,y_2)!=(0,0,0,0):
-                            x_1 = int(x_1*width)
-                            x_2 = int(x_2*width)
-                            y_1 = int(y_1*height)
-                            y_2 = int(y_2*height)
-                            if np.amax(num_tags[y_1:y_2, x_1:x_2])<max_tags_per_pixel:
-                                num_tags[y_1:y_2, x_1:x_2]+=1
-                                set_predictions[(x_1, x_2, y_1, y_2, height, width)].append(prediction[TAG_LOCATION])
-                                file_name = prediction[FILENAME_LOCATION]
+                        valid_pred, x_1, x_2, y_1, y_2, height, width = parse_prediction(prediction)
+                        if valid_pred and np.amax(num_tags[y_1:y_2, x_1:x_2])<max_tags_per_pixel:
+                            num_tags[y_1:y_2, x_1:x_2]+=1
+                            set_predictions[(x_1, x_2, y_1, y_2, height, width)].append(prediction[TAG_LOCATION])
+                            file_name = prediction[FILENAME_LOCATION]
             for j,(coordinates, tags) in enumerate(set_predictions.items(), 1):
-                # filename,tag,x1,x2,y1,y2,true_height,true_width,image_directory
-                x_1, x_2, y_1, y_2, height, width = coordinates
-                curframe = {}
-                curframe["x1"] = x_1
-                curframe["y1"] = y_1
-                curframe["x2"] = x_2
-                curframe["y2"] = y_2
-                curframe["id"] = j
-                curframe["width"] = width
-                curframe["height"] = height
-                curframe["type"] = "Rectangle"
-                curframe["tags"] = tags
-                curframe["name"] = j
-                all_frames.append(curframe)
+                all_frames.append(get_frame(coordinates))
             dirjson["frames"][file_name] = all_frames
         dirjson["framerate"] = "1"
         dirjson["inputTags"] = ",".join(tag_names)