Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor spelling changes and cleaned download_vott_json #48

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# Active learning + object detection
Labeling images for object detection is commonly required task to get started with Computer Vision related project.
Good news that you do not have to label all images (draw bounding boxes) from scratch --- the goal of this project is to add (semi)automation to the process.
Please refer to this blog post that describes Active Learning and semi-automated flow:
Labeling images for object detection is a commonly required task to get started with Computer Vision related projects.
The good news is that you do not have to label all images (draw bounding boxes) from scratch --- the goal of this project is to add (semi)automation to the process.
Please refer to this blog post that describes Active Learning and the semi-automated flow:
[Active Learning for Object Detection in Partnership with Conservation Metrics](https://www.microsoft.com/developerblog/2018/11/06/active-learning-for-object-detection/)
We will use Transfer Learning and Active Learning as core Machine Learning components of the pipeline.
-- Transfer Learning: use powerful pre-trained on big dataset (COCO) model as a startining point for fine-tuning foe needed classes.
We will use Transfer Learning and Active Learning as the core Machine Learning components of the pipeline.
-- Transfer Learning: use powerful pre-trained on big dataset (COCO) model as a starting point for fine-tuning for needed classes.
-- Active Learning: human annotator labels small set of images (set1), trains Object Detection Model (model1) on this set1 and then uses model1 to predict bounding boxes on images (thus pre-labeling those). Human annotator reviews mode1's predictions where the model was less confident -- and thus comes up with new set of images -- set2. Next phase will be to train more powerful model2 on bigger train set that includes set1 and set2 and use model2 prediction results as draft of labeled set3…
The plan is to have 2 versions of pipeline set-up.

Expand Down Expand Up @@ -40,12 +40,12 @@ The flow below assumes the following:
1) We use Tensorflow Object Detection API (Faster RCNN with Resnet 50 as default option) to fine tune object detection.
2) Tensorflow Object Detection API is setup on Linux box (Azure DSVM is an option) that you can ssh to. See docs for Tensorflow Object Detection API regarding its general config.
3) Data(images) is in Azure blob storage
4) Human annotators use [VOTT](https://github.com/Microsoft/VoTT) to label\revise images. To support another tagging tool it's output (boudin boxes) need to be converted to csv form -- pull requests are welcomed!
4) Human annotators use [VOTT](https://github.com/Microsoft/VoTT) to label/revise images. To support another tagging tool it's output (bouding boxes) needs to be converted to csv format -- pull requests are welcomed!

Here is general flow has 2 steps:
1) Environments setup
2) Active Learnining cycle: labeling data and running scipts to update model and feed back results for human annotator to review.
The whole flow is currenly automated with **4 scrips** user needs to run.
The whole flow is currenly automated with **4 scripts** that the user needs to run.


### General prep
Expand Down
8 changes: 4 additions & 4 deletions config_description.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,13 @@ This determines whether or not the images in blob storage are within separate fo
- classes:
This is a comma separated list of all classes that are being tagged. Please ensure that there are no spaces in the list and only commas are used to separate names.
- ideal_class_balance
This is a comma separated list of requested classes distribution in images being reviewed by human expert.
This is a comma separated list of requested classes distribution in images being reviewed by the human expert.
Example (for 2-class scenario):
`ideal_class_balance=0.6,0.3,0.1`
In this example:
60% of images that use will be reviewing will have at least one bbox with object class1,
30% images that have bboxes for class (defects),
10% of images get class "NULL" -- were neither knots nor defects were detected by the model.
60% of images that the human expert will be reviewing will have at least one detected object of type class1 (knots),
30% of images will have at least one detected object of type class2 (defects),
10% of images will be of class "NULL" -- where neither knots nor defects were detected by the model.

- filetype:
This is the type of image file used. The format is a glob pattern, so *.jpg for a .jpg file or *.png for a .png file. Note that only JPEG or PNG filetypes can be used with tensorflow.
Expand Down
4 changes: 2 additions & 2 deletions init_pred_desription.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ We could use pretrained model that can detect decently few dozens or more object
of objects are on the images. The model might not provide super-accurate results however some of those might be
useful for more target image sampling.
For example if you dataset has common scenes of nature or city life than using model trained on [COCO dataset](https://github.com/amikelive/coco-labels/blob/master/coco-labels-paper.txt)
might give you an idea what images have objects that _resembles_ person, car, deer and so on. And depedning on your
might give you an idea what images have objects that _resembles_ person, car, deer and so on. And depending on your
scenario you might focus you initial labeling efforts on images that have or don't have a particular class.

![Flow](images/init_predict.PNG)
Expand Down Expand Up @@ -73,4 +73,4 @@ SSH to DSVM, activate needed Tensorflow virtul environment if needed and run:
"class mapping json" as 3rd parameter:
` D:\repo\active-learning-detect\tag>python download_vott_json.py 200 ..\config.ini ..\sample_init_classes_map.json`

![Flow](images/VOTT_animal.PNG)
![Flow](images/VOTT_animal.PNG)
62 changes: 32 additions & 30 deletions tag/download_vott_json.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,30 @@ def add_bkg_class_name(tag_names):
def remove_bkg_class_name(tag_names):
return remove_class_name(tag_names, "NULL")

def parse_prediction(prediction):
x_1, x_2, y_1, y_2, height, width = map(float, prediction[TAG_STARTING_LOCATION:TAG_ENDING_LOCATION+1])
x_1 = int(x_1*width)
x_2 = int(x_2*width)
y_1 = int(y_1*height)
y_2 = int(y_2*height)
valid_pred = prediction[TAG_LOCATION]!="NULL" and (x_1,x_2,y_1,y_2)!=(0,0,0,0)
return valid_pred, x_1, x_2, y_1, y_2, height, width

def get_frame(coordinates):
x_1, x_2, y_1, y_2, height, width = coordinates
curframe = {}
curframe["x1"] = x_1
curframe["y1"] = y_1
curframe["x2"] = x_2
curframe["y2"] = y_2
curframe["id"] = j
curframe["width"] = width
curframe["height"] = height
curframe["type"] = "Rectangle"
curframe["tags"] = tags
curframe["name"] = j
return curframe

def get_image_loc(prediction, user_folders, image_loc):
if user_folders:
if image_loc == "":
Expand Down Expand Up @@ -126,43 +150,21 @@ def make_vott_output(all_predictions, output_location_param, user_folders, image
set_predictions = defaultdict(list)
if max_tags_per_pixel is None:
for prediction in predictions:
x_1, x_2, y_1, y_2, height, width = map(float, prediction[TAG_STARTING_LOCATION:TAG_ENDING_LOCATION+1])
if prediction[TAG_LOCATION]!="NULL" and (x_1,x_2,y_1,y_2)!=(0,0,0,0):
x_1 = int(x_1*width)
x_2 = int(x_2*width)
y_1 = int(y_1*height)
y_2 = int(y_2*height)
valid_pred, x_1, x_2, y_1, y_2, height, width = parse_prediction(prediction)
if valid_pred:
set_predictions[(x_1, x_2, y_1, y_2, height, width)].append(prediction[TAG_LOCATION])
file_name = prediction[FILENAME_LOCATION]
else:
if predictions:
num_tags = np.zeros((int(predictions[0][HEIGHT_LOCATION]),int(predictions[0][WIDTH_LOCATION])), dtype=int)
for prediction in sorted(predictions, key=lambda x: float(x[TAG_CONFIDENCE_LOCATION]), reverse=True):
x_1, x_2, y_1, y_2, height, width = map(float, prediction[TAG_STARTING_LOCATION:TAG_ENDING_LOCATION+1])
if prediction[TAG_LOCATION]!="NULL" and (x_1,x_2,y_1,y_2)!=(0,0,0,0):
x_1 = int(x_1*width)
x_2 = int(x_2*width)
y_1 = int(y_1*height)
y_2 = int(y_2*height)
if np.amax(num_tags[y_1:y_2, x_1:x_2])<max_tags_per_pixel:
num_tags[y_1:y_2, x_1:x_2]+=1
set_predictions[(x_1, x_2, y_1, y_2, height, width)].append(prediction[TAG_LOCATION])
file_name = prediction[FILENAME_LOCATION]
valid_pred, x_1, x_2, y_1, y_2, height, width = parse_prediction(prediction)
if valid_pred and np.amax(num_tags[y_1:y_2, x_1:x_2])<max_tags_per_pixel:
num_tags[y_1:y_2, x_1:x_2]+=1
set_predictions[(x_1, x_2, y_1, y_2, height, width)].append(prediction[TAG_LOCATION])
file_name = prediction[FILENAME_LOCATION]
for j,(coordinates, tags) in enumerate(set_predictions.items(), 1):
# filename,tag,x1,x2,y1,y2,true_height,true_width,image_directory
x_1, x_2, y_1, y_2, height, width = coordinates
curframe = {}
curframe["x1"] = x_1
curframe["y1"] = y_1
curframe["x2"] = x_2
curframe["y2"] = y_2
curframe["id"] = j
curframe["width"] = width
curframe["height"] = height
curframe["type"] = "Rectangle"
curframe["tags"] = tags
curframe["name"] = j
all_frames.append(curframe)
all_frames.append(get_frame(coordinates))
dirjson["frames"][file_name] = all_frames
dirjson["framerate"] = "1"
dirjson["inputTags"] = ",".join(tag_names)
Expand Down