Object Detection Module With Python(Not For Newbies)
The backend system of the Object Detection module is intensely heavy with the activities of mathematical operations, numerical calculations, and data manipulation. Having those libraries which are suitable for the task of the above activities is the most. Anaconda Distribution python offers these libraries off the shelves, so keeping that in mind, we installed Python 3.7 and other data science libraries that came inbound with the Anaconda Enterprises distributed python. An option to avoid this distribution is using the python package manager pip to install the library that is required one by one. The cost of using Anaconda python is space while pip python is time.
The final piece of code is run in Linux Environment and hosted upon GCP[1] making a VM instance of ubuntu. Some of the favorite and widely handled object detection models(algorithms) are SSD[2], Fast RCNN[3, Faster RCNN[4], YOLO[5]. In this project we have used SSD, I have explained in the article, why have I actually done so.
Protobuf is used here, but why?
Often JSON and Protobuf can be used for transferring data between services or systems. JSON stands for JavaScript Object Notation while Protobuf, Protocol Buffer. JSON is comparatively famous than protobuf, as it is more readable, self-contained, and extensible, nevertheless, it isn’t perfect as in the case of heavy software like Tensorflow, as it happens to be expensive, and when the serialization and deserialization are at high volume(in the case of Tensorflow), the cost happens to be non-negligible. Protobuf is designed by Google and is also used in this system.
# Protobuf Compilation:
The TensorFlow Object Detection API uses Protobufs to configure(setup) model and training parameters. Before the framework can be used, the Protobuf libraries must be compiled.
P.S: This compilation changes the proto file to a python file
# Tensorflow detection model zoo
For object detection, there exists a number of models, T/F detection model zoo[6], provides a collection of detection models that are trained on the COCO dataset[7], Kitti dataset, and Open Image dataset. Collectively the model is trained in over 330k images out of which more than 200k is labeled. The training of these models under such a variety of data enabled us to use it in our system and actually cut off the overhead of training the self-built models from very scratch.
These models are very useful for the out of the box detection/conclusion. The categorization of the object is also possible, as it is also supported because categories already exist in these datasets. Inside the data directory “mscoco_label_map.pbtxt” file holds the labels defined which are used in the categorization of the objects. This file holds around 90 different labels that can be used to categorize these objects.
# Tensorflow object detection API
The Tensorflow Object detection API provides us with the number of models trained on the COCO dataset. The models are trained on SSD/ Faster RCNN/ Mask RCNN inception.
So, how should we actually choose the object detection model? Well, it depends primarily on the following factors:
- It depends upon our system to choose the model (H/W support, S/W specifications)
- An accurate prediction in images can be done with the models having higher speed and higher MAP points.
# Why SSD?
The primary reason for choosing an SSD(Single Shot MultiBox Detection) is its exactness. According to Technostacks[8], “SSD is a healthier recommendation. However, if exactness is not too much of disquiet but you want to go super quick, YOLO will be the best way to move forward. First of all, a visual thoughtfulness of swiftness vs precision trade-off would differentiate them well.” The YOLO mentioned in the above statement is probably YOLO V1.
# Where is SSD?
“use_model = ‘ssd_mobilenet_v1_coco_2018_01_28’
path_to_checkpoint = use_model + ‘/frozen_inference_graph.pb’”
The above is the snippet of model.py file. Here the use_model variable defines the SSD model, which is downloaded from the TF Object Detection model zoo. Ssd_mobilenet_v1_coco_2018_01_28 only consist of one file “frozen_inference_graph”, this is essential as Tensorflow works in Graph Principle. Working on a Graph principle means, TensorFlow is based on the concept of the data flow graph. The nodes of this graph represent operations. The edges are tensors. In terms of TensorFlow, a tensor is just a multidimensional array. Each data flow graph computation runs within a session on a CPU or GPU. The frozen detection graph is the actual model that is used for object detection.
# Why is Tensorflow used after all?
TensorFlow seems to have a faster compile time than other leading libraries such as Torch and Theano, and its computational graphs can be distributed on a cluster for computations. Theano and Torch are more of Deep Learning R&D frameworks, but TensorFlow is both an R&D and deployment framework. This makes the Tensorflow better and usable for object detection systems.
# Map Calculation and Process Time
Looking at the stats the system looks reliable with mAP of 0.8, however, the system was totally blank and was not able to detect any of the stuff in one of the cases out of five.
Below is an example of an image detected, which has the true positive value of 1
# Server Side
For server-side programming python’s flask library is used. It provides minimalistic MVC pattern support. The server is hosted on the Google Cloud Platform.
# References
[1] GCP: Google Cloud Platform
[2]SSD: SSD is a unified framework for object detection with a single network
[3]Fast RCNN: Fast Regional Convolution Neural Network
[4]Faster RCNN: Faster Regional Convolution Neural Network
[5]YOLO: You Only Look Once
[6]T/F detection model zoo: https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md
[7]COCO dataset: http://cocodataset.org/
[8] https://technostacks.com/blog/yolo-vs-ssd/