Technical Details
The car make and model classifiers that we offer are just binary neural network models. The classification models are delivered in the following formats: Tensorflow protobuf, Tensorflow saved_model format, ONNX, MNN, TFLite, and OpenVINO. There is no object detector included, and the developers can use any object detector of their preference, like YOLO or SSD, to find the cars in each frame. The detected cars must be cropped and resized to 224x224 pixels, which is the input image size of the classifier. The car classifier is based on MobileNetV3 neural network architecture. It is very fast and runs in real time on CPU of a regular PC. One car image classification takes 35 milliseconds on Intel Core i5 CPU. For faster inference a NVIDIA GPU is recommended to be used. The acceleration when using GPU over CPU depends on the type of the graphic card, type of the CPU, and the batch size (the number of simultaneous processed images). If using a multi-core CPU, it is possible to get very high recognition speed even on CPU. As the object detection is the most computationally intensive task, it is possible to run the detector on GPU, and do the classification on CPU. There are many ways to integrate the car classifier into your software. Some runtime libraries that can be used are Tensorflow (as a standalone library or models server), Microsoft ONNX Runtime, NVIDIA TensorRT, Alibaba MNN lightweight deep learning framework, TFLite library, and Intel OpenVINO toolkit. It is possible to run the classifier using C++ or Python. Another option is to use TensorFlow Serving, which is a high-performance serving system for machine learning models, designed for production environments. It exposes RESTful API (in port 8501) and gRPC interface (in port 8500). The model server can be packaged in Docker container and to be hosted on the cloud or On-Premises servers.
The choice of the inference engine is important to get optimal results. When running on Intel CPU's, the best performance is achieved using the OpenVINO runtime library. For ARM processors, TFlite or MNN is more suitable. If the inference is done on Nvidia GPU, the optimized library NVIDIA TensorRT gives the best performance. But besides the hardware platform, there are many other factors that have to be considered, like mode of operation (optimized for throughput or latency), batch size, the architecture of the software, and the image processing pipeline. Quantizing the models is a common way to boost the performance at the expense of a slight decrease in accuracy. Some other factors to consider are if the model runs on an edge device or on a server, the need to scale the software on the cloud, etc. Using a model server like TensorFlow Serving might be the best option for some use cases.
Specifications:
Recommended open-source object detectors with state-of-art accuracy:
Business Applications
Intelligent Video Analytics
Public safety and security organizations can include advanced search and car analytics functionalities into their software to find or redact relevant information in video records.
Traffic Analytics
Cities are getting smarter and by using Big Data supplied by the traffic cameras, the transportation systems can be managed more efficiently.
Digital Asset Management
Organizing, storing, and retrieving multimedia content like photos and videos. Building searchable car image databases for video and image archives.