To provide a supervisory input to train convolutional neural networks for classification or regression purposes, ground truth needs to be provided. Hence, it is very important to have the ability to generate ground truth in large scale for localization, detection purposes.
There are two crucial steps to achieve that.
1. Large scale data augmentation.
We developed a pipeline to automated the construction and superimposition of ground truth against diverse background images with image augmentation to facilitate training of generalizable computer vision neural networks for detection and regressions types of tasks. It is capable of augmenting 2D ground truth and superimpose them to generate massive number of training data sets for computer vision projects. It is particular applicable for generating customized 2D planar object detection.
In addition, this can be further extended to deal with 3D object data via 2D view export plus superimposition to train neural networks to properly detect different views of 3D objects, assuming a 3D object model exist.
If you are interested in applying this technological pipeline to your project or discuss its other potential applications, feel free to contact us.
2. Heuristic ground truth generation.
Supervised learning is one of the fastest way to get good computer vision results. However, to get there, a massive amount of labelled dataset is necessary. Assuming a large scale data augmentation process has been undergone, it is imperative that an equally large amount of ground truth can be generated (even if not perfectly, as neural networks are capable of generalizing beyond flaws, if properly designed).
For this reason, traditional machine learning can be applied here and is a necessary stepping store BEFORE you can replace it with a deep learning computer vision neural networks.
The below videos showcases our current prototype object tracking and detection in action.
EditVideoAndAddMarker Rotation Inference App1:
For more information on the overall project, you might want to check out here. In short, these hand crafted computer vision components are used to generate labelled positional and rotational dataset, which is necessary to provide the input for the PoseTracker.