PoseTracker was a simple proof of concept project to leverage the power of convolutional neural network in effective 3D rigid body pose detection. The goal of this experiment was to provide a seamless and highly automated way to generate labelled data set for pose detection. This is quite challenging and necessitating the overcoming of two major limitations:

  • overcome the lack of prior information about the object of interest, while forcing the neural network to both detect and deduce the pose information of the object of interest.
  • even assuming prior step is success, to obtain the ground truth to validate, the pose information needs to be accurately detected via external means and sequentially coupled with the 2D images captures to validate the performance of the neural network.

Convolutional neural networks has made significant strides in the recent years in terms of object recognition, classification and segmentation leading to significant development in self driving vehicles and a great variety of computer vision application.

However there has been very few practical implementation framework of these advanced approaches in object 3D pose estimation, especially with regard to practical work flow to help training a customized network for at least, object-specific pose detection. The ability to recognize and track the object in the 3-D reference space is still a difficult problem to resolve. There are several important challenging issues that needs to be resolved.

In order to meet the need, issues such as the followings need to be addressed effectively:

  1. accurate quantitative 3D pose information itself is hard to capture easily, requiring complicated setups involving stereo optical or magnetic localization apparatus.
  2. the lack of prior information about the object of interest, necessitating the neural network to both detect and deduce the pose information of the object of interest at the same time.
  3. labelled dataset with the proper pose information is very hard to obtain in large quantity because the aforementioned image data pairs cannot be very effectively augmented in massive quantity by simple image manipulation techniques while preserving 3D pose information accurately, i.e. the traditional image manipulation like simple or multiple axis scaling, transformations will inevitably corrupt 3D pose information.

We demonstrate a proof of concept for simplified general purpose object pose detection pipeline, integrating with rotation information based on our proprietary 3D pose tracking solution. With those information, we apply supervised training to fit a small neural network at the end to attempt to regress the pose of the 3D object in 2D image in relation to reference. We assume 1) There is a reference object orientation for which ALL subsequent images are compared against with. 2) Reference position is fixed to camera orientation/position and our interest is the rotation of the new image in relation to the reference position.