Stream-based ORB Feature Extractor with Dynamic Power Optimization

Tran Phong1, Thinh Hung Pham1, Siew-Kei Lam1, Meiqing Wu1, and Bhavan A. Jasani2

1Nanyang Technological University, Singapore


2Carnegie Mellon University, United States

Figure 1. Feature Extraction Architecture using ORB algorithm.
Abstract
The Oriented Fast and Rotated BRIEF (ORB) feature extractor, which consists of key-point detection and descriptor computation, is a key module in many computer vision systems. Existing hardware implementations of ORB feature extractor only focus on increasing performance with power optimization as a post consideration. In this paper, we present a stream-based ORB feature extractor that incorporates mechanisms to lower the dynamic power consumption. These mechanisms exploit the fact that the number of detected keypoints is typically small. The proposed solution significantly lowers the switching activity of the key-point detection and descriptor computation stages by early pruning of non-likely key-points and gating the descriptor computation stages. Further power reduction and resource minimization are achieved by employing a threshold-guided bit-width optimization strategy to truncate the redundant bits in the key-point detection stage. Finally, we propose an approximation method to achieve rotation invariance of the descriptors. FPGA implementation targeting the Altera Aria V device shows that the proposed strategies lead to over 25% reduction in dynamic power and lower resource utilization, with only marginal loss in accuracy
Introduction
We want to implement a power-optimized and stream-based ORB feature extractor to achieve real-time performance in computer vision systems. Yet, there are two challenges:
  1. Firstly, the key-point detection algorithm is computationally intensive due to the numerous addition and multiplication operators needed for calculating the corner response.
  2. Secondly, a large number of row buffers are typically used in stream processing to cache the input pixel stream, which contributes to significant dynamic power consumption and hardware resources.
We thus introduce four main contributions to solve the aformentioned challenges:
Accuracy Evaluation
First, we measure the accuracy of the key-points detector in which we use repeatability criterion, which measures the robustness against variety of changes in image conditions, i.e. rotation, scaling, and illumination. Here we report the difference in repeatability of Prop2-KD and Prop1-KD which is computed in Equation (1). $$ \begin{equation} \tag{1} \Delta r = \frac{|\text{repeat}_{\text{Prop2-KD}} - \text{repeat}_{\text{Prop1-KD}}|}{\text{repeat}_{\text{Prop1-KD}}} \times 100 \% \end{equation} $$ Figure 2 illustrates the difference in repeatability rate of Prop2-KD and Prop1-KD, where the x-axis is the truncated bit-width of gradient values (G = 11 to 5). The results show that the repeatability difference increases with larger bit-width truncation which implies higher accuracy degradation. When 6 bits of the gradients are truncated, the loss of accuracy of the proposed architecture is marginal (less than 6%) for all datasets, but the gains in terms of power and resource utilization reduction are significant (as shown in the next subsection).
Figure 2. The difference in repeatability rate of Prop2-KD and Prop1-KD.
Second, we compare the descriptor accuracy which is determined by the Hamming distance of the 256-bit descriptor vectors of each implementation(Existing and Prop2).
Figure 3. Hamming distances with respect to the implementation using double precision.
Figure 3 shows the average Hamming distance of Existing and Prop2. It is evident that our design results in fewer error bits than Existing. Particularly, the proposed ORB feature descriptor achieves about 50% reduced Hamming distance compared to Existing for all four datasets. One of the main reasons that our design resulted in higher accuracy is the rounding scheme that we have employed in the Angle unit for approximating the nearest index of the \(\cos \theta \) and \(\sin \theta \) LUT, and for determining the point-pairs for binary test. Unlike Existing, we employ rounding to the nearest integer, which produces more accurate results.
Paper

Link to the latest version paper

Acknowledgement

This research project is partially funded by the National Research Foundation Singapore under its Campus for Research Excellence and Technological Enterprise (CREATE) programme

Discussion

For any questions regarding the publications, please contact me or post a comment below and I will respond shortly.