Truncated Gray-Coded Bit-Plane Matching Based Motion Estimation and its Hardware Architecture

Anil Çelebi, Student Member, IEEE, Orhan Akbulut, Oğuzhan Urhan, Member, IEEE, Sarp Ertürk, Member, IEEE

Abstract — This paper proposes an efficient low bit-depth representation based motion estimation approach which is particularly suitable for low-power consumer electronics devices. In the proposed approach motion estimation is carried out using bit truncated gray-coded image pixels. The corresponding hardware architecture is also designed and presented in this paper to show the effectiveness of the proposed approach. It is shown that the proposed approach provides improved motion estimation accuracy compared to conventional bit-truncation based approaches that are directly applied to binary coded pixel values. The proposed approach uses simple Gray-coding, that has very low-complexity and can be applied on a pixel-by-pixel basis. Hence, the comparatively more complex transformation processes required in One Bit-Transform or Two-Bit Transform based low bit-depth representation ME approaches are avoided. Experimental results show that the proposed approach also outperforms such low bit-depth representation based motion estimation methods previously presented in the literature, in terms of motion estimation accuracy.

Index Terms — Motion estimation, gray-coding, bit truncation, hardware architecture, systolic arrays.

I. INTRODUCTION

Recent increases in computational processing capabilities of microprocessors as well as highly effective dedicated hardware implementations together with emerging video coding standards have increased video applications in consumer electronics devices. Moreover, widespread Internet access and increased capacity of storage media, have contributed to increases in video content distribution. However, mobile devices have typically limited processing capabilities and battery life, and therefore require low-complexity video compression techniques to enable efficient transmission of captured video over bandwidth limited channels.

Many consumer electronics devices, such as mobile phones and camcorders for example, typically use H.263 [1] or H.264/AVC [2] based techniques for video compression.

Motion estimation (ME) is basically the most complex and processing power intensive part of the encoder. Although various low-complexity ME approaches have been presented in the literature to reduce the computational complexity of ME, only several efficient hardware implementations of such approaches have been proposed.

One-bit transform (1BT) based ME has been proposed in [3] as a low-complexity ME approach for video compression. In 1BT, video frames are converted into a single bit-plane by comparing them with their multi-band pass filtered version. In 1BT based ME, block matching is performed using Exclusive-OR (EX-OR) matching of bit-planes. ME using EX-OR matching of 1BTs enables efficient hardware implementation compared to conventional SAD (sum of absolute difference) matching, decreasing the computational load by nearly sixteen times at the expense of some accuracy loss.

A multiplication-free one-bit transform (MF1BT) is proposed in [4] to decrease the computational load of the 1BT presented in [3] by making use of a novel multiplication-free kernel to obtain bit-planes. In [5] a two-bit transform (2BT) based ME approach is introduced to improve the performance of 1BT based methods using an additional bit-plane, but thereby increasing computational load compared to 1BT. A constrained one-bit transform (C-1BT) based ME approach is proposed in [6] to introduce a constraint mask so as to use only 1BTs of reliable pixels in the matching process. The C-1BT based ME approach is shown to provide improved matching compared to other 1BT and 2BT based ME approaches.

Low bit-depth matching based ME approaches can also be combined with additional complexity reduction methods for further reduction in ME complexity. In [7], a partial distortion search approach combined with a sparse search point approach is utilized with C-1BT based ME to further reduce the computational load for software implementation, at the expense of a slight accuracy loss. An early termination scheme is combined with MF1BT based ME in [8], to further reduce computational complexity.

Various hardware implementations of ME algorithms are proposed in the literature. It is stated in [9] that one of the most important aspects influencing power consumption in ME hardware architectures is system memory bandwidth, and a higher system memory bandwidth increases power consumption. Most hardware architectures proposed for the block matching algorithm (BMA) in ME are designed using parallel architectures based on systolic or semi-systolic arrays [10-14].

In [10] a hardware architecture utilizing SAD reuse to reduce the system memory bandwidth for variable block size
motion estimation is proposed. An efficient hardware architecture for variable block size ME is obtained in [10] by removing the data dependency between the sub-partitions of macroblocks and modifying the prediction flow accordingly. In [11], a detailed architectural analysis of variable block size ME for H.264/AVC is investigated. Instead of using a 1D adder tree as in [10], a 2D adder tree architecture that increases parallelism and a new search scheme that improves data reuse is utilized in [11]. In [12] a hardware-oriented fast ME algorithm is proposed with the intra-/inter-candidate data reuse considerations. In [13], an adaptive search range algorithm is proposed for the software side and a SAD-tree based architecture is introduced in the hardware side for software/hardware co-solution to achieve high throughput ME for H.264/AVC HDTV. The common purpose of these hardware implementations is to reduce complexity and/or power consumption.

One way to reduce processing complexity and/or power consumption is to reduce the data amount to be processed. Therefore, architectures that make use of low bit-depth based representations such as 1BT and 2BT can therefore provide efficient hardware implementations. Several hardware implementations of binary ME approaches are presented in [3, 14-18]. In [3], all binary ME using 1BT and the hardware architecture based on a 1D linear PE (processing element) array is presented. An all binary hierarchical ME approach and the corresponding hardware design are presented in [14]. In [15], a platform based implementation of the approach proposed in [14] is presented with bus interlaced architecture. A fast binary ME algorithm for MPEG4 shape encoding is presented in [16] together with its hardware architecture. In [17], hardware architectures for 1BT based ME methods are proposed together with an efficient data flow scheme where the power consumption is reduced about 50% compared to the hardware architecture proposed in [3]. Recently, an extension of 1BT based ME hardware architecture presented in [17] to sub-pixel level is proposed in [18].

The number of arithmetic operations carried out in the PE array and adder structures is another aspect influencing the complexity and power consumption of ME hardware architectures. In the hardware architecture presented in [13], a 2D PE array composed of 256 PEs is used with an SAD tree and variable block size (VBS) adder tree, requiring 6368 full adders (FAs) in total. On the other hand, the 1BT based ME architecture presented in [17] requires only a total of 199 FAs.

Instead of using all 8-bits of pixel values for ME, it is proposed in [19] to use bit truncation by utilizing only a certain number of the most significant bits (MSB), by truncating the lower bits, in order to reduce the computational load. Alternatively, an adaptive, pixel bit-depth reduction technique based ME approach and its VLSI architecture is presented in [9], however additional processing is required to obtain the reduced bit-depth representations in this case compared to simple bit truncation. Bit truncation is furthermore applied to variable block size motion estimation in [20].

Bit truncation based ME hardware architectures are proposed in literature to reduce the computational complexity of 8-bits/pixel based BMA at the expense of some loss in ME accuracy [19, 20, 22]. In [19], the VLSI implementation of bit-truncation based ME is accomplished using a well known parallel architecture proposed in [21]. A variable length bit truncation technique based ME approach and its VLSI architecture is proposed in [22]. However, the hardware complexity of the architecture presented in [22] is relatively high compared to low bit-depth based ME architectures because the PE architecture proposed in [22] is designed to process both low bit-depth as well as full bit-depth pixels.

In [23], Gray-coded pixel values are used to obtain global motion in image sequences for video stabilization. This approach is employed for block matching based ME in [24] to investigate its possible utilization in ME for video coding.

This paper proposes to employ bit truncation on Gray-coded pixel values for low-complexity ME and it is shown that this approach provides improved prediction performance compared to existing low bit-depth representation based ME approaches such as 1BT, 2BT, MF1BT, and C-1BT. Furthermore, the proposed approach significantly reduces the binarization process which is comparatively more complex in 1BT based ME methods due to the filtering process. It is also shown that the proposed method outperforms conventional bit truncation based ME approaches in terms of ME accuracy. The proposed approach enables low-complexity and power efficient ME hardware architecture implementation.

II. TRUNCATED GRAY-CODED BIT-PLANE MATCHING BASED MOTION ESTIMATION

Gray-coding based block motion estimation is presented in [24] to reduce computational load of the motion estimation process particularly in hardware implementations. In this paper, it is proposed to use bit-truncation with Gray-coding to further reduce ME complexity and at the same time facilitate efficient hardware design.

It is possible to represent a pixel value that is quantized to 2^k grey quantization levels, and located at location (x,y) of frame f at time t in the form of

\[ f(x,y) = a_0 + a_1 2^{k-1} + a_2 2^{k-2} + \ldots + a_{2^n} 2^0 \]  

where \( a_k \) coefficients represent the natural binary code and take only binary values. If the \( k \) th bit-plane of frame \( t \) is represented as \( b_k(x,y) \), it contains all \( a_k \) bits of level \( k \). If a bit-depth of 8-bits/pixel is used, \( K \) is 8, and \( b_k(x,y) \) is the least significant bit-plane, while \( b_k(x,y) \) is the most significant bit-plane.

The gray-coded version of a pixel value can be computed from its natural binary codes as:

\[ g_{k-1} = a_{k-1}, \]
\[ g_k = a_k \oplus a_{k-1}, \quad \text{for } 0 \leq k \leq K - 2 \]
where ⊕ shows the EX-OR operation. Since the gray codes of adjacent grey levels differ only in a single bit, it is more appropriate to use Gray-coded pixel values in EX-OR matching based ME.

In BM, the original block of the current frame is searched for, inside a search window in the reference frame (which is usually the previous frame), using a certain similarity measure. In Gray-coded bit-plane matching based ME the similarity between the current block of size $N \times N$ pixels located in frame $t$ and the reference block located in frame $t-1$ can be calculated using a correlation measure ($CM_{GC}$) which is defined as

$$CM_{GC}(m,n) = \sum_{i=0}^{N-1} \sum_{j=0}^{N-1} 2^{s-NTB} \left( g_i^t(i,j) \odot g_i^{t-1}(i+m, j+n) \right)$$

where $(m,n)$ and $s$ denote the candidate displacement and search range, respectively. The displacement resulting in the lowest $CM_{GC}$ value is assigned as the motion vector of the current block. Note that, a scaling factor of $2^s$ is utilized to include the weight of level $k$ when computing the similarity measure, so that higher order bit-planes have higher weight.

The proposed truncated Gray-coding based bit-plane matching approach does not use all $K$ bit-planes, but it makes only use of the highest $M$ bit-planes to compute the similarity measure. If the number of truncated bits is shown as $NTB$, then the highest $M=K-NTB$ bit-planes are used in the matching process. Thus, the new correlation metric $CM_{TGC}$ is defined as

$$CM_{TGC}(m,n) = \sum_{i=0}^{N-1} \sum_{j=0}^{N-1} 2^{k-NTB} \left( g_i^t(i,j) \odot g_i^{t-1}(i+m, j+n) \right)$$

where $(m,n)$ and $s$ denote the candidate displacement and search range, respectively. The displacement resulting in the lowest $CM_{TGC}$ value is assigned as the motion vector of the current block. Note that, a scaling factor of $2^s$ is utilized to include the weight of level $k$ when computing the similarity measure, so that higher order bit-planes have higher weight.

The overall hardware architecture proposed in this paper is shown in Fig. 1. The implemented architecture is capable of performing ME at macroblock level (16×16 pixels) for a search range of $[-16, 15]$ pixels. A RAM block, of size 48-bit wide and 16 rows deep, is used to store the current macroblock. The hardware architecture can process multiple pixels at each clock cycle [3].

The ME performance of the proposed method for different $NTB$ values is provided in the experimental results section. Experimental results show that $NTB=5$ gives the best performance in terms of motion estimation accuracy taking at the same time complexity into account; and only the most significant three Gray-coded bit-planes are utilized for ME in this case. The hardware design is therefore carried out for $NTB=5$.

III. HARDWARE DESIGN

Because of the binary nature of the proposed algorithm a 1D systolic array architecture is sufficient to provide real-time processing performance. The architectures proposed for 8-bits/pixel representation based ME algorithms mostly utilize 2D systolic arrays to process video in real time. A similar data throughput to 8 bit/pixel representation based 2D systolic array architecture is achievable by utilizing only a 1D systolic array for a binary ME method, since each PE in a binary ME hardware architecture can process multiple pixels at each clock cycle [3].

The ME at macroblock level with a search range of $[-16,15]$ pixels, requires a search window of size $47 \times 47$ pixels. Therefore, the theoretical minimum memory size required for the search window memory of the proposed ME architecture is actually $3$-bits/pixel$\times 47 \times 47 = 6,627$k bits. This leads to a total minimum on-chip memory amount of 7,395k bits, including the current block memory, in theory. However, in the implementation, the total memory used for the proposed architecture is actually $1504 \times 48 + (16 \times 48) = 72,96k$ bits. This is much higher than the theoretical minimum, and it is actually possible to reduce this size by designing additional data scheduling hardware, however, this is out of the scope of this paper. For comparative evaluation it is useful to note that the total on-chip memory amount is 24,32k bits in [17] for 1BT based BMA hardware where a search window memory

Fig. 1. Hardware architecture of proposed approach

Fig. 2. PE Array
bigger than the theoretical minimum is used because of the abovementioned reasons. On the other hand, the on chip memory used in [11] is 208k bits for ME hardware with 8 bits/pixel representations.

The PE array consists of 16 PEs as shown in Fig. 2. All PEs in the PE array are basically identical to each other, and the only difference is the width of the CM input/output of each PE. The width of the CM output of each PE is determined by the range of possible values the CM output can take. In other words, it is determined by the maximum possible value of the adder tree and the CM input, which are summed to obtain the CM output. The output of the adder tree is fixed and 7-bit wide. The CM output of the first PE (CM0) is therefore also 7-bit wide. The CM output of the next PE (CM1) on the other hand needs to be 8-bit wide as it is the sum of two 7-bit values, i.e., the sum of the CM value of the previous PE and the output of its own adder tree. Considering the maximum possible values at each stage, the final CM output (CM15) is obtained to be 11-bit wide. The PE architecture of the proposed hardware is shown in Fig. 3. In addition to CM inputs and outputs, there are three data inputs to each PE: one for the current macroblock (C) and two for the search window data (S1, S2). Three LUTs are used to obtain the EX-OR matching result, one for each gray-coded bit-plane, because three MSBs of gray-coded pixel values are used in the matching process. Note that the current macroblock data is not shifted through the PEs, but each PE uses the corresponding row of the current macroblock to compute the CM value of that row, so that actually a shared computation scheme is utilized. This is also seen in the data flow scheme of the proposed architecture shown in Table I. In Table I, square brackets show that the corresponding data is read from the latch instead of the current macroblock RAM. Thus, for each PE only a single memory read operation is needed for the current macroblock in the entire motion vector computation stage of a macroblock.

In Table I, Ci represents the 48-bit wide vector located in the i-th row of the current macroblock comprising the three MSBs of Gray-coded values of the 16 pixels in the i-th row. Si,j represents the 48-bit wide vector located in the i-th row between columns j and (j+47) in the search window. For example S0,0 denotes the three MSBs of the 16 pixels in the 0th row concatenated in the form of \{(S0,0-2),(S0,3-5),…,(S0,45-47)\}. Each PE in the PE array shown in Fig. 2 can process 16 pixels in one clock cycle so that 16×16 pixels can be processed in each cycle when all of the PEs in the array are utilized. As shown in Table I, 15 clock cycles are needed for the PE array to become fully functional.
The hardware needs 1024 clock cycles for the computation of the motion vector for one macroblock, in addition to this 15 clock cycle offset. More detailed information about this data flow scheme, typical to 1D systolic array architectures, is available in [25].

The total full adder amount needed for the PE array of the proposed architecture is 523, where $16 \times 23 = 368$ full adders are used in the adder tree and $153$ full adders are used in the CM accumulation stage. For comparative evaluation it is useful to note that the total full adder count of the MF1BT [4] based ME hardware architecture presented in [17] is 153, as only a single bit-plane is used in the matching process (but the ME accuracy is lower). On the other hand the hardware architecture presented in [11] requires 6368 full adders to compute the SAD of a macroblock. Therefore, the hardware complexity of the proposed approach is in between 1BT based architectures and 8 bits/pixel representation based architectures, but close to 1BT based architectures.

**IV. EXPERIMENTAL RESULTS**

In the experimental setup, initially, the motion estimation performance of the proposed truncated gray-coded bit-plane matching based ME approach (T-GCBPM) is evaluated using an open loop scheme, in which the current frame of the video sequence is reconstructed from the previous frame using motion vectors obtained by the ME approach. The similarity between the original and the estimated frames are computed in terms of Peak Signal to Noise Ratio (PSNR). Six different video sequences are utilized in the experiments to properly assess the performance of the proposed approach.

Average PSNR values for the test sequences are given in Table II. Here, T-BPM represents the conventional truncated bit-plane matching approach presented in [11]. In case of T-BPM and T-GCBPM, experimental results are provided for various $NTB$ cases.

Experimental results show that the proposed T-GCBPM based ME approach outperforms conventional T-BPM based ME in all $NTB$ cases. The increase in PSNR can be as high as 0.5dB. These results show that Gray-coding improves the performance of truncated bit-plane matching. Experimental results also show that the proposed T-GCBPM based ME approach with $NTB=5$ provides higher PSNR values compared to other low bit-depth ME approaches such as C-1BT or 2BT.

**TABLE II. AVERAGE PSNR VALUES (dB) FOR SEVERAL TEST SEQUENCES USING AN OPEN-LOOP SCHEME**

<table>
<thead>
<tr>
<th>Method</th>
<th>Football (352x240) (125 frames)</th>
<th>Flowergarden (352x240) (115 frames)</th>
<th>Mobile (352x240) (140 frames)</th>
<th>Tennis (352x240) (112 frames)</th>
<th>Coastguard (352x288) (300 frames)</th>
<th>Foreman (352x288) (300 frames)</th>
</tr>
</thead>
<tbody>
<tr>
<td>SAD, 8 bits/pixel</td>
<td>22.88</td>
<td>23.79</td>
<td>22.99</td>
<td>28.97</td>
<td>30.48</td>
<td>32.11</td>
</tr>
<tr>
<td>2BT [5]</td>
<td>22.08</td>
<td>23.43</td>
<td>22.72</td>
<td>28.89</td>
<td>29.93</td>
<td>30.71</td>
</tr>
<tr>
<td>C-1BT [6]</td>
<td>22.10</td>
<td>23.39</td>
<td>22.77</td>
<td>29.18</td>
<td>29.98</td>
<td>30.87</td>
</tr>
<tr>
<td>T-BPM [11] ($NTB=6$)</td>
<td>22.08</td>
<td>23.49</td>
<td>22.72</td>
<td>28.57</td>
<td>29.87</td>
<td>30.35</td>
</tr>
<tr>
<td>T-BPM [11] ($NTB=3$)</td>
<td>22.21</td>
<td>23.48</td>
<td>22.76</td>
<td>28.70</td>
<td>29.95</td>
<td>31.08</td>
</tr>
<tr>
<td>T-BPM [11] ($NTB=2$)</td>
<td>22.20</td>
<td>23.48</td>
<td>22.76</td>
<td>28.70</td>
<td>29.95</td>
<td>31.09</td>
</tr>
<tr>
<td>T-GCBPM ($NTB=6$)</td>
<td>23.39</td>
<td>23.61</td>
<td>22.84</td>
<td>28.98</td>
<td>29.14</td>
<td>30.64</td>
</tr>
<tr>
<td>T-GCBPM ($NTB=5$)</td>
<td>22.59</td>
<td>23.67</td>
<td>22.86</td>
<td>29.19</td>
<td>30.16</td>
<td>31.32</td>
</tr>
<tr>
<td>T-GCBPM ($NTB=4$)</td>
<td>22.58</td>
<td>23.66</td>
<td>22.87</td>
<td>29.26</td>
<td>30.27</td>
<td>31.57</td>
</tr>
<tr>
<td>T-GCBPM ($NTB=3$)</td>
<td>22.56</td>
<td>23.66</td>
<td>22.87</td>
<td>29.23</td>
<td>30.26</td>
<td>31.61</td>
</tr>
<tr>
<td>T-GCBPM ($NTB=2$)</td>
<td>22.56</td>
<td>23.66</td>
<td>22.87</td>
<td>29.23</td>
<td>30.25</td>
<td>31.61</td>
</tr>
</tbody>
</table>
This increase in PSNR can be accounted to the fact that three-bit-planes are used in the matching process for T-GCBPM based ME with $NTB=5$, while only two bit-planes are used in the matching process in case of 1BT and 2BT based ME. The proposed T-GCBPM also has a lower binarization complexity compared to 1BT and 2BT based approaches.

The proposed hardware is coded in Verilog hardware description language and verified for a clock frequency of 90 MHz using synthesis with the Synplicity Synplify Pro synthesis tool.

The synthesized design occupied 2339LUTs (8%) on a Xilinx XC2VP30 device. The power consumption of the proposed T-GCBPM based ME hardware architecture is obtained to be about 230 mW on average, while the power consumption for T-BPM based ME is obtained as about 245 mW on average, for the mobile sequence at a clock frequency of 66MHz. Therefore, the proposed T-GCBPM based ME architecture results in a small reduction in power consumption compared to T-BPM based ME, in addition to improved ME accuracy. Note that the XPower and ISE Simulator (ISIM) tools from Xilinx are used for the power consumption analysis. The proposed hardware architecture is also synthesized for a 0.18um process to compare the performance with available 8bits/pixel based ME hardware architectures. According to the synthesis results, the gate count for the 1-bit/pixel based hardware architecture proposed in [17] is 8k, the number of total gates for the hardware architecture proposed in this work is 23k gates. In [11] where the least significant 3 bits of pixels are truncated to reduce ME hardware complexity, the total number of gates for fixed block size ME is 88k. In [26], the total gate count utilized for integer motion estimation is 146k gates, which is roughly 6 times the size of the proposed hardware architecture. Compared to 1-bit/pixel based architectures, the proposed hardware architecture requires about three times more gates, which is directly the result of using 3 bits/pixel instead of 1 bit/pixel representations, but the proposed hardware architecture has two important advantages: first of all the ME accuracy of the proposed approach is much better (the PSNR of reconstructed frames is up by nearly 1 dB) and secondly the proposed approach only requires Gray coding of pixel values which is a very low-complexity process and can directly be applied on a pixel-by-pixel basis, whereas 1BT and 2BT based approaches require a comparatively more complex transform process which introduces substantial additional hardware complexity that is not accounted for in the presented results. Compared to recently proposed 8bits/pixel based ME architectures utilizing SAD based matching criterion, the hardware complexity of the proposed approach is dramatically lower, making the proposed approach particularly favorable for consumer electronics applications that require low complexity and low power consumption.

V. CONCLUSIONS

A novel gray coded bit plane matching based ME approach with bit truncation is proposed in this paper. The proposed truncated gray-coded bit-plane matching based ME approach is shown to provide improved motion estimation accuracy compared to conventional bit-plane matching based ME. Furthermore, the presented approach also outperforms low-bit-depth representation based ME methods, such as 1BT and 2BT based ME, in terms of ME accuracy; and also has a lower binarization complexity. The binarization complexity is much lower because the proposed approach uses simple Gray-coding that has very low-complexity and can be applied on a pixel-by-pixel basis, whereas 1BT and 2BT based approaches require much more complex transformation processes that actually add to the hardware complexity. An efficient hardware architecture of the proposed method is designed and verified in this paper. The proposed approach is particularly suitable for consumer electronics equipment with low processing resources and limited power capabilities.

REFERENCES

Anı Çelebi (S’00) was born in Ordu, Turkey. He received the B.Sc., M.Sc. and Ph.D. degrees in electronics and telecommunications engineering from Kocaeli University, Kocaeli, Turkey, in 2002, 2005, and 2008, respectively. Since 2002 he has been with the Department of Electronics and Telecommunications Engineering, University of Kocaeli, Turkey, where he is currently Assistant Professor. His research interests include very large scale integration (VLSI) design and implementation for analog/mixed signal systems, image processing systems, and video coding systems.

Orhan Akbulut was born in Kütahya, Turkey. He received the B.Sc. and M.Sc. degrees in electronics and telecommunication engineering from Kocaeli University in 2005 and 2007 respectively. He is currently working towards the Ph.D. degree at the Graduate School of Natural and Applied Sciences, Kocaeli University. His major research interests are image and video coding systems.

Oğuzhan Urhan (S’02-M’06) received his B.Sc., M.Sc., and Ph.D. degrees in electronics and telecommunication engineering from the University of Kocaeli, Kocaeli, Turkey, in 2001, 2003, and 2006, respectively. Since 2001 he has been with the Department of Electronics and Telecommunications Engineering, University of Kocaeli, Turkey, where he is currently Associate Professor. He was a Visiting Professor at Chung-Ang University, Korea, from 2006 to 2007. His research interests include digital signal and image processing, in particular, image and video restoration and coding.

Sarp Ertürk (M’99) received his B.Sc. in Electrical and Electronics Engineering from Middle East Technical University, Ankara in 1995. He received his M.Sc. in Telecommunication and Information Systems and Ph.D. in Electronic Systems Engineering in 1996 and 1999 respectively from the University of Essex, U.K. From 1999 to 2001 he carried out his compulsory service at the Army Academy, Ankara. He is currently appointed as Full Professor at Kocaeli University, where he worked as Assistant Professor between 2001 and 2002, and Associate Professor between 2002 and 2007. His research interests are in the area of digital signal and image processing, video coding, remote sensing and digital communications.