Near-Capacity H.264 Multimedia Communications Using Iterative Joint Source-Channel Decoding

Nasruminallah and L. Hanzo

Abstract—In this tutorial, a unified treatment of the topic of near capacity multimedia communication systems is offered, where we focus our attention not only on source and channel coding but also on their iterative decoding and transmission schemes. There is a paucity of up-to-date surveys and review articles on the unified treatment of the topic of near capacity multimedia communication systems using iterative detection aided joint source-channel decoding employing sophisticated transmission techniques - even though there is a plethora of papers on both iterative detection and video telephony. Hence this paper aims to fill the related gap in the literature.

Index Terms—Multimedia communications, H.264 video transmission, joint source-coding and channel coding, iterative detection, near-capacity wireless communications, EXIT charts, irregular channel codes, video standards.

I. ADVANCES IN MULTIMEDIA CODING

Robust transmission of multimedia source coded streams over diverse wireless communication networks constitutes a challenging research topic [1, 2]. Recent advances in the world of telecommunication and multimedia systems resulted in the design of improved transmission techniques. However, bearing in mind the volume of information produced by high definition multimedia communication systems and the limited availability of unoccupied bandwidth at carrier frequencies, where beneficial propagation conditions prevail, the design of efficient multimedia systems at low bit-rate requires careful attention. Hence the design of improved coding techniques is important for the successful implementation of various multimedia communication systems, in order to reduce the amount of information required for flawless interactive multimedia communications [1]. An overview of advances in the field of video coding is presented in Table I.

The rest of the paper is organised as follows. A preliminary introduction about H.264 video coding is provided in Section I-A followed by the details about our input source codec parameters in Section I-B. An overview of the iterative detection is provided in Section II. Section III portrays our employed video transmission scheme using Short Block Code (SBC) based iterative source-channel decoding. SBCs are used for achieving guaranteed convergence in soft-bit assisted iterative Joint Source-Channel Decoding (JSCD), which facilitates improved iterative Unequal Source-Symbol Probability Aided (USSPA) operations. The schematic of the proposed SBC based iterative source-channel decoding arrangement is presented in Section III-A. Section II-A provides the details about the iterative source channel decoding aided receivers, followed by the introduction about EXIT charts in Section II-B. The iterative convergence analysis using SBCs is provided in Section III-B, followed by the proposed SBCs in Section III-C. The performance of the proposed system is characterised with the aid of EXIT chart analysis in Section III-D and Section III-E.

Furthermore, the performance improvements of the proposed SBCs using RSM is described in Section IV along with its EXIT chart analysis and the overall performance results in Section IV-B. The use of RSM is to improve the convergence behaviour of the SBC coding upon incorporating additional redundancy and an improved minimum Hamming distance \( d_{H,\text{min}} \). Additionally, SBC assisted UEP video using RSC codes and Sphere Packing (SP) modulated Differential Space Time Spreading (DSTS) along with the concepts of SP modulation and its performance results is presented in Section V. SP modulation is a specific scheme, which maintains the highest possible Euclidian distance of the modulated symbols, as detailed in [3]. DSTS is a low-complexity technique that does not require channel estimation, because it relies on non-coherent detection. This low-complexity detection is particularly important in the context of Multiple-Input Multiple-Output (MIMO) systems using \( N_T \) transmit and \( N_R \) receive antennas, which would require the estimation of \((N_T\times N_R)\) MIMO channels, hence substantially increasing both the cost and complexity of the receiver. Furthermore, the pilot overhead required by the MIMO channel estimator may also be excessive. A serially concatenated three-stage scheme for near-capacity operation in terms of iteratively detected H.264 wireless video telephony is described in Section VI along with its EXIT chart analysis and system performance results. In contrast to the two-stage system constituted by a single iterative loop, the three-stage system employs two iterative loops, which exchange extrinsic information both between the inner and the intermediate decoder, as well as between the outer decoder and the intermediate decoder. The conventional two-stage turbo-detection schemes generally suffer from a Bit Error Rate (BER) floor, therefore the advantage of the three-stage design is to circumvent this deficiency by proposing an extra iteration in the turbo detection process. Finally, the paper is concluded with generic design guidelines and conclusions in Section VII and Section VIII, respectively.

A. The H.264 Video Coding Standard

Wireless systems are typically constrained owing to the availability of limited bandwidth and battery power. Therefore,
the drive for increased compression efficiency has to be carefully balanced against the increased power consumption of high-complexity signal processing in video coding standards. Furthermore, the integration of the channel coded video system into different types of communication networks, while maintaining an enhanced error-resilience are equally important design aspects of wireless video and multimedia applications [4, 5]. In this context the combined efforts of the ITU-T Video Coding Experts Group (VCEG) and ISO/IEC Motion Picture Experts Group (MPEG) resulted in the H.264/AVC video coding standard [6]. The main aim of this standardisation activity was to design an efficient and network-friendly video codec capable of supporting a variety of applications, including both real-time interactive applications such as video
conferencing, video telephony and non-real-time applications, such as video streaming and digital television broadcast. Considerable research efforts have been dedicated to the design of H.264 video codec [7–10]. H.264/AVC provides the best performance in terms of its rate-distortion efficiency amongst the existing standards [11, 12]. The H.264/AVC coding standard may be considered as an attractive candidate for all wireless applications, including Multimedia Messaging Services (MMS), Packet-Switched Streaming Services (PSS) and real-time conversational applications [8].

1) H.264/AVC Video Coding Profiles and Levels: The H.264/AVC video codec was developed to address a broad range of applications at different bit rates, video resolutions, video qualities and services characteristics. However, different applications impose different requirements in terms of video quality, error resilience, compression efficiency, delay, and complexity. Therefore, in order to increase the codec’s interoperability, while limiting its complexity, the H.264/AVC standard defines various ‘Profiles’ and ‘Levels’. A ‘profile’ is defined as a subset of standard coding tools. For this reason, side-information parameters and flags are included in the bit-stream, which specify the presence or absence of the corresponding tools in the stream. All decoders that are compliant with certain profiles must support all the tools within that profile. However, there is still a high degree of freedom within the boundaries imposed by the syntax of a specific profile. For example, these variations are dependent on the values assumed by the different parameters, such as the decoded picture size, frame rate etc. For many applications it is neither economical nor practical to implement a decoder, which is capable of processing all possible syntax parameters within a given profile. For this reason, a second profile descriptor known as ‘Level’ is created for each profile, which specifies a set of constraints imposed on the syntax parameters within each profile. These constraints may either be syntax parameter values or they may be a combination of values, such as the picture width and height expressed in terms of the number of pixels or the frame rate. In H.264/AVC all profiles employ the same level definitions. Furthermore, if the application considered is capable of supporting more than one profiles, then we have the option of supporting either the same or different levels for each profile [8]. In H.264/AVC three profiles are defined, which may be invoked for supporting a diverse range of applications. A stylized representation of the capabilities of these profiles is provided in Figure 1, which are detailed below:

a) Baseline Profile: The simplest profile of the three is capable of supporting all H.264/AVC tools, except for handling B-slices, interlaced coding, weighted prediction, adaptive switching between frame/field coding, Context Adaptive Binary Arithmetic Coding (CABAC), Switching P (SP) / Switching I (SI) slices and data partitioning[13]. This profile typically targets applications with low complexity and low delay requirements [6].

b) Main Profile: The H.264/AVC coding tools that are not supported in this profile include Flexible Macroblock Ordering (FMO), Arbitrary Slice Ordering (ASO), and the transmission of redundant pictures. By contrast, the list of supported tools contains all the above-mentioned Baseline Profile tools, along with handling B-slices, weighted prediction, interlaced coding, adaptive switching between frame/field coding and CABAC, again, except for FMO, ASO and redundant pictures. Due to the inclusion of complex tools, such as the support of B-slices and CABAC, this profile provides the best quality at the cost of an increased complexity [13] in comparison to the baseline profile. This profile typically allows the best quality at the cost of higher complexity (essentially due to the B-slices and CABAC) and delay [6].

c) Extended Profile: This profile contains all the coding tools of H.264, except for CABAC. The SP/SP slices and slice data partitioning are only included in this particular profile, but not in the previously mentioned more simple profiles. It is generally difficult to establish a strong relation between the profiles and their specific applications, but its possible to say that conversational services associated with low delay requirements will typically use the Baseline profile, while streaming services using wireless or wired transmission medium may employ the Baseline or Extended profiles [13]. There are 15 levels defined for each profile in H.264/AVC. Each level defines an upper bound for the encoded bit-stream or a lower bound for the decoder’s capabilities. These different parameter specifications may include the picture size, ranging from QCIF to (4K x 2K)-pixel high-definition video, the decoder’s processing rate of say 1485 to 983,040 macroblocks per second, the affordable memory size for employment in multi-picture references, the video bit rate ranging from 64 Kbps to 240 Mbps and the motion vector range of say $-64$ to $+64$ or $-512$ to $+512$ [14].

d) FRext Amendment for High-Quality Profiles: profiles, the H.264/AVC FRext amendment specifies three additional nested sets of profiles relying on the main profile, namely the so-called High, High 10 and High 4:2:2 profiles of Figure 1. The High profile contains coding tools for the further improvement of the coding efficiency relative to the main profile and results in a moderate increase of the compression

<table>
<thead>
<tr>
<th>Scope of the H.264/AVC profiles [6].</th>
</tr>
</thead>
</table>

<table>
<thead>
<tr>
<th>Profile</th>
<th>Tools</th>
</tr>
</thead>
<tbody>
<tr>
<td>Baseline</td>
<td>I and P slices, Field coding, MB–AFF, Intra prediction, Weighted prediction, Multi–Reference frames, Adaptive Slice Ordering, Redundant Pictures, FMO</td>
</tr>
<tr>
<td>Main</td>
<td>B slices, Field coding, MB–AFF, In-loop deblocking, Intra prediction, Multi–Reference frames, Adaptive Slice Ordering, Redundant Pictures, FMO</td>
</tr>
<tr>
<td>Extended</td>
<td>B slices, Field coding, MB–AFF, In-loop deblocking, Intra prediction, Multi–Reference frames, Adaptive Slice Ordering, Redundant Pictures, FMO</td>
</tr>
</tbody>
</table>

Fig. 1. Scope of the H.264/AVC profiles [6].
ratio at a modest implementation and computational cost. The High 10 profile further extends the capabilities of the standard. A high pixel-resolution ranging up to 10 bits/pel is supported by this profile. Similarly, High 4:2:2 is used to extend the video format to 4:2:2, which is associated with a high chroma resolution. These profiles extend the capabilities of the standard in order to provide enhanced-quality applications, such as High Definition (HD) consumer applications, including High Definition TV (HDTV) and computer monitors having a high quality [13].

B. Input Video Source

For the sake of constructing a practical system and achieving realistic performance improvements, instead of using a generic mathematical model for the source codec parameter set [15–17], we employed the H.264/AVC video codec as our source encoder. The 45-frame "Akiyo" video sequence [1] in (176x144)-pixel Quarter Common Intermediate Format (QCIF) was used as our test sequence and was encoded using the H.264/AVC JM 13.2 reference video codec at 15 frames-per-second (fps) at the target bit-rate of 64 Kbps. Each QCIF frame was partitioned into 9 slices and each slice was composed of 11 MBs, as shown in Figure 2. The resultant video encoded clip consisted of an intra-coded 'I' frame followed by 44 predicted or 'P' frames, corresponding to a lag of 3 seconds between the 'I' frames at a frame-rate of 15 fps. The periodic insertion of 'I' frames curtailed error propagation beyond 45 frames.

Additional source codec parameters were set as follows,

- Quarter-pixel motion estimation resolution was used;
- Intra frame MB update was used;
- All macroblock types were enabled;
- No multiframe prediction was used;
- No B-slices were employed;
- Universal Variable Length Coding (UVLC) type entropy coding was used;
- Error concealment was performed using the motion vector recovery algorithm of [8].

To control the effects of error propagation, we incorporated error resilience features, such as DP and intra-frame coded MB updates of three randomly distributed MBs per frame. The insertion of 'B' pictures was avoided, because it results in an unacceptable loss of lip-synchronisation as a result of the corresponding delay incurred due to the bi-directionally predicted video coding operations [8]. Additionally, only the immediately preceding frame was used for motion search, which results in a reduced computational complexity compared to using multiple reference frames. These video coding parameters were chosen, bearing in mind that the error-resilience of the DP aided H.264/AVC stream is directly related to the number of 'P' frames inserted between two consecutive 'I' frames.

The remaining error resilient encoding techniques, such as the employment of multiple reference frames for interframe motion compensation and Flexible Macro-block Ordering (FMO) [18] were turned off, because they typically result in modest video performance improvements in low-motion head-and-shoulders video sequences, such as the "Akiyo" clip, despite their substantially increased complexity. These encoder settings result in a reduced encoder complexity and in a realistic real-time implementation.

II. Iterative Detection

Iterative detection aided schemes [19] consist of a combination of two or more constituent encoders and interleavers. The principle of iterative detection using concatenated codes was first introduced in [20]. However, due to the lack of sophisticated hardware at that time, it failed to stimulate further research owing to its excessive computational complexity. After the discovery of turbo codes [21], which employed low-complexity component codes, the implementation of low-complexity iterative decoding aided concatenated codes became a practical reality. The innovative iterative decoding of concatenated codes inspired researchers to extend this concept to numerous communication schemes in order to achieve high-integrity transmission of information [22–38]. In [30] the Soft-Input Soft-Output (SISO) A-Posteriori Probability (APP) module corresponding to the input and output bits of the encoder was described, which exploited the benefits of iterative decoding. Similarly, iterative extrinsic information exchange was performed between the detector and channel decoder in [29], in order to mitigate the effects of intersymbol interference in digital transmission. Furthermore, in [31, 32] the authors presented the theory behind bit-interleaved coded modulation, complemented by its design guidelines and performance evaluation. As a further advance, the principle of iterative demapping was proposed in [32] for communication systems applying multi-level modulation combined with low-complexity channel coding, in order to reduce the BER. On the other hand, iterative multi-user detection and channel decoding designed for CDMA schemes was proposed in [37]. Moreover, a turbo coding scheme constituted by a serially concatenated block code and an orthogonal Space-Time Block Codes (STBC) was designed for Rayleigh fading channels in [38]. In [24] a serially concatenated system composed of a cascaded outer encoder, an interleaver fed with the outer coded bits and an inner encoder was presented. It was demonstrated [24] that in order to maximise the interleaver gain and to avoid having a BER floor in case of iterative decoding, the employment of a recursive inner code was found
to be important. This principle was proposed in [24], which has been adopted by several authors in [39–43], in order to design serially concatenated schemes incorporating the unit-rate recursive precursor as an inner code, when designing low-complexity iterative detection aided schemes suitable for power-limited systems having demanding BER requirements.

Considerable research efforts have been dedicated to the design of semi-analytical tools [39,44–51] conceived for analysing the convergence behaviour of iteratively decoded systems. The employment of so-called EXIT charts was proposed by ten Brink in [45], in order to analyse the flow of extrinsic information between the SISO constituent decoders. Furthermore, in [47] the computation of EXIT charts was further simplified by exploiting that the PDFs of the information communicated between the input and output of the constituent decoders are symmetric. A tutorial introduction to the powerful technique of EXIT charts, along with simple examples, typical applications and preliminary analytical results was presented in [48]. Finally, the concept of iterative detection using three stage concatenated systems and their convergence analysis using EXIT charts was provided in [51,53,54]. The major contributions to the field of iterative detection and their EXIT chart analysis are summarised in Table II.

A. Iterative Source-Channel Decoding Aided Receivers

The Turbo principle of exchanging extrinsic information has also been successfully applied to various receiver components, resulting in Iterative Source and Channel Decoding (ISCD). The joint optimisation of different functions such as Joint Source and Channel Decoding (JSCD) gained considerable attention in the recent decade. Therefore, ISCD employs a decoding algorithm to exploit both the explicit artificial redundancy imposed by channel encoding and the implicit natural redundancy observed in terms of the non-uniform distribution or correlation of the source encoded parameters. ISCD consists of a SISO channel decoder and an Unequal Source-Symbol Probability Aided (USSPA) decoder. The USSPA is a soft-decision aided, version of source decoding, which exploits the source natural residual redundancy to estimate the source coding parameters for the sake of improving the convergence of ISCD [64–67]. The artificial channel coding redundancy and the natural residual source redundancy are expressed as $L_E(\text{inner})$ and $L_E(\text{outer})$ respectively as shown in Figure 3. The details about the extrinsic LLR-values generation by the channel decoder of Figure 3 during the $n$-th iteration can be found in [68]. Similarly, joint source and channel decoding exchanging extrinsic information is capable of exploiting the natural residual source redundancy that remains in the source-coded stream during the iterative

\footnote{A recursive inner code has an infinite impulse response and hence efficiently spreads the extrinsic information between decoder components without increasing its delay.}

\footnote{In [52] the concept of extrinsic information was introduced to identify the reliability value generated by exploiting the redundant information, introduced by the constituent code.}

\footnote{The reliability of the received symbol $r$ for the transmitted symbol $y$ can be expressed in terms of the transition probabilities $p(y|z)$ or in form of the corresponding Log-Likelihood Ratios (LLR) also often referred to as $L$-values.}

\begin{table}[h]
\centering
\begin{tabular}{|c|c|c|}
\hline
\textbf{YEAR} & \textbf{AUTHORS} & \textbf{CONTRIBUTION} \\
\hline
1983 & Forney et al. [20] & introduced concatenated codes. \\
1994 & Divsalar et al. [55] & introduced the Maximum A-Posteriori (MAP) algorithm. \\
1995 & Berrou et al. [21] & invented turbo codes and showed that iterative decoding constitutes an efficient way of improving the attainable performance. \\
1995 & Robertson et al. [56] & proposed the log-MAP algorithm that results in a performance similar to that of the MAP algorithm at a significantly lower complexity. \\
1996 & Divsalar et al. [22] & extended the turbo principle to multiple parallel concatenated codes. \\
1997 & Alamri et al. [23] & proposed iterative detection scheme, where iterations were carried out between the outer convolutional code and inner TCM decoders. \\
1999 & Benedito et al. [51] & studied the design of multiple serially concatenated codes combined with interleavers. \\
2000 & Alamri et al. [37] & employed unit-rate inner codes for designing low-complexity iterative detection schemes suitable for bandwidth- and power-limited systems having stringent BER requirements. \\
2001 & ten Brink et al. [44] & extended the employment of EXIT charts for analysing the convergence behaviour of iteratively detected systems. \\
2002 & Ten Brink et al. [53] & extended the EXIT chart analysis to three-stage serially concatenated systems. \\
2003 & Ten Brink et al. [54] & introduced a soft demapper between the multilevel demodulator and the channel decoder in an iteratively detected coded system. \\
2004 & Ten Brink et al. [55] & combined the BICM and an iterative detection scheme. \\
2005 & Ten Brink et al. [56] & presented a novel approach to quantify the minimum amount of residual redundancy required for successful ISCD.
\end{tabular}
\caption{Iterative Detection Contributions}
\end{table}
decoding process. The reason for having residual source redundancy is the practical delay and complexity limitations of the source encoding process, which limits its ability to perfectly remove any redundancy. The detailed procedure used for determining the extrinsic information $I_F(outer)$ of the outer source decoder was provided in [69–72]. Again, the natural redundancy manifests itself in terms of having an unequal probability of occurrence for the particular bit-pattern combinations. Having independent extrinsic LLR-values is essential for the success of iterative decoding in order for the components to assist each other in an iterative fashion. In ISCD the de-interleaved extrinsic output of the channel decoder serves as additional soft-input for the source decoder. Similarly, the interleaved extrinsic output of the soft-bit source decoder serves as additional input to the inner channel decoder. This way the iterative refinement of the extrinsic LLR-values results in step-wise reliability improvements. Let us now elaborate on the $L_E(\text{inner})$ and $L_E(\text{outer})$ components of Figure 3 in a little more detail.

In Figure 3, the initial information-values $L_E(\text{outer}, 0)$ are set to zero at the commencement of the iterative decoding process, since no previous extrinsic information is available. This corresponds to having logical 0 and 1 values with a probability of 0.5. After the first iteration the updated information-values become available from the outer decoder of Figure 3, which are forwarded to the channel decoder as a priori information, in order to assist the channel decoder in achieving an iterative performance improvement. Therefore the iterative calculation of $L_E(\text{inner})$ and $L_E(\text{outer})$ results in successive improvements of the reliability information concerning the data bits $x$. During the iterative decoding process of Figure 3, further performance improvements are possible, as long as the extrinsic information terms remain mutually independent. In order to ensure this independence, an interleaver is inserted in Figure 3 between the two constituent decoders for providing a certain level of independence after a specific number of iterations. Nonetheless, due to the iterative interaction between the component decoders, the extrinsic information terms become dependent on each other after a certain number of iterations, unless an ‘infinite’ interleaver length was used, which is impractical.

**B. Extrinsic Information Transfer Charts**

The extrinsic information exchange between the two constituent decoders is visualised by plotting the EXIT characteristics of both constituent decoders in a joint diagram, known as EXIT chart [46, 73]. The outer channel decoder’s extrinsic output $I_F(outer)$ scaled on the x-axis becomes the inner decoder’s a priori input $I_A(\text{inner})$. Similarly, the inner decoder’s extrinsic output $I_F(\text{inner})$ represented on the y-axis, becomes the outer channel decoder’s a priori input $I_A(\text{outer})$, as shown in Figures 6, 7, 8 and 9. Furthermore, the axes of the outer decoders are swapped in the joint diagram for the sake of consistency with the EXIT chart concept [73].

In order to achieve an infinitesimally low BER, the extrinsic transfer characteristic curves of the inner and outer decoders should only intersect at the $(1,1)$ point of the EXIT chart at the $E_b/N_0$ value considered. If this condition is satisfied, then a so-called open convergence tunnel [46, 73] appears between the inner and outer EXIT curves. The narrower the tunnel, the more iterations are required to reach the $(1,1)$ point, which results in an increased iterative decoding complexity but the system operates closer to the attainable capacity. However, if instead of convergence to the $(1,1)$ point, the two curves intersect at a point close to $(1,1)$, then a reasonably low BER could still be achieved. The resultant EXIT tunnel may be referred to as a semi-convergent tunnel.

In order to verify the EXIT chart based convergence prediction, the actual Monte-carlo simulation-based decoding trajectory recorded by acquiring the mutual information at the input and output of both the inner and outer constituent decoders is used, as presented in Figures 6, 7, 8 and 9.

Below we provide a brief summary of the EXIT-chart properties [48]:

- The area under the outer code’s EXIT curve is given by the code-rate;
- The area in the open EXIT-tunnel is proportional to the $E_b/N_0$ distance from the system’s capacity. Hence having an infinitesimally small tunnel-area corresponded to near-capacity operation provided that the $(1,1)$ point of the EXIT-chart is reached without having an intercept point between the two EXIT curves;
- The area under the EXIT curve of an inner decoder component is approximately equal to the attainable channel...
capacity, provided that the channels input is uniformly distributed.

- If there is an intercept point, a residual BER floor is expected.

III. VIDEO TRANSMISSION USING SHORT BLOCK CODE BASED ITERATIVE SOURCE-CHANNEL DECODING

Since the early days of wireless video communications [74–76] substantial further advances have been made both in the field of proprietary and standard-based solutions [77, 78]. Furthermore, the discovery of turbo codes [52, 79] made it practical to achieve transmission close to the Shannon limit at a moderate computational complexity and delay.

In this section, we analyse the iterative performance gain achieved using a technique, which we referred to as Short Block Coding (SBC) based iterative source-channel decoding. We will demonstrate the improved error correction capability of the SBC based ISCD using simulation examples, as follows.

A. Design Example: System Model

The schematic of our proposed videophone arrangement used as our design example for quantifying the performance of various ISCD schemes is shown in Figure 4. The compressed video source bit-stream \( x_k \) detailed in Section I-B generated using the H.264 video codec of Section I-A is mapped or encoded into the bit-string \( x'_i \). Subsequently, the output bit-string is interleaved using the bit-interleaver \( \Pi \), which is then encoded by a RSC code having a specific code rate given in Table VI(a).

Interleaving and de-interleaving constitute an important step in the iterative decoder of Figure 4. The task of the interleaver is to ensure that the bits are input in their expected original order to the component decoders and ascertaining that the statistical independence of the extrinsic LLRs is retained.

Since the degree of statistical independence guaranteed by an interleaver is always related to its length [80], instead of performing the ISCD operation on the various frame slices independently, we concatenated all the bits generated by each type of partition for the different Macro-Blocks (MBs) within each slice of a given frame, which results in a longer interleaver without extending the video delay and hence improves the achievable performance of iterative decoding. The resultant bit-stream is Quadrature Phase Shift Keying (QPSK) modulated and transmitted over a temporally correlated narrowband Rayleigh fading channel, associated with the normalised Doppler frequency of \( f_d = f_D T_s = 0.01 \), where \( f_D \) is the Doppler frequency and \( T_s \) is the symbol duration. At the receiver the signal is QPSK demodulated and the resultant soft-information is passed to the RSC decoder. The extracted extrinsic information is then exchanged between the USSPA and RSC decoders of Figure 4 [64]. Following QPSK demodulation at the receiver, the soft information is extracted in the form of its LLR representation \( L_M(\hat{y}_i) \). This soft-information \( L_M(\hat{y}_i) \) is forwarded to the RSC inner decoder, which processes it along with the a-priori information \( L_{RSC}^{apr}(\hat{x}_i) \) fed back from the outer decoder of Figure 4 in order to generate the extrinsic LLR values \( L_{RSC}^{extr}(\hat{x}_i) \), which are subsequently deinterleaved by the soft-bit interleaver of Figure 4, yielding the soft-bits \( L_{SBC}^{apr}(\hat{x}_i') \) that are input to the outer decoder to compute the extrinsic LLR value \( L_{SBC}^{extr}(\hat{x}_i') \), which in turn results in \( L_{RSC}^{extr}(\hat{x}_i) \) after interleaving. During iterative decoding the outer decoder exploits the input LLR values for the sake of providing improved a-priori information for the inner channel decoder of Figure 4, which in turn exploits the a-priori information fed back to it in the subsequent iteration for the sake of providing improved extrinsic LLR values for the outer decoder. Further details about iterative decoding are provided in [19, 24].

B. Iterative Convergence Analysis Using Short Block Codes

The purpose of ISCD is to utilise the constituent inner and outer decoders in order to assist each other in an iterative fashion to glean the highest possible extrinsic information \([L_{USSPA}^{extr}(x) \text{ and } L_{RSC}^{extr}(x)]\) from each other. In fact, the achievable performance of USSPA is limited by the fact that its achievable iteration gain is actually dependent on the residual redundancy or correlation that remains in the coded bit pattern \( x \) after limited-complexity, limited-delay, lossy source encoding [1]. However, despite using limited-complexity, limited-delay, lossy compression, which fails to remove all redundancy from the source-signal, the achievable performance improvements of USSPA may remain limited due to the limited residual redundancy in the video-encoded bit-stream, because the high-compression H.264/AVC video codec succeeds in removing most of the predictable information from the source. It may be observed from the simulation results of [81] that using USSPA for the H.264/AVC coded bit-stream resulted in negligible system performance improvements beyond two decoding iterations. Hence, in order to improve the achievable ISCD performance gain, here we advocate a technique which we refer to as SBC coding. The novel philosophy of our SBC design is based on exploiting a specific property of EXIT charts [45]. As mentioned above, an iterative decoding aided receiver is capable of near-capacity operation at an infinitesimally low decoded BER, if there is an open tunnel between the EXIT curves of the inner and outer decoder components. We will demonstrate that this condition is clearly satisfied, when these two EXIT curves have a point of intersection at the \((I_A, I_B) = (1,1)\) corner of the EXIT chart. The sufficient and necessary condition for this iterative detection convergence criterion to be met in the presence of perfect a-priori information was shown by Kliwer et al. [50] to be that the legitimate codewords have a minimum Hamming distance of \(d_{H_{min}} = 2\). Then the ISCD scheme becomes capable of achieving the highest possible source entropy denoted as \(H(X) = L_{USSPA}^{extr} = 1 - \text{bit}\), provided that the input a-priori information of the USSPA is perfect, i.e. we have \(H(X) = L_{USSPA}^{apr} = 1 - \text{bit}\). This motivates the design of the proposed SBC schemes, because it is plausible that using our design procedure all legitimate SBC codewords having a specific ‘mapping-rate’ defined as the reciprocal of the classic code-rate results in a code-table satisfying the condition of \(d_{H_{min}} \geq 2\). Using appropriately designed SBCs it may be guaranteed that the EXIT curve of the combined source codec and SBC block becomes capable...
of reaching the \((I_A, I_E) = (1, 1)\)-point of perfect convergence, regardless of the EXIT-curve shape of the stand-alone source encoder.

Having outlined the theoretical justification for achieving perfect convergence to an infinitesimally low BER, let us now introduce the proposed SBC\([K, N]\) encoding algorithms, which maps or encodes each \(K\)-bit symbol of the source set \(X\) to the \(N\)-bit code words of the SBC set \(f(X)\), while maintaining a minimum Hamming distance of \(d_{H,\min} \geq 2\). According to our SBC\([K, N]\) encoding procedure, the video-stream \(x_k\) is partitioned into \(M = 2^K\)-ary, or \(K\)-bit source symbols, each of which has a different probability of occurrence and will be alternatively termed as the information word to be encoded into \(N = (K + P)\)-bits, where \(P\) represents the number of redundant bits per \(K\)-bit source symbol.

**Algorithm-I:**

For \(P = 1\), the redundant bit \(r_\tau\) is generated for the \(\tau\)-th \(M\)-ary source symbol by calculating the exclusive OR (XOR) function of its \(K\) constituent bits, as follows:

\[
r_\tau = [b^\prime(1) \oplus b^\prime(2) \ldots \oplus b^\prime(K)],
\]

where \(\oplus\) represents the XOR operation.

The resultant redundant bit can be incorporated in any of the \([K + 1]\) different bit positions, in order to create \([K + 1]\) different legitimate SBC-encoded words, as presented in Table IV, each having a minimum Hamming distance of \(d_{H,\min} = 2\) from all the others. The encoded symbols of the rate-\(\frac{1}{3}\), \(\frac{2}{3}\), \(\frac{3}{4}\) and \(\frac{4}{5}\) SBCs along with their corresponding minimum Hamming distance \(d_{H,\min}\) is summarised in Table V for the specific case of incorporating the redundant bit \(r_\tau\) at the end of the \(\tau\)-th \(K\)-bit source symbol.

**Algorithm-II:**

For \(P = (m \times K)\) associated with \(m \geq 1\), we propose the corresponding SBC\([K, N]\)-encoding procedure, which results in a gradual increase of \(d_{H,\min}\) for the coded symbols upon increasing both \(K\) and \(N\) of the SBC\([K, N]\), while the codeword rate is fixed. This \(K\) to \(N\)-bit encoding method consists of two steps,

1) **STEP-1:**

First \(I_r(i) = [(m - 1) \times K]\) number of redundant bits \(r_{\tau}(i)\), for \(i = 1, 2 \ldots I\) are concatenated to the \(\tau\)-th \(K\)-bit source symbol by repeated concatenation of \(K\) additional source coded bits \((m - 1)\) times, yielding a total of \([(m - 1) \times K]\) bits, as shown in Table IV.

2) **STEP-2:**

In the second step the last set of \(K\) redundant bits \(r_{\tau}(k)\), for \(k = 1, 2 \ldots K\) is generated by calculating the XOR function of the \(K\) source bits \(b_r(j)\), while setting \(b_r[j = k]\) equal to 0, yielding:

\[
r_\tau(k) = [b_r(1) \oplus b_r(2) \ldots \oplus b_r(K)];
\]

for \(k = 1, 2 \ldots K\), while setting \(b_r(k) = 0\), as presented in Table IV, where \(\oplus\) represents the XOR operation.

Using this method a carefully controlled redundancy is imposed by the specific rate \(r = \left[\frac{m}{K}\right]\) SBC\([K, N]\) to ensure that the resultant \(N\)-bit codewords exhibit a minimum Hamming distance of \(d_{H,\min} \geq 2\) between the \(M = 2^K\) number of legitimate \(K\)-bit source code words. This method also results in a gradual increase of the minimum Hamming distance \(d_{H,\min}\) of the coded symbols upon increasing both \(K\) and \(N\) of the SBC\([K, N]\) considered, as shown in Table V, until the maximum achievable \(d_{H,\min}\) is reached for the specific SBC coding rate.

**C. The Proposed Short Block Codes**

Let us now demonstrate the power of ISCD and the effect of our proposed SBCs on the performance of ISCD with the aid of a design example. As an example, the SBC\([K, N]\)-encoded symbols generated by applying rate-\(\frac{2}{3}\), \(\frac{3}{4}\), \(\frac{4}{5}\) and rate-\(\frac{1}{2}\) coding schemes generated using Algorithm-I and Algorithm-II are detailed in Table V, along with their corresponding minimum Hamming distances \(d_{H,\min}\). Again, as it becomes evident from Table V, the EXIT-chart optimised SBCs ensure that the encoded \(N\)-bit symbols exhibit a minimum Hamming distance of \(d_{H,\min} \geq 2\). Additionally, only \(2^K\) out of the \(2^N\) possible \(N\)-bit symbols are legitimate in the mapped source coded-bit-stream, which exhibits a non-uniform probability of occurrence for the \(N\)-bit source symbols. Figure 5 depicts the EXIT characteristics of Figure 4 using either the rate-\(1\) or the rate-\(<1\) SBC\([K, N]\) schemes of Algorithm-I and II shown in Table V. More specifically, the EXIT curve using rate-\(\frac{1}{4}\), \(\frac{1}{2}\), \(\frac{2}{3}\), \(\frac{3}{4}\) and \(\frac{4}{5}\) SBCs does indeed reach to the top right corner of the EXIT chart at \((I_A, I_E) = (1, 1)\) and hence results in an infinitesimally low BER. By contrast, the rate-1 SBC, i.e. the employment of no SBC fails to do so. In conclusion, our simulation results recorded for the system presented in Figure 4 reveal that the performance of SBC strongly depends on the presence or absence of residual source redundancy, which typically manifests itself in the form of non-uniform probability of occurrence for the \(N\)-bit source coded symbols.

The coding parameters of the different SBC schemes used in our design example are shown in Table VI(a). We considered a concatenated rate \(R = \frac{1}{4}\) RSC encoder with constraint length \(L = 4\) and generator sequences \(g_1 = [1011]\), \(g_2 = [1101]\), \(g_3 = [1101]\) and \(g_4 = [1111]\) represented as \(G = [1, g2/g1,g3/g1,g4/g1]\), where ‘1’ denotes the systematic output. The first output \(g_1\) is fed back to the input, while \(g_2, g_3, g_4\) denotes the feed-forward output of the RSC encoder. Observe from the table that an overall code-rate of \(R = \frac{1}{4}\) was maintained by adjusting the puncturing rate of the concatenated RSC in order to accommodate the different SBC rates of Table VI(a).

**D. Exit Chart Analysis**

At the receiver seen in Figure 4, iterative soft-bit source and channel decoding is applied by exchanging extrinsic information between the receiver blocks, which has the capability of improving the achievable subjective video quality. EXIT charts were utilised to characterise the mutual information exchange between the input and output of both the inner

\(^{1}\)For the sake of using a unified terminology, we refer to the scheme using no SBC as the rate-1 SBC.
and outer components of an iterative decoder and hence to analyse its decoding convergence behaviour. Additionally, the actual decoding trajectories acquired while using various SBCs generated using Algorithms-I and II were presented by recording the mutual information at the input and output of both the inner and outer decoder during the bit-by-bit Monte-Carlo simulation of the iterative SBC algorithm.

Figures 6 and 7 present the decoding trajectories recorded both at $E_b/N_0 = 0$ dB and $-1$ dB, when employing the rate-$\frac{2}{3}$ and $\frac{5}{6}$ SBCs of Algorithm-I as the outer code along with their corresponding rate-$\frac{1}{2}$ and $\frac{3}{4}$ RSC, respectively. Observe from the decoding trajectories that the convergence behaviour of the SBCs considered degrades upon increasing their coding rate. Furthermore, the decoding trajectories obtained by employing rate-$\frac{1}{2}$ outer SBCs of type SBC$_{[2, 6]}$ and SBC$_{[5, 15]}$, which were generated using Algorithm-II as well as using the rate-$\frac{1}{2}$ inner RSC detailed in Table V was recorded at $E_b/N_0 = -4$ dB and $-4.5$ dB, as portrayed in Figures 8 and 9, respectively. It may be observed from the EXIT trajectories of Figures 8, and 9 that as expected, the convergence behaviour of SBCs improves upon increasing $d_{H_{\text{min}}}$.

![EXIT characteristics of the USSPA scheme of Figure 4 using the different SBCs of Table V.](image)
### Table V

**Different SBCs with Corresponding Symbols and Minimum Hamming Distances \[ d_{H,min} \]**

<table>
<thead>
<tr>
<th>SBC Type</th>
<th>Symbols in Decimal</th>
<th>[ d_{H,min} ]</th>
</tr>
</thead>
<tbody>
<tr>
<td>Rate 1 SBC</td>
<td>{0, 1}</td>
<td>1</td>
</tr>
<tr>
<td>Rate ( \frac{2}{3} ) SBC ([2, 3])</td>
<td>{0, 3, 5, 6}</td>
<td>2</td>
</tr>
<tr>
<td>Rate ( \frac{3}{4} ) SBC ([3, 4])</td>
<td>{0, 3, 5, 6, 9, 10, 12, 15}</td>
<td>2</td>
</tr>
<tr>
<td>Rate ( \frac{3}{4} ) SBC ([4, 5])</td>
<td>{0, 3, 5, 6, 9, 10, 12, 15, 17, 18, 20, 23, 24, 27, 29, 30}</td>
<td>2</td>
</tr>
<tr>
<td>Rate ( \frac{5}{6} ) SBC ([5, 6])</td>
<td>{0, 3, 5, 6, 9, 10, 12, 15, 17, 18, 20, 23, 24, 27, 29, 30, 33, 34, 36, 39, 40, 43, 45, 46, 48, 51, 53, 54, 57, 58, 60, 63}</td>
<td>2</td>
</tr>
</tbody>
</table>

| Rate \( \frac{4}{5} \) SBC \([2, 6]\) | \{0, 22, 41, 63\} | 3 |
| Rate \( \frac{4}{5} \) SBC \([3, 9]\) | \{0, 78, 149, 219, 291, 365, 438, 504\} | 4 |
| Rate \( \frac{4}{5} \) SBC \([4, 12]\) | \{0, 208, 357, 519, 681, 843, 1005, 1167\} | 5 |
| Rate \( \frac{4}{5} \) SBC \([5, 15]\) | \{0, 1, 2, 4, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150\} | 6 |

### Table VI

**Code Rates and Systems Parameters Used in the Schematic of Figure 4**

(a) Code rates for Different Error Protection schemes

<table>
<thead>
<tr>
<th>Scheme</th>
<th>Code Rate</th>
<th>SBC Type</th>
<th>RSC</th>
<th>SBC</th>
<th>Overall</th>
</tr>
</thead>
<tbody>
<tr>
<td>Rate 1 SBC</td>
<td>1/4</td>
<td>1</td>
<td>1/4</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Algorithm-I</td>
<td>Rate ( \frac{2}{3} ) SBC ([2, 3])</td>
<td>3/8</td>
<td>2/3</td>
<td>1/4</td>
<td></td>
</tr>
<tr>
<td>Rate ( \frac{3}{4} ) SBC ([3, 4])</td>
<td>1/3</td>
<td>3/4</td>
<td>1/4</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Rate ( \frac{3}{4} ) SBC ([4, 5])</td>
<td>5/16</td>
<td>4/5</td>
<td>1/4</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Rate ( \frac{5}{6} ) SBC ([5, 6])</td>
<td>3/10</td>
<td>5/6</td>
<td>1/4</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Algorithm-II</td>
<td>Rate ( \frac{2}{3} ) SBC ([2, 6])</td>
<td>3/4</td>
<td>1/3</td>
<td>1/4</td>
<td></td>
</tr>
<tr>
<td>Rate ( \frac{3}{4} ) SBC ([3, 9])</td>
<td>3/4</td>
<td>1/3</td>
<td>1/4</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Rate ( \frac{3}{4} ) SBC ([4, 12])</td>
<td>3/4</td>
<td>1/3</td>
<td>1/4</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Rate ( \frac{5}{6} ) SBC ([5, 15])</td>
<td>3/4</td>
<td>1/3</td>
<td>1/4</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

(b) Systems parameters used in the schematic of Figure 4

<table>
<thead>
<tr>
<th>System Parameters</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Video Bit Rate (Kbps)</td>
<td>64</td>
</tr>
<tr>
<td>Video Frame Rate (fps)</td>
<td>15</td>
</tr>
<tr>
<td>Channel Coded Rate (Kbps)</td>
<td>256</td>
</tr>
<tr>
<td>Baud-rate (Kbps)</td>
<td>128</td>
</tr>
<tr>
<td>Channel Coding</td>
<td>RSC</td>
</tr>
<tr>
<td>Over-all Code Rate</td>
<td>1/4</td>
</tr>
<tr>
<td>Code Memory</td>
<td>4</td>
</tr>
<tr>
<td>Generator Polynomials</td>
<td>( G_1, G_2, G_3, G_4 )</td>
</tr>
<tr>
<td>Modulation Scheme</td>
<td>QPSK</td>
</tr>
<tr>
<td>Number of Transmitters, ( N_t )</td>
<td>1</td>
</tr>
<tr>
<td>Number of Receivers, ( N_r )</td>
<td>1</td>
</tr>
<tr>
<td>Channel Correlated Rayleigh Fading</td>
<td>0.01</td>
</tr>
<tr>
<td>Interleaver Length</td>
<td>( \approx (64000/15) )</td>
</tr>
<tr>
<td>No System Iterations ( I_t )</td>
<td>10</td>
</tr>
</tbody>
</table>

### E. System Performance Results

In this section we present our BER and video performance results for the proposed system. The video source signal provided as input for our proposed systems was detailed in Section I-B and the key parameters were presented in Table VII. The remaining system parameters are listed in Table VI(b).

Since hand-held videophones have to have a low complexity, we limited the number of iterations between the RSC and SBC decoders to \( I_t = 10 \), when using a rate-1 SBC — i.e. no SBC. Similarly, we used \( I_t = 10 \) iterations, when applying SBCs having a rate below unity. For the sake of increasing the confidence in our results, we repeated each 45-frame experiment 160 times and averaged the results generated. A range of different SBCs generated using our proposed Algorithms-I and II are given in Table V, which are used as the outer codes of Figure 4 in order to evaluate their achievable system performance improvements. We evaluated the performance of our proposed system by keeping the same overall code rate as well as video rate for the different error protection schemes considered.

Figure 10 presents the performance of the various rate-\( \frac{2}{3} \), \( \frac{3}{4} \), \( \frac{5}{6} \) and \( \frac{10}{15} \) SBCs along with the rate-\( \frac{1}{2} \) SBC based error
EB/N0 = -4dB

3

No

44

Akiyo

9

QCIF

1-Frame

Value

Yes

Rate-3/4 SBC[3, 4]

H.264/AVC

JM 13.2

TABLE VII

erroR Protection SchemeS

OUTER SBCs

| Rate-5/6 SBC[5, 6] | Rate-3/5 |
| Rate-4/5 SBC[4, 5] | Rate-5/16 |
| Rate-3/4 SBC[3, 4] | Rate-1/3 |

| Rate-2/3 SBC[2, 3] | Rate-3/8 |
| Rate-1/3 SBC[1, 2] | Rate-3/4 |
| Rate-1/3 SBC[1/2] | Rate-3/4 |
| Rate-1/3 SBC[1/2] | Rate-3/4 |
| Rate-1/3 SBC[1/2] | Rate-3/4 |

Rate-1/3 SBC[3, 9]

Rate-1/3 SBC[2, 6]

Rate-3/8

Rate-2/3 SBC[2, 3]

Rate-3/4

Rate-3/4

Rate-1/3 SBC[3, 9]

Rate-3/4

Rate-3/4

Rate-3/4

Fig. 8. The EXIT chart and simulated decoding trajectories of the SBC[2, 6] scheme of Figure 4 using the parameters of Table VI, at EB/N0 = -4 and -4.5 dB.

Fig. 9. The EXIT chart and simulated decoding trajectories of the SBC[5, 15] scheme of Figure 4 using the parameters of Table VI, at EB/N0 = -4 and -4.5 dB.

protection schemes of Table V in terms of the attainable BER, while their comparison with the rate-1 SBC based schemes is offered in Figures 12.

Additionally, the performance trends expressed in terms of the PSNR versus EB/N0 curves are portrayed in Figures 11 and 13. It may be observed in Figure 11 that the SBC[5, 15] scheme having dH,min = 6 provides the best PSNR performance among the eight different SBC schemes of Table V across the entire EB/N0 region considered. Furthermore, observe from Figure 11 that the lowest rate-2/3 outer SBC combined with rate-3/4 inner RSC results in the best PSNR performance, outperforming the rate-1/2, 1/3 and 1/4 SBCs generated using Algorithm-I. It may also be observed in Figure 13 that using the rate-1 outer SBC and rate-1/2 inner RSC results in a worse PSNR performance than the outer SBCs having a less than unity rate combined with the corresponding inner RSC of Table VI(a), while maintaining the same overall code rate. Quantitatively, using the SBCs of Table V having a rate lower than 1, an additional EB/N0 gain of up to 25 dB may be achieved over the rate-1 SBC at the PSNR degradation point of 1 dB.

Finally, the achievable subjective video qualities of the video telephone schemes utilising various types of SBCs generated using Algorithm-I and II is presented in Figures 14 and 15, respectively. In order to have a fair subjective video quality comparison, we averaged both the luminance and
chorminance components of the decoded video test sequence 30 times for each type of setup. The achievable subjective video quality recorded at the channel $E_b/N_0$ value of 0.5 $dB$ using rate-$\frac{2}{3}$, $\frac{1}{3}$, and $\frac{1}{2}$ SBCs based on Algorithm – I may be seen in Figure 14. Observe from Figure 14 that the achievable video quality improves upon decreasing the SBC code rate. Similarly, Figure 15 presents the subjective video quality obtained at (from left to right) $E_b/N_0 = -4.1$ $dB$, $-3.9$ $dB$, $-3.0$ $dB$ and $-2.1$ $dB$ using rate-$\frac{4}{5}$ SBCs of the type (from top to bottom) SBC$[2]$, SBC$[3]$, SBC$[4]$, SBC$[5]$ and SBC$[6]$. Observe from Figure 15 that a nearly unimpaired quality is obtained for the rate-$\frac{1}{3}$ SBCs having (from top) $d_{H,min} = 3$, $4$, $5$ and $6$ at $E_b/N_0$ values of $-2.5$ $dB$, $-3.0$ $dB$, $-3.9$ $dB$ and $-4.1$ $dB$, respectively. This implies that the subjective video quality of the system improves upon increasing $d_{H,min}$ of the SBCs employed.

IV. PERFORMANCE IMPROVEMENT OF SBCs USING REDUNDANT SOURCE MAPPING

In this section, we will analyse the performance improvement of our proposed Mapping-I SBC coding algorithm by introducing redundancy using our proposed Redundant Source Mapping (RSM) Algorithm.

A. Redundant Source Mapping Assisted Iterative Source-Channel Decoding

The RSM was used to introduce redundancy in the source coded streams by transforming the symbols generated using Algorithm-I in a systematic way, which results in further increase in the $d_{H,min}$ of the generated symbols. Additionally, RSM also increases the EXIT chart flexibility of the outer code, which results in iterative convergence at further reduced $E_b/N_0$ values relative to the corresponding SBC coding. The proposed RSM coding algorithm is described below.

1) RSM Coding Algorithm: In order to further decrease the SBC coding rate of Algorithm-I described in Section III-B and to increase its minimum Hamming distance $d_{H,min}$, we introduce a RSM Coding Algorithm, in which the $N$ additional bits are concatenated to the bits encoded according to Algorithm-I by repeating the same coded bits in a reverse order, which results in a $K$ to $(2 \times N)$-bit mapping, where we have $N = (K + 1)$, as depicted in Table VIII.

Let us now demonstrate the power of RSM with the aid of a design example. As an example, the various SBC mapping symbols along with the corresponding RSM mapping symbols generated by applying the proposed RSM$[K,N]$ encoding schemes along with their corresponding $d_{H,min}$ is summarised in Table IX. Again, as it becomes evident from Table IX, the EXIT-chart optimised RSM also ensures that the mapped symbols exhibit $d_{H,min} \geq 2$. Additionally, only $2^K$ out of the
The actual decoding trajectories of the various error protection arrangements employing the different SBC schemes using Algorithm-I along with their corresponding RSM mapping schemes as well as using the respective constituent inner RSCs detailed in Table X(a) were recorded at $E_b/N_0 = -0.0\ dB$, $-1\ dB$ and $E_b/N_0 = -3.0\ dB$, $-3.5\ dB$ respectively, as portrayed in Figures 17 and 18. These trajectories were recorded by acquiring the mutual information at the input and output of both the inner and outer decoder during the bit-by-bit Monte-Carlo simulation of the iterative soft-bit source and channel decoding algorithm. It may be inferred from the EXIT trajectories of Figures 17 and 18 that as expected, the convergence behaviour of the SBC Algorithm-I improves upon invoking the RSM coding, owing to incorporating additional redundancy and hence improving $d_{H,min}$. Figure 19 presents the performance of the various rate RSM based error protection schemes of Table IX in terms of the attainable BER along with the AWGN channel’s ‘best-case’ performance curve, while their comparison with the rate-1 RSM+ based schemes is offered in Figure 21. The performance trends expressed in terms of the $PSNR$ versus $E_b/N_0$ curves are portrayed in Figures 20 and 22 along with the AWGN.

B. System Performance Results

In this section, we present the performance results of our proposed system seen in Figure 4 and using the RSM of Table IX. For the performance evaluation of our proposed RSM, we used the “Akiyo” video sequence described in Section III-E. Moreover, due to the limited residual redundancy inherent in the source encoded bit-stream and for the sake of reducing the computational complexity imposed, we limited the number of iterations between the RSC and outer decoders to $I_t = 5$, when using a rate-1 RSM — i.e. no RSM. By contrast we used $I_t = 10$ iterations, when applying RSM schemes having a rate below unity. For the sake of increasing the confidence in our results, again, we repeated each 45-frame experiment 160 times and averaged the generated results. Additionally, the performance of our proposed system was evaluated by keeping the same overall code rate as well as video rate for the different considered error protection schemes.

The coding parameters of the different SBC and RSM schemes used in our design example are shown in Table X(a). We considered a concatenated rate $R = \frac{1}{2}$ RSC encoder with constraint length $L = 4$ and generator sequences $g_1 = [1011]$, $g_2 = [1101]$, $g_3 = [1101]$ and $g_4 = [1111]$ represented as $G = [1, g_2/g_1, g_3/g_1, g_4/g_1]$, where ’1’ denotes the systematic output, the first output $g_1$ is fed back to the input and $g_2, g_3, g_4$ denotes the feed forward output of the RSC encoder. Observe from Table X(a) that an overall code-rate of $R = \frac{1}{2}$ was maintained by adjusting the puncturing rate of the concatenated RSC in order to accommodate the different RSM rates of Table IX, while keeping the overall bit-rate budget constant.

For the sake of using a unified terminology, we refer to the scheme using no RSM as the rate-1 RSM+. 2\(N\times2\) possible \((N \times 2)\)-bit symbols of RSM Mapping are legitimate in the mapped source coded bit-stream, where we have $N = (K + 1)$, which exhibits a non-uniform probability of occurrence for the $N$-bit mapped source symbols. Figure 16 portray the EXIT characteristics of Figure 4 using either the rate-1 RSM\(^2\) or the rate $< 1$ SBC and RSM schemes shown in Table IX. More specifically, the EXIT curve of rate $< 1$ SBC and RSM schemes does indeed reach the top right corner of the EXIT chart at $(I_A, I_E) = (1, 1)$ and hence results in an infinitesimally low BER. By contrast, the scheme using a rate-1 RSM+, i.e. no RSM fails to do so.

The EXIT curve of rate $\frac{1}{3}$ SBC and RSM schemes having a rate below unity. For the sake of increasing the confidence in our results, again, we repeated each 45-frame experiment 160 times and averaged the generated results. Additionally, the performance of our proposed system was evaluated by keeping the same overall code rate as well as video rate for the different considered error protection schemes.

\[ \text{Table IX} \]

\begin{table}[h]
\centering
\begin{tabular}{|c|c|c|}
\hline
Mapping Type & Symbols in Decimal & $d_{H,min}$ \\
\hline
Rate-1 RSM & \{0,1\} & 1 \\
Rate-\frac{2}{3} SBC\{2,3\} & \{0,3,5,6\} & 2 \\
Rate-\frac{2}{3} SBC\{3,4\} & \{0,3,5,6,9,10,12,15\} & 2 \\
Rate-\frac{2}{3} SBC\{4,5\} & \{0,3,5,6,9,10,12,15,17,18,20,23,24,27,29,30\} & 2 \\
Rate-\frac{2}{3} SBC\{5,6\} & \{0,3,5,6,9,10,12,15,17,18,20,23,24,27,29,30,33,34,36,39,40,43,45,46,48,51,53,54,57,58,60,63\} & 2 \\
\hline
Rate-\frac{4}{5} SBC\{2,6\} & \{0,30,45,51\} & 4 \\
Rate-\frac{4}{5} RSM\{3,8\} & \{0.60,90,102,153,165,195,255\} & 4 \\
Rate-\frac{4}{5} RSM\{4,10\} & \{0.120,180,204,306,330,390,510,561,585,645,765,771,891,951,975\} & 4 \\
Rate-\frac{6}{12} T2 RSM\{5,12\} & \{0.240,360,408,612,660,780,1020,1122,1170,1290,1530,1542,1782,1902,1950,2145,2193,2313,2553,2565,2805,2925,2973,3075,3315,3435,3483,3687,3735,3855,4095\} & 4 \\
\hline
\end{tabular}
\caption{Different RSM schemes with corresponding symbols and $d_{H,min}$}
\end{table}

The decoding parameters of the different SBC and RSM schemes used in our design example are shown in Table X(a). We considered a concatenated rate $R = \frac{1}{2}$ RSC encoder with constraint length $L = 4$ and generator sequences $g_1 = [1011]$, $g_2 = [1101]$, $g_3 = [1101]$ and $g_4 = [1111]$ represented as $G = [1, g_2/g_1, g_3/g_1, g_4/g_1]$, where ’1’ denotes the systematic output, the first output $g_1$ is fed back to the input and $g_2$, $g_3$, $g_4$ denotes the feed forward output of the RSC encoder. Observe from Table X(a) that an overall code-rate of $R = \frac{1}{2}$ was maintained by adjusting the puncturing rate of the concatenated RSC in order to accommodate the different RSM rates of Table IX, while keeping the overall bit-rate budget constant.

\[ 2^{(N \times 2)} \text{ possible } (N \times 2)\text{-bit symbols of RSM Mapping are legitimate in the mapped source coded bit-stream, where we have } N = (K + 1), \text{ which exhibits a non-uniform probability of occurrence for the } N\text{-bit mapped source symbols. Figure 16 portray the EXIT characteristics of Figure 4 using either the rate-1 RSM+ or the rate } < 1 \text{ SBC and RSM schemes shown in Table IX. More specifically, the EXIT curve of rate } < 1 \text{ SBC and RSM schemes does indeed reach the top right corner of the EXIT chart at } (I_A, I_E) = (1, 1) \text{ and hence results in an infinitesimally low BER. By contrast, the scheme using a rate-1 RSM+, i.e. no RSM fails to do so.} \]
channel’s reference performance curves. It may be observed
in Figure 20 that the RSM\textsuperscript{2} scheme having the lowest coding
rate among the different RSM schemes of Table X(a) provides
the best PSNR performance across the entire $E_b/N_0$ region
considered. It is also observed in Figure 22 that using a rate-1
outer RSM\textsuperscript{*} in conjunction with the rate-$\frac{1}{4}$ inner RSC results
in a worse PSNR performance than the RSM schemes having
a rate below unity combined with their respective inner RSCs
at the same over-all code rate of $\frac{1}{4}$, as mentioned in Table X(a).
Quantitatively, using the RSM of Table IX having a rate lower
than 1, an additional $E_b/N_0$ gain of upto 20 dB may be
achieved over the rate-1 RSM\textsuperscript{*}.

Fig. 14. Subjective video quality of the 45\textsuperscript{th} “Akiyo” video sequence frame using (from left) Rate-$\frac{1}{3}$, $\frac{2}{3}$, $\frac{4}{3}$ and $\frac{5}{3}$ SBCs of Algorithm-I summarised in Table V at $E_b/N_0=0.5$ dB.

Fig. 15. Subjective video quality of the 45\textsuperscript{th} “Akiyo” video sequence frame using Rate-$\frac{1}{3}$ SBCs of type (from top) SBC[2, 6], SBC[3, 9], SBC[4, 12] and SBC[5, 15] of Algorithm-II summarised in Table V at (from left) $E_b/N_0=-4.1$ dB, -3.9 dB, -3 dB and -2.1 dB.
TABLE X
CODE RATES AND SYSTEMS PARAMETERS USED IN THE SCHEMATIC OF FIGURE 4

(a) Code rates for Different Error Protection schemes

<table>
<thead>
<tr>
<th>Error Protection Scheme</th>
<th>Code Rate</th>
</tr>
</thead>
<tbody>
<tr>
<td>Rate-1 RSM</td>
<td>1/4</td>
</tr>
<tr>
<td>SBC[2,3]</td>
<td>3/8</td>
</tr>
<tr>
<td>SBC[3,4]</td>
<td>1/3</td>
</tr>
<tr>
<td>SBC[4,5]</td>
<td>5/16</td>
</tr>
<tr>
<td>SBC[5,6]</td>
<td>3/10</td>
</tr>
<tr>
<td>RSM[2,6]</td>
<td>3/4</td>
</tr>
<tr>
<td>RSM[3,8]</td>
<td>2/3</td>
</tr>
<tr>
<td>RSM[4,10]</td>
<td>5/8</td>
</tr>
<tr>
<td>RSM[5,12]</td>
<td>3/5</td>
</tr>
<tr>
<td>RSM Overall</td>
<td>1/4</td>
</tr>
</tbody>
</table>

(b) Systems parameters used in the schematic of Figure 4

<table>
<thead>
<tr>
<th>System Parameters</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Video Bit Rate (Kbps)</td>
<td>64</td>
</tr>
<tr>
<td>Video Frame Rate (fps)</td>
<td>15</td>
</tr>
<tr>
<td>Channel Coded Rate (Kbps)</td>
<td>256</td>
</tr>
<tr>
<td>Baud-rate (Kbps)</td>
<td>128</td>
</tr>
<tr>
<td>Channel Coding</td>
<td>RSC</td>
</tr>
<tr>
<td>Over-all Code Rate</td>
<td>1/4</td>
</tr>
<tr>
<td>Code Memory</td>
<td>4</td>
</tr>
<tr>
<td>Generator Polynomials</td>
<td>$G_1, G_2, G_3, G_4$</td>
</tr>
<tr>
<td>Modulation Scheme</td>
<td>QPSK</td>
</tr>
<tr>
<td>Number of Transmitters, $N_r$</td>
<td>1</td>
</tr>
<tr>
<td>Number of Receivers, $N_t$</td>
<td>1</td>
</tr>
<tr>
<td>Channel</td>
<td>Correlated Rayleigh Fading</td>
</tr>
<tr>
<td>Normalised Doppler Frequency</td>
<td>0.01</td>
</tr>
<tr>
<td>Interleaver Length</td>
<td>$\approx (64000/15)$</td>
</tr>
<tr>
<td>No System Iterations, $I_s$</td>
<td>10</td>
</tr>
</tbody>
</table>

RSM coding at an $E_b/N_0$ value of $-3.0$ dB were presented in Figure 24. In order to have a pertinent subjective video quality comparison, the video frames presented in Figure 23 and Figure 24 were obtained by repeated transmission of the received video sequence using the proposed system architecture of Figure 4 30-times. Observe from Figure 23 that the achievable video quality improves upon decreasing the SBC Algorithm-I code rate. Additionally, its clear from Figure 24 that the employment of the RSM scheme provides an improved video quality at a 3.5 dB lower $E_b/N_0$ value than the various Mapping-I RSM schemes presented in Figure 23.

Finally, the subjective video quality achieved by the proposed SBC error protection schemes using Algorithm-I was recorded in Figure 23 at the channel $E_b/N_0$ value of 0.5 dB. The corresponding results achieved by Mapping-II aided

V. SBC ASSISTED UNEQUAL ERROR PROTECTION VIDEO USING RSC CODES AND SP-MODULATED DSTS

In this system SBCs are applied to each type of H.264/AVC source coded stream. Following RSC encoding, the SBC coded stream is transmitted using a SP modulation aided DSTS
transceiver. As described in Section III-C, SBCs assists the outer decoder’s EXIT curve to reach to the \((1,1)\) point of EXIT chart. Clearly, SBCs may hence also be referred to as EXIT Chart Optimised Short Block Codes (EOSBCs), which constitute low-complexity block codes.

A. Sphere Packing Modulation Based Orthogonal Design

Consider two complex modulated symbols \((x_1, x_2)\) to be transmitted from two transmit antennas in \(T = 2\) time slots. It was shown in [82] that the diversity product quantifying the coding advantage of an orthogonal transmit-diversity scheme is determined by the minimum Euclidean distance of the vectors \((x_1, x_2)\). Let us assume that the signal that has to be
transmitted by the two antennas in two consecutive time slots consists of $L$ legitimate space-time signals $G_2(x_{l,1}, x_{l,2})$, $l = 0, 1, \cdots, L - 1$, where $L$ represents the number of SP modulated symbols. For example, when jointly transmitting a pair of QPSK symbols, we need $L=16$ SP signals. The aim of SP modulation is to design $x_{l,1}$ and $x_{l,2}$ jointly, so that they provide the best minimum Euclidean distance with respect to the remaining $(L-1)$ legitimate transmitted space-time signals in order to improve the system’s error resilience [83].

B. Near Capacity Differential Space Time Spreading

Space time coding constitutes an effective diversity technique of compensating the effects of wireless channels by exploiting the independent fading of the two antennas’ signal. The goal of space-time coding is to achieve a substantial diversity and power gain relative to its single-input and single-output counterpart, which is achieved without any bandwidth expansion. There are numerous coding structures that are capable of achieving diversity, including the simple STBC proposed by Alamouti [84]. Similarly, Hochwald et al. [85] proposed the transmit diversity concept known as STS for the downlink of Wideband Code Division Multiple Access (WCDMA) systems. However, these STBC and STS techniques use coherent detection and require channel estimation at the receiver, which is acquired by transmitting known training symbols. However, as mentioned above, the channel estimation increases both the cost and complexity of the receiver. More specifically, when the channel experiences fast fading, a commensurately increased number of training sym-
symbols has to be transmitted, which results in a high transmission overhead and wastage of transmission power. Relative to these techniques, DSTS constitutes a low-complexity MIMO-aided technique that does not require channel estimation [86]. However, this low-complexity is achieved at the cost of 3 dB performance loss in comparison to the more complex coherent receivers. The DSTS encoder is composed of two primary stages, consisting of differential encoding and space-time spreading, as shown in Figure 25. The mapped symbols of Figure 25 are first differentially encoded and then they are spread using STS.

According to the DSTS encoding algorithm, at time \( t = 0 \) the arbitrary dummy reference symbols \( v_1^0 \) and \( v_2^0 \), carrying no information, are passed to the STS encoder in order to obtain the differentially encoded symbols \( v_1^t \) and \( v_2^t \) of Figure 25 as detailed in [87]. The differentially encoded symbols \( v_1^t \) and \( v_2^t \) are then spread using the orthogonal spreading codes \( c_1 \) and \( c_2 \) into \( y_1^t \) and \( y_1^t \) as explained in [86].

C. System Overview

The schematic of the proposed system architecture is presented in Figure 26. We considered the H.264 coded stream of Section I-B as the input of the schematic of Figure 26. More specifically, at the transmitter side, the video sequence is compressed using the H.264/AVC video codec. Then, the output bit-stream representing a video frame consisting of \( B \) source coded bits \( x_i \), \( i = 1, 2, ..., B \), is de-multiplexed into three different bit-partitions, namely \( \text{Stream} - A \), \( \text{Stream} - B \) and \( \text{Stream} - C \), containing the sequentially concatenated partitions of type A, B and C of all the slices per frame, respectively. The de-multiplexer’s binary output sequences \( x_a \), \( x_b \), and \( x_c \), where we have \( a = 1, 2, ..., b_a \), \( b = 1, 2, ..., b_b \), \( c = 1, 2, ..., b_c \), and \( B = b_a + b_b + b_c \), are then encoded to the bit-strings \( x_a^t \), \( x_b^t \), and \( x_c^t \) using a SBC.

As the different H.264 coded bit-partitions have different sensitivity to channel-induced errors. Therefore, by appropriately allocating the available bit-rate budget to the different portions of the bit-stream, UEP can be provided, which may result in a performance improvement for the system relative to the conventional equal protection scheme. In this system configuration UEP is provided for H.264 coded video stream by applying different-rate RSC codes to the three different types of H.264 bit-stream partitions.

In our system setup we used introduce the concept of using a \( \frac{3}{4} \)-rate SBC. The resultant 3-bit to 4-bit SBC coding is shown in Table IX. Subsequently, the SBC-encoded bit-strings are interleaved using the bit-interleavers II of Figure 26, into the interleaved sequences \( \bar{x}_a, \bar{x}_b, \) and \( \bar{x}_c \), as shown in Figure 26 and are encoded by RSC codes having different code-rates, as summarised in Table XI(a). In the iterative decoder of Figure 26, interleaving and de-interleaving are necessary for ensuring that the bits are input to the constituent decoders in their expected original order, while maintaining the statistical independence of the extrinsic LLRs. Since the degree of statistical independence guaranteed by an interleaver is always related to its length [80], concatenation of the bits generated by the MBs of a slice within a given partition results in a longer interleaver without extending the video delay and hence improves the achievable performance of iterative decoding.

At the receiver side, after DSTS decoding of the received signal, the soft information gleaned from the SP demapper of Figure 26 is de-multiplexed into three UEP video streams corresponding to the partitions A, B and C and are passed to the respective RSC decoder in order to generate extrinsic information. The extrinsic information gleaned is then exchanged between the RSC decoders and SBC decoder. More explicitly, the soft-information obtained after SP demodulation in the form of its LLR representation \( L_M(\bar{y}) \) is forwarded to the corresponding RSC decoder. The RSC decoder processes this input information and the \( a\text{-priori} \) information \( L_{RSC}^{apr}(\bar{x}) \) is fed back from the outer decoder in order to generate the extrinsic LLR values \( L_{RSC}^{extr}(\bar{x}) \), which are subsequently deinterleaved by the soft-bit interleaver of Figure 26, yielding \( L_{SBC}^{extr}(x') \). Then, the soft bits \( L_{SBC}^{extr}(x') \) are passed to the SBC decoder, which uses the extrinsic information generation algorithm of [64] to compute the extrinsic LLR values \( L_{SBC}^{extr}(x) \). During the iterative decoding process the RSC decoder exploits the \( a\text{-priori} \) information supplied by the

![Fig. 25. The DSTS system block diagram](image)

### Table XI

<table>
<thead>
<tr>
<th>System Parameters</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Video Bit Rate (Kbps)</td>
<td>64</td>
</tr>
<tr>
<td>Video Frame Rate (fps)</td>
<td>15</td>
</tr>
<tr>
<td>Channel Coded Rate (Kbps)</td>
<td>128</td>
</tr>
<tr>
<td>Baud-rate (Kbps)</td>
<td>64</td>
</tr>
<tr>
<td>Channel Coding</td>
<td>RSC</td>
</tr>
<tr>
<td>Over-all Code Rate</td>
<td>1/2</td>
</tr>
<tr>
<td>Code Memory</td>
<td>3</td>
</tr>
<tr>
<td>Spreading Code</td>
<td>Walsh Code</td>
</tr>
<tr>
<td>Spreading Factor</td>
<td>8</td>
</tr>
<tr>
<td>Modulation Scheme</td>
<td>SP(L=16)</td>
</tr>
<tr>
<td>Generator Polynomials</td>
<td>( (5, 7) )</td>
</tr>
<tr>
<td>RSC 3/5, 4/5, &amp; 9/10 ((G_1, G_2))</td>
<td>( (5, 7) )</td>
</tr>
<tr>
<td>Number of Transmitters, ( N_t )</td>
<td>2</td>
</tr>
<tr>
<td>Number of Receivers, ( N_r )</td>
<td>1</td>
</tr>
<tr>
<td>Channel</td>
<td>Correlated Rayleigh Fading</td>
</tr>
<tr>
<td>Normalised Doppler Frequency</td>
<td>0.01</td>
</tr>
<tr>
<td>Interleaver Length</td>
<td>( (64000/15) )</td>
</tr>
<tr>
<td>No System Iterations, ( I_t )</td>
<td>10</td>
</tr>
<tr>
<td>SBC Rate</td>
<td>5/6</td>
</tr>
</tbody>
</table>
outer decoder for the sake of providing improved a-priori LLR values for the outer channel decoder of Figure 26, which in turn exploits the input LLR values gleaned from the inner decoder for the sake of providing improved a-priori information for the RSC decoder in the subsequent iteration. Further details about iterative decoding are provided in [19, 24].

As an example, the actual decoding trajectory of the Equal Error Protection (EEP) scheme is recorded at $E_b/N_0 = 9$ dB and is presented in Figure 27. The trajectory steps shown in Figure 27 represent the actual extrinsic information transfer between the SBC decoder and the corresponding inner RSC decoder and hence the EXIT chart based convergence prediction is accurate at $E_b/N_0 = 9$ dB.

**D. Performance Results**

The performance of the system outlined in Figure 26 is evaluated using both the EEP and Unequal Error Protection (UEP) schemes of Table XI(a) for the transmission of the H.264 encoded video stream of Section I-B by applying different-rate RSC codes to the video partitions A, B and C, as presented in Table XI(a). For the sake of fair comparison, the resultant overall rate of all EEP and UEP schemes is fixed to 1/3. The convolutional encoder used in our simulations is specified by the constraint length of $L = 3$, that has generator sequences of $g_1 = [111]$, $g_2 = [101]$ and $g_2 = [011]$, which may also be represented as $G = [1, g_2/g_1, g_3/g_1]$, where ‘1’ denotes the systematic output, the first output $g_1$ is fed back to the input and $g_2$, $g_3$, $g_4$ denotes the feed forward output of the RSC encoder. Furthermore, as shown in Table XI(a), the fixed overall coding rate of $R = \frac{1}{3}$ was maintained by adjusting the puncturing rate of the RSC code applied. For the sake of maintaining a high statistical confidence level, we repeated each experiment 160 times and recorded the average results. The BER vs $E_b/N_0$ curves obtained using the system architecture described in Section V-C for the various error protection schemes of Table XI(a) are presented in Figure 28. Furthermore, the performance of the various error protection schemes was evaluated in terms of the associated objective video quality, which is characterised in the form of the PSNR-Y vs $E_b/N_0$ curves of Figure 29. It may be observed from Figure 29 that the UEP2 scheme, which is associated with a high error protection for partition B provides the best performance, followed by the UEP1 scheme of Table XI(a), which performs better than EEP. Explicitly, an $E_b/N_0$ gain of 3 dB is attained using UEP1 with reference to UEP2, while an $E_b/N_0$ gain of 1 dB is achieved with reference to the EEP scheme, at the PSNR degradation point of 2 dB.

Moreover, the EEP scheme outperformed the EEP+3 scheme for the sake of using a unified terminology, we refer to the scheme using rate-1 SBC as the EEP*.
of Table XI(a), which is based on similar type of error protection schemes, but dispensing with EOSBCs. It is observed from Figure 29 that an $E_b/N_0$ gain of about 12 dB can be achieved with reference to EEP*, which is an explicit benefit of our EOSBC.

Finally, the subjective video quality achieved by the proposed EEP, UEP1, UEP2 and EEP* schemes at $E_b/N_0=12$ dB for EEP, UEP1 and UEP2 schemes and $E_b/N_0=20$ dB for EEP* scheme is presented in Figure 30. Observe from Figure 30 that the achievable video quality UEP2 has relatively less error blocks than that of EEP scheme which has better quality than UEP1. Similarly, it can be observed from Figure 30 that the subjective video quality of the EEP* scheme at $E_b/N_0=20$ dB is comparable to that of UEP1 at $E_b/N_0=12$ dB.

**VI. ITERATIVELY DETECTED H.264 WIRELESS VIDEO TELEPHONY USING THREE-STAGE SYSTEM DESIGN**

In this section, we advocate a three-stage serially concatenated scheme designed for near-capacity operation. Contrary to the two-stage system design having a single iterative loop, the three-stage system employs two iterative loops between the one inner and the intermediate decoder, which we refer to as inner iterations, as well as another one between the outer decoder and the intermediate decoder, which are referred to as outer iterations. A particular combination of inner iterations followed by outer iterations is referred to as one system iteration.

**A. Three-Stage System Design Example**

In order to characterise the advantage of the three-stage system design, we present a design example. Our proposed design example consists of an inner sphere packing modulator, intermediate rate-1 precoder and outer short block code. Therefore, our focus is on analysing the achievable performance improvements over traditional two-stage turbo-coding, if an intermediate unity-rate precoder is invoked between the inner and outer iterative components, which result in a three-stage serially concatenated system.

**B. Three-Stage System Overview**

The schematic of the proposed system is shown in Figure 31, where we employ the rate-$\frac{1}{2}$ SBC code of Section IX, provided by partitioning the $k$th video frame into $N$ source code symbols, where each symbol $v_{n,k}$ consists of $M$ source coded bits $v_{n,k}(m)$, $m = 1, \ldots, M$ although this is not shown in Figure 31. In our case $M = 2$-bit input symbols are encoded by the rate-$\frac{1}{2}$ SBCs of Figure 31, resulting $M' = 6$-bit SBC coded symbols. The SBC encoded bits $s$ are then interleaved by a random bit interleaver $\Pi_{int}$ and then the interleaved bits $s'$ are encoded by the URC. The URC-encoded bits $r$ are interleaved by the second random bit interleaver $\Pi_{inv}$ of Figure 31 into $r'$ and passed to the SP modulator. As detailed in Section V-A, the benefit of the SP modulator is that it allows us to jointly consider the space-time symbols of the DSTS scheme’s two antennas, while maximising the Euclidean distance of the resultant symbols. The SP modulator maps $B$ number of coded bits $b = b_0, \ldots b_{B-1} \in 0, 1$ to a SP symbol $v \in V$, so that we have $v = map_{SP}(b)$, where $B = log_2(L)$, and $L$ represents the set of legitimate SP constellation points, as detailed in [83]. More explicitly, we used $B = log_2(16) = 4$ channel coded bits per SP symbol. The resultant set of SP symbols are transmitted with the aid of DSTS within two time slots using two transmit antennas. In this study, we consider transmission over a temporally correlated narrowband Rayleigh fading channel associated with a normalised Doppler frequency of $f_D = f_dT_s = 0.01$, where $f_d$ is the Doppler frequency of $f_D = f_dT_s = 0.01$. 

**Fig. 28.** BER versus $E_b/N_0$ performance of the various error protection schemes of Figure 26 using the parameters of Table XI.

**Fig. 29.** PSNR-Y versus $E_b/N_0$ performance of the various error protection schemes of Figure 26 using the parameters of Table XI.
Symbols in Decimal

1

{ d

\text{*} \text{URC-SBC and DSTS-SP-URC-SBC} \text{ in Table XII. We refer to these two schemes as the DSTS-SP-extrinsic information represented in the form of LLRs, to assist and the USSPA decoder of Figure 31 iteratively exchange frequency and } T_s \text{ is the symbol duration. The extrinsic LLR computation can be found in [68, 69], and was briefly reviewed in Section II-A. The schemes considered in this section differ in the choice of the outer SBC codec. Specifically, we considered EXIT-chart optimised SBCs and an equivalent-rate arbitrary SBC-based benchmark having a minimum Hamming distance of } \text{d}_{H, min} = 3 \text{ and 1 respectively, as given in Table XII. We refer to these two schemes as the DSTS-SP-URC-SBC and DSTS-SP-URC-SBC* arrangements.}

C. Three-Stage Iterative Decoding

At the receiver, the APP SISO decoder of the URC scheme and the USSPA decoder of Figure 31 iteratively exchange extrinsic information represented in the form of LLRs, to assist each other in approaching the point of perfect convergence at (1,1) of the EXIT-chart, as shown in Figure 31. The variable } L(.) \text{ represents the respective bit-LLRs, where the LLRs of the corresponding decoder in our three-stage system design are differentiated by the subscript SBC for the outer decoder. By contrast, the subscript URC is used to represent our intermediate decoder, while SP corresponds to the inner decoder. Additionally, the specific type of the LLRs is indicated by the superscript } apr \text{ and } extr, \text{ corresponding to } a\text{-priori} \text{ and extrinsic information, respectively.}

1) Inner Iterations: The received complex-valued symbols corresponding to each } B = 4 \text{ URC-coded bits per DSTS-SP symbol are demapped to their } L(.) \text{ representations. The extrinsic LLR values } L_{\text{extr}}^{\text{SP}}(r') \text{ generated at the output of the SP-demapper are deinterleaved by the softbit interleaver } \Pi_{in} \text{ of Figure 31 and are passed to the URC-decoder. The extrinsic LLR values } L_{\text{extr}}^{\text{URC}}(r) \text{ of the URC-encoded bits } r \text{ are interleaved using the softbit interleaver } \Pi_{in} \text{ of Figure 31. Following interleaving the resultant extrinsic LLRs } L_{\text{extr}}^{\text{URC}}(r) \text{ are fed back to the SP-demapper as the } a\text{-priori information } L_{\text{apr}}^{\text{SP}}(r'). \text{ This } a\text{-priori information is exploited by the SP demapper for the sake of providing improved extrinsic information for the URC decoder in the successive iterations.}
These extrinsic LLRs are interleaved and fed back as \( \Pi \) soft-bit interleaver produced by the URC decoder are deinterleaved using the decoder of Figure 31. First, the extrinsic LLRs \( L_{\text{ex}} \) exchanging extrinsic information between the SBC and URC decoder generate two extrinsic outputs \( L_{\text{extr}}(r) \) and \( L_{\text{extr}}(x') \) by the SBC and fed back to the URC-decoder. Similarly, the URC decoder generates two extrinsic outputs \( L_{\text{extr}}(r) \) and \( L_{\text{extr}}(x') \), representing the data bits \( r \) and \( x' \), respectively. However, the SBC and the SP demapper only receives input from and provides output for the URC decoder of Figure 31.

According to the EXIT chart analysis of iterative decoding, the outer decoder results in the highest possible extrinsic information \( I_{E}(\text{outer}) = 1 \), for a given input a-priori information, as shown in Figure 33. For the EXIT chart analysis of our proposed system the intermediate URC decoder and the SBC are viewed as a single combined outer SISO module. The EXIT chart of the proposed benchmark DSTS-SP-URC-SBC* scheme is shown in Figure 32 along with the EXIT curves of the SP demapper recorded for the \( E_b/N_0 \) values of 8 to 13 dB.

As seen from Figure 32, the EXIT curve of the combined outer SISO module constituted by the DSTS-SP-URC-SBC* scheme cannot reach the (1,1) point of perfect convergence in the EXIT chart, since it intersects with the EXIT curve of the inner SP demapper, which implies that an infinitesimally low BER cannot be achieved. Additionally, the outer EXIT curve of the combined SISO module recorded for the DSTS-SP-URC-SBC scheme is shown in Figure 33 along with the EXIT curves of the SP demapper for various \( E_b/N_0 \) values.

Figure 33 shows that the joint EXIT curve of the SBC and URC decoder in the DSTS-SP-URC-SBC arrangement reaches the (1,1) point of the EXIT chart. Figures 32 and 33 also provide the decoding trajectories of the proposed system at the \( E_b/N_0 \) values considered. These trajectories were recorded by acquiring the mutual information at the input and output of both the inner SP demapper and the joint outer SISO module during the bit-by-bit Monte-Carlo simulation of the iterative decoding algorithm. Observe from the decoding trajectories of Figure 33 that for \( E_b/N_0 \) values higher than 8 dB the DSTS-SP-URC-SBC scheme becomes capable of achieving the highest possible extrinsic information of \( I_{E}(\text{outer}) = 1 \) during the iterative decoding process. However, the DSTS-SP-URC-SBC* scheme is unable to achieve this goal due to the intersection of the inner and outer decoders’ EXIT curves.
E. System Performance Results

In this section, we present our overall performance results for the proposed system model. For the performance analysis of our proposed three-stage system model of Figure 31, we employed the H.264/AVC video source codec using the Akiyo video test sequence of 45 frames, as detailed in Section I-B. We consider the SP modulation scheme [83] of Section V-A associated with \( L = 16 \) sphere-packing modulated symbols, while employing Anti-Gray Mapping (AGM)\(^5\) for source bits-to-SP symbol mapping. Our system design consisted of a two-antenna-aided DSTS and a single receiver antenna arrangement. The performance of the system was evaluated, while considering various combinations of the system iterations \( I_{\text{system}} \) and of the iterations \( I_{\text{out}} \) within the outer joint SBC-URC system module. For the sake of increasing the confidence in our results, we repeated each 45-frame experiment 160 times and averaged the generated results.

The BER performance of the error protection scheme employed is shown in Figure 34. It can be observed from Figure 34, that as expected the DSTS-SP-URC-SBC scheme using \( I_{\text{system}} = 5 \) system iterations results in the best BER performance, when compared to \( I_{\text{system}} = 4 \) and 3 for the same error protection scheme. Additionally, it can be seen that DSTS-SP-URC-SBC\(^*\) scheme results in the worst BER performance due to its inability to reach the (1,1) point of perfect convergence.

Furthermore, the PSNR versus \( E_b/N_0 \) curve of the proposed error protection scheme is portrayed in Figure 35. It may be observed in Figure 35 that the DSTS-SP-URC-SBC scheme employing rate \( \frac{3}{4} \) SBCs having \( d_{\text{H},\text{min}} = 3 \) and \( I_{\text{system}} = 5 \) system iterations results in the best PSNR performance across the entire \( E_b/N_0 \) region considered. It is also observed in Figure 35 that when performing iterative decoding, while employing an arbitrary rate \( \frac{1}{4} \) SBC results in the worst PSNR performance at the same overall code rate of \( \frac{1}{3} \), as given in Table XIII(a). Quantitatively, when using the DSTS-SP-URC-SBC scheme of Table XIII(a), an \( E_b/N_0 \) gain of upto 22 dB may be achieved relative to the DSTS-SP-URC-SBC\(^*\) scheme at the PSNR degradation point of 2 dB, as shown in Figure 35.

Finally, the subjective video quality of the error protection schemes employed is portrayed in Figure 36. The video frames portrayed in Figure 36 were obtained by repeated transmission of the received video sequence using the same system with \( I_{\text{system}} = 5 \) system iterations 30 times, in order to have a pertinent subjective video quality comparison. Observe from Figure 36 that an unimpaired video quality is attained by the DSTS-SP-URC-SBC scheme at an \( E_b/N_0 \) value of 10 dB. However, video impairments persist for the DSTS-SP-URC-SBC\(^*\) scheme even at the high \( E_b/N_0 \) values of 28.5 dB, 29 dB, 29.5 dB and 30 dB, as shown in Figure 36.

\(^5\)Any bit-to-symbol mapping, which is different from the gray-mapping is referred to as an AGM. The best AGM has to be found also by EXIT-chart optimisation.
VII. GENERIC DESIGN GUIDELINES

The system design guidelines are summarized below:

- The first step in the design of multimedia encoding transmission systems is that of determining the specific type of application, such as low bit rate videophones, digital TV broadcast, video conferencing etc. Which in turn predetermine the affordable complexity.
- Another fundamental specification is the affordable delay. For example, interactive video applications, such as videophones are delay sensitive compared to non-interactive applications, such as digital video broadcast.
- The available bandwidth is also an important factor, which limits the number of bits/video frame and the frame rate. The former is directly proportional to the maximum tolerable interleaver length, which also affects the decoding complexity and the error correction capability.
- If the multimedia application considered allows the implementation of complex video coding features without violating the allocated delay and hardware complexity limitations, then various source coding parameters such as multiple-reference frame prediction, insertion of B-pictures, FMO and CABAC can also be incorporated.
- The SBC/RSM codes of Section III provide a high degree of design flexibility in terms of code rate and minimum Hamming distance $d_{H,min}$, which can be selected based on the affordable interleaver length, processing delay and complexity.
- If the design of the multimedia system is based on MIMO transmission, then considering near-capacity DSTS aided SP modulation scheme is a feasible implementation option for attaining a diversity gain without the need for any high-complexity MIMO channel estimation.
- If there is no strict constraint on the overall delay and complexity, while near-capacity operation is the desired design criterion, the more flexible three-stage concatenated design of Section VI may have to be invoked. This is because it facilitates decoding convergence to an infinitesimally low BER at near-capacity SNRs.
- It is important to emphasise that the EXIT-charts provide us with an insightful tool for designing codes for near-capacity operation in order to analyse the effect of the code rates or of the minimum Hamming distance on the end-user’s video performance.

Further video transceiver designs are possible with improved performance, while considering the basic design principles outlined above. The video coding community completed the design of the H.264/SVC standard, which has diverse coding functionalities, while the wireless communications community is progressing research towards the next-generation mobile radio standards. Overall, this is an exciting era for wireless video communications research, leading to new standards and designs for the wireless multimedia age [1].

VIII. CONCLUSIONS

In this tutorial we presented various system design aspects of iterative source-channel decoding for near-capacity multimedia. We proposed a family of SBCs for achieving guaranteed convergence in soft-bit assisted iterative JSCD, which facilitates improved iterative USSPA operations. The DP H.264 source coded video was used to evaluate the attainable performance of our system using the SBC-assisted iterative USSPA scheme in conjunction with RSC codes for transmission over correlated narrowband Rayleigh fading channels, while keeping the overall bit-rate budget constant.

Moreover, we proposed an RSM scheme by transforming the SBC algorithms in a systematic way, which resulted in a further increase in the $d_{H,min}$ of the generated symbols. From the EXIT curves obtained it may be observed that the convergence behaviour of SBCs improves upon incorporating
RSM associated with additional redundancy and an improved $d_{H,\text{min}}$. Furthermore, we proposed an UEP H.264/AVC video transmission scheme using a combination of SBCs and RSCs. We demonstrated that by using UEP employing an appropriate channel coded bit-rate budget allocation to the different partitions of the H.264/AVC coded video based on their relative importance resulted in useful PSNR improvements. Furthermore, a three-stage system design was presented, which was constituted by the serially concatenated and iteratively decoded EOSBCs and the precoded DSTs aided multi-dimensional SP modulation designed for near-capacity joint source and channel coding. It was demonstrated that the employment of EXIT-chart-optimised SBCs in the three-stage system setup, which deliberately imposed further artificial channel-coding redundancy on the source coded bit-stream provided significant improvements in terms of the PSNR versus $E_b/N_0$ performance, when compared to the benchmark scheme employing equivalent-rate SBCs, which were not optimised using EXIT-charts for achieving the best possible iterative convergence behaviour.

REFERENCES


Nasrumanhillah received the Bachelor of Science (B.Sc.) degree in computer engineering from University of Engineering & Technology (UET), Peshawar, Pakistan, in 2004 and the Master of Science (M.Sc.) degree in computer engineering from Lahore University of Management Sciences (LUMS), Lahore, Pakistan, in 2006. In 2010 he was awarded the Doctor of Philosophy (Ph.D) degree by the University of Southampton, UK. Since October 2010, he has been working as Postdoc at the France Telecom R&D - Orange Labs, France. His research interests include low-bit-rate video coding for wireless communications, turbo coding and detection, iterative source-channel decoding and peer-to-peer video streaming.

Lajos Hanzo (http://www-mobile.ecs.soton.ac.uk) FREng, FIEEE, FIET, DSc received his degree in electronics in 1976 and his doctorate in 1983. In 2009 he was awarded the honorary doctorate “Doctor Honoris Causa” by the Technical University of Budapest. During his 35-year career in telecommunications he has held various research and academic posts in Hungary, Germany and the UK. Since 1986 he has been with the School of Electronics and Computer Science, University of Southampton, UK, where he holds the chair in telecommunications. He has co-authored 20 John Wiley/IEEE Press books on mobile radio communications totalling in excess of 10000 pages, published in excess of 1000 research entries at IEEE Xplore, acted both as TPC and General Chair of IEEE conferences, presented keynote lectures and been awarded a number of distinctions. Currently he is directing an academic research team, working on a range of research projects in the field of wireless multimedia communications sponsored by industry, the Engineering and Physical Sciences Research Council (EPSRC) UK, the European IST Programme and the Mobile Virtual Centre of Excellence (VCE), UK. He is an enthusiastic supporter of industrial and academic liaison and he offers a range of industrial courses. He is also a Governor of the IEEE VTS. Since 2008 he has been the Editor-in-Chief of the IEEE Press and since 2009 a Chaired Professor also at Tsinghua University, Beijing. For further information on research in progress and associated publications please refer to http://www-mobile.ecs.soton.ac.uk