Table of Contents Author Guidelines Submit a Manuscript
International Journal of Digital Multimedia Broadcasting
Volume 2008, Article ID 124685, 18 pages
Research Article

Lossy Joint Source-Channel Coding Using Raptor Codes

1Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, CA 90089, USA
2Department of Electrical Engineering, Princeton University, Princeton, NJ 08544, USA

Received 1 May 2008; Accepted 17 June 2008

Academic Editor: Massimiliano Laddomada

Copyright © 2008 O. Y. Bursalioglu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


The straightforward application of Shannon's separation principle may entail a significant suboptimality in practical systems with limited coding delay and complexity. This is particularly evident when the lossy source code is based on entropy-coded quantization. In fact, it is well known that entropy coding is not robust to residual channel errors. In this paper, a joint source-channel coding scheme is advocated that combines the advantages and simplicity of entropy-coded quantization with the robustness of linear codes. The idea is to combine entropy coding and channel coding into a single linear encoding stage. If the channel is symmetric, the scheme can asymptotically achieve the optimal rate-distortion limit. However, its advantages are more clearly evident under finite coding delay and complexity. The sequence of quantization indices is decomposed into bitplanes, and each bitplane is independently mapped onto a sequence of channel coded symbols. The coding rate of each bitplane is chosen according to the bitplane conditional entropy rate. The use of systematic raptor encoders is proposed, in order to obtain a continuum of coding rates with a single basic encoding algorithm. Simulations show that the proposed scheme can outperform the separated baseline scheme for finite coding length and comparable complexity and, as expected, it is much more robust to channel errors in the case of channel capacity mismatch.

1. Introduction

A stationary ergodic source can be transmitted over an information-stable channel with end-to-end average distortion 𝐷 with bandwidth expansion factor 𝑏 not lower than 𝑅(𝐷)/𝐶 channel symbols per source sample, where 𝑅(𝐷) is the source rate distortion function and 𝐶 is the channel capacity. Shannon's source-channel separation principle [1] ensures that this optimal performance can be approached by independently designing the source coding and the channel coding schemes. The bandwidth expansion factor 𝑏 is defined as the number of channel symbols per source symbol. If a block of 𝐾 source symbols is transmitted through the channel in 𝑁 channel uses, then 𝑏=𝑁/𝐾. This provides a definite architectural advantage in practical systems, where typically (lossy) source coding is implemented at the application layer, while channel coding is designed and optimized for the physical layer.

On the other hand, this separated source-channel coding (SSCC) approach may incur substantial suboptimality, due to the nonideal behavior of finite length, finite complexity, source and channel codes. In fact, source codes designed without taking into account the presence of channel decoding errors are typically very fragile and this might impose unnecessarily restrictive constraints on the performance of channel coding. In such cases, joint source-channel coding (JSCC) may lead to performance improvement (i.e., a better (𝑏,𝐷) operating point) for the same level of complexity.

Most practical lossy source coding schemes for natural sources (e.g., images, audio, video) are based on the idea of transform coding [2]. Source blocks are projected onto a suitable basis by a linear transformation, such that the source is well described by only a small number of significant transform coefficients. Then, the coefficients are scalar-quantized, and finally the resulting sequence of quantization indices are entropy coded. The theoretical foundation of this approach relies on the universality of entropy-coded quantization, and dates back to the work of Ziv [3]. In general, the linear transform is adapted to the given class of sources (e.g., wavelet transforms for images [4]). The statistics of the quantization indices is not known a priori. However, the memory structure of the underlying discrete source is fixed and it is typically described as a finite-memory tree-source (e.g., the context structure of JPEG2000 [5, 6]). Then, data compression is obtained by using an adaptive entropy coding scheme that estimates the transition probabilities of the source statistical model. For example, arithmetic coding [7] with Krichevsky-Trofimov (KT) sequential probability estimation is a common choice [8].

For the sake of simplicity, this paper treats only independent and identically distributed (i.i.d.) sources with known statistics, that is, it neither deals with the transform coding aspect, nor with the universal implementation of entropy coding. However, our results can be generalized along the lines of what done in [5, 6]. Even in the nonuniversal case, classical lossless compression is catastrophic: a small Hamming distortion (number of bits in error) in the entropy-coded sequence is mapped into a large distortion in the reconstructed source sequence. This imposes a very strict target error probability on the channel coding stage, thus involving both complex channel coding and operating points that may be quite far from the theoretical limits. This is even more evident in applications where the coding delay is limited, thus preventing the use of very large block lengths.

It was shown in [9] that fixed-to-fixed length data compression of a discrete source with linear codes is asymptotically optimal, in the sense that compression up to the source entropy rate can be achieved. This is strongly related to transmission using the same linear code on a discrete additive noise channel where the noise has the same statistics as the discrete source. This analogy can be exploited in order to design a JSCC scheme. We wish to maintain the simplicity of the transform coding approach while improving the robustness of the scheme. The rationale behind the proposed design is the following: since linear codes achieve the entropy rate of discrete sources and the capacity of symmetric channels, we can combine the entropy coding stage and the coding stage into a single linear encoding stage. The advantage of this approach is that the design of noncatastrophic linear encoders is very well understood. Therefore, the proposed scheme can approach the optimal separation limit for large block length, while achieving better robustness to channel errors at finite decoding delay and complexity.

In [5], this JSCC approach was applied to the transmission of JPEG2000-like encoded images (in the sense that the wavelet transform, the quantization scheme and the tree-source memory structure were borrowed from JPEG2000), by using a family of progressively punctured turbo codes to map directly the redundant quantization bits into channel symbols. As stated above, here we focus on simpler i.i.d. sources with perfectly known statistics (i.e., the nonuniversal case) and investigate in greater detail the performance analysis and the comparison with the baseline SSCC approach. In this work, we use raptor codes [10] in order to map the redundant quantization bits into channel-coded symbols.

Our scheme works as follows. A source block of length 𝐾 is quantized symbol by symbol. The sequence of quantization indices, represented as binary vectors, are partitioned into bitplanes. The bitplanes are separately encoded into channel symbols by a bank of binary raptor encoders. Each bitplane is encoded at a rate that depends on its conditional entropy rate given the bitplanes previously encoded. At the decoder, the bitplanes are decoded in sequence using a multistage decoder, where in each stage we use a belief propagation (BP) iterative decoder that takes into account both the already decoded bits from previous planes, and the a priori statistics of the current bitplane as well as the received channel output.

Raptor codes are a particularly useful class of rateless codes. The advantage of using a rateless code is clear: with a single basic encoding machine we can generate a continuum of coding rates. Therefore, the scheme can adapt naturally to the entropy rate of the source and to the capacity of the channel. Although we do not pursue the universal setting in this work, we notice here that the proposed architecture allows a very fine rate matching between the (unknown a priori) source entropy and the channel capacity without resorting to a library of progressively punctured codes as is done in [5].

We express the performance of a source-channel coding scheme in terms of its peak signal-to-noise ratio (PSNR), expressed in dB, defined as PSNR10log10(𝐷).(1) In particular, we will focus on a standard Gaussian i.i.d. source 𝑆𝒩(0,1) and on the mean-squared distortion 𝐷=𝔼[|𝑆𝑆|2]. In this case, the distortion-rate function is 𝐷=22𝑅. At the Shannon separation limit, that is, letting 𝑅=𝑏𝐶, we have PSNRShannon=20log10(2)𝑏𝐶=(6dB)×𝑏𝐶.(2)

Our aim is to design a family of practical schemes that operate close to the curve PSNRShannon versus 𝑏. Notice that we do not pursue here the design of embedded schemes, that is, of single coding schemes that achieve multiple (PSNR,𝑏) points. Nevertheless, the bitplane layered structure of the proposed encoder and the proposed multistage decoder lend themselves quite naturally to the design of embedded JSCC schemes. We leave this aspect for future work and comment on it further in the concluding section.

The rest of this paper is organized as follows. In Section 2, we review the limits of scalar entropy-coded quantization and define the target “operational Shannon limit” of our scheme. In Section 3 we present a comprehensive analysis of the baseline SSCC scheme which represents our term of comparison.Section 4 presents the details of the proposed scheme, its analysis and an algorithm for progressive incremental redundancy in order to optimize the coding rates at each bitplane. Section 5 presents some additional numerical comparisons between the baseline SSCC and the JSCC schemes, and in Section 5 we present some concluding remarks. Raptor codes, BP decoding, EXIT chart analysis and some ancillary results are presented in the appendices.

2. Entropy-Coded Scalar Quantization

A source sequence of length 𝐾,𝐬𝐾, is quantized by applying componentwise the scalar quantizer 𝒬𝐵𝔽2𝐵+1, where 𝐵 bits are used to represent the magnitude and one bit represents the sign. Let 𝐮=𝒬𝐵(𝐬) denote the sequence of quantization indices and let 𝑢𝑝,𝑘𝑝=0,,𝐵 denote the bits forming the 𝑘th index. The sequence 𝐮 can be thought as a (𝐵+1)×𝐾 binary array, where each row is called a “bitplane." Without loss of generality, we associate the 0th row with the sign bit and the rows from 1 to 𝐵 with the magnitude bits, with the convention that the first bitplane is the least significant and the 𝐵th bitplane is the most significant.

As anticipated in the Introduction, we fix the quantizer and compare the performance of an SSCC approach based on the concatenation of a conventional entropy coding stage with a conventional channel code, with the performance of a JSCC that merges the two operations into a single linear encoding map. Therefore, in the absence of channel residual errors, both schemes achieve the same minimum distortion due to the quantizer, denoted by 𝐷𝒬(𝐵). Letting 𝐻𝐵(𝑈) denote the entropy rate of 𝐮, measured in bit per quantization index, we have that the point 𝐻𝑏=𝐵(𝑈)𝐶,PSNR=10log10𝐷𝒬(𝐵)(3) is the best achievable point for any scheme based on the fixed quantizer 𝒬𝐵. Following [2], we refer to this point as the “operational Shannon limit” for schemes with fixed quantizers.

We consider uniform scalar quantizers where the interval size is chosen in order to minimize the mean-squared distortion of the Gaussian unit variance i.i.d. source, for a fixed number 2𝐵+1 of intervals. In [3], Ziv showed that a coding scheme formed by a scalar uniform quantizer followed by entropy coding yields a rate penalty of no more than 0.754 bits per sample with respect to the 𝑅(𝐷) limit. Thus, constraining the quantizer to be a uniform scalar quantizer should cost no more then a 0.754/𝐶 channel symbols per source symbols.

In Figure 2 we compare the PSNR versus 𝑏 curves for the Shannon limit, the Ziv bound and the operational Shannon limit for a family of optimized uniform scalar quantizers with 𝐵=1,2,3,4,5, and 6, and for channel capacity 𝐶=0.5. All results in this paper make use of this family of quantizers.

3. Analysis of the Baseline SSCC Scheme

In this section, we study the performance of nonideal SSCC. First, we consider the performance degradation due to nonideal source and channel codes that operate at source coding rate 𝑅𝑠=𝑅(𝐷)+𝛿𝑠 and channel coding rate 𝑅𝑐=𝐶𝛿𝑐, respectively, where 𝛿𝑠 and 𝛿𝑐 are positive rate gaps. This analysis assumes no errors at the output of the channel decoder.

Then, we introduce the channel decoding error probability, and obtain a distortion upper bound as a function of 𝛿𝑐 and 𝛿𝑠, closely following the analysis of [11]. This analysis is based on the random coding exponent for channel codes, and essentially validates the error-free rate-gap analysis even for moderately large block length 𝐾.

Finally, we consider a very practical scheme, based on the concatenation of arithmetic entropy coding and a conventional binary raptor code. We provide a very accurate semi-analytic approximation for the achievable PSNR of this scheme and show that the achieved results follow closely the error free rate-gap analysis by matching the parameters 𝛿𝑐 and 𝛿𝑠. We also notice that for finite block length 𝐾 the practical scheme suffers from an additional performance degradation, especially visible at high resolution (large PSNR). We quantify this additional performance degradation by looking at the finite-length versus infinite-length error performance of raptor codes.

3.1. Rate-Gap Analysis

Consider a separated scheme that makes use of channel coding at rate 𝑅𝑐=𝐶𝛿𝑐 and source coding at rate 𝑅𝑠=𝑅(𝐷)+𝛿𝑠, where 𝛿𝑐,𝛿𝑠>0 are rate gaps, and where the residual bit-error rate (BER) at the output of the channel decoder is (essentially) zero. Using 𝐷=22(𝑅𝑠𝛿𝑠) and 𝑏=𝑅𝑠/𝑅𝑐 we obtain 𝑏PSNR=(6dB)×𝐶𝛿𝑐𝛿𝑠.(4) We notice that the slope of the straight line characterizing PSNR versus 𝑏 decreases with the channel coding gap 𝛿𝑐, while the source coding rate gap involves only a horizontal shift. As a result, an SSCC whose channel coding stage achieves negligible BER works further and further away from the Shannon limit as PSNR increases (high resolution).

3.2. SSCC with Codes Achieving Positive Error Exponent

In order to take into account channel decoding errors, we modify slightly the approach of [11] and obtain the achievable PSNR lower bound (we omit the details since the derivation follows trivially from [11]): 𝑏PSNR(6dB)×𝐶𝛿𝑐𝛿𝑠10log101+2𝐾𝑏𝐸𝑟(𝐶𝛿𝑐)+1+2𝑏(𝐶𝛿𝑐)2𝛿𝑠,(5) where 𝐸𝑟(𝑅𝑐) denotes the random coding error exponent for a given coding ensemble over the considered transmission channel. Notice that 𝐸𝑟(𝑅𝑐)>0 for all 𝑅𝑐<𝐶, and therefore, the error exponent is positive for all rate gaps 𝛿𝑐>0. For values of 𝐾,𝛿𝑐,𝛿𝑠,𝑏,𝐶 such that 2𝐾𝑏𝐸𝑟(𝐶𝛿𝑐)+1+2𝑏(𝐶𝛿𝑐)2𝛿𝑠1, (5) essentially coincides with (4).

Figure 3 compares (4) and (5) for different values of 𝛿𝑐 and for block length 𝐾=10000 (which is the finite source block length that we will use throughout this paper), and 𝛿𝑠=0.3821. This value of 𝛿𝑠 is chosen in order to match the rate gap attained by the quantizers (see Figure 5). In these results, we considered a binary symmetric channel (BSC) with capacity 𝐶=0.5 (cross-over probability 𝜖=0.11). The exponent 𝐸𝑟(𝑅𝑐) for the BSC can be found, for example, in [12]. For the parameters of Figure 3 we notice that (4) and (5) do not coincide for too small 𝛿𝑐 (e.g., 𝛿𝑐=0.01 in the figure) while they coincide for large enough 𝛿𝑐 (in this case, 𝛿𝑐𝛿𝑐=0.016). For finite but large block lengths, as in this case, the threshold 𝛿𝑐 is given by the minimum value of the channel coding rate gap above which the exponent 𝑇(𝐶,𝛿𝑐,𝛿𝑠,𝐾,𝑏)𝐾𝑏𝐸𝑟(𝐶𝛿𝑐)+1+2𝑏(𝐶𝛿𝑐)2𝛿𝑠 becomes negative. In Figure 4, we plot 𝑇(𝐶,𝛿𝑐,𝛿𝑠,𝐾,𝑏) and 𝐸𝑟(𝐶𝛿𝑐) versus 𝛿𝑐, when the parameters of Figure 3 are considered. It has been observed that for different values of 𝑏 given in the range of Figure 3, 𝛿𝑐 is constant. Thus in Figure 4, we use 𝑏=7.2036.

3.3. SSCC with Arithmetic Coding and Raptor Codes

We provide an accurate approximated analysis of the performance of a practical SSCC scheme that can be regarded as our baseline scheme, since its encoding and decoding complexity is very similar to that of the JSCC scheme examined in the next section. With reference to the block diagram of Figure 1, we consider the concatenation of an optimized uniform scalar quantizer with 𝐵+1 quantization bits with an arithmetic encoder. The resulting entropy-coded bits are then channel encoded using a raptor code of suitable rate.

Figure 1: Conceptual block diagram of the conventional SSCC and the proposed JSCC schemes. The two schemes coincide but for the fact that the concatenation of entropy coding and channel coding (SSCC) is replaced by a single linear encoding block (JSCC).
Figure 2: PSNR versus 𝑏 for the Shannon Limit, the Ziv bound and the operational Shannon limit for the considered family of scalar quantizers for 𝐵=1,,6 and for channel capacity 𝐶=0.5.
Figure 3: The SSCC rate-gap approximation (4) is compared with the PSNR upperbound (5) for 𝐶=0.5 and 𝛿𝑠=0.3821.
Figure 4: 𝑇(𝐶,𝛿𝑐,𝛿𝑠,𝐾,𝑏) and 𝐸𝑟(𝑅𝑐) versus 𝛿𝑐 for 𝐶=0.5, 𝐾=10000, 𝛿𝑠=0.03821, and 𝑏=7.2036.
Figure 5: Performance of the concatenation of arithmetic coding and raptor code for infinite channel coding length, source length 𝐾=10000, and a BSC with 𝐶=0.5. 𝛿𝑠=0.3821, and 𝛿𝑐=0 are the values of rate-gaps for the operational Shannon limit (4). For the base line SSCC, these rate-gap values are empirically found to be 𝛿𝑠=0.3821 and 𝛿𝑐=0.035.

A sufficiently large interleaver is placed between the entropy coding and the channel coding stages, such that the bit decoding errors at the input of the arithmetic decoder can be considered to be i.i.d.. Since the arithmetic encoder has perfect knowledge of the probability distribution of the discrete source 𝐮 at the quantizer output, it can approach very closely the source entropy rate 𝐻𝐵(𝑈) even for moderate source block length 𝐾.

We approximate the performance of such a scheme by assuming that the arithmetic decoder produces random data after the first bit error at its input. Let 𝑀=𝐾𝐻𝐵(𝑈) denote the number of entropy-coded bits produced by the arithmetic encoder. These bits are channel encoded and decoded. Let 𝑚 denote the position of the last correctly decoded bit before the first bit error at the arithmetic decoder input. Under the assumption of i.i.d. bit errors, 𝑚 is a truncated geometric random variable with probability mass function 𝑃(𝑚=𝑖)=1𝑃𝑏𝑖𝑃𝑏(6) for 𝑖=0,1,,𝑀1, and 𝑃(𝑚=𝑀)=1𝑀1𝑖=0(1𝑃𝑏)𝑖𝑃𝑏, where 𝑃𝑏 denotes the BER at the output of the channel decoder. We approximate the number of correctly decoded quantization indices by 𝑚/𝐻𝐵(𝑈) (neglecting integer effects). After the first bit error, the arithmetic decoder produces random symbols distributed as the quantization indices (i.e., according to the given discrete-source probability distribution) but essentially statistically independent of the source sequence. Therefore, the average distortion in this case is given by𝜎2𝑆=𝔼𝑆2𝑆=𝔼2𝑆+𝔼2=1+𝜎2𝒬,(7) where 𝑆 denotes a random variable distributed as the quantizer reconstruction points, and 𝜎2𝒬 denotes its variance. On the other hand, before the first bit error, the system reconstructs the correct quantization points 𝑆 correctly; therefore the average distortion in this case coincides with the quantization distortion 𝐷𝒬(𝐵).

Eventually, the total average distortion of the system is approximated by 𝐷𝑃(𝑚=𝑀)𝐷𝒬+(𝐵)𝑀1𝑖=01𝑃𝑏𝑖𝑃𝑏𝐷𝒬(𝐵)𝑖/𝐻𝐵(𝑈)𝐾+𝜎2𝐾𝑖/𝐻𝐵(𝑈)𝐾.(8) The approximate analysis requires the evaluation of the residual BER at the channel decoder output. This can be obtained by simulation of the stand-alone raptor code with given finite length, or by using any suitable approximation or semi-analytic technique, such as Density Evolution or EXIT chart methods [1317]. In particular, we make use of the EXIT chart approximation, reviewed in Appendix B.

In Figure 5 we report the PSNR versus 𝑏 (obtained by using (8)) for different values of 𝐵 when 𝐾=10000, a BSC with capacity 𝐶=0.5 and where the raptor code output BER is approximated via the EXIT chart method. These results assume implicitly infinite channel coding block length. In order to validate the approximated distortion analysis of (8), we run simulations of the arithmetic decoder and quantization reconstruction, fed with the quantization bits corrupted by independent bit errors at a rate equal to the raptor code output BER. As seen in Figure 5, the results of the simulated arithmetic decoder match remarkably well the approximation (8), thus showing that the arithmetic decoder indeed produces approximately random data after the first bit error.

The results in Figure 5 show that for the case of very large channel coding block length the performance of the baseline SSCC scheme is remarkably close to the operational Shannon limit and therefore the scheme is hard to beat by any scheme using the same set of quantizers. However, the picture changes when we consider a finite channel coding block length. In particular, we consider independent encoding of each source block, so that the system latency is dictated by the source block length 𝐾 and not by the channel coding block length. This corresponds to choosing the raptor code input bits block length equal to 𝑀=𝐾𝐻𝐵(𝑈). For the system parameters as before, the PSNR results in this case are shown in Figure 6. We notice that the finite channel coding block length yields an additional degradation that increases with PSNR.

Figure 6: Performance of the concatenation of arithmetic coding and a raptor code for finite channel coding length, source length 𝐾=10000, and a BSC with 𝐶=0.5.

We can explain and quantify the increasing bandwidth expansion gap Δ𝑏(𝐵) shown in Figure 6 as follows. Let 𝑅inf and 𝑅n denote the channel coding rates needed by the raptor code to reach a small BER such that the effective distortion is virtually identical to the quantization distortion. For example, Figure 7 plots the PSNR corresponding to the distortion (8) as a function of 𝑃𝑏. We notice that for 𝑃𝑏=107 the quantization distortion (corresponding to the maximum achievable PSNR) is essentially reached. Then, Figure 8 plots the raptor code BER for the BSC with capacity 𝐶=0.5, as a function of the reciprocal of the channel coding rate 1/𝑅𝑐. Notice that the raptor code is a rateless code, and therefore we can generate as many coded symbols as we like. In order to generate Figure 8 we keep the channel parameter 𝜖=0.11 fixed (corresponding to 𝐶=0.5) and run encoding and decoding for smaller and smaller coding rates. The infinite block length case is obtained by using the EXIT chart approximation.

Figure 7: Output PSNR as a function of the channel decoding residual BER 𝑃𝑏.
Figure 8: Raptor code output BER for the infinite block length case (EXIT approximation) and for the finite length case, obtained by simulation, as a function of the reciprocal of the coding rate for 𝐶=0.5. The Finite length is taken to be 27606=𝐾𝐻2(𝑈) which is the approximate number of bits at the output of arithmetic encoder, since the Gaussian block has entropy rate 2.7606 when quantized with 𝒬2.

Figure 8 shows that the target BER of 107 is reached at certain rates 𝑅inf and 𝑅n for the cases of infinite and finite block length, respectively, and allows us to find the difference 1/𝑅inf1/𝑅n, shown in the figure.

Finally, we can quantify the bandwidth expansion gaps shown in Figure 6 by noticing that, since 𝑏=𝐻𝐵(𝑈)/𝑅𝑐, we have Δ𝑏(𝐵)=𝐻𝐵1(𝑈)𝑅inf1𝑅n.(9) It is clear that the gap Δ𝑏(𝐵) is increasing with the quantizer resolution 𝐵, and therefore with PSNR. This is a further confirmation of the fact that, in practice, it becomes more and more difficult to approach the Shannon limit as the resolution increases.

4. Joint Source-Channel Coding Scheme

In this section we describe the encoder and decoder of the proposed JSCC scheme. Then, we discuss an incremental redundancy rate allocation procedure that allows the optimization of the scheme. We hasten to say that this rate allocation procedure is run off-line, and serves to design the coding scheme for given source and channel statistics. More generally, an adaptive scheme that allocates coded bits to the bitplanes on the fly, depending on the empirical entropy rate of the source and on the capacity of the channel may be envisaged in a universal JSCC setting, where the source statistics are not known a priori and are learned instead from the source sequence itself. However, we do not pursue this approach here.

Figure 9 shows the encoder block diagram. Each bitplane (row of the binary array 𝐮 of quantization indices produced by the quantizer), is mapped into a sequence of coded symbols. Here we consider binary coding, and a BSC. Letting 𝐮𝑝 denote the 𝑝th row of 𝐮, the corresponding block of coded symbols is given by 𝐱𝑝=𝐮𝑝𝐆𝑝, where 𝐆𝑝 is a suitable encoding matrix of size 𝐾×𝑁𝑝. Then, the encoded blocks 𝐱0,,𝐱𝐵 are transmitted in sequence over the BSC. The resulting bandwidth expansion factor is 𝑏=𝐵𝑝=0𝑁𝑝𝐾.(10) Given the source symmetry, it is clear that the sign bit is equiprobable and has entropy 𝐻(𝑈0)=1. Furthermore, it is independent of the magnitude bits. Hence, the target nominal rate for the encoder of the sign bit is 𝐾/𝑁0=𝐶. As for the 𝑝th magnitude bit, we allocate a nominal target rate equal to 𝐾/𝑁𝑝=𝐶/𝐻(𝑈𝑝𝑈𝑝+1,,𝑈𝐵), where 𝐻(𝑈𝑝𝑈𝑝+1,,𝑈𝐵) denotes the conditional entropy rate of the 𝑝th bitplane, conditioned on the bitplanes 𝑝+1,𝑝+2,,𝐵. It follows that the nominal bandwidth expansion is given by𝑏=𝐾/𝐶+𝐵𝑝=1𝑈𝐾𝐻𝑝𝑈𝑝+1,,𝑈𝐵/𝐶𝐾=1+𝐵𝑝=1𝐻𝑈𝑝𝑈𝑝+1,,𝑈𝐵𝐶=𝐻𝐵(𝑈)𝐶,(11) which is optimal.

Figure 9: Diagram of the proposed JSCC encoder.

In order to be able to decode at these rates, we consider a multistage decoder as shown in Figure 10, that considers the bitplanes in sequence. The sign bit is independently decoded. The magnitude bits are decoded in sequence, starting from the 𝐵th plane. At each decoding stage 𝑝, the hard decisions of the already decoded planes are used by the BP decoder to compute the conditional a priori probabilities of the 𝑝th bitplane, as explained in Appendix A. Assuming that at each level 𝑝, the previous levels are correctly decoded, then the rates 𝐾/𝑁𝑝=𝐶/𝐻(𝑈𝑝𝑈𝑝+1,,𝑈𝐵) are achievable.

Figure 10: Multistage decoder for successive bitplane decoding and source reconstruction.

In practice, due to the fact that the raptor codes do not achieve sufficiently low BER if their rate is too close to the nominal rate limit, we must allocate the rates allowing for some gap. The rate allocation problem is made more complicated by the fact that in the multistage decoder the decoding of the different planes is not independent. In particular, if the 𝑝th plane fails with many bits in errors, then it is very likely that all the planes 𝑝1,𝑝2,,1 will also fail, since their decoders are fed with incorrect a priori conditional probabilities. We will address the problem of rate allocation for the multistage decoder at the end of this section.

Next, let us examine in more detail how encoding of the 𝑝th plane is implemented with raptor codes [10]. Raptor codes can substantially be viewed as an extension of Luby Transform codes (LT codes) [18], since they are based on the concatenation of an outer linear code (in our case we consider low-density-parity check (LDPC) codes) with an inner LT code (see Appendix A for details). We use raptor codes in systematic form. In particular, let 𝐒𝑝 be a 𝐾×𝐾 full-rank binary matrix given by 𝐒𝑝=𝐆ldpc𝐀𝑝, where 𝐀𝑝 is a submatrix of the LT code generator matrix at encoding level 𝑝 and 𝐆ldpc is the generator matrix of the LDPC code (see [10] for details). The encoder produces a vector of 𝐾 intermediate symbols, denoted by 𝐮𝑝=𝐮𝑝𝐒𝑝1. Then, the intermediate symbols are expanded by high-rate LDPC encoding, into 𝐮𝑝=𝐮𝑝𝐆ldpc. Finally, the encoded symbols 𝐱 are obtained from 𝐮, by applying nonsystematic rateless encoding, that is, the symbols 𝑥1,𝑥2,,𝑥𝑁𝑝 are produced in sequence, and each 𝑥𝑖 is given as the sum of elements of 𝐮 selected at random according to the LT degree distribution Ω.

Notice that 𝐮𝑝=𝐮𝑝𝐒𝑝. Therefore, in the Tanner graph representing the code [19, 20] the nodes corresponding to source symbols 𝐛𝑝 have a degree distribution identical to that of a standard nonsystematic raptor code. Furthermore, although 𝐒𝑝 is sparse, its inverse is sufficiently dense such that the symbols 𝐮𝑝 are close to being uniform and random i.i.d. Notice that this is essential to the scheme, since in order to drive the channel with the correct input distribution we need to send the nonsystematic symbols 𝐱𝑝 through the channel, and their distribution should be as close as possible to i.i.d. and equiprobable.

A key component in the systematic raptor code design consists of finding a suitable nonsingular 𝐾×𝐾 matrix 𝐒𝑝, with given column weight distribution, and such that its inverse looks as much as possible like a random binary matrix. As for the LDPC code (often referred to as the “precode” in the raptor coding literature), we used a regular code with parameters (2,100).

Let us focus now on decoding and source reconstruction. The multistage decoder of Figure 10 is based on BP at each stage 𝑝 in order to approximately compute the symbol-by-symbol posterior marginal Log-Likelihood Ratios (LLRs) {𝜆𝑝,𝑘𝑘=1,,𝐾} defined as 𝜆𝑝,𝑘𝑃𝑢=log𝑝,𝑘=0𝐲𝑝,𝐮𝑝+1,,𝐮𝐵𝑃𝑢𝑝,𝑘=1𝐲𝑝,𝐮𝑝+1,,𝐮𝐵,(12) where 𝐲𝑝 denotes the channel output corresponding to the input 𝐱𝑝, and the conditioning is with respect to the already decoded bitplanes. This is obtained by feeding the hard decisions from the planes 𝑝+1,,𝐵 to the BP decoder at level 𝑝. An iterative version of the multistage decoder where soft messages in the form of a posteriori LLRs are exchanged instead of hard decisions was also considered, but it was observed that this does not provide any significant improvement and therefore was not pursued further, given its much greater complexity.

The information about the already decoded bitplanes is incorporated into the BP decoder for bitplane 𝑝 in the following way. As explained in Appendix A, the BP algorithm is initialized with input messages at all the source and coded nodes in the Tanner graph of the code. The coded nodes (corresponding to the coded symbols 𝐱𝑝), receive their input message from the corresponding channel observation. In the case of a BSC, this is given by 𝜇𝑝,𝑖=(1)𝑦𝑝,𝑖log1𝜖𝜖,𝑖=1,,𝑁𝑝.(13) The source nodes (corresponding to the source bits 𝐮𝑝), are associated with the input messages 𝜈𝑝,𝑘𝑃𝑢=log𝑝,𝑘=0̂𝑢𝑝+1,𝑘,,̂𝑢𝐵,𝑘𝑃𝑢𝑝,𝑘=1̂𝑢𝑝+1,𝑘,,̂𝑢𝐵,𝑘,(14) where ̂𝑢𝑝+1,𝑘,,̂𝑢𝐵,𝑘 are the hard decisions obtained from the previous stages.

The BP decoder at each stage runs for a given desired number of iterations, and eventually outputs both hard decisions to be passed to the next stage and soft outputs in the form of the posterior LLRs given by (12). Once all bitplanes have been decoded, the source is reconstructed as follows. Consider without loss of generality the inverse quantization mapping 𝒬𝐵1𝑢𝑘=(1)𝑢0,𝑘Δ𝐵2([𝐵𝑝=1𝑢𝑝,𝑘2𝑝]+1)(15) that yields the mid-point of each quantization interval given the set of quantization bits.

Then, we can either consider hard reconstruction, which consists of using the hard decisions ̂𝑢𝑝,𝑘 in (15), or soft reconstruction, which makes use of the (approximate) posterior LLRs in order to compute the minimum-mean-square-error (MMSE) estimator of the source samples given the channel output, that is, the conditional mean estimator ̂𝑠𝑘=𝔼[𝑠𝑘|𝐲]. Treating the decoder estimated posterior LLRs as if they were the true posterior LLRs, we obtain ̂𝑠𝑘=Δ𝐵2𝜆tanh0,𝑘2([𝐵𝑝=12𝑝1+𝑒𝜆𝑝,𝑘]+1).(16)

In Appendix A, we prove an interesting isomorphism between the BP decoder of the joint source-channel problem as described above and a related standard channel coding problem. Let us focus on a single binary independent source sequence 𝐮 of length 𝐾, with probabilities 𝑝𝑘𝑃(𝑢𝑘=1) for 𝑘=1,,𝐾. This is encoded into a binary codeword 𝐱=𝐮𝐆, of length 𝑁, where 𝐆 is a 𝐾×𝑁 raptor encoder as previously described. Let us transmit 𝐱 through a BSC with cross-over probability 𝜖, and let 𝐲=𝐱𝐞 denote the corresponding output. The result holds for any binary input symmetric output channel, but here we focus on the BSC for simplicity of exposition. Then, the BP decoder for this problem is isomorphic to a decoder for the following related channel coding problem: consider transmission of the all-zero codeword from the systematic code with generator matrix [𝐈|𝐆], of size 𝐾×(𝐾+𝑁) over a channel that for the first 𝐾 components operates as 𝑦𝑘=𝑥𝑘𝑢𝑘,𝑘=1,,𝐾,(17) where 𝑢𝑘 is the 𝑘th source symbol, and for the remaining 𝑁 components operates as 𝑦𝐾+𝑛=𝑥𝐾+𝑛𝑒𝑛,𝑛=1,,𝑁.(18) In order words, there exists a one-to-one mapping of the messages of the BP decoder for the first problem (joint source channel) and the messages of the BP decoder for the second problem (channel only), for every edge of the decoder graph and every decoder iteration.

This means that the source-channel BP decoding can be analyzed, for example, by using the EXIT chart method, by considering the associated “virtual” channel, where the all-zero codeword from the associated systematic code is transmitted partly on a binary additive noise channel with noise realization identical to the source realization of the source-channel problem, and partly on the same BSC (with the same noise realization) of the source-channel problem. We use this BP isomorphism result in order to derive a simple EXIT chart analysis of the BP decoder at each stage of the multistage decoder, under the assumption that the hard decisions from previous stages are correct.

4.1. Rate Allocation Algorithm

The rate allocation of each bitplane encoder is established offline by using the greedy algorithm described below. Again, we notice that we do not consider adaptive rate allocation: given the source and channel statistics, we run the greedy allocation algorithm in order to design the JSCC coding scheme.

Allocating the number of coded symbols according to the optimal limits, that is, 𝑁𝑝𝑈=𝐻𝑝𝑈𝑝+1,,𝑈𝐵𝐾𝐶(19) yields very bad performance even at very large block length, since it is known that raptor codes converge to very small BER at a fixed (small) gap from capacity on general binary-input symmetric output channels [14]. Therefore, we have to allow for some increment in the coded block lengths, normally referred to as “overhead” in the raptor coding literature. The problem is how to allocate a total overhead among the 𝐵+1 stages. In order to do so, we propose the following greedy overhead allocation algorithm.

We initialize the lengths 𝑁𝑝(0) according to their nominal value given by (19). At each iteration of the allocation algorithm, we allocate a given number Δ𝑁 of additional coded symbols to one of the 𝐵+1 codes. Let 𝐷(𝑁0,,𝑁𝐵) denote the achieved average distortion of the JSCC scheme when coding lengths 𝑁0,,𝑁𝐵 are used and let 𝐷(0)=𝐷(𝑁0(0),,𝑁𝐵(0)). Then, for iterations 𝑖=1,2,, do the following.

(i)For all 𝑝=0,,𝐵, compute 𝐷𝑝(𝑖)=𝐷(𝑁0(𝑖1),,𝑁𝑝(𝑖1)+Δ𝑁,,𝑁𝐵(𝑖1)).(20)(ii)Find ̂𝑝=argmin𝑝=0,,𝐵𝐷𝑝(𝑖).(iii)Let 𝑁𝑝(𝑖)𝑁𝑝(𝑖1) for all 𝑝̂𝑝, and 𝑁(𝑖)̂𝑝𝑁(𝑖1)̂𝑝+Δ𝑁.(iv)If |𝐷(𝑖)̂𝑝𝐷𝒬(𝐵)|𝛿, exit. Otherwise, let 𝐷(𝑖)𝐷(𝑖)̂𝑝 and go back to 1. The quantity 𝛿>0 is the tolerance within which we wish to achieve the target quantization distortion.

In essence, the above algorithm allocates at each iteration a packet of Δ𝑁-coded bits to the bitplane raptor encoder that yields the largest decrease in the overall average distortion. The distortion can be computed either by Monte Carlo simulation, or by using the EXIT chart approximation. The latter method is much faster, but cannot take into account the effect of finite block length and the error propagation between the stages of the multistage decoder.

In Figure 11, we report the comparison between the finite length simulation and the infinite length EXIT approximation for the same setting of 𝐵 ranging from 1 to 6, the BSC with capacity 𝐶=0.5, and source block length 𝐾=10000 used throughout the paper. As we can see, the two cases yield almost identical results. This allows us to use the infinite length EXIT approximation to estimate (with very good approximation) a suitable rate allocation among the 𝐵+1 stages for the finite length case. Finally, in Figure 12, for the case of 𝐵=3, we report the relative overhead 𝑁𝑝(𝑖)/𝑁𝑝(0) versus PSNR(𝑖)=10log10𝐷(𝑖) produced by the greedy allocation algorithm. As one might expect, the greedy algorithm starts increasing the overhead of the sign bitplane and then continues from the most significant to least significant magnitude bitplanes. Eventually, each bitplane is allocated a coding length between 12% and 18% larger than the nominal length (19), in line with standard raptor coding reported results. Furthermore, notice that this scheme tends to give larger overheads to most significant bitplanes, that is, it implicitly implements unequal error protection across the layers, which is a very well-known design approach for multilevel coded modulation with multistage decoding [21].

Figure 11: PSNR versus 𝑏 comparison of JSCC scheme for finite block length simulation and infinite block length EXIT approximation for 𝐶=0.5.
Figure 12: Relative overhead 𝑁𝑝(𝑖)/𝑁𝑝(0) versus PSNR(𝑖)=10log10𝐷(𝑖) produced by the greedy allocation algorithm for the case 𝐵=3. We notice that the bitplane coding overheads are incremented one at a time, in sequence.

5. Numerical Results

In this section, we provide both finite length and infinite length results. We considered source block length 𝐾=10000 for the finite length results. In all the numerical results of this paper, we considered raptor codes with the “LT” degree distribution [14]Ω(𝑥)=0.008𝑥+0.494𝑥2+0.166𝑥3+0073𝑥4+0.083𝑥5+0.056𝑥8+0.037𝑥9+0.056𝑥19+0.025𝑥65+0.003𝑥66.(21) As outer code we used a regular high-rate LDPC code with degrees (2,100) and rate 0.98. The source symbols are estimated after running 100 iterations of the decoding algorithm.

We would like to stress the fact that the LT and LDPC degree distribution polynomials have been chosen without considering any optimization method and that we have averaged over the ensemble of randomly generated raptor codes with the given parameters. In practice, one would carefully design an LDPC graph with good properties for the desired length 𝐾 and degree distributions.

This section is subdivided in two parts. In the first part we described the results obtained by varying of the bandwidth expansion factor, when the capacity of the BSC is fixed to 𝐶=0.5, corresponding to crossover probability 𝜖=0.11. The aim of this section is to compare the performance of families of SSCC and JSCC codes for different values of 𝑏, and to see how they approach the operational Shannon limit.

In the second part we examine the behavior of a single fixed code, designed for a nominal channel crossover probability and target PSNR, when we vary the channel crossover probability. This set of results illustrates the robustness of a given coding scheme to nonideal channel conditions.

In both subsections we provide results for infinite and finite codeword length cases. The infinite case results have been generated by using the EXIT chart approximation of Appendix B.

5.1. Approaching the Operational Shannon Limit

In Figure 13, we plot the performance comparison between the proposed JSSC scheme and the SSCC scheme, when infinite codeword length is considered. In this case, the SSCC scheme outperforms the proposed scheme in the sense that it reaches the quantization distortion at slightly lower values of 𝑏, for all 𝐵=1,,6. The SSCC schemes show a very sharp transition (“all or nothing” behavior). In contrast, the JSCC schemes reach their quantization PSNR more gradually: as we increase the overhead, the performance gradually improves.

Figure 13: JSCC and SSCC infinite block length comparison for 𝐵=1,2,3,4,5,6 and 𝐶=0.5.

The situation radically changes when we consider finite codeword length. In Figure 14 we plot the performance of JSSC and SSCC schemes for finite block length. In this case, the JSCC schemes outperform their SSCC counterpart. In particular, as we have already remarked, the JSCC performance is almost identical to that for infinite block length, while the SSCC suffers much more evidently from the residual BER due to finite length practical codes. This also hints that the EXIT approximated analysis yields very faithful results for the JSCC scheme, while it provides optimistic results for the SSCC scheme. This can be explained by the fact that the BER performance of infinite length codes exhibits a very sharp “waterfall” threshold, beyond which the BER is zero, while for finite length the waterfall is smoother.

Figure 14: JSCC and SSCC finite block length comparison for 𝐵=1,2,3,4,5,6 and 𝐶=0.5.

An important advantage of the JSCC is that the PSNR value gradually increases as 𝑏 increases, while a sharp threshold effect can be seen in the case of SSCC. In [5] it was shown that, with natural sources such as images, PSNR values lower than peak value were still perceptually acceptable for the JSCC scheme, while the SSCC scheme degrades abruptly also from the perceptual viewpoint.

5.2. Robustness

In the previous set of results, we have fixed the channel capacity and the (quantized) source entropy rate and we have examined families of codes operating at different (𝑏,PSNR) points. Now, we take a complementary view and fix the channel code while letting the channel capacity vary. This setting is relevant when a given code, designed for some nominal channel conditions, is used on a channel of variable quality, and therefore we are interested in the robustness of the performance with respect to the channel parameters. Also, this setting is more akin to the standard way of studying the performance of channel coding, where the BER is plotted as a function of the channel parameters (𝜖 in the case of a BSC), for a given channel code.

In order to have a fair comparison between the two schemes, the bandwidth expansion factor (i.e., the code used) has been fixed in the following way: we keep the minimum value of 𝑏 such that both schemes reach the quantization PSNR on the previous set of results (see Figure 14). In particular we keep 𝑏=4.3565 and 𝑏=14.8079 for 𝐵=1 and 𝐵=6, respectively. Since the JSCC scheme needs lower values of 𝑏 to reach the quantization PSNR in both cases, we add some extra bits to reach the same values of 𝑏.

We have examined the two extreme cases of low resolution (𝐵=1) and high resolution (𝐵=6). In Figures 15 and 16 we notice that in both cases the JSCC scheme outperforms the SSCC scheme in terms of PSNR. Moreover, as expected, the PSNR of the SSCC scheme degrades sharply, while the PSNR of the JSCC scheme degrades gradually as the channel crossover probability increases. For example, considering 𝐵=6, if 𝜖 increases from its nominal value 0.11 to a higher value 0.115 the JSCC scheme loses about 6 dB in PSNR, while the SSCC loses 24 dB. We interpret this sharp degradation as an effect of the catastrophic behavior of the entropy coding stage in SSCC, which is greatly mitigated by the linear coding stage in the proposed JSCC scheme.

Figure 15: Comparison of performance degradation of JSCC and SSCC as the cross-over probability of the BSC increases for 𝐾=10000 and 𝐵=1.
Figure 16: Comparison of performance degradation of JSCC and SSCC as the cross-over probability of the BSC increases for 𝐾=10000 and 𝐵=6.

6. Conclusions

Unlike most JSCC schemes presented in the literature, which are carefully targeted for specific source and channel pairs, the scheme proposed here can closely approach the rate-distortion separation limit for virtually any well-behaved source under quadratic distortion and any symmetric channel, owing to the universality of entropy-coded quantization and the optimality of linear codes for both data compression and channel coding. Furthermore, we have demonstrated that beyond operating close to optimal, the proposed scheme is better and more robust than a separated approach, especially in the practical case of finite block length.

We wish to conclude this paper with some considerations for future work. Following [5], the JSCC scheme can be applied to any class of sources for which efficient transform coding has been designed. In particular, images, audio and video are natural and relevant candidates. The scheme takes advantage of the know-how and careful source statistical characterization developed in designing lossy coding standards, and preserves the structure of the transform coder. This makes it easy to introduce the JSCC scheme into practical applications, for example, by introducing a trans-coding stage at the physical layer, while preserving the network architecture and the source coding standards developed at the application layer.

Although we have not pursued this aspect here, the bitplane layered encoding and multistage successive decoding architectures of the proposed scheme lend themselves quite naturally to a multiresolution, or “embedded," implementation. For example, it is sufficient to use an embedded scalar quantizer in order to obtain such a scheme: bitplanes will be transmitted in sequence, and the resolution of the reconstructed source improves at each additional layer received.

A different route for future investigation involves the use of nonbinary linear codes. Also for the proposed JSCC scheme, the gap from the Shannon limit increases with the PSNR (resolution). This is due to the fact that each layer needs to be encoded with a fixed overhead, such that the overall overhead increases with the number of layers. As an alternative, we may wish to use a nonbinary raptor code operating over symbols of 𝐵+1 bits, and mapping directly the quantization indices over the channel symbols. The hope is that the overhead of such nonbinary codes does not depend (or at least depends in a sublinear way) on 𝐵. This may lead to better bandwidth expansion gaps at high resolution.


A. Raptor Codes and BP Decoding

Raptor codes [10] are a class of rateless codes designed for transmission over erasure channels with unknown capacity. They are an extension of Luby Transform codes (LT codes) [18], since they are based on the concatenation of an outer linear code (precode) with an inner LT code. To be compliant with the raptor codes terminology, let us define the input symbols as the symbols generated from the source symbols by the linear precode encoder, and output symbols as the symbols generated from the input symbols by the LT encoder.

Formally a raptor code is defined by the triplet (𝐾,𝒞,Ω(𝑥)), where 𝐾 is the source symbols length, 𝒞 is a linear encoder 𝒞𝔽𝐾2𝔽𝑛2 and Ω(𝑥)=𝑛𝑗=1Ω𝑗𝑥𝑗 represents the generating function of the probability distribution Ω1,,Ω𝑛 on {1,,𝑛} that generates the LT codewords.

The (𝑛,Ω(𝑥)) LT code ensemble corresponds to the ensemble of 𝑛×𝑁 binary matrices, for all 𝑁=1,2,, with columns randomly generated according to Ω(𝑥), where each matrix yields an encoding mapping.

The operations to generate a generic column of an LT encoding matrix can be summarized in two steps:

(1)sample the distribution {Ω1,,Ω𝑛} to obtain a weight 𝑤 between 1 and 𝑛;(2)generate the column (𝑣1,,𝑣𝑛) uniformly at random from all 𝑛𝑤 binary vectors of weight 𝑤 and length 𝑛; As shown in [14], it is possible to adapt raptor codes for transmission over memoryless symmetric channels. The decoding is performed by using the classical belief propagation algorithm (see [14] for details).

In this paper, we exploit a high rate LDPC code as the precoder, then the 𝑛 input nodes can also be seen as the 𝑛 bitnodes of the LDPC code.

A.1. BP Decoder Isomorphism

As anticipated in Section 4, there is an interesting isomorphism between the standard channel coding problem when an all zero codeword is transmitted (we refer to this as Scheme 𝐴) and the joint source-channel coding problem as defined at each stage of the multistage decoder (we refer to this as Scheme 𝐵).

Consider the following unified scheme. Let the vector [𝐰𝐳] be the output block when a vector 𝐰 of length 𝐾 is channel coded with a systematic raptor code and where 𝐳 has length 𝑁 (i.e., the raptor code rate is equal to (𝐾/(𝐾+𝑁))). Let us assume that the output block is transmitted over a hybrid channel such that the first 𝐾 output symbols are distorted by noise vector 𝐮 where 𝑝𝑘=𝑃(𝑢𝑘=1) for 𝑘=1,,𝐾 and the remaining 𝑁 output symbols are distorted by the BSC channel noise vector 𝐞 where 𝑃(𝑒𝑘=1)=𝜖,𝑘=1,,𝑁. Then, the hybrid channel is characterized by many BSCs with crossover probabilities 𝑝1,,𝑝𝐾 and 𝜖. The channel observation block is then composed of 𝐲=𝐰𝐮𝐳𝐞.(A.1) Notice that when 𝐰=𝟎 then 𝐳=𝟎, and 𝐮𝐲=[𝐞]. In this case the unified scheme becomes Scheme A. On the other hand, when 𝐰=𝐮, then 0𝐲=[](𝐳𝐞), and the unified scheme becomes Scheme B. Let us consider the 𝑙th iteration of the BP decoder. We use the following notation (see Figure 17):

Figure 17: Raptor code factor graph for the application of belief propagation.

(i)𝑚(𝑙)v,𝑜,𝑚(𝑙)o,𝑣 are the messages passed from the 𝑣th input node to the 𝑜th output node and from the 𝑜th output node to the 𝑣th input node, respectively, of the LT-decoder;(ii)𝑚(𝑙)v,𝑐,𝑚(𝑙)c,𝑣 are the messages passed from the 𝑣th input node (the so called variable node in classical LDPC notations) to the 𝑐th check node and from the 𝑐th check node to the 𝑣th input node, respectively, of the LDPC decoder;(iii)𝛿(𝑙),𝑣ldpc is the message generated from the 𝑣th LDPC input node and passed to the corresponding input node of the LT-decoder;(iv)𝛿(𝑙),𝑣lt is the message generated from the 𝑣th LT input node and passed to the corresponding input node of the LDPC decoder; and(v)𝑍𝑜 is the LLR of the 𝑜th output symbol received from noisy channel; notice for 𝑜=1,𝐾,𝑍𝑜=(1)𝑢𝑜𝑤𝑜log((1𝑝𝑜)/𝑝𝑜) while 𝑍𝑜=(1)𝑒𝑜𝑧𝑜log((1𝜖)/𝜖) for 𝑜=𝐾+1,𝑁+𝐾.

Using the notation above, we can define the updating rules for the LT and the LDPC decoders separately.

For the LT decoder, at the 𝑙th iteration, we have 𝑚tanh(𝑙)o,𝑣2=tanh(1)𝑢𝑜𝑤𝑜log(1𝑝𝑜)/𝑝𝑜2𝑣𝑣𝑚tanh(𝑙)v,𝑜2𝑚,𝑜=1,,𝐾,tanh(𝑙)o,𝑣2=tanh(1)𝑒𝑜𝑧𝑜log(1𝜖)/𝜖2𝑣𝑣𝑚tanh(𝑙)v,𝑜2𝑚,𝑜=𝐾+1,,𝑁+𝐾,(𝑙+1)𝑣,𝑜=𝛿(𝑙),𝑣ldpc+𝑜𝑜𝑚(𝑜𝑙),𝑣,𝑣=1,,𝑛,(A.2) where the product is taken over all input nodes adjacent to 𝑜 other than 𝑣 and the summation is taken over all output nodes adjacent to 𝑣 other than 𝑜. For 𝑙=0, we set 𝑚(0)𝑣,𝑜=0 for 𝑣=1,,𝑛.

For the LDPC decoder, at the 𝑙th iteration, we have 𝑚(𝑙)𝑣,𝑐=𝛿0if𝑙=0,(𝑙),𝑣lt+𝑐𝑐𝑚(𝑐𝑙1),𝑣𝑚if𝑙0,𝑣=1,,𝑛,(A.3)tanh(𝑙)c,𝑣2=𝑣𝑣𝑚tanh(𝑙)v,𝑐2,𝑐=1,,𝑛𝐾.(A.4) The messages 𝛿(𝑙),𝑣lt and 𝛿(𝑙),𝑣ldpc passed from the LT to the LDPC decoder and from the LDPC to the LT decoder, respectively, are defined by𝛿(𝑙),𝑣lt=𝑜𝑚(𝑙)𝑜,𝑣𝛿,𝑣=1,,𝑛,(A.5)(𝑙),𝑣ldpc=𝑐𝑚(𝑙)𝑐,𝑣,𝑣=1,,𝑛,(A.6) where the summation is taken over all output nodes adjacent to 𝑣 or overall checknodes adjacent to 𝑣.

The overall factor graph (FG) of the proposed decoding algorithm is displayed in Figure 17 for the case of JSCC 𝐰=𝐮. We use Wiberg's notation (see [20]), that is, the FG is a bipartite graph with variable nodes (circles) and function nodes (boxes). A variable node is connected to a function node if the corresponding variable is an argument of the corresponding factor [20]. In our case, the variable nodes correspond to the input symbols of the LT code and to the input symbols of the LDPC code. The function nodes correspond to the output symbols of the LT code and to the check nodes of the LDPC code. To explicitly represent the messages passed between the two decoders at each stage, we split the graph into two parts connected to each other by “equality constraints.” Finally, to distinguish between channel outputs received from the equivalent channel and channel outputs received from the noiseless channel, we explicitly represent the source symbols 𝐮=(𝑢1,,𝑢𝐾), and the output 𝐲=(𝑦𝐾+1,,𝑦𝐾+𝑁) of the noisy channel with input 𝐳. Let us also denote the input block by 𝐢.

As we can see from the updating rules described above and from the factor graph, the decoder can be modeled as two independent factor graphs that exchange information between themselves after each iteration.

Theorem 1. The magnitude of the BP messages exchanged between input and output symbols for the same Tanner graph is the same for both Schemes A and B. In particular, at BP round 𝑙, the relationship between the messages passed in Schemes A and B is 𝐵𝑚(𝑙)𝑣,𝑜=(1)𝑖𝑣𝐴𝑚(𝑙)𝑣,𝑜,𝐵𝑚(𝑙+1)𝑜,𝑣=(1)𝑖𝑣𝐴𝑚(𝑙+1)𝑜,𝑣,(A.7) (where 𝐴𝑚 is used to denote messages for Scheme A and 𝐵𝑚 is used for Scheme B).

Belief propagation equations (A.2)–(A.4) can be also written in an explicit form by using a map 𝛾 from the real numbers (,) to 𝐹2×[0,) defined by 𝛾(𝑥)(sgn(𝑥),lntanh(|𝑥|/2)). Clearly 𝛾 is bijective and there exists an inverse 𝛾1. Moreover, 𝛾(𝑥𝑦)=𝛾(𝑥)+𝛾(𝑦) where addition is component-wise in 𝐹2 and in [0,). Another important property is as follows: 𝛾1(𝑖𝛾(1)𝑏𝑖𝐵𝑖)=𝑖(1)𝑏𝑖𝛾1(𝑖𝛾𝐵𝑖).(A.8)

Rewriting (A.2), (A.4) in terms of the 𝛾 mapping and using (A.8), we have 𝑚(𝑙)𝑜,𝑣=𝛾1(𝑣𝑣𝛾(𝑚𝑣(𝑙),𝑜)+𝛾(1)𝑢𝑜𝑤𝑜0𝑥0𝑓10𝑓0𝑚),(𝑙)𝑜,𝑣=(1)𝑒𝑜𝑧𝑜𝛾1(𝑣𝑣𝛾(𝑚𝑣(𝑙),𝑜𝑚)+𝛾(𝜉)),(𝑙)𝑐,𝑣=𝛾1(𝑣𝑣𝛾(𝑚𝑣(𝑙1),𝑐)),(A.9) where 𝒫𝑜log((1𝑝𝑜)/𝑝𝑜) and 𝜉log((1𝜖)/𝜖).

Similarly, we have 𝑚(𝑙)𝑜,𝑣=(1)𝑛𝑜𝑧𝑜𝛾1(𝑣𝑣𝛾(𝑚𝑣(𝑙),𝑜)+𝛾(𝜉)),(A.10)𝑚(𝑙)𝑐,𝑣=𝛾1(𝑣𝑣𝛾(𝑚𝑣(𝑙1),𝑐)).(A.11)

Proof. To prove the theorem, the BP equations for each scheme will be given explicitly and then starting with the 0th round, the relationship between the messages corresponding to different schemes will be verified. The proof follows by induction, after showing that, given the rule holds for round (𝑙), it also hold for round (𝑙+1).
BP for Scheme A: In this case we have 𝐴𝑚(𝑙)𝑜,𝑣=(1)𝑢𝑜𝛾1(𝑣𝑣𝛾(𝐴𝑚𝑣(𝑙),𝑜𝒫)+𝛾𝑜),𝐴𝑚(𝑙)𝑜,𝑣=(1)𝑒𝑜𝛾1(𝑣𝑣𝛾(𝐴𝑚𝑣(𝑙),𝑜)+𝛾(𝜉)),𝐴𝑚(𝑙+1)𝑣,𝑜=𝑜𝐴𝑜𝑚𝑜(𝑙),𝑣+𝐴𝛿(𝑙),𝑣ldpc.(A.12)
BP for Scheme B: In this case, we have 𝐵𝑚(𝑙)𝑜,𝑣=𝛾1(𝑣𝑣𝛾(𝐵𝑚𝑣(𝑙),𝑜𝒫)+𝛾𝑜),𝐵𝑚(𝑙)𝑜,𝑣=(1)𝑒𝑜𝑧𝑜𝛾1(𝑣𝑣𝛾(𝐵𝑚𝑣(𝑙),𝑜)+𝛾(𝜉)),𝐵𝑚(𝑙+1)𝑣,𝑜=𝑜𝐵𝑜𝑚𝑜(𝑙),𝑣+𝐵𝛿(𝑙),𝑣ldpc.(A.13)
Note that in the above equations, we have provided two different versions of equations for 𝑚𝑜,𝑣 for both Scheme A and Scheme B for values of 1𝑜𝐾 and for 𝐾+1𝑜𝐾+𝑁. We call these ranges of 𝑜 the first block and the second block, respectively.
By applying the BP rules at round zero, we have the following relationships between Scheme A and Scheme B:𝐴𝑚(0)𝑜,𝑣=𝐵𝑚(0)𝑜,𝑣=0,𝐴𝑚(1)𝑣,𝑜=𝐵𝑚(1)𝑣,𝑜=0.(A.14) Then for round zero (A.7) are satisfied.
Now let us assume that the theorem holds for the (𝑙)th round. Then we have the following equations for Round 𝑙
Consequently, the equations for Round (𝑙+1) can be written as follows. Letting 𝑜 and ̃𝑜 denote any output symbols, from the first and the second output blocks, respectively, and letting 𝑣 and ̃𝑣 denote any adjacent input nodes, we can write: 𝐴𝑚(𝑙+1)𝑜,𝑣=(1)𝑢𝑜𝛾1(𝑣𝑣𝛾(𝐴𝑚𝑣(𝑙+1),𝑜𝒫)+𝛾𝑜),𝐴𝑚̃𝑣(𝑙+1)̃𝑜,=(1)𝑒̃𝑜𝛾1(𝑣𝑣𝛾(𝐴𝑚𝑣(𝑙+1),̃𝑜)+𝛾(𝜉)),𝐵𝑚(𝑙+1)𝑜,𝑣=𝛾1(𝑣𝑣𝛾(𝐵𝑚𝑣(𝑙+1),𝑜𝒫)+𝛾𝑜),𝐵𝑚̃𝑣(𝑙+1)̃𝑜,=(1)𝑒̃𝑜𝑧̃𝑜𝛾1(𝑣̃𝑣𝛾(𝐵𝑚𝑣(𝑙+1),̃𝑜)+𝛾(𝜉)).(A.16)
Using the assumption, we can write 𝐵𝑚(𝑙+1)𝑜,𝑣=𝛾1(𝑣𝑣𝛾((1)𝑖𝑣𝐴𝑚𝑣(𝑙+1),𝑜𝒫)+𝛾𝑜),𝐵𝑚̃𝑣(𝑙+1)̃𝑜,=(1)𝑒̃𝑜𝑧̃𝑜𝛾1(𝑣̃𝑣𝛾((1)𝑖𝑣𝐴𝑚𝑣(𝑙+1),̃𝑜)+𝛾(𝜉)).(A.17)
In order to find a relationship similar to what obtained before, we need to apply (A.8). By applying (A.8), summation coefficient terms such as (1)𝑖𝑣 or (1)𝑖𝑣 can be separated from the other summands. By (A.8), it is known that the number of terms in the summation is important. For any 𝑜, denote by 𝑜 the adjacent input node set for the output node 𝑜. Then 𝑣𝑜,𝑣𝑣(1)𝑖𝑣=(𝑣𝑜(1)𝑖𝑣)(1)𝑖𝑣=(1)𝑢𝑜(1)𝑖𝑣,(A.18) since the summation of all of the input nodes should give the value of the corresponding output node without additive noise. Similarly, for any ̃𝑣̃𝑜and, define ̃𝑜 as the set of adjacent input nodes for the output node ̃𝑜. Then 𝑣̃𝑜,𝑣̃𝑣(1)𝑖𝑣=(𝑣̃𝑜(1)𝑖𝑣)(1)𝑖̃𝑣=(1)𝑧̃𝑜(1)𝑖̃𝑣𝐵𝑚(𝑙+1)𝑜,𝑣=(1)𝑢𝑜(1)𝑖𝑣𝛾1(𝑣𝑣𝛾(𝐴𝑚𝑣(𝑙+1),𝑜)+𝛾(𝒫0)),𝐵𝑚̃𝑣(𝑙+1)̃𝑜,=(1)𝑧̃𝑜(1)𝑖̃𝑣(1)𝑒̃𝑜𝑧̃𝑖𝛾1(𝑣̃𝑣𝛾(𝐴𝑚𝑣(𝑙+1),̃𝑜)+𝛾(𝜉))(A.19)𝐵𝑚(𝑙+1)𝑜,𝑣=(1)𝑖𝑣𝐴𝑚(𝑙+1)𝑜,𝑣,(A.20)𝐵𝑚̃𝑣(𝑙+1)̃𝑜,=(1)𝑖̃𝑣𝐴𝑚̃𝑣(𝑙+1)̃𝑜,.(A.21)
It is worth noting that by applying the 𝑙th round hypotheses to (A.5), we obtain 𝐵𝛿(𝑙),𝑣lt=(1)𝑖𝑣𝐴𝛿(1),𝑣lt; that is, when we only consider the LDPC iterations, Scheme A and Scheme B differ only in the signs of the channel observations. It can easily be shown that, with such an input relationship between two schemes, the messages will be also closely related as follows: 𝐵𝑚(𝑙)𝑣,𝑐=(1)𝑖𝑣𝐴𝑚(𝑙)𝑣,𝑐,𝐵𝑚(𝑙)𝑐,𝑣=(1)𝑖𝑣𝐴𝑚(𝑙)𝑐,𝑣,(A.22) for any 𝑙.
Then by (A.6) we obtain𝐵𝛿(𝑙),𝑣ldpc=(1)𝑖𝑣𝐴𝛿(𝑙),𝑣ldpc,(A.23) so that we can write 𝐴𝑚(𝑙+2)𝑣,𝑜=𝑜𝐴𝑜𝑚𝑜(𝑙+1),𝑣+𝐴𝛿(𝑙+1),𝑣ldpc,𝐵𝑚(𝑙+2)𝑣,𝑜=𝑜𝐵𝑜𝑚𝑜(𝑙+1),𝑣+𝐵𝛿(𝑙+1),𝑣ldpc.(A.24) Applying (A.23) for round (𝑙+1), 𝐵𝑚(𝑙+2)𝑣,𝑜=𝑜𝑜(1)𝑣𝑏𝐴𝑚(𝑙+1)𝑜,𝑣+(1)𝑖𝑣𝐴𝛿(𝑙+1),𝑣ldpc,𝐵𝑚(𝑙+2)𝑣,𝑜=(1)𝑖𝑣(𝑜𝐴𝑜𝑚(𝑙+1)𝑜,𝑣+𝐴𝛿(𝑙+1),𝑣ldpc),𝐵𝑚(𝑙+2)𝑣,𝑜=(1)𝑖𝑣𝐴𝑚(𝑙+2)𝑣,𝑜.(A.25)
Equations (A.20), (A.21), and (A.25) are identical to the ones assumed in (A.15) of the (𝑙)th round. This completes the proof by induction.

Due to Theorem 1, the BER of the pure channel coding scheme (assuming the all zero codeword) is equal to the BER of the source bits in the JSCC scheme. Based on this result, we can obtain an EXIT chart by considering the associated channel coding problem.

B. EXIT Chart Approximation

The standard analysis tool for graph-based codes under BP iterative decoding, in the limit of infinite block length, is density evolution (DE) [22, 23]. DE is typically computationally heavy, and numerically not very well conditioned. A much simpler approximation of DE consists of the so-called EXIT chart, which corresponds to DE by imposing the restriction that message densities are of some particular form. In particular, the EXIT with Gaussian approximation (GA) assumes that at every iteration the BP message distribution is Gaussian having a particular symmetry condition, which imposes that the variance is equal to 2 times the mean [13]. At this point, densities are uniquely identified by a single parameter, and the approximate DE tracks the evolution of this single parameter across the decoding rounds.

In particular, the EXIT chart tracks the mutual information between the message on a random edge of the graph and the associated binary variable node connected to the edge. By the isomorphism proved before, we know that the JSCC scheme and the “two-channel” scheme have the same performance. For the sake of completeness, in this section we apply the EXIT chart analysis to the to “two-channel” case. The resulting EXIT chart applies directly to the JSCC EXIT chart for a binary source. Finally, we briefly discuss how to apply the EXIT chart method to the multistage decoder used by our JSCC scheme. The resulting EXIT chart analysis provides very accurate approximations of the actual JSCC scheme performance, also in the finite (moderately large) block length case (see Figure 11).

For the graph induced by the raptor (LT) distribution, we define the input nodes (also called information bitnodes), the output nodes (also called coded bitnodes) and the checknodes. For LDPC codes, we define just the bitnodes and the checknodes, since any set of bitnodes that form an information set, can be taken as information bitnodes (see Figure 17).

There are different ways of scheduling for raptor decoder.

Practical schedule
Activate in parallel all input LT checknodes, then all LDPC bitnodes (corresponding to LT input nodes), then all LDPC checknodes, then back to the LDPC bitnodes. This forms a complete cycle of scheduling, which is repeated an arbitrarily large number of times. This is the scheduling that was used in our finite length simulation.

Conceptually simple schedule
Activate the LT checknodes. Then, reset the LDPC decoder and treat the messages generated by the LT checknodes as inputs for the LDPC decoder. Perform infinite iterations of the LDPC decoder. After reaching a fixed point of the LDPC decoder, take the LLRs produced for the bitnodes by the LDPC decoder at the fixed-point equilibrium and incorporate these messages as “virtual channel observations” for the input nodes of the LT code. Then, activate all LT input nodes. This provides a complete cycle of scheduling, which is repeated an arbitrarily large number of times. Our EXIT chart equations are obtained assuming this scheduling.

EXIT charts can be seen as a multidimensional dynamic system. We are interested in studying the fixed points and the trajectories of this system. As such, an EXIT chart has state variables. Proceeding to find an EXIT recursion for the conceptually simple schedule, we will denote by 𝑥 and 𝑦 the state variables of the LT EXIT chart, and by 𝑋 and 𝑌 the corresponding state variables for the LDPC EXIT chart.

We use the following notations.

(i)𝑥𝑖 denotes the mutual information between a message sent along an edge (𝑣,𝑜) with “left-degree” 𝑖 and the symbol corresponding to the bitnode 𝑣, and 𝑥 the average of “𝑥𝑖" over all edges (𝑣,𝑜). Following standard parlance of LDPC codes, we refer to the degree of the bitnode connected to an edge as the left degree of that edge, and to the degree of the checknode connected to an edge as the right degree of that edge.(ii)𝑦𝑗 denotes the mutual information between a message sent along an edge (𝑜,𝑣) with “right-degree” 𝑗 and the symbol corresponding to the bitnode 𝑣 and 𝑦 denotes the average of 𝑦𝑗 over all edges (𝑜,𝑣).(iii)𝑋𝑖 denotes the mutual information between a message sent along an edge (𝑣,𝑐) with “left-degree” 𝑖 and the symbol corresponding to the bitnode 𝑣, and 𝑋 denotes the average of 𝑋𝑖 over all edge (𝑣,𝑐).(iv)𝑌𝑗 denotes the mutual information between a message sent along an edge (𝑐,𝑣) with “right-degree” 𝑗 and the symbol corresponding to the bitnode 𝑣, and 𝑌 denotes the average of 𝑌𝑖 over all edge (𝑐,𝑣).(v)For an LDPC code, we let 𝜆(𝑥)=𝑖𝜆𝑖𝑥𝑖1 and 𝜌(𝑥)=𝑗𝜌𝑗𝑥𝑗1 denote the generating functions of the edge-centric left- and right-degree distributions, and we let Λ(𝑥)=𝑖Λ𝑖𝑥𝑖=𝑥0𝜆(𝑢)𝑑𝑢10𝜆(𝑢)𝑑𝑢(B.26) denote the bit-centric left-degree distribution.(vi)For an LT code, we let 𝜄(𝑥)=𝑖𝜄𝑖𝑥𝑖1 denote the edge-centric degree distribution of the input nodes, 𝜔(𝑥)=𝑗𝜔𝑗𝑥𝑗1 denote the edge-centric degree distribution of the “output nodes” or, equivalently, the edge-centric degree distribution of the checknodes. The node-centric degree distribution of the checknodes, is given by Ω(𝑥)=𝑖Ω𝑗𝑥𝑗=𝑥0𝜔(𝑢)𝑑𝑢10𝜔(𝑢)𝑑𝑢.(B.27)(vii)For the concatenation of the LT code with the LDPC code we also have the node-centric degree distribution of the LT input nodes. This is given by (𝑥)=𝑖𝑖𝑥𝑖=𝑥0𝜄(𝑢)𝑑𝑢10𝜄(𝑢)𝑑𝑢.(B.28)

We consider the class of EXIT functions that make use of Gaussian approximation of the BP messages. Imposing the symmetry condition and Gaussianity, the conditional distribution of each message in direction 𝑣𝑐 is Gaussian 𝒩(𝜇,2𝜇), for some value 𝜇+. Hence, letting 𝑉 denote the corresponding bitnode variable, we have 𝐼(𝑉;)=1𝔼log21+𝑒𝐽(𝜇),(B.29) where 𝒩(𝜇,2𝜇).

In BP, the message on (𝑣,𝑜) is the sum of all messages incoming to 𝑣 on all other edges. The sum of Gaussian random variables is also Gaussian, and its mean is the sum of the means of the incoming messages. It follows that 𝑥𝑖=𝐽(𝑖1)𝐽1(𝑦)+𝐽1(𝐶),(B.30) where 𝐶 is the mutual information (capacity) between the bitnode variable and the corresponding LLR at the (binary-input symmetric output) channel output. In the raptor case, the bitnodes correspond to variables that are observed through a virtual channel by the LDPC decoder. Averaging with respect to the edge-degree distribution, we have 𝑥=𝑖𝜄𝑖𝐽(𝑖1)𝐽1(𝑦)+𝐽1(𝐶).(B.31) As far as checknodes are concerned, we use the well-known quasiduality approximation and replace checknodes with bitnodes by changing mutual information into entropy (i.e., replacing 𝑥 by 1𝑥). Then 𝑦𝑗=1𝐽(𝑗1)𝐽1(1𝑥)+𝐽1(1𝐶).(B.32)

Let us consider now the “two-channel" scenario induced by the JSCC isomorphism. Let 𝐾 denote the number of source bits, and 𝑁 denote the number of parity bits. In the corresponding LT code, we have 𝑀=𝐾+𝑁 output nodes. The first 𝐾 output nodes are “observed” through a channel with capacity 1𝐻 (i.e., the channel corresponds to the source statistics), while the second 𝑁 output nodes are observed through the actual transmission channel, with capacity 𝐶.

This channel feature is taken into account by an outer expectation in the EXIT functions. Therefore, the LT EXIT chart can be written in terms of the state equations as follows: 𝑥=𝑘𝑖Λ𝑘𝜄𝑖𝐽(𝑖1)𝐽1(𝑦)+𝐽1𝑐𝑘=𝑘𝑖Λ𝑘𝜄𝑖𝐽(𝑖1)𝐽1(𝑦)+𝑘𝐽1,(𝑌)(B.33) where 𝐾/𝑀=𝛽 and 𝑁/𝑀=1𝛽, and 𝑦=1𝑗𝜔𝑗𝛽𝐽(𝑗1)𝐽1(1𝑥)+𝐽1(𝐻)+(1𝛽)𝐽(𝑗1)𝐽1(1𝑥)+𝐽1,(1𝐶)(B.34) where 𝑐𝑘 is the mutual information input by the LDPC graph into the LT code graph via the node 𝑣 of degrees (𝑖,𝑘) as explained in the following.

Equation (B.34) follows from the fact that a random edge (𝑜,𝑣) is connected with probability 𝛽 to a source bit (i.e., to the channel with capacity 1𝐻), while with probability 1𝛽 to a parity bit (i.e., to the channel with capacity 𝐶).

Consider an LDPC bitnode 𝑣 that coincides with an input node of the LT code. The degree of this node with respect to the LDPC graph is 𝑘, while the degree of 𝑣 with respect to the LT graph is 𝑖. For a randomly generated graph, and a random choice of 𝑣,𝑘 and 𝑖 are independent random variables, with joint distribution given by 𝑖,𝑘=𝑖Λ𝑘.(B.35) The mutual information input by the LT graph into the LDPC graph via the node 𝑣 of degrees (𝑖,𝑘) is given by 𝑐𝑖=𝐽𝑖𝐽1(𝑦).(B.36) Therefore, the LDPC EXIT chart can be written in terms of the following state equations: 𝑋=𝑘𝑖𝜆𝑘𝑖𝐽(𝑘1)𝐽1(𝑌)+𝐽1(𝑐𝑖)=𝑘𝑖𝜆𝑘𝑖𝐽(𝑘1)𝐽1(𝑌)+𝑖𝐽1,(𝑦)𝑌=1𝜌𝐽(1)𝐽1.(1𝑋)(B.37) The mutual information input by the LDPC graph into the LT graph via the node 𝑣 of degrees (𝑖,𝑘) is given by 𝑐𝑘=𝐽𝑘𝐽1(𝑌).(B.38)

Equations (B.37), (B.33), and (B.34) form the state equations of the global EXIT chart of the concatenated LT-LDPC graph, where the state variables are 𝑥,𝑦,𝑋,𝑌, while the parameters are 𝐻,𝐶 and 𝛽, and the degree sequences 𝜔,𝜄,𝜌, and 𝜆.

Finally, in order to get the reconstruction distortion we need to obtain the conditional probability density function (pdf) of the LLRs output by BP for the source bits. Under the Gaussian approximation, the LLR is Gaussian. Let 𝜇𝑗 denote the mean of the LLR of a source bitnode corrected to a checknode of degree 𝑗, given by 𝜇𝑗=𝐽11𝐽𝑗𝐽1(1𝑥)+𝐽1(1𝐻).(B.39) Then, we approximate the average BER of the source bits as 𝑃𝑏=𝑗Ω𝑗𝑄(𝜇𝑗2).(B.40)

B.1. Multilayer EXIT Chart Analysis

For each bitplane, at every location, the entropy of the bit depends on the realization of the bits at the previous (more significant) bitplanes. We are then in the presence of a “time-varying” memoryless channel in the corresponding channel coding problem. To develop the equations for the multilayer case, we use the same idea described in the previous section, namely, an outer expectation. At the (𝐵𝑝)th most significant level, the previous corresponding bit locations might have 2𝑝 different combinations, with possibly different probabilities which are denoted by 𝛾𝐵𝑝(𝑚) for 0𝑚2𝑝1.

Let 𝐻(𝑥𝐵𝑝|𝑚) denote the conditional entropy of a bit at the (𝐵𝑝)th most significant plane given that the value of the corresponding more significant bits' combination is 𝑚.

At the (𝐵𝑝)th most significant level, the channel has capacity 𝐶 with probability 1𝛽, while it has capacity 1𝐻(𝑥𝐵𝑝|𝑚) for 0𝑚2𝑝1 with probability 𝛽𝛾𝐵𝑝(𝑚).

Following this approach we can modify (B.34) for the decoding of the (𝐵𝑝)th magnitude plane. It is worth noting that when the (𝐵𝑝)th biplane is considered, we assume that the sign plane and 𝑝 (from 𝐵 to 𝐵𝑝1) magnitude planes have been processed. Since it is known that the magnitude plane model does not depend on the sign plane, we take into account 2𝑝 different realizations.

That is, we have 𝑦=1𝑗𝜔𝑗{(1𝛽)𝐽(𝑗1)𝐽1(1𝑥)+𝐽1𝐶+𝛽(2𝑝1𝑚=0𝛾𝐵𝑝(𝑚)𝐽(𝑗1)𝐽1(1𝑥)+𝐽1𝐻𝑥𝐵𝑝||𝑚)},𝑝=1,,𝐵1.(B.41) We would like to underline that the sign bitplane and the most important biplane do not depend on any other bitplane, and so 𝛾𝐵(0)=𝛾0(0)=1.

Similar to (B.41), we can update (B.40) as follows: 𝑃𝑏=𝑗Ω𝑗𝑄(𝜇𝑗2),(B.42) where 𝜇𝑗=𝐽11𝐽𝑗𝐽1+(1𝑥)2𝑝1𝑚=0𝛾𝐵𝑝(𝑚)𝐽1𝑥1𝐻𝐵𝑝||𝑚.(B.43)

Note that a genie-aided scheme was assumed for the EXIT analysis, where there is no error-propagation between layers. In fact, it is not possible to take into account error propagation using the EXIT chart, since the underlying assumption is that the message exchanged at each iteration of the BP is a true LLR, that is, an LLR computed on the basis of the correct conditional probabilities. Decision errors, instead, would feed the decoder at a lower stage with “false” a priori probabilities.

As was done in the finite-length case, we will use soft reconstruction for the infinite-length case where the conditional mean estimator of the reconstruction points will be calculated using the fact that the source bit LLRs have the symmetric Gaussian distribution. From the EXIT chart, we can obtain the mean 𝜇 of the Gaussian approximation of the conditional pdf of any LLR in the graph. Hence, this can be used to compute the MMSE of the soft-reconstruction estimator (16).


This research was supported in part by the National Science Foundation under Grants ANI-03-38807, CNS-06-25637 and NeTS-NOSS-07-22073 and in part by the USC Annenberg Graduate Fellowship Program.


  1. T. M. Cover and J. A. Thomas, Elements of Information Theory, Wiley-Interscience, New York, NY, USA, 1991.
  2. A. Ortega and K. Ramchandran, “Rate-distortion methods for image and video compression,” IEEE Signal Processing Magazine, vol. 15, no. 6, pp. 23–50, 1998. View at Publisher · View at Google Scholar
  3. J. Ziv, “On universal quantization,” IEEE Transactions on Information Theory, vol. 31, no. 3, pp. 344–347, 1985. View at Publisher · View at Google Scholar
  4. D. S. Taubman and M. W. Marcellin, JPEG2000: Image Compression Fundamentals, Standards, and Practices, Kluwer Academic Publishers, Dordrecht, The Netherlands, 2001.
  5. M. Fresia and G. Caire, “A practical approach to lossy joint source channel coding,” in Proceedings of the Information Theory and Applications Workshop (ITA '07), San Diego, Calif, USA, February 2007.
  6. M. Fresia and G. Caire, “A practical approach to lossy joint source-channel coding,” submitted to IEEE Transactions on Information Theory.
  7. I. H. Witten, R. M. Neal, and J. G. Cleary, “Arithmetic coding for data compression,” Communications of the ACM, vol. 30, no. 6, pp. 520–540, 1987. View at Publisher · View at Google Scholar
  8. R. E. Krichevsky and V. K. Trofimov, “The performance of universal encoding,” IEEE Transactions on Information Theory, vol. 27, no. 2, pp. 199–207, 1981. View at Publisher · View at Google Scholar
  9. G. Caire, S. Shamai, and S. Verdu, “Noiseless data compression with low-density parity-check codes,” in Advances in Network Information Theory, P. Gupta, G. Kramer, and A. J. van Wijnggaardenl, Eds., vol. 66 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science, pp. 263–284, American Mathematical Society, Providence, RI, USA, 2004. View at Google Scholar
  10. A. Shokrollahi, “Raptor codes,” IEEE Transactions on Information Theory, vol. 52, no. 6, pp. 2551–2567, 2006. View at Publisher · View at Google Scholar
  11. B. Hochwald and K. Zeger, “Tradeoff between source and channel coding,” IEEE Transactions on Information Theory, vol. 43, no. 5, pp. 1412–1424, 1997. View at Publisher · View at Google Scholar
  12. R. G. Gallager, Information Theory and Reliable Communication, Jonh Wiley & Sons, New York, NY, USA, 1968.
  13. S. T. Brink, “Convergence behavior of iteratively decoded parallel concatenated codes,” IEEE Transactions on Communications, vol. 49, no. 10, pp. 1727–1737, 2001. View at Publisher · View at Google Scholar
  14. O. Etesami and A. Shokrollahi, “Raptor codes on binary memoryless symmetric channels,” IEEE Transactions on Information Theory, vol. 52, no. 5, pp. 2033–2051, 2006. View at Publisher · View at Google Scholar
  15. T. J. Richardson and R. L. Urbanke, “The capacity of low-density parity-check codes under message-passing decoding,” IEEE Transactions on Information Theory, vol. 47, no. 2, pp. 599–618, 2001. View at Publisher · View at Google Scholar
  16. A. Roumy, S. Guemghar, G. Caire, and S. Verdu, “Design methods for irregular repeat-accumulate codes,” IEEE Transactions on Information Theory, vol. 50, no. 8, pp. 1711–1727, 2004. View at Publisher · View at Google Scholar
  17. R. Palanki and J. S. Yedidia, “Rateless codes on noisy channels,” in Proceedings of the Conference on Information Sciences and Systems (CISS '04), Princeton, NJ, USA, March 2004.
  18. M. Luby, “LT codes,” in Proceedings of the 43rd Annual Symposium on Foundations of Computer Science (FOCS '02), pp. 271–280, Vancouver, Canada, November 2002.
  19. R. M. Tanner, “A recursive approach to low complexity codes,” IEEE Transactions on Information Theory, vol. 27, no. 5, pp. 533–547, 1981. View at Publisher · View at Google Scholar
  20. F. R. Kschischang, B. J. Frey, and H.-A. Loeliger, “Factor graphs and the sum-product algorithm,” IEEE Transactions on Information Theory, vol. 47, no. 2, pp. 498–519, 2001. View at Publisher · View at Google Scholar
  21. U. Wachsmann, R. F. H. Fischer, and J. B. Huber, “Multilevel codes: theoretical concepts and practical design rules,” IEEE Transactions on Information Theory, vol. 45, no. 5, pp. 1361–1391, 1999. View at Publisher · View at Google Scholar
  22. M. Luby, M. Mitzenmacher, A. Shokrollahi, and D. Spielman, “Analysis of low density codes and improved designs using irregular graphs,” in Proceedings of the 30th Annual ACM Symposium on Theory of Computing (STOC '98), pp. 249–258, Dallas, Tex, USA, May 1998. View at Publisher · View at Google Scholar
  23. T. J. Richardson, M. A. Shokrollahi, and R. L. Urbanke, “Design of capacity-approaching irregular low-density parity-check codes,” IEEE Transactions on Information Theory, vol. 47, no. 2, pp. 619–637, 2001. View at Publisher · View at Google Scholar