Abstract

We propose an effective error correction technique for arithmetic coding with forbidden symbol. By predicting the occurrence of the subsequent forbidden symbols, the forbidden region is actually expanded and theoretically, a better error correction performance can be achieved. Moreover, a generalized stack algorithm is exploited to detect the forbidden symbol beforehand. The proposed approach is combined with the maximum a posteriori (MAP) metric to keep the highly probable decoding paths in the stack. Simulation results justify that our scheme performs better than the existing MAP methods on the error correction performance, especially at a low coding rate.

1. Introduction

Traditionally, channel coding is performed after source coding to protect the compressed bit stream sent over a noisy channel. For example, an image file is first compressed by using Discrete Cosine Transformation or arithmetic coding [13] with high coding efficiency. Then, the compressed sequence is further protected by Turbo code [4] or Hamming code [5] against channel noise. This traditional separate scheme lacks the cooperation between source and channel coding processes and may not result in the optimal performance. Recent studies have revealed that the joint operation of them leads to some advantages when compared with the traditional separately operated approach [69]. As certain implicit redundancy still exists in the bit streams when the encoder cannot ideally decorrelate the source symbols, it can be utilized in the joint scheme to improve the overall error correcting performance. Thus, it is possible for the joint scheme to outperform the separate approach [10].

Early works on joint source-channel coding were devoted to the study of error resilience in variable length codes (VLC). In particular, most of which were focused on the resynchronization ability of Huffman code [79]. The corresponding hard and soft decoding schemes based on maximum likelihood (ML) or MAP metrics are well-studied for a binary symmetric channel (BSC) with additive white Gaussian noise (AWGN). As arithmetic coding (AC) represents a source symbol using a fractional number of bits, it leads to a better compression efficiency and achieves the optimal entropy coding. However, the high compression ratio makes the codeword more sensitive to channel noise and is difficult to be resynchronized. Therefore, there is a growing interest in improving the robustness of AC against channel noise.

In [11], a forbidden symbol introduced by a reduction in the coding interval is adopted to detect the transmission error continuously. These errors can be detected when the forbidden region is visited. This continuous nature in error detection is exploited to improve the overall performance of the communication system [12]. It provides a tradeoff between the extra redundancy and the delay in detecting an error since its occurrence. Instead of the forbidden symbol, the insertion of markers in some particular positions of the input sequence plays the role of synchronization between the encoder and the decoder [13]. The markers which do not appear in the expected positions indicate transmission errors. Three strategies for the selection of the markers were studied in [13]. A better compression ratio can be achieved using an adaptive [14] or an artificial marker scheme [15]. The adaptive marker scheme selects the most frequent source symbol as the marker symbol while the artificial marker scheme creates an artificial marker with an arbitrary probability. Making use of the error detection capacity of AC, error correction is performed by sequential decoding, which successively removes the erroneous decoding paths. In [16], depth-first and breadth-first decoding algorithms were proposed with binary branching based on a null zone. The decoding paths are discarded due to the error detection capacity of the forbidden symbol. All the decoding paths with the lowest Hamming distance from the received sequence are preserved in a list.

In [17], a MAP criterion based on the context-based AC was proposed with the insertion of synchronization markers, where the symbol clock and the bit clock models were analyzed. The iterative decoding of error resilient AC concatenated with a convolutional code is adopted and its error correcting capability is validated with the transmission of images over an AWGN channel. A novel MAP decoding approach based on the forbidden symbol was proposed in [10], with a high flexibility in adjusting the coding rate. Sequential decoding algorithms, such as stack algorithm and -algorithm, are adopted and the proposed system outperforms the separate approach based on convolutional codes in terms of error correcting capability. It is serially concatenated with channel codes and iterative decoding is employed to further improve the overall performance [18, 19]. Chaos phenomenon, which generally exists in complex systems [20, 21], is also observed during the iterative decoding procedures. Thus, chaos control techniques can be adopted to further enhance the error correction performance [2224]. A sequential MAP estimation for CABAC coder was proposed in [25], which employs an improved sequential decoding technique to determine the tradeoff between complexity and efficiency. In [26], a look-ahead technique for AC decoder was proposed to allow quick error detection. Considering the improvement in the implementation efficiency, AC can be modeled as a finite-state machine corresponding to a variable-length trellis code. The trellis code based on AC was proposed in [27, 28], where a list Viterbi decoding algorithm is applied on the corresponding trellis code and a cyclic redundancy check code is employed for detecting small Hamming-distance errors. The free distance of the corresponding AC-based VLC and its theoretical error correction performance were investigated in [29, 30]. Besides that, the practical implementations on this joint source-channel coding scheme were studied in [31, 32] for high coding speed.

The error detecting capability of AC was analyzed in our previous paper [15]. Here we extend our previous work to tackle the problem of error correction in AC. An effective error correction technique utilizing the forbidden symbol is proposed, which predicts the occurrence of the subsequent forbidden symbols. With our approach, the forbidden region is theoretically expanded and so a better error correction performance is achieved. Furthermore, a generalized stack algorithm (SA) extending branches from the best node is also studied for the detection of the forbidden symbol beforehand. The MAP metric [10] is integrated with our approach to preserve the most probable decoding paths in the stack. The idea of our approach was briefly presented in [33], which mainly focuses on the forecasting of the forbidden symbols. Here, the procedures of AC with forecasted forbidden symbols are described in detail. More analyses and simulation results are provided to justify that the proposed scheme outperforms the look-ahead scheme [26] and the original MAP scheme [10] on the error correction performance, especially at a low coding rate.

The rest of this paper is organized as follows. The background of AC is reviewed in Section 2. The proposed scheme is described in Section 3, where the estimation of the subsequent forbidden symbols and the generalized SA are introduced. Simulation results are presented in Section 4 to show the improvement of our scheme. Finally, conclusions are drawn in Section 5.

2. Background of Arithmetic Coding

Arithmetic coding is an iterative operation, which recursively assigns the coding interval to a sequence of source symbols. In general, a prior source model is required, which initializes the coding interval according to the occurrence probabilities of the source symbols. Considering the binary case that the occurrence probabilities of “0” and “1” are correspondingly 0.8 and 0.2, the coding units and are then assigned to the symbols “0” and “1,” respectively. The arithmetic coding steps for encoding the source sequence “01000” are illustrated in Figure 1.

At last, the final coding unit is obtained, within which any real value can be selected and exported as the compressed bits. Theoretically, it is guaranteed to obtain a compressed sequence with the shortest length using bits (1011) in this example. In the decoding process, the received codeword sequence (1011) is firstly put after the decimal point to make it 0.1011 which is within the range . Then the representation 0.1011 is converted to the decimal value 0.6875. As it falls into the intervals , , , , and , the decoder will sequentially export the symbols “0,” “1,” “0,” “0,” and “0.” The decoded sequence is exactly the same as the source sequence since AC is a lossless source coding scheme.

A practical problem encountered in the implementation of AC is that the interval will continue to shrink in the iterative encoding steps. Thus, a high precision is needed to represent the very small real numbers encountered in the coding process. A solution to this problem is to use integer representation, where the coding interval can be rescaled when the most significant bits in the representation of the lower and upper bounds are the same. Suppose that the binary source sequence represented by with and , where , is encoded as a variable-length codeword . In Algorithm 1, the pseudocode of the encoder is given, where and are, respectively, the lower and upper bounds for encoding the source symbol . The vector represents the cumulative probabilities of the source model with , and . The initial lower and upper bounds are set to 0 and , respectively, where is the length of the register for storing the value of the bounds. The number of bits not emitted in the interval rescaling operations is recorded by . The values of First_quarter, Half and Third_quarter are fixed and are set to , , and , respectively.

Function AC_Encoder
Input:   , c, ,
Output:  b
Set and
While(True)
 If < Half
  Set and
  Emit a bit 0 and bits 1 to b
  Set
 Else If Half
  Set and
  Emit a bit 1 and bits 0 to b
  Set
 Else If First_quarter and < Third_quarter
  Set and
  Set
 Else
  Break;

In Algorithm 2, the pseudocode of the corresponding sequential decoder is listed, which avoids the decoding delay. Once both the bounds, and , of the decoding interval for the compressed bit are located in the encoding interval of a particular source symbol, the symbol can be decoded out. The encoding and decoding intervals are then rescaled as performed in the encoder. The lower and upper bounds of the decoding interval are also initialized to 0 and , respectively. The details of this kind of AC encoding and decoding can be found in [1, 34].

Function AC_Decoder
Input:  , , , , ,
Output:  s
If == 0
 Set and
Else
 Set and
Set
While(True)
 If
  Emit source symbol “0” to s
  Set and
  Scale the intervals and as done in AC_Enocder
 Else If
  Emit source symbol “1” to s
  Set and
  Scale the intervals and as done in AC_Enocder
 Else
  Break;

3. The Proposed Algorithm

Assume that the variable-length codeword is transmitted over a channel with transition probability . The receiver obtains the demodulated sequence , with which the recovered message is found using the generalized stack algorithm. A block diagram of this transmission system is depicted in Figure 2.

3.1. MAP Metric

In our scheme, the MAP metric [10] is employed for finding the most probable message from all possible sequences , by maximizing the likelihood , as expressed byThe Bayesian relationship states thatIn the case of memoryless channels, it is straightforward to represent (2) in an additive formFor each bit of , where the vector contains the decoded source symbols when the compressed bit is shifted into the decoder. It should be noticed that can be empty when no source symbol is outputted from the decoder. There are three terms at the right-hand-side of (4). The first term is the channel transition probability while the second term represents the a priori probabilities of the source symbols. The first two terms can be evaluated based on the channel and source models, respectively. The last term is complicated, which needs to sum up all the terms, as follows:As the full knowledge on the codeword with length is required, it is impractical to evaluate (5) exactly. However, assuming that the codeword has equal probabilities of occurrence of “0” and “1,” this term can be approximated by

When hard decoding is adopted in an AWGN channel using binary phase-shift keying (BPSK) modulation with a signal-to-noise ratio (SNR) , the channel transition probability iswhere . By (6), in this case. Therefore,

In the soft decoding process, each bit in b is mapped to t by before transmitted over the AWGN channel. The decoder receives the noisy signal , where is the additive white noise with standard deviation . Given the input sequence b, the conditional probability of the received signal isMaking use of (6), we haveThus,

3.2. Forecasted Forbidden Symbols

In order to embed error detecting capacity into AC, a forbidden symbol with probability of occurrence is inserted in the source model, as shown in Figure 3. The probabilities of “0” and “1” are changed to and , respectively. The overhead of this approach is a lower coding rate as the available coding space for AC shrinks. This accounts for additional bits for each source symbol. The expected length of the compressed sequence is when the forbidden symbol is adopted, where is the memoryless source entropy rate. As the forbidden symbol is never encoded, the decoder can assure that some estimated bits are erroneous once it observes the forbidden symbol in the decoding process. Thus, the erroneous decoding path can be pruned. Theoretically, the number of symbols decoded before an error is detected is greater than at a probability of . Therefore, as more source symbols after the erroneous bits are decoded, the error can be detected with a higher probability. Moreover, a large value of enables short error detection delay at the expense of compression efficiency.

Thanks to the iterative nature of the AC encoding process, the forbidden symbols after the currently encoded source symbol can actually be estimated beforehand. This is useful in detecting the errors at an earlier stage, so as to prune the erroneous decoding tree quickly and to increase the chance for the correct decoding tree to remain in the stack. As shown in Figure 4(a), the second forbidden symbols in the original coding regions for “0” and “1” are forecasted, the lengths of which are and , respectively. Similarly, the third forbidden symbols shown in Figure 4(b) are predicted with the corresponding lengths and . Theoretically, the lengths of all the successive forecasted forbidden symbols can be summed up as and in the two coding units, where is the number of the forecasted forbidden symbols. When tends to , we have the length of forecasted forbidden regions and as follows:They are the theoretical limit for the length of successive forbidden symbols. As shown in Figure 4(c), the forecasted forbidden region is much larger than that in Figure 3, which obviously improves the error correcting capability.

On the other hand, the look-ahead technique [26] usually employed in AC decoders can detect forbidden symbol quickly by decoding the source symbol even when the decoding interval bounds and are located in the encoding intervals for a particular source symbol and the forbidden region by assuming that it is error-free. An example of which is given in Figure 5, where the source symbol “1” is decoded in Figure 5(a) and then the forbidden symbol can be detected in Figure 5(b). Compared with the look-ahead technique in AC decoders, our forecasted forbidden regions can effectively detect the possible errors that may be found by the look-ahead technique. For example, the forbidden symbol in Figure 5(b) can also be detected with our forecasted forbidden regions without the need to decode source symbol “1” in advance. Besides that, the forecasted forbidden regions in the middle of the source symbols “0” and “1” enable the adoption of generalized stack algorithm introduced in Section 3.3. The look-ahead technique can be adopted in our scheme to further enhance the overall correction performance. An example of this scenario is depicted in Figure 5(c) that the source symbol “1” is decoded in advance. The improvement of our scheme is further validated by the simulation results to be reported in Section 4.

With the look-ahead technique adopted in our scheme, it is possible that and in the implementation of AC. The pseudocode of the modified decoder for error detection can be found in Algorithm 3.

Function AC_FS_Decoder
Input:
Output: s
If
  Set and
Else
  Set and
While(True)
  Find two estimated forbidden regions in the encoding interval;
  If and are completely located in the forecasted forbidden regions
  or out of the encoding interval
  Then delete the decoding path and break;
  Set and
  If
   Emit source symbol “0” to s
   Set and
   Rescale the intervals and as done in AC_Enocder
  Else If and
   Emit source symbol “1” to s
   Set and
   Rescale the intervals and as done in AC_Enocder
  Else
   Break;

3.3. Generalized Stack Algorithm

The generalized SA is a variation of SA, which is a metric first search algorithm. In the original SA, all the explored decoding paths with better metric are stored in an ordered stack with size . The best decoding path, which has the maximum value given by (8) or (12), is usually stored at the top of the stack. It is extended to two branches after the current decoding node by estimating the subsequent decoding bits as “0” and “1,” respectively. Then the top node is removed and the two child nodes are inserted into the stack. Once the stack is full, the one with the worst metric will be discarded. In the generalized SA, branches instead of 2 branches are extended from the top node. As the coding region assigned to the forbidden symbol is small, it usually needs more bits to make sure that the decoder will visit the forbidden region or not. Thus, branches from the best node are able to result in a fast detection and the removal of erroneous decoding paths. This in turn means a higher probability to preserve the correct path in the stack. It is noted that the generalized SA is not applicable in the original MAP algorithm and the look-ahead technique as the underflow problem may happen when branches are extended from the very small decoding interval. However, as the forbidden regions in the middle of source symbols “0” and “1” are forecasted in our scheme, it guarantees that the decoding interval is not smaller than the length of the intermediate forbidden region. Therefore, the underflow problem can be avoided.

There are three conditions for discarding decoding paths in the generalized SA. The first condition is that the forbidden symbol is encountered. The second corresponds to the situation that the number of decoded symbols is equal to but the number of decoded bits is smaller than . The third case is that the number of decoded bits is equal to but the number of decoded symbol is smaller than . The generalized SA stops when the decoded bits can exactly recover source symbols or the stack is empty. A diagram illustrating the generalized SA is shown in Figure 6, with . Therefore, four child nodes are extended from the best nodes that are identified with gray color. The one marked with X is deleted as it visits the forbidden region.

4. Simulations

In this section, the proposed scheme is compared with the original MAP scheme [10] and the look-ahead scheme [26]. Binary source symbols with are randomly generated, which correspond to the memoryless source entropy . This entropy is the same as that of the simulation data used in [10]. Each packet consists of 2304 binary symbols. It is then encoded by arithmetic coding with forbidden symbol to generate the variable-length compressed sequence . The packet length and the priori bit probability are sent to the decoder as side information. They are protected by a high-redundant channel code to guarantee their correctness. Each packet is terminated with an EOS (End of Sequence) symbol having probability 10−5, which protects the last few bits of . The stack size is chosen as 256 for all the algorithms. The value of is set to 8 in the generalized SA. All the simulations are run for 105 times over an AWGN channel with BPSK modulation. The number of forecasted forbidden symbols is selected as 4. As the original MAP scheme [10] has already been shown to have a better error correction performance than the traditional separated source and channel coding scheme, a comparison with the latter scheme is not repeated here. As the placement of the forbidden symbol can affect the error correction performance [26, 29], two placements are considered in our simulations, which are identified with source models A and B in Figures 7(a) and 7(c). The corresponding forecasted forbidden symbols in source models A and B are illustrated in Figures 7(b) and 7(d). Note that the performance of the compared schemes is evaluated by the packet error rate (PER).

The PERs of the proposed, the look-ahead, and the original MAP schemes with source models A and B at various channel SNRs are plotted in Figure 8. The value of is set to 0.185 in Figure 8, which corresponds to the coding rate of 2/3. Considering source model A, in which many forbidden symbols can be forecasted as indicated by Figure 7(b), it contributes to the major improvement of our scheme and the look-ahead scheme when compared with the original MAP scheme. The simulation results plotted in Figure 8 validate that the look-ahead scheme performs much better than the original MAP scheme while ours achieves the best results. However, the results obtained with source model B are better than that with source model A in all algorithms. These observations show that the error correcting capacity of source model B is better than that of source model A. Although many forbidden symbols can be forecasted in source model A, it will cause a lot of forbidden symbols assigned in the upper bound of the coding interval and leads to weak error detection in the errors occurring in the lower bound of the coding interval. In summary, our scheme performs much better than the look-ahead scheme and the original MAP scheme at all SNRs for the two source models. Of course, the best results in soft and hard decoding are found by using our scheme with source model B, which achieves a coding gain of around 0.5 dB for hard decoding and 0.25 dB for soft decoding, when compared with the original MAP scheme. Moreover, the values of at 0.097 and 0.05 are also selected for source model B, which correspond to the coding rate of 4/5 and 8/9, respectively. The PERs of our, the look-ahead, and the original MAP schemes are plotted in Figures 9-10. As indicated in these two figures, the coding gain decreases when the value of becomes small. The graphs reveal that, for large , our scheme has a much better performance than the look-ahead and the original MAP schemes. In other words, it is especially effective at a low coding rate.

Figure 11 shows the error correction performance with source model B at various using soft and hard decoding. In this figure, soft decoding is applied in our, the look-ahead, and the original MAP schemes with the channel SNR fixed at 3.5 dB. The PERs are plotted against the value ranging from 0.04 to 0.16. With the increase of , the PERs of all schemes drop accordingly. This is reasonable as a large value of leads to more redundant bits for error detection, which are helpful in removing the erroneous decoding paths. When is large, the gain of our scheme over the look-ahead scheme and the original MAP scheme becomes apparent, which also indicates that our scheme performs much better at a low coding rate. Considering hard decoding in a channel with SNR 5.5 dB, the PERs of our, the look-ahead, and the original MAP schemes are also depicted in Figure 11 with various between 0.04 and 0.16. Results similar to those obtained using soft decoding are observed and they further confirm the superiority of our scheme at a low coding rate.

5. Conclusions

We have proposed an effective error detection technique based on the forecasting of forbidden symbols, which widens the forbidden region by estimating the occurrence of the subsequent forbidden symbols. A generalized SA is also adopted to detect the forbidden symbol beforehand and to remove the erroneous decoding paths earlier. As a result, the chance of preserving the correct decoding path increases and the error correction performance is improved. Simulation results validate the superiority of our approach over the look-ahead and the original MAP schemes, especially at a low coding rate.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The work described in this paper was supported by a grant from CityU (Project no. 7004062), the National Natural Science Foundation of China under the Project Grant nos. 61402291, 61272402, 61070214, 60873264, and 61170283, National High-Technology Research and Development Program (“863” Program) of China under Grand 2013AA01A212, and Ministry of Education in the New Century Excellent Talents Support Program under Grand NCET-12-0649. Ming Li also thanks the support by the Science and Technology Commission of Shanghai Municipality under research Grant no. 14DZ2260800.