Limited by the characteristics of underwater acoustic channels, the video transmission applications targeting deep-sea detection and operation tasks are facing severe challenges such as network failure and high delay, resulting in loss of video details, color distortion, blurring, and even bit errors, which seriously affect decoding quality of the video transmission and reception. In order to solve the problems of deep-sea long-distance wireless communication, this paper proposes an improved Wyner-Ziv coding scheme (UnderWater-WZ) for video transmission through acoustic channels. The implementation process includes controlling error range by using MJPEG coding, combining motion compensation time interpolation with calibration information to generate high-quality side information. And intraframe quantization matrix is designed to weaken the change of video scene. The experimental results show that under the highest packet loss rate of 20%, this scheme can achieve 2.6~3.5 dB improvements in terms of video reconstruction compared to the previous methods, which is close to the error-free level.

1. Introduction

In recent years, with the development of deep-sea exploration technology and the increasing demand for marine resources, various countries have begun to build their own deep-sea space stations. Using the visual perception in deep-sea environments, a variety of video applications can be transmitted through underwater acoustic channels, including seabed exploration, disaster prevention, mine reconnaissance, and environmental monitoring [14]. To fulfill the transmission needs for underwater video data, coding and transmission techniques have been proposed over underwater acoustic networks [59]. However, due to the characteristics of acoustic signals, an underwater acoustic communication system typically has three defects: narrow bandwidth, long-distance propagation delay, and error-prone transmission. These limitations provide no guarantees of real-time video transmission and end-to-end reproduction quality. An increased desire for higher quality and reliability is also arising in underwater multimedia applications. Most underwater video coding algorithms for underwater acoustic communication are aimed at improving the compression efficiency. These algorithms can be divided into two categories: improved algorithms based on standard encoders and self-designed coding algorithms. In [10], the authors compare the coding performance of different standard encoders on underwater acoustic networks. For underwater video streaming, the standard encoder is improved to cooperate with acoustic links. On the basis of H.264, Avrashi et al. [11] realize efficient compression of 36 kbps code rate by adaptively reducing the frame rate and use motion vector interpolation to reconstruct the missing frames at the decoding end to restore the normal frame rate. Aiming at the transmission of underwater forward-scan sonar video, Mirizzi et al. [12] propose a coding algorithm based on region segmentation, which adopts different coding strategies for image foreground and background, respectively, and embeds the algorithm into an HEVC encoder. This method exploits the characteristics of 2-D sonar imagery to remove redundancy more effectively over standard encoders for low-bandwidth underwater acoustic transmission, but it is not designed for optical video/images with high details. Hoag et al. [13] propose the underwater video compression techniques start using the discrete wavelet transform as coding basis to fully eliminate the spatial correlation between pixels. Negahdaripour and Khamene [14] combine wavelet transform with interframe motion compensation, and the average compression ratio of the algorithm reaches 150 : 1. In order to further eliminate the time redundancy between video frames, Nagrale et al. [15] use a hybrid global and local motion compensation strategy to achieve efficient underwater video compression. Based on the wavelet coding tree proposed by Li and Wang [16], Song et al. [17] adopt an improved wavelet transform algorithm with stronger signal representation ability while improving coding efficiency and decoding quality. Dong et al. [18] propose a fast algorithm for VVC from two aspects of mode selection and prediction terminating to reduce coding complexity. For the mode selection, adaptive mode pruning (AMP) is proposed to remove nonpromising modes. What is more, mode-dependent termination (MDT) is proposed to select an appropriate model through the optimal mode and terminate unnecessary intrapredictions of remaining depth levels. For the new QTMT segmentation structure and two-way coded intraprediction mode, Yang et al. [19] propose a low-complexity CTU partition decision method and a fast intramode decision method to reduce the computational burden of VVC intracoding.

In the current research, most of the methods are not aimed at the characteristics of deep-sea video transmission and cannot be applied to videos containing complex motion information under the ocean. The most important thing is that the current video error-resilience research cannot control the bit error rate and improve the video quality at the same time. In this paper, we propose video error-resilience scheme based on Wyner-Ziv framework (UnderWater-WZ) for underwater transmission.

UnderWater-WZ coding system is aimed at improving the efficiency of encoding and decoding and improving video quality. This scheme is based on Wyner-Ziv video coding system. By optimizing WZ coding scheme and using MJPEG coding algorithm to reduce data correlation, the bit error rate performance can be enhanced. Through the combination of motion-compensated temporal interpolation (MCTI) and check information, higher-quality side information (SI) is generated, and with the help of DCT transform intraquantization matrix and feedback channel, the video quality is significantly improved. To sum up, our video error-resilience research mainly is aimed at reducing the bit error rate and improving the reconstruction quality. This paper provides the following contributions: (1)In this paper, by combining MJPEG coding with the Wyner-Ziv system, a high-efficiency video error-resilience method with low correlation is formed. When data loss occurs, the updated coding system will not cause error diffusion of reconstructed images and control the range of data packet loss, thereby improving the robustness of the system(2)This paper proposes a macro-block-based side information generation method, which combines MCTI and verify information transmission to generate higher-quality side information. Firstly, for the macroblocks in the reference frame, intraframe coding is performed through the verified information transmitted by the encoder and then using MCTI to generate side information. Finally, higher-quality side information is generated through motion-compensated quality enhancement techniques. The method can encode nonlinear moving objects in ocean video and improve the quality of reconstructed video(3)In this paper, we design an intraframe quantization matrix based on DCT. The scheme adopts a new spatial frequency calculation method. In this way, we can weigh the different frequency coefficients. Efficient video coding reduces people’s sensitivity to scene changes. The rest of this paper is organized as follows. Section 2 reviews the relevant works. Section 3 introduces the implementation details of the Wyner-Ziv video coding system and the proposed video error-resilience scheme, respectively. The experimental results are presented in Section 4, and the conclusions from this investigation are finally given in Section 5

The need for emerging applications such as wireless low-power video surveillance has spurred the development of video encoding. The current research direction of WZ video coding mainly focuses on the following aspects: one is to improve the coding efficiency and improve the RD performance of the system; the other is to provide error recovery capability and improve video quality.

2.1. Improving the Coding Efficiency and Improving the RD Performance of the System

Multiple description coding [2026] and redundant image [2729] techniques are also effective against channel transmission errors, but both come at the cost of increased bit consumption. The video error-resilience technology in the video compression algorithm needs to consume a part of the code stream so that the sender has to generate more data transmission to ensure the same reconstruction quality in the decoder. Ramon et al. [30] propose a new curved-based intraframe prediction method. It enhances the intraframe prediction process by modeling curved texture features. The proposal incurs a small bitstream overhead for transmitting the displacement information, which is offset by encoding efficiency gains. Singhadia et al. [31] establish an empirical relationship of PSNR and bitrate with quantization parameter, through optimization algorithm to find the best quantization parameter to achieve the highest PSNR. However, this method cannot perform efficient encoding when transmission errors occur. Marzuki et al. [32] propose a tile-level rate control algorithm for high-efficiency video coding on tile-parallelization case. The proposed rate control algorithm embeds a tile-level rate control by adaptively assigning the appropriate bits for each tile. However, for video sequences of nonlinear moving objects, the rate control cannot be effectively performed, and thus, the video quality cannot be improved.

The above research is aimed at terrestrial video transmission. However, the bandwidth of underwater transmission is extremely limited. When high compression rate transmission is performed, it also means high correlation. A 1-bit data error during the transmission process will cause the data transmission of the entire frame to fail. Therefore, for the key frames, this paper combines the WZ system with the MJPEG coding technology to form a high-efficiency video error-resistance method with low correlation. In the case of ensuring the low correlation of data, each frame is independently compressed using the JPEG algorithm, which can control the data packet loss in a stable range and adapt to the special nature of video framing and marine environment transmission. Most studies improve RD performance by upgrading the way of generating side information; this paper combines MCTI and parity information transmission based on macroblocks to generate higher-quality side information. The Euler distance can be better mapped to a smaller Hamming map by using the gray mapping. In this way, it can improve the correlation between initial information and side information and handle underwater video sequences with dramatic scene changes efficiently.

2.2. Providing Error Recovery Capability and Improving Video Quality

The current research about improving video quality focuses on superresolution image reconstruction based on deep learning [3339]. This method is realized on the basis of a large amount of data, but the underwater video transmission has the characteristics of narrow bandwidth and high delay. In order to avoid external interference during transmission, resulting in transmission errors, it is not suitable for processing a large number of video sequences. For small sample data processing, Luo et al. [40] propose a novel recursively adaptive perceptual nonlocal means preprocessing algorithm based on just noticeable distortion model. It can avoid discontinuous artifacts to occur in reconstructed video by reducing perceptual redundancy. The algorithm can effectively improve the perceptual quality of reconstructed video frames. However, when data loss occurs during underwater transmission, this method cannot improve the video quality according to the existing data. Liu et al. [41] propose a hierarchical motion estimation and compensation network for video compression. Video frames are marked as intraframes and interframes. Independent compression is employed for intraframes and hierarchical prediction by adjacent frames using a bidirectional motion prediction network for interframes. However, it will result in highly sparse and compressible residues, thereby reducing intra- and interframe correlation.

Aiming at the defects in the above research, this paper proposes a motion compensation quality enhancement technology with the rate adaptation. This method utilizes the temporal correlation between frames and improves the quality of intraframe coding by weighted averaging of reference frames and intraframe blocks.

3. Description of the Proposed Coding Solution

The UnderWater-WZ video coding system is mainly improved based on the Wyner-Ziv system framework. The side information generation method in the scheme is improved according to the motion compensation time interpolation adopted in the WZ system. In the framework proposed in this paper, we still use the bit plane-based method for video encoding and decoding while also retaining the feedback channel in the framework. In addition, we aim at the defect that the WZ system cannot effectively deal with nonlinear moving targets, improving the key frame coding method, the side information generation method, and the discrete cosine transform process.

3.1. Wyner-Ziv Video Coding System

In recent years, distributed video coding algorithm has been a hot research direction of video coding. Compared with other traditional video coding, its coding method has the characteristics of simple coding and high bit error robustness. It can well meet the needs of new video services such as underwater robot operation, UAV aerial photography, and wireless monitoring. The most typical distributed video coding system is Wyner-Ziv video coding system [4244] proposed by Stanford University and prism system proposed by the University of California, Berkeley [45]. However, most researches are based on the WZ video coding system framework of Stanford University. Among them, the more typical scheme is the DISCOVER scheme [46] proposed by the research group funded by the European Union IST FP6 in 2007, which is one of the best schemes in the video coding system.

The Wyner-Ziv video coding system based on transform domain integrates a variety of excellent algorithms and technologies. Its framework is shown in Figure 1.

The Wyner-Ziv video coding system adopts the frame structure of intracoding and intercoding. According to whether there is discrete cosine transform (DCT) in the system, it can be divided into WZ video coding system based on pixel domain and WZ video coding system based on transform domain. Because DCT has spatial correlation, WZ video coding system based on transform domain has better rate-distortion (RD) performance, and its system framework is shown in Figure 1.

For the encoder, according to the order of parity, the input video sequence is divided into Wyner-Ziv frame and key frames. Among them, the key frames adopt the traditional intracoding. Next, macro-block-based discrete cosine transform is performed on each WZ frame. According to the position of the DCT coefficients in the macroblock, we combine all coefficients. And then, it forms a DCT coefficient band , where represents the coefficient. Secondly,level uniform quantization is carried out for each DCT coefficient band, extracting the bit plane of all quantized values in the quantized coefficient bandand sending the bit plane to the LDPC encoder for coding. Finally, the system stores the verification information generated by each bit plane in the buffer, and the decoder requests to send the verification information through the feedback channel to assist the decoding process.

For the decoder, first, the system utilizes key frames to generate side information SI frames through motion-compensated frame interpolation techniques or extrapolation techniques. Secondly, the Laplacian distribution model is constructed by the residual statistical properties between the DCT coefficients corresponding to the WZ frame and the side information frames. The system requests the encoder to send check information through the feedback channel and then uses the side information coefficient band for decoding. Next, combine the bit planes of all decoded coefficient bands to form a decoded quantized coefficient band. Through the SI coefficients, the DCT coefficient band is reconstructed to form a reconstructed coefficient band . And inverse discrete cosine transform (IDCT) is then performed on to obtain the decoded WZ frame. Finally, the system reassembles the decoded frames to obtain the decoded video sequence.

3.2. Side Information (SI) Generation Algorithm Based on MCTI

The quality of SI largely determines the RD performance of Wyner-Ziv system. The higher the quality of SI, the higher the correlation with the initial WZ frame. To achieve the target decoding quality, the encoder needs to transmit fewer check bits to the decoder, so the overall RD performance of the system is higher.

At present, the typical side information generation method is motion-compensated time domain interpolation (MCTI) in the Wyner-Ziv coding framework. MCTI uses the motion estimation method, the motion trajectory of the interpolated frame is found based on the time domain and spatial domain information between the pre- and postdecoded reference frames, and then, the SI is generated through motion compensation. Finally, the weighted vector median filter is used to smooth the motion vector, so as to enhance the continuity of the motion vector in space and time domain. Other methods also include updating and iteration of side information and sending check information from the encoder.

3.2.1. Algorithm Framework for Generating SI Based on MCTI

MCTI used in the Wyner-Ziv coding system is based on the scheme proposed in [47]. The framework for generating side information using MCTI is shown in Figure 2.

The scheme mainly includes frame interpolation structure definition module, forward motion estimation module, bidirectional motion estimation module, spatial motion smoothing module, and bidirectional motion compensation module.

3.2.2. Frame Interpolation Algorithm

MCTI uses the maximum interframe distance of decoded reference frames and to obtain side information by interpolating half the time interval.

3.2.3. Forward Motion Estimation

MCTI takes as the reference frame and obtains the motion vector of each macroblock of by estimating the correlation between the key frames. The main parameters include search window size, search range, and search step. Its structure is shown in Figure 3, where and are the motion vectors.

3.2.4. Bidirectional Motion Estimation

Due to forward motion estimation, side information frames generate overlapping and uncovered regions. Bidirectional motion estimation further improves the motion vector, and its process is shown in Figure 4. The interpolation frame is divided into nonoverlapping frames; all motion vectors passing through this frame are used as candidate motion vectors. And then, the center of the interpolation block is selected as the motion vector of the subblock of the interpolation frame. To improve the quality of the interpolated frame and increase the linearity of the motion vector, the subblock of the interpolated frame can be used as the midpoint of the motion vector.

The vector of forward motion estimation is . According to the target motion and the temporal distance from the interpolation frame to and which are equal, the bidirectional motion vector of the corresponding interpolation block in the interpolation frame is .

3.2.5. Spatial Motion Smoothing

The motion vectors obtained by bidirectional motion estimation have low spatial correlation, which leads to errors in the interpolated frames, so the error can be reduced by the spatial smoothing algorithm. A weighted motion vector median filter (WMVMF) is used to reduce the influence of error vectors on the quality of the interpolated frame. WMVMF maintains the spatial correlation of the motion field by finding candidate motion vectors in the neighboring blocks, and its structure is shown in Figure 5. The motion vectors of 8 macroblocks surrounding the filter block are selected for calculation.

represents the motion vector of the block to be filtered, and to represent the motion vectors of eight macroblocks adjacent to the block to be filtered.

3.2.6. Bidirectional Motion Compensation

MCTI fills the interpolated frame by bidirectional motion compensation. Assuming that the temporal distance between the interpolated frame and the preceding and following key frames is the same, when bidirectional motion compensation is performed, each reference frame has the same weight. Let the size of the motion vector of the interpolation block be . Motion compensation is performed as follows: where represents the motion compensation obtained at and and represent the decoded before and after reference frame.

3.3. Video Error-Resilience Scheme of Video Transmission under Underwater Acoustic Channel

The UnderWater-WZ system framework under the underwater acoustic channel proposed in this paper is shown in Figure 6. It is based on the frequency domain Wyner-Ziv framework of Stamford Bridge University. The difference is that the UnderWater-WZ framework designed in this paper includes a gray mapping module, an LDPCA codec module, a side information quality enhancement module, and an optimal design of the intraframe quantization matrix for DCT. The encoding and decoding process is as follows: (1)Encoder side: first, the encoder divides the video sequence into frames in the order of parity. For key frames, format conversion is performed to convert video frames in BMP format into data in YUV format required for encoding and transmission. Then, the MJPEG algorithm is used for intraframe coding. For the WZ frames, the same format conversion is performed first, followed by discrete cosine transform and DCT intraframe quantization matrix processing. For the two adjacent reference frames ( and ), first we perform DCT on each macroblock in the WZ frames and group colocated coefficients together in each macroblock to form coefficient band , . level quantization is performed on each coefficient band. Afterwards, the quantization interval of the DCT coefficients is mapped into a binary codeword by using gray mapping, and the bit plane is extracted. The extracted bit planes are sequentially sent to the LDPCA encoder for encoding in the order from MSB to LSB, and the verification information generated by the encoding is first stored in the buffer. The encoder transmits part of the verification information to the packet loss channel through the feedback channel according to the request of the decoder. This channel environment is simulated according to the marine and finally enters the decoder for decoding(2)Decoder side: since the key frames adopt the MJPEG algorithm encoding mode and the bilinear interpolation algorithm is a typical algorithm in spatial error concealment, this algorithm is used to mask the error of the key frame with errors. For decoded reference frames and , MCTI is used to generate an estimate of the initial WZ. Side information quality can be significantly improved by utilizing MCTI-generated estimates, intradecoding blocks, and motion-compensated quality enhancement modules. And the correlation noise between WZ frames and side information is modeled by the correlation noise model based on the Laplacian distribution, and the side information is converted into soft input information. The parameters of the Laplacian distribution are estimated online at the coefficient layer. After that, the improved side information is sent to the LDPCA decoder, and the gray code is demapped, the code stream correction and interpolation restoration are processed, and finally, the refactor module is entered. When all the DCT coefficient bands are reconstructed, the decoded WZ frame is obtained by IDCT and then merged with the decoded key frames, obtaining the video sequence processed by the video error-resilience scheme designed in this paper

3.4. Optimization of WZ Frame Encoding and Decoding Method

In the Wyner-Ziv system, the key frames are usually compressed using the H.264/AVC coding standard. Its advantage is that H264/AVC algorithm removes interframe redundancy and performs intra- and interframe compression. Therefore, the algorithm can provide better compression ratio.

In general video compression and coding algorithms, the compression action depends on the correlation of data. The higher the compression rate, the stronger the correlation between video data. However, the wireless underwater acoustic communication method used for video transmission in marine environment has three defects: narrow bandwidth, high delay, and error-prone transmission. This leads to strong data correlation not suitable for wireless underwater acoustic communication, and it causes a small part of packet loss to result in a large part. For example, in the H.264/AVC compression coding method in underwater acoustic channel, the transmission error of 1-bit data will lead to the loss of 1 second of video sequence data.

The video error-resilience scheme of UnderWater-WZ is mainly aimed at controlling the bit error rate and image restoration. Therefore, for controlling the bit error rate, we combine the motion joint photographic experts group (MJPEG) compression coding algorithm with the Wyner-Ziv system, and we design a video error-resilience system based on low correlation coding. MJPEG was originally developed for multimedia PC applications. Now, it is used by video capture devices such as digital cameras, IP cameras, webcams, and nonlinear video editing systems. MJPEG processes a continuous sequence of video frames as continuous still image frames, and each frame is compressed by JPEG algorithm. Without considering the correlation between frames, there is a lack of interprediction, and frames are compressed independently, so it is suitable for processing video sequence framing.

Although the compression efficiency of MJPEG is limited, it can control data packet loss in a stable range and adapt to the special nature of the marine environment. Due to the harsh marine environment, H.264/AVC relies more on hardware devices in transmission, but MJPEG does not have high hardware requirements.

3.5. Improved Generation Method Based on Macroblock Side Information

MCTI is only applicable to the simple linear motion of the target. And in the marine environment, there are often violent collisions and obvious light changes, so the original scheme cannot be applied to the marine environment.

Most WZ video coding frameworks use MCTI to generate side information. In order to adapt to the video sequence of complex scenes under the marine environment, UnderWater-WZ system improves the SI generation method in WZ system. We combine MCTI with additional information of coding segment transmission, which can cope with video transmission in complex environments, so as to produce higher-quality side information.

Combined with DISCOVER coding technology, the scheme framework proposed in this section includes gray mapping, adaptive LDPCA codec, and side information quality enhancement module.

3.5.1. Gray Mapping

Gray mapping is widely used in digital communication, which can significantly improve the correlation between initial information and side information and can better map smaller Euler distances to smaller Hamming distances, making encoded data less susceptible to complex environments.

The natural binary code is usually used in the WZ codec to describe the quantization symbols. The Hamming distance between the binary codes is obviously higher than that of the gray mapping. The corresponding relationship between the two coefficients near zero is shown in Table 1.

The quantized DCT coefficient is converted into a binary form using natural binary as follows: where represents the number of bits required for the DCT coefficient band and and represent the most significant and least significant bits, respectively.

Then, the above binary code is converted into a gray code by the following equation as follows: where , and the XOR calculation has low-complexity characteristics, so the added complexity can be ignored.

At the decoding end, after the bit planes in the DCT coefficient band are decoded, the natural binary code shown in equation (4) is obtained by inverse gray mapping as follows: where indicates the decoded most significant bit.

The smaller the Hamming distance between the codewords, the smaller the bit error rate (BER) of the bit stream of the correlation between the initial WZ information and the corresponding side information. Replacing the natural binary code with gray mapping can significantly reduce BER and improve the DSC compression efficiency.

3.5.2. Rate-Adaptive LDPCA Codec

Since LDPC codes have poor source compression performance, a rate-adaptive LDPC codec composed of LDPC code concomitant encoders and accumulators is proposed, and its system block diagram is shown in Figure 7.

First, add the syndrome code stream output by LDPC encoding to modulo 2 to obtain . And stored in the buffer area, according to the request of the decoder, the check bits in the encoder are segmented and transmitted to the decoder. During decoding, the decoder first performs modulo 2 differential on the received parity bits to obtain a concomitant code stream and then performs LDPC iterative encoding.

3.5.3. Quality Enhancement Module for Side Information

When there is severe shaking and obvious illumination changes in the deep-sea environment, the quality of side information will deteriorate accordingly. The MCTI scheme will fail. In this paper, a quality enhancement module based on motion compensation is proposed to be applied in the UnderWater-WZ system. The basic idea is to improve the quality of the intraframe coding fast , thereby improving the quality of SI. The framework is shown in Figure 8, which is mainly composed of three parts, namely, block estimation, block estimation, and generation of side information .

(1) Block Estimation. blocks are estimated by exploiting the temporal correlation between frames of a video sequence. For intracoded block , the symmetrical motion vector pair obtained by bidirectional motion estimation is . According to the classical linear motion model, the initial frame is linked with the macroblocks in the same position in the preceding and following reference frames and as follows: where and both obey a normal distribution with a mean value of 0 and the decoded reference frames and before and after the quantization noise processing as follows: where and represent the quantization distortion at .

For a uniform quantizer with a quantization step size of , the quantization distortion is approximately uniformly distributed in the interval , and the variance is . Under the action of the quantization distortion, the quality of the macroblock is estimated by the variance of the quantization distortion, as follows:

According to the vector obtained by bidirectional motion estimation and the decoded reference frames, the motion-compensated version and of block is obtained, as shown in formulas (13) and (14).

where and correspond to the residuals of the motion estimation process, respectively.

By combining the motion compensation of the reference frames with the corresponding intraframe decoding block, the corresponding block can be obtained, as follows:

The mean square error between and blocks is smaller than that between and blocks. Therefore, when the mean error is zero, the block should satisfy the following: where is the macroblock size and is the quantization step size of the intraframe macroblock . Then, the error variance of block is as follows:

When the block satisfies the condition of (16), that is, the sequence correlation is low or the intraframe decoding quality is low, motion compensation is performed, and the obtained macroblock is used for estimation. The rest will do block estimation.

(2) Block Estimation. For macroblocks that do not satisfy (14), the corresponding MCTI estimation block will be combined with the weighted average to obtain , as shown in (16). where represents the weight of the intraframe decoding block , and the higher the quality of the intraframe decoding block, the greater the weight of the block in the estimation of the corresponding block. The quality of block is evaluated as follows:

The variance of the matching error between the front and rear reference frames can be calculated as follows:

Finally, the weight is calculated as follows:

Among them, quality of estimated .

(3) The Side Information . For the generation of side information, only the corresponding macroblock in the reference frame needs to be used as the data block of the side information frames. The generation of SI in this scheme can reduce the number of parity bits that need to be transmitted to the decoding end, thereby improving the RD performance. Secondly, the data amount of the LDPCA encoder can also be reduced, which is beneficial to improve the decoding performance.

The motion compensation quality enhancement is to select a suitable candidate block from the two reference frames before and after and combine the low-quality intraframe block to obtain a higher-quality macroblock, thereby generating high-quality side information.

3.6. DCT Intraframe Quantization Matrix Design

Because there are many interference factors such as creatures, light, and water flow in marine videos, the scenes of the videos change frequently and violently. In order to reduce the visual sensitivity to scene changes, so as to obtain better visual quality and compression effect, Bossen et al. [48] propose a spatial frequency calculation method, and we design an intraframe quantization matrix based on DCT according to the weighted quantization matrix proposed by Shang et al. [49]. Using to weigh different change frequency coefficients, people will weaken the changes to the scene. represents the sensitivity corresponding to the position coefficient, and its calculation is as follows: where , , , and are all constants and represents the spatial frequency corresponding to the position, and its expression is as follows:

After different frequency coefficients are weighted and adjusted, it has the same sensitivity to the scene, so different coefficients should be quantized with the same length after weighting, as shown below: where represents the transform coefficient value, the uniform quantization step size is used for different weighted coefficients, and the quantized coefficient is .

This process is equivalent to quantizing the corresponding coefficients with different quantization step sizes , so the DCT coefficient intraframe quantization matrix can be obtained as shown below:

4. Experiments

In order to simulate the real underwater acoustic channel environment, it is necessary to establish a simulator to realize the real environment simulation of underwater video transmission, so as to verify the UnderWater-WZ. The framework is mainly composed of three main modules, namely, underwater acoustic channel simulator, packet loss channel simulator, and video encoder/decoder. Under this framework, the video packet loss situation in the deep-sea can be simulated, and the specific packet loss rate can be set, taking into account the interference of network and bandwidth on video transmission.

4.1. Simulation of Marine Video Transmission Environment

(1)The video transmission process uses Wi-Fi to transmit video frames. The input end is responsible for encoding and sending video frames, and the output end is responsible for decoding and saving video frames(2)In order to simulate the packet loss channel, the experiment uses the UDP protocol, which uses its nontransportability and does not retransmit after packet loss. The input end preprocesses the frames before sending each video frame, dividing the frames into multiple parts. When data is transmitted over Wi-Fi, some of these chunks are randomly discarded. After the output end receives the data with missing chunks, then the frames enter the video error-resilience system designed in this paper to restore the frames to form a new video sequence(3)To evaluate video latency and video quality, we set up one video encoder node to be underwater and another node to be above water. In order to simulate the underwater video transmission environment, the experiment adopts the bandwidth of 200 K, the protocol which is 802.11 N, and the rate of MCS7

4.2. Experimental Parameter Settings

We evaluate the proposed UnderWater-WZ error-resistant coding system on the Brackish Dataset [50]. The simulation experiment uses five video sequences, Crab, Fish-big, Jellyfish, Fish-school, and Fish-small-shrimp, with different motion characteristics in the dataset. Some video frames in the dataset are shown in Figure 9.

In the experiment, the first 100 frames under different bit error rate environments were tested. The video size is CIF (), video format Y : U : V is 4 : 0 : 0, only the luminance component is tested, and the video frame rate is 15 bps.

In UnderWater-WZ, traditional MJPEG intraframe coding is used for key frames, and LDPCA coding is used for WZ frames. In the coding scheme designed in this paper, the block size is , and the quantization matrix is 2. The adopted quantization step size is represented by , wherein different quantization parameters () correspond to different quantization step sizes. Through a large number of simulation experiments, the corresponding value of and is shown in Table 2 so that a good rate-distortion effect can be obtained.

In the next section, the experimental results are compared in detail and comprehensively, so as to verify the performance and video quality of the scheme proposed in this paper in wireless channel transmission.

4.3. Evaluation Settings

To evaluate the reconstructed video quality when receiving data, we use a video quality evaluation metric: average peak signal-to-noise ratio (PSNR) and structural similarity (SSIM).

MSE is the mean squared error between all pixels of the decoded video and the original video, calculated as follows: where and represent the pixel values of the original image and the decoded image at the coordinate position , respectively, and represents the image size.

In our case, we also calculated the average video peak signal-to-noise ratio between the reconstructed received video and the original video. As shown in [51], for channel error or data loss conditions, the average peak signal-to-noise ratio is calculated as follows: where is the number of bits used to encode the brightness of the pixel, and in the standard case of the image, the components of the pixel are encoded in 8 bits. is the average of the frame-by-frame MSE values.

SSIM usually evaluates the similarity of two images through the following three aspects: luminance, contrast, and structure. Given two images and , the structural similarity of the two images can be obtained as follows: where is the mean of , is the mean of , is the variance of , is the variance of , and is the covariance of and . And , and , where is the dynamic range of pixel values, , and .

4.4. DCT Intraframe Quantization Matrix

The DCT-based intraframe quantization matrix designed in this paper reduces visual sensitivity and enhances video quality by processing different frequency coefficients. The experiment is verified by Crab under different quantization step size QP, and the relationship between QP, bit rate, and PSNR is obtained, and the experimental comparison is carried out. The experimental results are shown in Table 3.

Summarizing the experimental results, compared with the nonquantization matrix, the quantization matrix designed in this paper further reduces the average bit rate of the test video while improving the PSNR quality of the image. Under different quantization step sizes, the Crab video sequence bit rate is reduced by about 1.93%, 1.89%, 2.13%, and 1.99%, respectively. The video quality PSNR gains were 1.07 dB, 0.8 dB, 1.16 dB, and 0.69 dB. Experiments show that, without changing the structure of the coding model, by designing an intraframe quantization matrix for DCT, the average video bit rate can be further reduced while improving the video quality.

4.5. Side Information Quality Comparison

The video error-resilience scheme designed in this paper includes the improvement of the side information generation method; that is, the MCTI is combined with the check information transmitted by the encoder to generate high-quality side information. The side information generation methods of the video coding system based on the Wyner-Ziv framework are all through MCTI.

This experiment is verified by the first 100 frames of Crab video sequence under three different resolutions (, , and ). Under the premise of 5%, 10% and 15% packet loss rate, compare with the Average interpolation (AI), Motion compensated time interpolation(MCTI) and MCTI+ Verification information (VI) method. The obtained side information PSNR (SI PSNR) quality curve is shown the figure below.

For the resolution ratio of , the quality of side information generated in the first 100 frames under different packet loss rates is shown in Figure 10.

For the resolution ratio of , the quality of side information generated in the first 100 frames under different packet loss rates is shown in Figure 11.

For the resolution ratio of , the quality of side information generated in the first 100 frames under different packet loss rates is shown in Figure 12.

It can be concluded from the experimental results that the quality of side information obtained by MCTI and MCTI+VI methods is obviously better than that of AI, so MCTI must be used to obtain high-quality side information. In addition, with the improvement of resolution ratio, the quality of side information is gradually improved. Therefore, improving the resolution ratio of video sequences is also one of the ways to improve the quality of side information. The scheme in this paper combines MCTI with check information to generate better side information. Compared with AI, the MCTI+VI method can achieve a maximum SI PSNR gain of about 10 dB in the first 100 frames of video sequences. Compared with MCTI, SI PSNR can be improved by about 4 dB at most. In addition, the lower the packet loss rate, the more obvious the effect of the MCTI+VI method. In addition, the resolution ratio is changed from to in the experiment; the SI PSNR can be increased by about 3.8 dB.

The way of generating side information determines the RD performance of a video coding system. The more similar the side information generated by the encoding end and the decoding end, the higher the RD performance of the reconstructed video, and the better the effect of video error resilience.

4.6. Coding Complexity

In the experiment, we evaluate the coding complexity of the UnderWater-WZ encoder from the running time of the deep-sea video and compare the processing time of HEVC and VVC coding. In this section, the experiment verifies five types of test sequences. For test conditions, the resolution ratio is , the QP parameter is 22, and the experimental results of coding complexity are shown in Table 4.

From the experimental results, we can see that the running time of HEVC coding is nearly 5 times that of VVC coding. It can be seen that compared with HEVC coding, the coding complexity of VVC is quite high, and the main reason of high complexity is that recursive multitype tree partitioning is used in coding. However, for UnderWater-WZ coding, although the running time cannot reach the level of HEVC, the running time is significantly reduced compared with VVC coding.

Through the above experimental results, it shows that the low correlation coding method used in UnderWater-WZ coding effectively reduces the complexity compared with VVC while ensuring that the coding efficiency does not decline significantly.

4.7. Video Quality

In order to verify the effectiveness of the video error-resilience scheme based on underwater acoustic channel designed in this paper, set , is 12, and verify that under different PLRs, the five video sequences, Crab, Fish-big, Jellyfish, Fish-school, perform video error-resilience transmission through the simulated marine video communication environment, and then, we obtain reconstructed video with the error-resilience processing. Finally, the experimental results under the error concealment algorithm and HEVC, VVC, and UnderWater-WZ coding systems are compared. The experimental results are shown in Figure 13.

It can be seen from the above experiments that with the increase of the packet loss rate, the PSNR after reconstruction by the error concealment algorithm has the most obvious downward trend, indicating that the video quality after this method is the worst. Based on the HEVC and VVC coding system and the UnderWater-WZ coding system designed in this paper, it is obvious that the quality of the reconstructed video image does not decrease significantly. In contrast, in the simulated deep-sea environment, the video quality after the UnderWater-WZ video encoding system is the best. Even when the packet loss rate is as high as 20%, the scheme proposed in this paper can improve the PSNR of the reconstructed image in the case of bit errors to varying degrees. PSNR can be improved by about 2.6~3.5 dB at most.

Figure 14 shows the rate-distortion performance curves of the reconstructed videos after the error concealment algorithm and HEVC, VVC, and UnderWater-WZ coding processing for three video sequences when the packet loss rate is set to 5%. Here, the error concealment processing adopts the typical bilinear interpolation algorithm in the spatial domain error concealment algorithm. Compared with the error concealment algorithm, the other three schemes can significantly improve the video quality after reconstructing the video. In particular, as the bit rate increases, the video quality improves most obviously when the quantization step size increases. In the deep-sea environment, compared with HEVC, the UnderWater-WZ scheme can improve the PSNR of the video sequence by about 2.8~3.4 dB at most. Compared with the VVC, the PSNR of the video sequence can also be improved by about 0.8~2.2 dB at most.

In order to further verify the improvement of video quality, we evaluate the SSIM indicators processed by the HEVC, VVC, and UnderWater-WZ systems for three video sequences with different step lengths. The data are shown in Table 5. It can be seen from the data in the table that as the step size increases, the SSIM is also decreasing. Compared with the HEVC system, the video sequences processed by the UnderWater-WZ system have higher SSIM. For the Fish-school and the Fish-small-shrimp sequences, since the target of the video is too small, the bit rate obtained is also small, resulting in insufficient quality improvement. In the follow-up work, the work will also focus on small target videos for further research.

The experimental results show the video error-resilience scheme (UnderWater-WZ) designed in this paper under the underwater acoustic channel. Through the coding scheme that reduces the data correlation and the improved macro-block-based side information generation method, with the assistance of the feedback channel, the bit error rate of video transmission in the deep-sea environment is successfully reduced, and the video quality is significantly improved. Compared with other video error-resilience schemes, the reconstructed video quality obtained by UnderWater-WZ is better, indicating that the video transmission under the underwater acoustic channel is more suitable for processing by the UnderWater-WZ system.

5. Conclusions

In order to solve the problems caused by wireless video transmission in the marine environment, in this paper, a new video error-resilience scheme (UnderWater-WZ) is formed by improving the Wyner-Ziv coding system for high-quality multiview video transmission in marine environment. The implementation process includes controlling error range by using MJPEG coding, combining motion compensation time interpolation with calibration information to generate high-quality side information. And intraquantization matrix is designed to weaken the change of video scene.

The experimental evaluation results show that in the simulated underwater video transmission environment, after processing the proposed scheme, higher-quality side information can be obtained. The reconstructed video quality obtained by the scheme is significantly better than the Wyner-Ziv system. The PSNR of the video can gain up to 2.6~3. 5 dB. In future work. we will discuss a scheme for multiview video transmission with low latency. We will divide each panoramic video frame into multiple tiles and send the tiles based on the display probability of each tile. In the follow-up work, the intraframe quantization matrix designed in this paper can be extended to the interframe quantization matrix to further improve the coding efficiency of HEVC. In this way, video error-resilience processing in low-latency transmission of underwater panoramic video is realized.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this paper.


This work was supported by the National Natural Science Foundation of China (Grant No. 62001035) and the R&D Program of Beijing Municipal Education Commission (Grant No. KM202111232018).