Abstract

With the development of film and television industry and the rise of new media short video, film and television postproduction process has higher and higher requirements for image quality. In film and television postproduction, optimizing the image quality can enhance the resolution and make the image more vivid and detailed. High-quality image can fully embody the value of film and television, as well as promote the development of new media short videos. This paper optimizes image quality by improving image processing technology, thus improving the quality and value of film and television and new media short videos. In this paper, a convolutional neural network combined with a nonlinear activation function is used to establish an improved image processing technology model to efficiently extract image features. This technology can enhance the ability to extract image features, improve the accuracy of image feature extraction, and then improve the image resolution and details, thus improving the image quality in the process of film and television postproduction. The results show that the average value of PSNR is 30.29. The average value of PSNR of the proposed algorithm is higher than that of other algorithms, indicating that the error between the image processed by the proposed algorithm and the original image is small. The average SSIM of the algorithm in this paper is 0.903, which is closer to 1. Compared with other algorithms, the structure processed by the algorithm in this paper is more similar to the original structure, resulting in a better graph. The algorithm in this paper has the best performance on both the peak signal-to-noise ratio (PSNR) and the structural similarity (SSIM). The improved image processing technology proposed in this paper can effectively improve the accuracy of image feature extraction, making film and television or new media short video images of higher quality.

1. Introduction

With the development of new media short videos, the requirements for image quality have gradually been improved. Image quality is not just about preshooting or production [1]. And image quality is affected by film and television postprocessing [2]. Therefore, film and television images can be optimized and improved through film and television postprocessing [3]. The important purpose of creating film and television works is to improve the acceptability of the audience [4]. In order to adapt to the development of film and television, film and television production must be combined with the characteristics of new media production to form works that meet people’s aesthetic characteristics [5]. Film and television postproduction is the key factor affecting film quality [6]. In the era of new media, audiences have accepted many films and television works. The audience has higher and higher evaluation requirements for film and television works [7]. If the works created after the feature are not outstanding, it cannot attract the attention of the audience [8]. Such productions are also more likely to meet the needs of most people for movies and short videos, and are recognized and welcomed by people.

In the past, film and television postproduction processes have seen continuous development in image processing technology [9]. Among the common image processing techniques, adjusting image sharpness and color saturation is one of the most efficient processing techniques [10]. It is generally applied to black and white image processing, pseudocolor image processing, and color image restoration [11]. Some technical measures are also used to improve the image quality [12]. Image enhancement technology can further optimize image processing and improve the original image quality. But the simple image processing cannot realize the parallel operation or complex operation of each technical algorithm by virtue of the sequence instruction [13]. Some video processing procedures cannot be implemented. The image processing technology is not only to optimize the image but also to improve the accuracy of information processing to a certain extent and further realize the rapid image processing on the basis of speed optimization [14]. Due to the rapid progress of science and technology in recent years, image processing technology is becoming more and more mature. Intelligent algorithms and neural network models have been widely used in image processing. An excellent image processing model is established by the algorithm. High-quality and resolution images are gradually obtained. Image processing technology can improve the resolution of the image and get more accurate superresolution image.

In recent years, various intelligent algorithms and neural network models have been widely used in image superresolution reconstruction [15]. These methods can be modeled using image databases or images themselves [16]. Designing learning strategies and training the learning ability of models to actively learn or find the relevant information between high-resolution images and low-resolution images is necessary. Finally, high-resolution images are generated by specific functional layers [17]. With the development of technology, some people propose to introduce convolutional neural network (CNN) into the superresolution reconstruction problem. CNN consists of three convolutional layers: image block extraction and representation, nonlinear mapping, and final reconstruction. According to sparse coding and feature learning theory, the end-to-end learning process from LR to HR image is realized [18]. Researchers have found that CNN is hard to train. It is very sensitive to hyperparameter variations and has only three convolutional layers, leaving much to be desired [19]. The method of deep edge guidance feedback residual network, which decompositions image signals into different frequency bands, reconstructs them and then combines them [20]. In this way, important details of images can be retained, which preliminarily solves the problem that CNN has not fully developed prior information of images and has lost details.

In this paper, convolutional neural network combined with swish activation function is used to establish an improved image processing model. This algorithm has the ability to enhance image feature extraction. It has a good effect on image feature extraction. This is conducive to the improvement of image resolution and image details. Convolutional neural networks can achieve the optimal network model by training the multilayer network structure. Swish is an automatic gated activation function and has good practicability. Swish activation function has the characteristics of sparse, nonmonotonic, and smooth. It reduces the phenomenon that the neurons may not be activated when the large gradient flows through and makes up for the processing of important details of the convolutional neural network. By improving convolutional neural network and using nonlinear activation function to establish an improved convolutional neural network model, the features of the image are extracted efficiently, and PSNR and SSIM are used for quantitative testing. At the same time, it is compared with other algorithms to improve the accuracy of image feature extraction and then improve the detailed performance of the image with superresolution and to enhance the image quality.

The contributions of this article are listed below: (1)In this paper, we optimize image quality by improving image processing techniques, in order to enhance the quality and value of short videos in film and new media(2)A convolutional neural network combined with a nonlinear activation function is used to build an improved image processing technique model to extract image features efficiently(3)Through experiments and comparisons, it is proved that the algorithm in this paper can enhance image feature extraction and improve the accuracy of image feature extraction. It improves the image resolution and image details, thus improving the image quality in the postproduction process of film and television

2. Methodology

2.1. Image Processing Technology
2.1.1. Image Compression

In image processing, most people use the image compression function. The purpose of compression is to improve the transmission speed of image information. In this process, the image data is not lost, and the image distortion is avoided to the maximum extent. In terms of operation effect, the current compression methods are mainly divided into two categories: lossy compression and lossless compression. The two compression methods are very different in nature. Therefore, the final effect of compression and the time used for compression also vary greatly.

2.1.2. Image Denoising

In general, there are a series of problems such as quantized noise or Gaussian noise when making images. However, the noise is not caused by the influence and interference of the original signal. Most of the software quantization processing caused by internal devices is not in place, thus affecting the image quality, reducing the aesthetic effect of image presentation, and diminishing the value of the image. To address this, web designers will use median filtering or mean filtering technology to expand image denoising processing. The technique is to change the arrangement of pixel information in an image. Gray value arrangement is adopted, and then, the intermediate value or noise intensity is numerically processed. Ensure that corresponding fuzzy and intermediate values can be inserted as required. Finally, the image noise reduction processing is completed. Figure 1 shows the noise reduction filter.

2.1.3. Image Enhancement

At present, the main processing methods of image processing and enhancement technology are as follows: image filter, image cutting, image feature extraction, and image sharpening. The image is preprocessed by corresponding image processing software to enhance the characteristics of computer images, thereby improving the quality of computer image processing. Among the common image processing techniques, adjusting image sharpness and color saturation is one of the most efficient processing techniques. It is generally applied to black and white image processing, pseudocolor image processing, and color image restoration.

2.2. Convolutional Neural Network Algorithm

Convolutional neural network is a functional network with local connections. The network structure is developed on the basis of multilayer neural networks. There are many researches on image enhancement and image denoising. At the same time, as a local connection network, compared with fully connected neural network, it has two characteristics: local connectivity and weight sharing. Convolutional neural networks have relatively flexible changes. The composition of convolutional neural network is shown in Figure 2.

2.2.1. The Input Layer

For a computer, as shown in equation (1), an image is typically represented as an array of three-dimensional numbers. where means that the width of the picture is 250 pixels and means 400 pixels high. Therefore, the image consists of numbers. There are 300,000 numbers, and each number is a pixel, which is an integer. These numbers range from 0 (black) to 225 (white), which are ready for processing. Image recognition is a series of operations to convert these numbers into a single label. The input layer is the input set consisting of images. Each image is the same size, and each image is labeled with one of different categories. This paper refers to these data as the training set.

Convolution is actually a common linear filtering method in image processing. It can process the input image to achieve image noise reduction, sharpening, and other filtering effects. The operation in the convolutional neural layer can be expressed as where represents the bias term of the sensing domain, is the size of the convolution kernel, and represents the input excitation of the input layer at .

In simple terms, for each pixel, the product of its neighborhood pixel and the corresponding position of the convolution nucleus is calculated. All products are then summed, plus the bias term, as the output value of the pixel. The convolution kernel slides through each pixel of the image in turn to output a new image. In this paper, the structure of convolution kernel is usually expressed in the way of , which is generally square. On the contrary, if is smaller, the image can be processed more carefully in order to reach a bigger .

For the same effect of feature extraction, more layers should be added to the design structure, which will make the whole structure more discriminative.

2.2.2. The Activation Function

There are two types of activation functions commonly used in convolutional neural networks. These are saturated nonlinear activation function and unsaturated nonlinear function, respectively. And they each contain two kinds of functions. The saturated nonlinear activation function is shown in equations (3) and (4). Equation (3) is tanh function.

As shown in Figure 3, when a real-valued number is input into the tanh function, the function compresses it in the range of -1~1. Large negative numbers are mapped to -1, while large positive numbers are mapped to 1. Around zero is a fairly stable state. Equation (4) is the sigmoid function. The sigmoid function is often used as the activation function of the neural network, mapping variables between 0 and 1.

But since it is antisymmetric around zero, that is not very typical of neurons in living organisms. At present, these two functions are rarely used in convolutional neural networks.

In unsaturated nonlinear activation function, equation (5) is the Relus function. Equation (6) is the Softplus function.

As shown in Figure 4, the Relus function will set all the data less than 0 to 0, indicating its ability to sparse data expression. Data greater than 0 will keep its own value unchanged, without nonlinear correction. The Softplus function curve corrected the data nonlinearity. However, it lacks the ability of sparse expression.

It is usually considered that the Relus function has sparse expression ability. However, there is no linear correction for data greater than 0. And Softplus function has the advantage of smooth characteristics. Generally, the two functions are comprehensively applied and denoted as Ren-Softplus function. The equation is shown in equation (7). The curve is shown in Figure 5.

According to equation (7), this activation function not only has the ability of sparse expression when the data is less than 0. Moreover, the nonlinear correction is made when the data is greater than 0. It effectively improves the ability to express the model.

2.2.3. Pooling Layer

After the image is processed by convolution operation and activation function, each pixel contains a lot of information. But some of this information is unnecessary. If we continue to carry on the following operations with this information, the performance of the overall algorithm and the recognition accuracy of the image will be greatly reduced. For each region, there are two calculation methods. Average the numbers in this area (avgpooling) and directly take the largest number in this area (maxpooling). In the neural network, the higher the value is, the higher the importance of the feature is. Taking the largest number in the region is equivalent to taking all the important features of the region. Therefore, maxpooling is often used in convolutional neural networks at present.

2.2.4. Fully Connected Layer

If the previous operations such as convolution layer, pooling layer, and activation function are to convert the original image into a 3D feature map, then the fully connected layer will map features in the 3D feature map that cannot be directly displayed into vector form, which facilitates later classification. In practice, it can be realized by convolution operation.

2.3. Construct Image Processing Technology Model to Optimize Image Quality
2.3.1. Model Establishment

In this paper, a 13-layer convolutional neural network is used and swish is used as the activation function. Firstly, the input image is preprocessed by bicubic interpolation in the first layer so that it can reach the required LR image. Secondly, because swish is adapted to local response normalization, the advantages of swish are fully integrated in the second to 11th layers. LR image blocks are obtained by feature extraction of LR images. Then, the HR image block is obtained by learning the feature mapping relationship between LR and HR image blocks in layers 12 and 13. Finally, the reconstructed HR image is obtained by overlapping and averaging methods. The network structure is shown in Figure 6.

2.3.2. Feature Extraction and Convolution Operation

The activation function used in the classical neural network model is sigmoid function, which comes from the neuroscience theory. It mainly simulates the excitation when the electrical signal received by the neuron reaches a certain threshold when it is stimulated, and it has a strong explanatory ability. However, researchers rarely use sigmoid function in practice, because using sigmoid will lead to network state oversaturation and gradient loss. Once the activity of a network neuron reaches saturation at 0 and 1, its gradient there will disappear, resulting in almost no signal through the neuron or all signals indirectly through the neuron cannot be activated. Moreover, it can be seen from the output image of sigmoid function that it does not take zero as the center point. This special property leads to the symmetric information that is not centered at zero being transported to the higher level of the neural network behind for processing, so that sloshing occurs in the process of gradient descent. In recent years, the method of replacing sigmoid function with Relus function in deep learning has become more and more popular, and Relus is often used as the activation function in the research. On the computer level, the modulation notation is more practical, and the gradient of the Relus function is only 0 and 1, which is very simple. No matter how the layers of the network model vary, the gradient can be stabilized at an order of magnitude. The function image is shown in Figure 7.

In comparative experiments, this study found that using Relus as the activation function achieved a much faster convergence rate than using the sigmoid function. Relus only needs a threshold to obtain the activation value without going through a lot of complicated operations. When a very large gradient flows through the Relus neuron, after updating the parameters, it will cause the neuron not to activate any data. Therefore, in view of the shortcomings of Relus function, this paper proposes a self-gated activation function form swish as the activation function, as shown in where is the convolution kernel, the size is , is the number of channels in the input image, is the bias vector, and denotes convolution operation.

Since the gradient of Relus is 0 for , this results in the negative gradient being zeroed at this Relus in Figure 8. In the improved activation function in this paper, the values greater than zero and less than zero are represented by swish. Unlike Relus, swish activation function is sparse, nonmonotonic, and smooth. It reduces the phenomenon that neurons may not be activated when large gradient flows through them. When the swish function is at , as the value of continues to decrease, the gradient is infinitely close to zero, but it will not be equal to 0, so neuron necrosis will not occur. In addition, swish is an automatic gated activation function form. Self-gated activation functions have special advantages. In contrast to normal gating, which requires multiple scalars as input, it only requires a single scalar as input. This feature is sufficient to allow Relus, which takes a single scalar as input, to be replaced by swish, a self-gated activation function, without having to adjust the hidden capacity or number of parameters. The experimental results show that when the value is increased to a certain value, the operation results achieve the expected effect. The utility of swish function is proved.

2.3.3. Nonlinear Mapping

In the whole model, the role of layer 1 in the network is to perform feature extraction operations on each LR image. The function of layers 2~11 is to map the extracted feature map nonlinearly. The mathematical expression is shown in equation (9), where is the convolution kernel and is the bias vector. If some vector of dimension passes the nonlinear mapping, the vector of dimension will be obtained; that is, the LR feature image block can be obtained the HR image block through the end-to-end learning theory. If there are convolution kernels, high-resolution image blocks for reconstruction will be generated after convolution.

2.3.4. Training and Optimization

All the traditional neural networks are fully connected. The neurons from the input layer to the hidden layer of the neural network are connected. This results in a huge number of parameters. Complex network training is time-consuming and sometimes impossible to train. Moreover, the CNN learning algorithm must train the multilayer network structure to realize the optimal network model. The CNN algorithm combines local receptive fields, weight sharing, and temporal or spatial subsampling. Thus, a certain degree of invariance of displacement, scale, and deformation can be obtained to avoid difficulties. The learning algorithm needs to reconstruct the error and constantly correct the connection weights and bias in the process of forward propagation and backpropagation. Update the network parameters to optimize the overall network parameters. Forward propagation is essentially the transmission of feature information. It is not different from the forward propagation process of ordinary neural networks. The current layer is denoted by , and the output of the current layer is denoted by . The weight and bias of the current layer are denoted by and , respectively. The forward propagation is shown in where function is the activation function, and swish function is selected in this paper. Backpropagation corrects model parameters through error information. In the process of backpropagation, the error in the larger area of the previous layer should be restored from the error in the reduced area, which is called upsample. There are many forms of loss function in CNN, such as mean square error loss function and cross-entropy loss function, which are commonly used. In this paper, the mean square error function is used as the loss function, as shown in

The above equation describes the training error of sample , where is the accurate value of the data in a batch of sample data and is the predicted value given by CNN. The experimental results show that when swish is used as the activation function, the mean square error loss function has a better prediction effect. The mean square error loss function is shown in Figure 9. The problem of minimizing the sum of squares of errors is usually solved by gradient descent. The gradient descent method obtains the parameters of the training network by finding the direction of the fastest gradient descent. There are two widely used gradient descent methods; one is batch gradient descent method, whose mathematical expressions are shown in equations (12) and (13), where represents the step size, the sample dataset of given image sample is , and represents the sample set of the type image, with samples in total. This method needs to use all the sample data to update the calculation when updating the parameters, and the calculation accuracy is relatively improved.

3. Result Analysis and Discussion

In the experiment, the typical traditional bicubic interpolation literature [21] algorithm, image superresolution algorithm literature [22] algorithm, and literature [23] algorithm based on sample learning fixed neighborhood regression are compared and tested. The main evaluation indexes are PSNR and image SSIM. The error analysis of the processed image and the original image by PSNR should be discussed. Two indexes, PSNR and SSIM, were used to evaluate the processing results of the four algorithms, and the PSNR and SSIM values of each group of images were calculated. Table 1 shows the objective experiment results. The average value of PSNR of the proposed algorithm is 30.29, 29.20 in literature [21], 29.61 in literature [22], and 29.70 in literature [23]. The average value of PSNR of the proposed algorithm is higher than that of other algorithms, which fully indicates that the error between the processed image and the original image is small. The average SSIM of the algorithm in this paper is 0.903, the average SSIM of literature [21] is 0.886, the average SSIM of literature [22] is 0.894, and the average SSIM of literature [23] is 0.892. The average SSIM of the algorithm in this paper is closer to 1. Compared with other algorithms, the structure processed by the algorithm in this paper is more similar to the original structure; that is, the resulting graph is better.

In order to highlight the improvement benefits of the improved algorithm compared to the algorithm in literature [23], this article has conducted a series of training and learning processes, and collected the results of improved algorithms for learning models in various network states. Two different network learning models are obtained for literature [23] algorithm and improved algorithm under the same number of iterations. The above test images were processed with superresolution to obtain the corresponding PSNR and SSIM values. Taking one diagram as an example, the network model obtained by different iterations is used to test and record it. Finally, the trends of PSNR and SSIM were obtained. The results are shown in Figures 10 and 11, respectively. In order to facilitate comparison, the test results of literature [21] and literature [22] are added to the figure. Algorithm literature [21] and literature [22] have an impact on the test results at 10~100 iterations but have no impact on the test results at more than iterations. The results show that with the increase of the number of iterations in the test image, the evaluation index PSNR in the improved algorithm increases from around 26.3 to nearly 28, which is higher than the PSNR index of other algorithms. It can be seen that the error between the processed image and the original image is small. Meanwhile, the SSIM index of the corresponding improved algorithm is 0.9488, which is also higher than the other three algorithms.

It can be seen from Figures 10 and 11 that the results obtained by the improved algorithm are better than SRCNN algorithm and BI algorithm on the whole, regardless of PSNR or SSIM. Compared with SRCNN algorithm, the convergence speed of the improved algorithm to achieve the optimal effect is relatively fast in the training process. As shown in Figure 9, SRCNN method needs iterations to achieve the effect, while the improved algorithm only needs about iterations, and the improvement amplitude reaches 88%. The above experiment results objectively show that the proposed algorithm has a better superresolution effect. The image processing technology can shorten the running time of the image, maintain the original characteristics of things, make the film image more specific, and improve the effect of the film image. The proposed algorithm performs well in both peak signal-to-noise ratio (PSNR) and structural similarity (SSIM). By improving the image processing technology, the accuracy of image feature extraction can be effectively improved, and the image quality of film and television or new media short video can be higher. However, the algorithm in this paper also has some shortcomings. For example, with the continuous improvement of the number of iterations, the peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) of the algorithm in this paper tend to grow slowly, indicating that there is still a large room for improvement in the image quality improvement of the algorithm.

4. Conclusion

Due to the development of film, television, and new media short videos, people have put forward higher requirements for image quality. In the continuous development of multimedia technology, methods of improving image processing technology to improve image quality have been widely discussed. Image processing technology not only optimizes the image but also improves the accuracy of information processing to some extent. On the basis of speed optimization, fast image processing is further realized. To effectively extract image features and improve image quality, this paper establishes an improved image processing model based on a convolutional neural network and a nonlinear activation function. The purpose of improving the image processing technology model is to effectively improve the accuracy of image feature extraction, so that the edges of the image are kept better, the edge definition is higher, and the overall look is clearer, resulting in higher quality images for film and television or new media short videos. The average value of PSNR of the proposed algorithm is 30.29, which is higher than that of other algorithms, indicating that the error between the processed image and the original image is small. The average SSIM of the algorithm in this paper is 0.903, which is closer to 1 than other algorithms. PSNR and SSIM are used to quantitatively evaluate the algorithm. PSNR measures the error between the processed image and the original image, while SSIM measures the similarity of the processed image’s structure to the original image’s structure. The results show that the proposed algorithm obtains higher PSNR and SSIM values than other algorithms. Furthermore, the evaluation index PSNR in the improved algorithm is greatly affected by the number of iterations; as the number of iterations increases, the PSNR index increases and the image quality improves.

Data Availability

The labeled datasets used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The author declares no competing interests.