Abstract

The advancement of AI technology has promoted the development speed of digital multimedia and brought a new experience to the digital media experience effect. In this paper, we aim to using artificial intelligence methods to enhance the digital media design experience. Specifically, we propose a method for low-light image enhancement using generative adversarial networks as a model framework. To better solve the problem, we design the following strategies in our proposed method. First, we preprocess the images into patches with a proper size. Second, we introduce the overall network structure of GAN. Third, we designed a multifeature extraction module with different sizes of convolution kernels to enhance the model’s ability for extracting features. Fourth, we propose a loss that combines the mean square error distance function with the adversarial error function to enable the model to learn the distributional features of the image. Finally, we verify the effectiveness of the proposed method on two datasets, namely, PASCAL VOC and MIT-Adobe FiveK. The results show that our proposed method performs well in the process of evaluation.

1. Introduction

The main purpose of digital media technology is to cultivate modern art design talents integrating technical quality and artistic quality for the society [13]. Compared with the traditional digital media art major, it pays more attention to the cultivation of technical quality and can guide the public to continuously adapt to the creation of new media art, network multimedia production, and the production of demonstration animations in the real estate industry. In the final analysis, the professional task of digital media technology is to convert digital signals into media signals and convert symbols that computers can understand into things that people can understand.

With the continuous improvement of information technology and artificial intelligence, people live a more convenient life [4, 5]. Currently, researchers are researching and developing more convenient technologies to meet people’s development needs. Artificial intelligence technology has ushered in a new look in people’s lives, and the upsurge of artificial intelligence has swept across many technical fields [6, 7]. Digital media technology can turn abstract information into vivid information. In the context of the combination of digital media and artificial intelligence, the original content will become more active. With the continuous acceleration of the development of digital media, the application of AI has become more and more common. For the improvement of digital media, the utilization of AI can provide great assistance for the development of enterprises and can optimize and improve the functions and effects of digital media. In addition, artificial intelligence can enhance the advantages of digital media and meet the basic needs of the public. There are many application scenarios of artificial intelligence, which are mainly reflected as follows.

With the improvement of information techniques and the rise of science levels, self-media has emerged in large numbers with the support of digital media and users can obtain media platforms for publishing information [8]. For example, WeChat [9], Weibo [10], and other platforms commonly used in people’s lives belong to self-media. Recently, the improvement of self-media has become faster and faster, which is indivisible for the continuous improvement of AI. In addition, AI applied to self-media can achieve the goal of optimizing the platform so that the performance of the platform when used is in line with people’s needs and tastes. It can be seen that artificial intelligence technology can play an important role in speeding up the development of self-media and can also promote the optimization of self-media platforms so that users can obtain a more comfortable operating experience.

At present, with the gradual progress of computer science, games are becoming more and more popular in daily lives [11]. For game character design, the primary support is digital media technology. In an environment where artificial intelligence does not exist, designers have a very heavy workload when designing characters. After artificial intelligence technology is applied to character design, it not only improves the speed and quality of the designers in designing characters but also optimizes and adjusts characters in an all-around way. Compared with the previous design methods, the use of artificial intelligence to design characters can not only make the character design fuller but also make the designed characters more vivid. In the actual design of characters, the use of artificial intelligence technology can maximize the use of information and resources and can use artificial intelligence technology to assist designers in completing batch operations. Therefore, the efficiency of the character design work has been greatly improved after the application of artificial intelligence. For the designer’s character design work, it is essential to take advantage of artificial intelligence technology, which can properly summarize and analyze the characters and design the overall framework. On this basis, the framework is adjusted and optimized to improve the design quality and design effect. In this way, the drawbacks of traditional design will be compensated, and the design work will be able to provide a better service for the current development. Therefore, the application of AI in digital media makes the character design work get technical help, which enhances the three-dimensional sense of character design and makes the designed characters more full and vivid.

In addition, for scene design work, the use of artificial intelligence technology can improve the overall design quality. Artificial intelligence can enable users to have a richer experience [12]. When designing a scene, the staff can process the user’s feedback as soon as possible and then improve the design scene in an all-round way according to the feedback. At the same time, artificial intelligence can reduce the workload of the designers. Therefore, the rational application of artificial intelligence can increase the choices of the users, make the design scene more diverse and richer, and then enhance the effect of the digital media design.

In general, artificial intelligence technology is of great help in enhancing the effect of digital media experience [1315]. Image is one of the important forms of multimedia presentation. Image enhancement work can enhance the color, composition, content, etc. of an image. At present, automatic image enhancement technology plays a key role in various fields, such as intelligent medical, military reconnaissance, and aerospace [1618].

Traditional image enhancement methods include histogram equalization and retinal image enhancement [19, 20]. Among them, histogram equalization does not need set parameter. However, as a global adjustment way, this method fails to effectively improve the local contrast; so, its performance is not ideal in some cases. The retinal image enhancement method is proposed under the condition of imitating the human visual system to judge the real color of the image without the interference of light. It mainly solves the problems of color shift and uneven illumination of the image. The abovementioned two methods are unsupervised algorithms, which do not need training samples and labels but lack the understanding of image content information, that is, they do not consider the overall semantic information of the image.

At present, the deep learning techniques have achieved great success in image recognition, target detection, and other fields through powerful image-feature learning capabilities. Deep convolutional neural networks can well extract key information in images and understand image semantics. But this relies on high-quality image data. Image data generated in low-light environments often have low brightness and contrast, loss of details, and more noise. Such low-light images not only affect human subjective perception but also hinder the upstream visual algorithm tasks. Therefore, image enhancement algorithms that do not rely on expensive image acquisition equipment have very important research significance and application value.

With the improvement of deep learning, a series of research studies on automatic image enhancement based on deep learning methods have appeared. Yan et al. [21] used deep learning methods to build deep convolutional networks. They used color and semantic context information to construct a color map of global and local features for spatial transformation, so as to generate a specific style of pixel color. Zamir et al. [22] proposed a model based on the convolutional neural network, which can retain the high-resolution spatial details of images while combining contextual information at multiple scales by using a multiscale residual module. Gharbi et al. [23] proposed an image enhancement network based on the bilateral filter, which uses the neural network to extract various features and uses a bilateral filter to store the coefficients of linear transformation. This method reduces the amount of computation while obtaining higher quality images. Bychkovsky et al. [24] provided the MIT-Adobe FiveK dataset containing 5 sets of expert modifications and used this dataset to propose a supervised image enhancement model. Isola et al. [25] proposed a pixel-level image translation method based on conditional generative adversarial networks which converts images to different domains. The deep learning-based method can effectively understand the overall semantic information of the image, so as to obtain a satisfactory image enhancement effect.

In this paper, we use artificial intelligence technology to enhance the digital media experience. Specifically, we employ a generative adversarial network model to enhance images.

This paper is organized as follows. Section 2 shows the method proposed in our method for low-light image enhancement, including image data preprocessing, model design based on generative adversarial networks, and loss function design. Section 3 describes the experimental results and analyzes the model performance for the low-light image enhancement evaluation. Section 4 is the conclusion of the whole paper.

2. Proposed Method

In this paper, aiming at the shortcomings of traditional enhancement methods and the characteristics of existing deep learning-based algorithms, we propose a method for low-light image enhancement using generative adversarial networks as a model framework.

To better solve the abovementioned problems, we design the following strategies in our proposed method. First, we preprocess the images into patches with the proper size. Second, we introduce the overall network structure of generative adversarial networks [26]. Third, we designed a multifeature extraction module with different sizes of convolution kernels to enhance the model’s ability for extracting features. Finally, we design a loss function that combines the mean square error distance function with the adversarial error function to enable the model to learn the distributional features of the image.

2.1. Data Preprocess and GAN

In order for the model to better input the image, we first preprocess the image, including geometric transformation and attribute transformation. Image space transformation refers to the processing of image data through a series of geometric operations, including scaling, rotation, and translation. Its purpose is to correct the errors generated in the process of image acquisition.

The generative adversarial network consists of the generator G and the discriminator D. The generator learns the potential distribution characteristics of real data and generates generated data. The two models, the generator and the discriminator, compete with each other in the training process so that the generated data of the generator is constantly close to the real data, and the discriminator cannot judge whether it is true or false. Finally, the dynamic balance between the generator and the discriminator is achieved. The discriminator is essentially a binary classifier, and its function is to determine whether the input data are generated data or real data. The optimization of GAN belongs to the minimax game problem [27], and its objective function formula is as follows.where represents the expected value of the distribution function, represents the distribution of real samples, represents the distribution of random noise, represents the data generated by the random noise z input to the generator , and and represent the discriminator’s results response to real data and generated data, respectively.

The training process for GAN is as follows. The final goal of the GAN model is to combine and generate data similar to real pictures in order to evade identification network D. The task of discriminant network D is to classify the generator-generated picture as closely as possible with the real picture. In this way, G and D constitute a dynamic game process. Here, the network D is trained such that the labels of the training samples are assigned with maximum probability, i.e., maximizing log(D(x)) and log(1 − D(G(z))). Train the network G to minimize log(1 − D(G(z))), i.e., maximize the loss of D. In the training process, we need to fix one side and update the parameters of the other network, so as to alternately iterate, and finally maximize the error of the other side. Finally, G can estimate the distribution of the sample data, that is, the generated samples are more realistic. The training process of GAN is shown in Algorithm 1.

Input: raw data and noise data
Output: The GAN model
(1)initialize the parameters and of the model randomly;
(2)for each epoch i = 1, 2, …, do
(3)for each step j = 1, 2, …, do
(4)  sample minibatch of m noise samples from noise data;
(5)  sample minibatch of m examples x from raw data;
(6)  calculate the loss and gradient:
(7)  update the parameters and of the discriminator D;
(8)end for
(9) sample minibatch of m noise samples from noise data;
(10) calculate the partial derivatives of the parameters:
   
(11) update the parameters and of the generator G;
(12)end for
(13)Return the GAN model with parameters.
2.2. Model Design Based on GAN

The proposed model is based on the GAN architecture, the generator uses a deep convolutional neural network (CNN) model [28], the input is a low-light image, and the output is an enhanced image of the same size as the input. The discriminator draws on the idea of PatchGAN and adopts a fully convolutional network, and the output is a tensor. The specific structure is shown in Figure 1.

2.2.1. Generator

At present, the convolutional neural network is a popular model that is widely utilized in the field of computer vision, which is similar to the BP neural network [29]. The common point is that they all adopt the forward propagation calculation method and use the back propagation error to adjust the connection weights and thresholds between each node. The difference is that CNN adopts partial connection instead of full connection of the BP neural network, and the CNN has deeper layers and can learn deeper features. In addition, CNN can directly perform feature extraction on images without the need for human input of feature parameters. The network structure characteristics of weight sharing and sparse connection can effectively reduce the number of training parameters, so as to combine image feature extraction and pattern recognition. By continuously optimizing the parameters in the network, the task of feature extraction is completed.

Input layer, convolutional layer, downsampling layer, and upsampling layer are the main components of CNN. (1) The convolution layer mainly manually sets a rectangular convolution kernel of fixed size, and the value in the convolution kernel is the weight. Then, the original data are divided into rectangles of the same size as the convolution kernel, and these rectangles and the convolution kernel are dot-multiplied to obtain a convolution value. A convolution kernel represents a feature, and different image features can be extracted by setting multiple convolution kernels. Compared with the traditional neural network, its biggest advantage lies in the reduction of network parameters through local perception and weight sharing, thereby improving the training speed and efficiency. (2) The downsampling layer is mainly used to reduce the dimensionality of the image data and reduce the network training parameters, thereby accelerating the network calculation and preventing overfitting. It is also commonly called the pooling layer. Compared with the convolution layer, the downsampling layer also has a rectangular block similar to the convolution kernel structure, but the calculation process is not the weighted sum of nodes but the maximum and average operations. The downsampling layer is further divided into a maximum pooling layer and an average pooling layer. In the downsampling layer, a value is used to replace a rectangular area value in the feature image to perform small transformations on the trained image. (3) The upsampling layer inputs the image features extracted by the convolution layer and outputs larger-scale image features. The final output is the resulting image with the same dimensions as the image.

The generator is the core of GAN, and its main function is to perform image enhancement, denoising, and detail restoration. We design the generator considering the wide application of U-Net [30] in the field of image segmentation, which is a CNN composed of encoder, decoder, and skip connections. The encoder uses a maxpooling layer to reduce the size of the feature maps to obtain feature maps with larger receptive fields. The decoder uses deconvolution to recover from high-level semantic features to high-resolution images. The skip connection superimposes the feature maps of the corresponding layers of the encoder and the decoder, avoiding the loss of shallow features caused by the deepening of the network, and at the same time aggregating multilayer features to synthesize high-quality images. The generator network proposed in this paper is improved on the basis of U-Net. We design multiscale convolution kernels in the encoder part to extract features of different scales, so as to improve the feature representation capability of the network, which helps to restore image details. The specific structure of the generator network is shown in Figure 2.

2.2.2. Discriminator

The discriminator in this paper adopts a CNN with three convolutional layers, including the convolutional layer, batch normalization layer, ReLU activation layer, pooling layer, and fully connected layer. We have covered convolutional layers earlier, so let us focus on batch normalization and fully connected layers. The batch normalization layer can process the data into data with a mean of 0 and a variance of 1 so that the network can converge better and faster. Usually, the last layers of the CNN structure are fully connected layers, which are mainly used to reduce the feature dimension and convert two-dimensional image features into one-dimensional feature vectors. The number of neurons in the last layer is the number of classification categories. Due to the large number of parameters in the fully connected layer, softmax regression is usually used to process the final result, and the calculation formula is as follows.where y is the output of the network model. The network structure of the discriminator is shown in Figure 3.

2.3. Loss Function

During training, we design a new loss function based on the GAN loss [31]. The original loss functions and of the generator and discriminator in the GAN model are defined as follows.

To stabilize the training of the generator and make the generated images closer to the ground-truth labeled images, we use the mean squared error to limit the distance between the generated image and the target image. Its definition is as follows.where MSE is defined as

Then, the final loss of the generator iswhere , , and are the scaling factors.

3. Experiments and Results

3.1. Data Description and Metrics

In this section, we experimentally verify the image enhancement methods based on the generative adversarial network proposed in this paper. In this paper, we adopt two public datasets to verify the effectiveness of our method. The first dataset is a synthetic dataset based on the PASCAL VOC image dataset provided by MBLLEN [32], which generates low-light images by random gamma nonlinear adjustment of each channel in the image, and the formula is expressed as follows:

Among them, A represents the maximum pixel value of the image, which obeys the uniform distribution U (2, 3.5). The dataset contains 16925 images in the training set and 144 images in the testing set. The second experimental dataset is the MIT-Adobe FiveK dataset proposed by [24]. This dataset collects 5000 photos taken by different photographers with SLR cameras. These images cover a wide range of scenes, subjects, and lighting conditions. The author hired 5 more experts to retouch all images using professional software, resulting in a pleasing rendering of the images. In this experiment, we use the image edited by one of the experts as the target image. To ensure the fairness of the experiment, we randomly select 500 image pairs as test images and the remaining 4500 image pairs as training images.

The performance metrics adopted include the peak signal-to-noise ratio (PSNR) and structural similarity (SSIM). The PSNR measures the overall enhancement effect by evaluating the pixel difference between the generated enhanced image and the normally illuminated image, in decibels (dB), and is defined as follows.

Among them, represents the maximum pixel value, and MSE is the mean square error. SSIM weighs the structural differences between images and is used to express the similarity of the details of the images. Its definition is as follows.where and indicate the mean values, and mean the variance values, is the covariance value for the x and y, respectively, and , , and are the constants.

Each index means the difference of the quality between the generated images and the actual images. The quality of the generated image will become better with the increase of these indicators.

3.2. Network Structure Analysis

As mentioned in the Methods section, we utilize the U-net structure in our designed model structure. To verify the impact of this structure on model performance, we conduct comparative analysis experiments in this section. Specifically, we compared the experimental results of the following network structures: (a) the fully connected network (FC), (b) the ResNet network structure (ResNet), and (c) the U-Net network structure (U-Net). Figure 4 shows the model performance under the abovementioned three different network structures. In this figure, the yellow line represents the experimental results under the fully connected network structure as the number of network layers changes. The blue and red lines represent the experimental results under the ResNet network structure and the U-Net network structure, respectively. We can see that among the three structures, U-Net has achieved the best results on the SSIM index, which indicates that the U-Net network structure has better performance in feature extraction and fusion. In addition, we can also see that when the number of layers changes, the performance of the model with more layers will not be better, but the optimal value is in the middle.

3.3. Comparison with Other Methods

To evaluate the performance of our proposed method in this paper, we compare the results of some classical algorithms on the datasets, including LIME [33], BIMEF [34], MBLLEN [32], U-Net, and ResNet. Table 1 shows the comparison results of our algorithm with other algorithms. From Table 1, we can see that our method presents better results, which shows that our proposed method has good performance in image enhancement.

Furthermore, we verify the effect of the loss function through ablation experiments. We compare the results with and without MSE loss. Table 2 shows the comparison results of ablation experiments. From Table 2, we can see that the loss of the discriminator has a great guiding role for the generator, and our proposed MSE loss can greatly improve the quality of the generated images.

4. Conclusion

In this paper, we are committed to using artificial intelligence methods to enhance the digital media design experience. Specifically, we propose a method for low-light image enhancement using generative adversarial networks as a model framework. To better solve the problem, we designed the following strategies in our proposed method. First, we preprocess the images into patches with the proper size. Second, we introduced the overall network structure of GAN. Third, we designed a multifeature extraction module with different sizes of convolution kernels to enhance the model’s ability for extracting features. Fourth, we proposed a loss that combines the mean square error distance function with the adversarial error function to enable the model to learn the distributional features of the image. Finally, we verify the effectiveness of the proposed method on two datasets, namely, PASCAL VOC and MIT-Adobe FiveK. The limitation of the proposed model in this paper lies in the calculation and inference time of the model. The superiority of the deep learning model also brings a huge cost of calculation and deployment, and the compression and simplification of the model will be the key research direction in the future.

Data Availability

All data used to support the findings of the study are included within this paper.

Conflicts of Interest

The author declares that there are no conflicts of interest.