Abstract

Blockchain technology is widely used in the field of digital right protection technology. The traditional digital right protection scheme is not only inefficient and highly centralized but also has the risk of being modified. Due to its own characteristics, blockchain cannot completely store all the original files of digital resources. In this paper, a convolutional neural network algorithm based on visual priority rule is proposed (CNNVP). This algorithm can recognize facial expressions in the original files of digital resources (for short video of face class). The algorithm extracts facial expression features accurately and makes these features form log files that can represent the original files of digital resources. Then, the paper proposes a short video copyright storage algorithm based on blockchain and facial expression recognition and stores the log file into the blockchain. The above methods not only improve the efficiency of short video copyright storage, reduce the degree of storage centralization, and eliminate the risk that copyright is easy to be modified. Moreover, the computing operation of deep learning technology on short video not only ensures the privacy of storage certificate information but also ensures the possibility of blockchain storage of video information. Experiments show that the algorithm proposed in this paper is more efficient than the traditional copyright storage method. Moreover, the algorithm proposed in this paper can provide technical support to the media resource management department.

1. Introduction

In terms of converged media, short video is its future development direction. The rapid development of short video brings a series of problems, among which copyright protection is the most serious one. In order to ensure the healthy development of integrated media environment, legal and orderly protection of short video property rights is essential, and we must treat the protection of short video property rights with strategic height. Short video copyright management based on private chain and alliance chain “mother child chain” technology relies on the concept of distributed storage with low degree of centralization of blockchain, which provides a feasible and effective way for short video digital rights management. However, due to the low block storage capacity of blockchain itself, it is easy to cause data redundancy by storing short video resources directly on the chain. Considering the practical significance of short video digital copyright, this paper proposes a short video copyright storage algorithm based on blockchain and facial expression recognition. It takes short video as the storage object and takes face video as the main storage object. This paper proposes a convolutional neural network algorithm (CNNVP) based on visual priority (VP) to recognize the facial expression in short video and extract the short video accurately. After that, the extracted facial expression feature group is stored in the form of “short video facial expression log” as the storage object, so as to achieve the purpose of digital copyright storage.

This paper takes the short video (short video) which is generally spread on the Internet new media within 5 minutes as the research core, and estimates the human facial movement, facial expression, eye, mouth and nose posture in the short video based on the CNNVP algorithm. The short video is taken as the input object and the key point data “short video facial expression log” as the output object is integrated and stored in the blockchain. Facial expression is an important tool to express oneself and recognize other people’s emotions. Human measurable and objective emotional reactions include joy, anger, sadness, and joy [1]. Facial expression recognition has always been regarded as an important research direction in the field of emotional computing, which is very important and mature for the application fields such as customer service [2] and driver monitoring [3], but there is little research on facial expression recognition in the field of short video copyright protection. The most important research point in facial expression recognition is the feature point extraction of facial expression, because the quality of feature extraction will directly affect the accuracy of the subsequent training classifier. In the existing research, Hong proposed a template-based method, which uses the template matching method to detect the pedestrian face, circle the detected face area, and extract it. A series of face images are formed [4]; Chao puts forward a method based on gradient feature, which can obtain light insensitive reflection face by eliminating illumination, and extracts local face information completely by gradient feature [5]; Xuexin proposes a method based on local texture, which takes extracting local texture features and edge texture features as the starting point. Three level texture feature extraction of face image [6]; Shiming proposed a method based on augmented reality. In order to improve the accuracy and efficiency of deformation algorithm, an interpolation control point selection and subregion interpolation method based on greedy algorithm and face muscle group distribution is proposed [7]. The common point of these four methods is that they all adopt the method of extracting facial features manually, which is easy to be disturbed and unstable. In addition, the research on convolutional neural network, which takes the input of the network as the original image and avoids the complex preprocessing method, gradually emerges. Hao et al. proposed a method based on convolutional sparse self-coding neural network, which has been proved to have good classification performance [8]. Nowadays, with the continuous development of deep learning technology, there are many networks with better performance and larger scale, such as Google Net [9], ResNet [10], and DenseNet [11]. Tiktok, Kwai, and today’s headlines are the short video resources represented by [12], which makes it impossible to detect and extract facial expression’s facial features and facial expression accurately.

This paper proposes a convolutional neural network algorithm based on visual priority, designs a new network architecture, and introduces the visual priority module to increase the complexity of the network. The experimental results show that the performance of the proposed algorithm is better than other similar algorithms in the public data set, the facial expression recognition ability is strong, and the overall architecture is light.

3. Algorithm Introduction

3.1. Visual Priority Rule

In human brain signal processing mechanism [13], visual priority rule is the most special. Human beings have the instinct to get information by quickly browsing pictures or words, and in the process of obtaining information, they will unconsciously “anchor” the parts that need to be focused; that is to say, they pay attention to the focus part. This mode can effectively avoid redundant information on the basis of obtaining more key information visually. In this paper, based on the existing convolutional neural network, the introduction of visual priority rule is very helpful for facial expression key point feature extraction [14]. Firstly, convolutional sparse self-coding neural network collects different parts and hierarchical structure features of face semantic subfeatures through its own features and then characterizes complex objects. All the subfeatures are grouped and stored in each independent level feature vector. Secondly, in the convolutional sparse self-coding neural network based on visual priority rule, the subfeatures in each group are processed in parallel. Finally, the visual priority rule module can adjust the importance of each sub feature by adjusting the weight. The structure of convolutional sparse self-coding neural network based on visual priority rule is shown in Figure 1.

Firstly, a feature exists in each spatial position of a feature group. Secondly, the original features in the feature group are averaged and pooled. Then, the global features and the original features in the feature group are integrated to get the independent coefficient of each feature and normalize it. Secondly, the parameters are introduced, the normalized values are scaled, and the sigmoid function is activated. Finally, the point product of the activated normalized values and the original features is performed to obtain the enhanced feature vector.

3.2. Facial Expression Recognition Mechanism of Convolutional Neural Network Based on Visual Priority Rule

Firstly, the features of input data are extracted. In the convolution sparse self-coding neural network, the first layer generally does not extract high-level features, but only extracts some lower level features, and the complexity level of features will increase with the increase of convolution layers. Therefore, neural networks with multiple convolution layers can obtain more accurate features after iteration. Let core be the convolution kernel of convolution neural network, its size is , bias is its bias, fun is activation function, input and output are input and output, respectively, and their sizes are ; then, the convolution operation formula proposed in this paper is shown in Equation (1).

In order to compress the input characteristic graph, this paper proposes the average pooling operation, which is a nonlinear down sampling operation method. The size of the compressed feature map will be significantly reduced, and there is no over fitting problem in the average pooling operation. If the lower sampling layer is set to , the definition of maximum pooling is shown in Equation (2).

In order to increase the expression ability and nonlinear mapping ability of convolution neural network, the activation function of ELU is used in this paper. The saturation value of activation function is not controlled. If count is set as constant, the expression of activation function is shown in Equation (3).

In order to increase identity mapping and enrich feature learning in the network, a residual identity block is added to the convolutional neural network algorithm based on visual priority rules. In order to ensure the accurate extraction of subtle expression and key expression features, this module is also combined with the visual priority rule module. The structure diagram of convolutional neural network based on visual priority rules and residual identity is shown as follows. Figure 2 shows the initial setting value of each operation parameter in the figure as shown in Table 1. Set input as input value of residual identity block, activation function as ELU (), and output result after convolution operation is output. Then, the structure diagram of residual identity module is shown in Figure 3.

4. Output Result Selection

In this paper, a short video copyright storage algorithm based on blockchain and facial expression recognition is proposed. The input of the algorithm is the short video frame by frame read image, after the convolution neural network calculation, inputs a cross entropy loss function classifier, and finally outputs the recognition values of seven kinds of expression tags, namely, fear, happiness, anger, disgust, sadness, surprise, and normal expression An example of the calculation process is shown in Figure 4.

CNNVP through write_json field, the data of 30 key points of face samples are written into JSON file for storage. The operation time of face key point recognition is prolonged according to the increase of detection number. Considering the problem of identification efficiency and practical application requirements, the identification of key points should not only ensure the relatively unique effect of access information but also ensure that the amount of data cannot be too large. For the demand of blockchain storage, it is not necessary to consider the data collection of neck and neck for the time being. Therefore, this paper uses 30 basic facial key points as the key information of each key character. Taking the simple portrait of the figure in Figure 5 as an example, the 30 basic key points identified correspond to lip thickness, lip width, nose thickness, earlobe thickness, earlobe width, auricle width, nose height, lower eyelid width, eye corner width, eyelash width, right eyebrow width, eyebrow spacing, right sideburns height, hair color, hair middle width, crown height, forehead color, left sideburns width, left sideburns width, left sideburns width, eyelash width, right eyebrow width, right sideburns width, hair color, middle width, top height, forehead colour, left sideburns width, left eyelid width, eyelash width Eyebrow width, eyebrow height, eyebrow tail height, single double eyelid, fishtail width, eyeball color, ear ornaments, nose width, middle person depth, lip color, lower lip thickness, and chin width. The JSON file set obtained by deep learning algorithm is shown in Figure 6, and the details of JSON corresponding to a frame are shown in Figure 7.

5. Data Storage and Architecture Model

The storage architecture based on blockchain has the characteristics of decentralized credit mode, automatic execution of smart contract mode, and high security and privacy, and time-series data cannot be tampered with and forged. Therefore, the storage architecture based on blockchain can not only be used in the financial field but also applied to the key technology research field of digital copyright protection. The security, traceability, and nonusurp ability of blockchain are also discussed the improved features can be well adapted to the digital asset storage process. The traditional copyright management system usually stores video files based on two modes: storing the video material files directly into the server and then writing the corresponding path of the storage files into the database and reading the video materials directly in the way of binary byte stream and writing the video files into the fields of the database. In the first way of storing files, the operation resource consumption of database is small, but once the number of files increases, the efficiency of file processing will decline exponentially. In addition, the way of storing file path cannot completely guarantee the security of data, and the video content can be modified; the second way uses binary stream to keep the data in the database completely. However, frequent database read operations will continue to affect the performance of database operations.

In the blockchain storage architecture, after the file is uploaded to the server, the storage address pointer of the file will be obtained. The video copyright protection system will store the hash value, video index value, and corresponding file pointer of the video. By this way, the video file can be stored in the block completely, and the system computing resources can be saved to a great extent. A short video copyright storage algorithm based on blockchain and facial expression recognition is proposed in this paper. In order to improve the usability of the system, the deep learning algorithm is used to extract the key frame. The key frame of the video file is used as the main basis for video storage and right confirmation. The key frame has the characteristics of fast access to file information and does not need to compare the content of the original file, and then its performance loss and physical resources of the loss of the source are negligible. In this paper, for the copyright key data log file calculated by CNNVP algorithm, firstly, the hash value is extracted by the SHA256 algorithm. If the hash collision is not considered by default, the calculated hash value cannot be used to deduce the file content. Moreover, if the file content is slightly modified, the calculated hash value will also change obviously. The paper thinks that the hash value can be set to a fixed length, and the output is not the unique file identifier of the same information, which is similar to the database primary key, and forms a one-to-one correspondence relationship with the file. Therefore, the hash value of the log file is stored in the blockchain as the unique identification of the file. If the length of a video file to be stored exceeds the specified duration (5 min), the system will automatically divide it into short video groups with a single duration of 5 min. The elements in the group output logs, respectively, and the total hash value calculated in the form of Merkel tree is written into the block as the Merkel root value of the long video. The schematic diagram of calculating Merkel root value of elements in the group is shown in Figure 8.

The copyright storage module first assigns a unique copyright storage number to the video file to be stored and stores it in the blockchain with the number as the primary key and the file hash value as the unique identification of the file. Secondly, the key frame of the original video material data is extracted with the copyright storage number as the primary key and stored in the key information database of the copyright storage. Then, the copyright storage number is used as the primary key to store the copyright data which is saved to the server. The original copyright material data as the copyright backup data does not participate in all operations of copyright confirmation.

6. Structure Composition and Evaluation

In this paper, the short video digital copyright storage architecture based on blockchain is proposed. When a suspected infringing video is detected in a hash collision, the video digital copyright key frame is used as the judgment basis; that is, the storage architecture changes the online storage of short video works into the chain storage of key frame information; that is, the content stored in the block, such as “hash value of multiple videos”, changes accordingly and is the hash value of multiple keyframes in a video. In addition, the number of key frames is adjusted by setting threshold, and different key frame selection bases are set based on different review standard mechanisms, which improves the robustness and efficiency of storage architecture. The short video digital right storage architecture based on blockchain consists of material production layer, consensus contract layer, business layer, and user layer from bottom to top. The architecture diagram is shown in Figure 9.

When the short video digital rights storage architecture based on blockchain is used to store the copyright of short video key information, the original material is uploaded to the external client through the architecture firstly. Secondly, the facial expression recognition mechanism of convolutional neural network based on visual priority rules extracts the key frame data, constructs a new block based on the list of key data and sends broadcast to the whole network, and sends the user’s personal information and short video copyright information that are stored on the server at the same time. Then, the client will automatically initiate the application for registration into the chain and send the application to node 1. Node 1 collects the registration application, adds the unconfirmed registration and establishes the block, and then publishes the broadcast to the whole network to request the whole network to register and verify. Node 2, node 3, and node 4, respectively, calculate the hash value of new block after receiving the new block issued by node 1 and broadcast it to the whole network to complete the preregistration. The four nodes receive the hash value of new block broadcast by each other and check them. If the hash value of the new block calculated by a neighbor node is equal to the hash value of the new block calculated by itself before broadcasting, it is deemed that the registration verification has passed; otherwise, it fails. Finally, each independent node broadcast the verification results to other nodes after completing the new block hash value registration verification. According to the Byzantine fault-tolerant algorithm, each normal working node should receive and verify the registration and verification information at least twice as much as the attack information. After receiving the registration and verification information of other nodes, each node stores the short video copyright registration confirmation letter and sends it to the client dynamic transmission, one registration process is completed, and the flow chart of registration process is shown in Figure 10.

In the aspect of nonrelational database blockchain storage content, this paper proposes a convolutional neural network facial expression recognition mechanism based on visual priority rules by using deep learning technology under the decentralized storage structure distribution based on blockchain. It provides an idea to extract facial expression information with short video resources and reasonably selects key frames to extract information for storage blockchain. The CNNVP mechanism proposed in this paper is effective in facial expression recognition information extraction. The size of key frame image and key information file is far smaller than the original video file. In the experiment, 75 MB original short video file is selected, and the size of JSON file generated after key information extraction is about 1.2 MB. From the perspective of distributed storage architecture, the key information extraction strategy improves the availability of blockchain storage system to a great extent.

In this paper, the original video files of about 0.5 MB, 10 MB, 30 MB, 50 MB, and 100 MB are selected from the material library. The traditional storage mode and the paper storage mode are, respectively, used to store 5 short videos for 50 times. The experimental results show that the storage method proposed in this paper is far less than the consumption of the traditional method in terms of time-consuming and resource consuming. The consumption of time-consuming and resource consuming increases exponentially with the increase of video resource size. At the same time, the storage method proposed in this paper has strong robustness and stability and has no obvious change. Therefore, for the storage of short video copyright resources, the storage method proposed in this paper is more suitable for the change of file size. The specific experimental results and the change trend of the results are shown in the table below. The time-consuming comparison between the traditional storage mode and the paper storage mode is shown in Figure 11, and the memory consumption comparison between the traditional storage mode and the paper storage mode is shown in Figure 12.

This paper compares the traditional copyright storage method, the block chain copyright storage method based on POW consensus mechanism, and the blockchain version based on PBFT and CNNVP proposed in this paper from the aspects of data storage convenience, data capacity, data atomicity (uniqueness), representativeness of stored data, data privacy and security, system operation flexibility, and data storage flexibility The results show that the proposed method is efficient. The detailed comparison dimensions and results are shown in Figure 13.

Based on the experimental results, this paper proposes a short video copyright storage algorithm based on blockchain and facial expression recognition. Because the blockchain structure is chained, it has strong traceability; so, it can track all the storage records once the right problem occurs. Moreover, the low probability of the block chain’s own characteristics and hash collision ensures that the stored copyright data is extremely difficult to be tampered with. Its decentralized mechanism can ensure that the data on the chain can be jointly managed by multiple units and operated by multiple platforms.

7. Conclusions

The key technology of digital rights protection based on blockchain also has the problem of increasing system burden in terms of ensuring decentralized and trusted storage mechanism. However, the use of deep learning technology can compress the description of digital assets and reduce the system burden that can greatly increase the application scenarios of blockchain in the field of digital rights protection. This paper proposes a short video copyright storage algorithm based on blockchain and facial expression recognition, which effectively solves the application of short video and other resources copyright storage in blockchain architecture. The limitations of the current research are as follows: the calculation load of the system studied by my team is too large. In terms of in-depth learning, I have not proposed a better algorithm to optimize my system. The next step of my research is as follows: I will try to process long videos and break through the bottleneck that I can only process short videos. Then, I will study some more interesting deep learning algorithms to further optimize the computational efficiency of my algorithm.

Data Availability

All the data used in the experiment are from free and open datasets, which are “pub fig: public figures face database” and “large scale celeb-faces attributes “ Dataset, the website addresses of which are as follows: “http://www.cs.columbia.edu/CAVE/databases/pubfig/” and “http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html.” Interested readers can visit the website directly and download the data.

Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

Authors’ Contributions

Yang Yang wrote the paper, collected the data, and designed the experiment. Dingguo Yu participated in the experiment and final proofreading.

Acknowledgments

This paper is supported by the Key R & D Project of Zhejiang Province, “Research on Key Technologies of all media publishing - Research on Key Technologies of all media press and publication under multi-screen integration environment” (Project No. 2019C03138).