AI and Edge Computing-Driven Technologies for Knowledge Defined NetworkingView this Special Issue
High-Performance Server-Based Live Streaming Transmission Optimization for Sports Events in Smart Cities
Smart cities allow cities to run more efficiently and have been approved by a lot of cities. During the process of building smart cities, a large amount of data is generated. Particularly, live sports events have been regarded as the inalienable part of smart cities. However, with the improvement in the quality of life, people tend to obtain better watching experience in terms of sports events. For such purpose, this paper proposes the live streaming transmission optimization method based on high-performance server, called HPTO, including two main modules, that is, high-performance server optimization and transmission optimization. Specifically, for the server optimization, this paper devises a distributed storage strategy to avoid producing the internal disk fragments and improve the writing efficiency of sports videos. For the transmission optimization, this paper devises a deep-learning-based video compression strategy to save the storage space of server and accelerate the transmission of sports videos. In addition, this paper makes simulation experiments based on PyCharm. The experimental results show that HPTO has higher storage efficiency, smaller transmission time, and lower packet loss rate than benchmarks, which indicates that the proposed two aspects of optimization strategies (server optimization and transmission optimization) are efficient.
Smart cities [1–3] usually apply modern network infrastructures and information techniques to improve the citizens’ living quality and make cities run more efficiently. Recently, there have been some cities committed to building smart cities in the world, including Chicago, Milton Keynes, Busan, and Shanghai. With the continuous improvement of life standard, citizens like to watch the live sports with the satisfactory quality of experience. However, these behaviors will generate a large amount of data, which is expected to reach ZB-level. In fact, the live streaming transmission of sports events usually shows the concurrency feature [4, 5], which has very high requirements on storage system and transmission path. In order to guarantee the sustained and steady work of smart cities during the process of watching sports events, two aspects of optimizations have been approved, that is, server optimization and transmission optimization.
For the server optimization, it refers to improve the storage system. The current storage system usually relies on file system management or raw disk design [6, 7]. The file-management-based storage system will produce lots of internal disk fragments as the magnetic head moves frequently. Besides, the file system needs to maintain index information and attribute information, which handles the redundant information and goes against the storage of sports video data. Different from the file-management-based storage system, the raw-disk-based storage system directly performs reading and writing operations by the application programs, which greatly improves the efficiency of I/O. However, the live streaming of sports events has the concurrency feature, which causes the data storage spaces to be relatively scattered and thus produces the internal disk fragments. Especially when the malfunction happens, the storage system exits the large failure probability; that is to say, the raw-disk-based storage system has no high reliability. Different from the above, in this paper, the storage system of server is optimized based on the distributed storage method.
For the transmission optimization, it is consisted of two kinds of methods. One is the selection of transmission path, that is, routing . However, smart cities usually depend on the backbone network, and the transmission path is usually assigned by service provider in advance. Thus, in terms of live streaming transmission of sports events, the optimization of transmission path has no obvious improvement effect on the transmission performance. The other one is the content compression; that is, the sports video is compressed into that with a smaller size, but all attributes are not changed. Video compression has two advantages. On one hand, the sports video can be transmitted in an effective way. On the other hand, the storage space of high-performance server can be saved to reduce its pressure so that the high-performance server works smoothly. The principle of video compression is to compile the original video through encoder [9, 10], where the encoder has two functions, that is, (i) reading the signals of video and (ii) discerning and counting the residual signals. In this paper, the deep learning is used to complete the compression of sports video.
In smart cities, this paper proposes high-performance-server-based live streaming transmission optimization method, named HPTO. HPTO’s main contributions are summarized as the three following aspects: A distributed storage strategy is devised to avoid the generation of internal disk fragments and improve the writing efficiency, that is, server optimization A deep learning method is used to compress the sports video to save the storage space and accelerate the transmission The rich experiments are made, including neural network verification, video compression verification, and live streaming optimization verification
The rest of the paper is organized as follows. Section 2 introduces the server optimization. Section 3 presents the transmission optimization. Section 4 reports the experiment results. This paper is concluded in Section 5.
2. High-Performance Server Optimization
2.1. Storage Structure of Sports Video
To avoid producing the internal disk fragments resulting from the concurrent and stochastic writing of live streaming, this paper handles the sports video by designing high-speed buffer structure and disk logic storage structure. In addition, this paper also uses the buffer mapping policy, which belongs to the bcache-based hybrid storage technology to complete the connection between high-speed buffer and disk logic storage. To be specific, solid state disk (SSD)  does the special high-speed caching operation for the written live streaming, where the group of pictures (GOP) is used as the basic unit to allocate the fixed buffer segment for each channel’s live streaming. The whole storage structure of sports video is shown in Figure 1. The high-speed buffer structure consists of superblock, buffer bitmap, and buffer segments. Among them, the superblock is a superfield that reports some parameters’ information including creation time, buffer size, buffer number, and allocation condition. In particular, the file that superblock corresponds to is assigned as “0xEF53” completed when the formatting order is started. The buffer bitmap field is used to describe the service condition regarding the subsequent buffer segments. The remaining fields are buffer segments regarded as the basic unit to allocate and recycle the live streaming of sports video. In this paper, the size of buffer segment is set as 16 MB. In particular, when the remaining space cannot allocate a complete buffer segment, it is reserved. Besides, when the last GOP of live streaming is completely written, the corresponding buffer space is recycled. The disk logic storage structure consists of superblock, data block bitmap, primary index, secondary index, and buffer segments. In particular, the file that superblock corresponds to is assigned as “0xEF53” completed when the formatting order is started. The primary index covers several parameters’ information including ID of live streaming, starting and ending time, bit rate type, and GOP. The secondary index is specially used to cover the detailed information of GOP. The buffer mapping strategy supports that multiple hard disk drives (HDDs) use the same SSD as the caching disk. To be specific, the “echo” statement is used to attach the “cset.uuid” of caching disk from high-speed buffer structure to disk logic storage structure; at the same time, the writing order is marked as “writeback.”
2.2. Storage Management
The raw disk usually refers to such special character-driven device without the formatting operation and it cannot be managed by Unix/Linux’s file system ; thus the space management is very inflexible, and the corresponding on-demand enlargement requirement is very difficult to be satisfied. Given this, this paper leverages the logical volume (LV) to complete the enlargement of high-performance server. The whole storage management of high-performance server is shown in Figure 2.
To be specific, when the unallocated space of LV group (LVG) can satisfy the enlargement requirement, the required enlargement size is divided. Then, the “rawdevice” installed and used upon it is bound to “/dev/raw/raw.” On the contrary, when the enlargement requirement cannot be satisfied, the additional physical disk is added and then such installation and binding operations can be started.
If the enlargement requirement cannot be satisfied by the above-mentioned enlargement method, the current file system will laterally increase the external high-performance server to reach the purpose of enlargement. In particular, the increased high performance submits its condition to the state manager through the heartbeat protocol.
2.3. Writing Method of Sports Video
In this paper, the writing method of sports video is the single thread. At first, the live streams of sports video are arranged according to the time of request storage. Then, they are scheduled concurrently via these buffers, where the single thread coding way is used to handle these concurrent live streams. The single thread coding way does the consecutive storage via the hugepage, which effectively avoids the time consumption due to waiting for addressing of multithreaded concurrency; therefore, the transmission efficiency is improved, and the storage time is decreased. In particular, a buffer is only used to store one channel’s live streaming, which guarantees the consecutiveness of the physical storage space.
3. Transmission Optimization Based on Video Compression
In smart cities, higher video compression proportion is needed as the transmission of live streaming is subjected to the limited transmission rate and storage space. In particular, the live streaming of sports video involves lots of images; that is to say, video compression is also called image compression. The current video image compression mainly compresses the intraframe data, that is, storing the previous video image information with less data .
This paper proposes a video segment compression method to realize the transmission optimization, where two key frames are extracted and used to store one video segment. In addition, on the video decoder side, the frame interpolation method is used to recover the previous video segment data. In particular, three-dimensional convolutional neural network (3DCNN) (a deep learning method [14, 15]) is used to make the classification for these sports video segments by analyzing temporal information and spatial information of video frame sequences, in which there are three kinds of video segments, that is, radical change, gradual change, and ordinary change. In fact, 3DCNN has attracted attention for video information processing, since it introduces the time dimension innovatively on the basis of spatial dimensions to capture the contextual information between the different frames in the sports video.
In order to guarantee the efficient and accurate classification, a large enough surface dataset is very necessary. Otherwise, it is considerably difficult or even impossible to train a high-precision 3DCNN. Given this, this paper presents a large enough dataset with 249621 sports video segments, including 135202 radical change segments, 103146 gradual change segments, and 11263 ordinary change segments. Meanwhile, each sports video segment consists of 16 image sequences, and the overlapping number of frames between two sports video segments is set as 8 in order to prevent the leak detection phenomenon from happening.
3.2. 3DCNN-Based Video Compression
The used 3DCNN has five convolution layers, three pooling layers, and three full connection layers. Among them, each convolution layer includes a rectified linear unit (ReLU) as the activation function with the local response normalization (LRN) operation. The first two full connection layers include 2048 neurons and the last full connection layers includes three neurons, where each neuron corresponds to one sports video segment. The detailed information of 3DCNN in this paper is shown in Table 1.
Furthermore, the ordinary change video segment’s head frame and tail frame are used to express the whole sports video segment, and the whole length cannot exceed 32 frames.
4. Experiment Results
4.1. Experiment Method
In smart cities, the proposed HPTO is implemented based on Intel (R) Core (TM) i5-8500 CPU @3.00 GHz, RAM 8.00 GB, running on the Ubuntul6.02 64-bit operation system. The programming language is Python, running on PyCharm. The verification of HPTO includes three aspects. At first, the 3DCNN-based deep learning method at the transmission optimization part is verified. In particular, two benchmarks [16, 17] regarding CNN are used for the comparisons with evaluating recall ratio, precision ratio, and four values (///). Then, the video compression method based on 3DCNN is verified, in which one benchmark  regarding video compression is used for the comparison with bit rate evaluation. Finally, the whole live streaming transmission optimization scheme including high-performance server optimization and transmission optimization is verified. Meanwhile, two benchmarks [19, 20] regarding live streaming optimization are used for the comparisons with evaluating storage efficiency, transmission time, and packet loss rate. In total, the five above-mentioned benchmarks are denoted by KumarB, LiuB, RaghaB, HeB, and LiB, respectively, and they are introduced as follows: Kumar et al.  did a comparative study on CNN for any real-time image classification and object recognition, where CNN had that much of ability to create optimized video image classifications and object recognitions Liu et al.  proposed a memristor-based 3DCNN to recognize and classify the behaviors of human in the video with 6 main actions Raghavendra et al.  devised different image compression techniques without any data loss He et al.  proposed an uncoded multiuser video streaming system by exploiting the diversities of video contents and channel conditions of multiple users Li et al.  presented a joint optimization method for conversational HD video service, taking into account the linkage between video coding and transmission
Furthermore, the seven above-mentioned performance evaluation metrics are introduced as follows: The recall ratio is defined as where indicates true positives and indicates false negatives. The precision ratio is defined as where indicates false positives. value is defined as When , value is obtained. The bit rate is defined as the transmitted number of bits per second (kbps). The storage efficiency is defined as the utilization rate of high-performance server’s storage space. The transmission time is defined as the time difference between the timepoint when the first video segment of live streaming is sent from the high-performance server side and that when the last video segment of live streaming arrives at the decoder side. The packet loss rate is defined as the ratio of the lost number of video segments to the total number of video segments.
4.2. 3DCNN Verification
The experiment results on recall ratio based on six time simulations are shown in Table 2. We can observe that the proposed HPTO has the best recall ratio, followed by LiuB and KumarB. In particular, the recall ratio of HPTO can reach about 99%, increasing about 2% and 4.5% compared to LiuB and KumarB, respectively. HPTO and LiuB have higher recall ratio than KumarB, which results from the fact that they use 3DCNN structure to recognize and classify these live streams of sports video. Here, we emphasize that 3DCNN introduces the time dimension innovatively on the basis of spatial dimensions to capture the contextual frame information in the sports video and it has better performance than those traditional CNN structures. For HPTO and LiuB, the former presents the deep training based on Table 1, while the latter has no further improvement on 3DCNN structure. Therefore, HPTO has higher recall ratio compared to LiuB.
The experiment results on precision ratio based on six time simulations are shown in Table 3. We also find that the proposed HPTO has the highest precision ratio, followed by LiuB and KumarB. Similar reasons are found from the above statements.
Based on Tables 2 and 3, the average experiment results on values including , , , and are shown in Table 4. We can observe that, with the increase of , the corresponding value becomes smaller and smaller. As a matter of fact, the evaluation based on has the highest reference value. In particular, the larger value means that the corresponding strategy has better performance. As can be seen from Table 4, the proposed HPTO has the largest value, which means that HPTO has the best classification effect on the live streaming of sports video.
4.3. Video Compression Verification
This section considers six kinds of sports videos (NBA, CBA, German Bundesliga, Serie A, World Cup, and AOTC) and two kinds of encoding structures (H.264/AVC and HEVC). The average experiment results on bit rate are shown in Table 5. The improvement degrees on bit rate are shown in Table 6. We can observe that the proposed HPTO has an obvious advantage in terms of increasing the bit rate. In particular, the improvement rate of bit rate compared to the benchmark can reach 65.52% (Serie A) based on H.264/AVC encoding structure and 85.29% (World Cup) based on HEVC encoding structure, respectively. This further indicates that the proposed video compression optimization scheme is efficient.
4.4. Live Streaming Optimization Verification
The experiment results on storage efficiency based on six time simulations are shown in Table 7. We can observe that HPTO has the highest storage efficiency, followed by HeB and LiB. In particular, the storage efficiency of HPTO can reach about 86%, but those of HeB and LiB can only reach about 79% and 72%, respectively. Different from two benchmarks, HPTO makes two aspects of optimization on live streaming, that is, server optimization and transmission optimization. To be specific, the server optimization improves the storage structure, storage management method, and writing method. In addition, 3DCNN is also employed to optimize the video compression at the transmission optimization part. In fact, 3DCNN structure and video compression scheme have presented the efficient experiment results, which can be found in Section 4.2 and Section 4.3.
The experiment results on transmission time based on six-time simulations are shown in Table 8. We can find that the proposed HPTO has the smallest transmission time, followed by LiB and HeB. In particular, HPTO has more than twice as transmission time as HeB; this is because HPTO increases the storage time and saves more storage space to accelerate the transmission of live streaming. Different from HeB, LiB presents a joint optimization method for conversational HD video service by considering the linkage between video coding and transmission; thus it has smaller transmission time than HeB.
The experiment results on packet loss rate based on six-time simulations are shown in Table 9. We can find that the proposed HPTO has the lowest packet loss rate, followed by HeB and LiB, which means that HPTO will present the best watching experience in terms of the sports video. In particular, the packet loss rate of HPTO almost reaches 0%, which indicates that server optimization and transmission optimization can guarantee that all live streams of sports video arrive at the user end.
In smart cities, the live streaming optimization of sports video is very important because it has a direct influence on the watching quality. In this paper, a live streaming transmission optimization method based on server optimization and transmission optimization is proposed. Meanwhile, the server optimization includes storage structure optimization, storage management optimization, and writing optimization method. The transmission optimization mainly depends on the video compression based on 3DCNN structure.
The experiments include 3DCNN verification, video compression verification, and live streaming optimization verification, with evaluation of seven metrics, that is, recall ratio, precision ratio, four values, bit rate, storage efficiency, transmission time, and packet loss rate. All experiment results show that the proposed HPTO has better performance to optimize the live streaming of sports video.
In the future, we will test more metrics and more applications. In addition, we also plan to make a real demo for the proposed live streaming transmission optimization mechanism by connecting some high-performance servers.
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
K. Takayama, T. Fujimoto, R. Endo et al., “Neighbor selection based on transmission bandwidth on P2P live streaming service,” in Proceedings of 26th International Conference on Advanced Information Networking and Applications Workshops (WAINA’12), pp. 105–110, Fukuoka-shi, Japan, March 2012.View at: Google Scholar
M. Liu, P. Hong, J. Li et al., “LM-MCM: a new layered multicast transmission protocol for live streaming,” in Proceedings of 14th IEEE International Conference on Networks, pp. 1–6, Singapore, September 2007.View at: Google Scholar
R. Salunkhe, A. D. Kadam, N. Jayakumar et al., “In search of a scalable file system state-of-the-art file systems review and map view of new Scalable File system,” in Proceedings of 2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT), pp. 364–371, Chennai, India, March 2016.View at: Google Scholar
S. Kim, J. Han, H. Eom et al., “Improving I/O performance in distributed file systems for flash-based SSDs by access pattern reshaping,” Future Generation Computer Systems, vol. 115, pp. 365–373, 2020.View at: Google Scholar
R. J. Clarke, “Image and video compression: a survey,” International Journal of Imaging Systems and Technology, vol. 10, pp. 1–13, 1999.View at: Google Scholar
S. Kim, S. Lee, S. Kim et al., “Environmental effects of the technology transformation from hard-disk to solid-state drives from resource depletion and toxicity management perspectives,” Integrated Environmental Assessment and Management, vol. 15, pp. 1–7, 2019.View at: Publisher Site | Google Scholar
D. R. Bhojani, V. J. Dwivedi, and R. M. Thanki, Hybrid Video Compression Standard, Springer Briefs in Applied Sciences and Technology, Berlin, Germany, 2020.
K. K. Kumar, M. D. Kumar, C. Samsonu et al., “Role of convolutional neural networks for any real time image classification, recognition and analysis,” Materials Today: Proceedings, 2021.View at: Google Scholar
J. Liu, Z. Li, Y. Tang et al., “3D Convolutional Neural Network based on memristor for video recognition,” Pattern Recognition Letters, vol. 130, pp. 116–124, 2018.View at: Google Scholar
C. He, Y. Hu, Y. Chen, X. Fan, H. Li, and B. Zeng, “MUcast: linear uncoded multiuser video streaming with channel assignment and power allocation optimization,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 4, pp. 1136–1146, 2020.View at: Publisher Site | Google Scholar