Abstract

The use of big data technology to efficiently access valid corridor monitoring information embedded in unstructured data and to achieve fast and effective processing of video surveillance data is an effective means of monitoring abnormal behavior in integrated corridors. The study first divides the longer surveillance video into multiple parts and then extracts functions for each part based on CenterNet. Inspired by the area under the curve concept, MIAUC was further applied to a loss function model, which encouraged higher scores for anomalous segments compared to normal segments. Also, by formulating anomaly detection as a regression problem, methods based on weakly labeled training data will consider both normal and anomalous behavior for anomaly detection. To alleviate the difficulty of obtaining accurate segment-level labels, Multiple Instance Learning (MIL) is utilized to learn the anomaly model and detect video segment-level anomalies during testing. The results of the research enable effective 24/7 monitoring, storage functions, intrusion detection functions, and emergency linkage functions.

1. Introduction

Video surveillance has shown some clear trends of change in recent years, for example, the historic shift from the analogue to the digital stage of video surveillance, the more open standards, the significant impact of HD video in the security sector, and the growing prominence and rapid development of intelligence. These changes are driving video surveillance systems towards a more mature stage. The network video surveillance system combines computer technology, multimedia technology, network technology, and monitoring technology organically, connecting the monitoring system and computer network system, so that the two independent systems towards integration, making a breakthrough in concept and approach. A large number of cameras are installed inside the corridor, thus forming a surveillance network. This network of cameras generates a huge amount of video data every day. Manual monitoring not only requires a lot of human, material, and financial resources to process but is also susceptible to subjective human factors that can reduce the effectiveness of monitoring. Therefore, there is an urgent need to use big data technology to efficiently obtain the effective monitoring information contained in the unstructured data and to achieve fast and effective processing of video surveillance data. At the same time, the task of monitoring the monitoring area for a long period of time and over a large area is guaranteed.

Currently, artificial intelligence manufacturing continues to be applied in the field of security, as the core of the security foundation of intelligent supervision system will usher in unprecedented application prospects. Intelligent video surveillance has become a research hotspot in the fields of computer vision and security surveillance. Convolutional neural network is a feedforward neural network [1, 2], this neuron connection pattern inspired by the principle of animal visual cortex to detect optical signals. Modern structure of CNN, which is a multilayer artificial neural network, is named LeNet-5. For processing sequence images, generally use 3-D convolution to extract the spatial and temporal characteristics of the data, which can make convolutional neural network can be good at processing sequence information, 3-D convolution has achieved remarkable results in areas such as human action recognition.

Since the MIL problem involves assigning a single category label to a bag of instances, it can be solved by learning a model that predicts the label of that bag. Another challenge is to discover the key instance which determines the bag label. To address the main task of bag classification, various approaches have been proposed, such as combining instance-level classifiers, measuring the similarity between bags, with rank-invariant aggregation operators for neural networks. More recently, the idea of combining MIL with deep learning has greatly improved accuracy, using deep learning-based autoencoders to learn models of normal behavior and using reconstruction loss to detect anomalies.

In this study, this line of research is followed as it allows the application of flexible transformation classes that can be trained end-to-end by backpropagation. In contrast to existing methods, anomaly detection is formulated as a regression problem. An approach based on weakly labeled training data would consider both normal and abnormal behavior for anomaly detection. To alleviate the difficulty of obtaining accurate segment-level labels, MIL is utilized to learn the anomaly model and detect video segment-level anomalies during testing.

2. Literature Review

In the development of the integrated pipe corridor itself, Trckova et al. constructed a comprehensive analysis system of the integrated pipe corridor in terms of economic and social benefits, safety, monitoring technology, repair and maintenance, visualization and development, equipment and facilities, etc., to comprehensively analyze the important role of the integrated pipe corridor for urban development [3]. The study of Klepikov et al. concluded that a hygienic, comfortable, and safe integrated pipe corridor is the goal of modern society. In the process of construction of integrated pipe corridors, all factors such as human health, psychology, and safety should be taken into account, and on the basis of ensuring these factors, the utilization rate of integrated pipe corridors should be improved [4, 5].

In terms of risk management, Hiromitsu et al. conducted a long research on safety and disasters in integrated corridors, and based on a large amount of data collected, analyzed and identified the hazards and risk categories in integrated corridors, and proposed preventive measures and methods for different hazards and risks [6]. Namin et al. and Bhalla et al. focused their research on underground space engineering structures and proposed technical measures to reduce the risk of engineering structures by using real-time monitoring during the construction and operation of underground spaces [7, 8]. Canto-Perello and Curiel-Esparza focused on the risk assessment of personnel once they have entered the tunnel, analyzing the potential risks and arguing that easy accessibility and maintainability are key concerns that distinguish tunnels from other public facilities [9]. Shahrour et al. proposed an intelligent solution that can address the main challenges faced by integrated urban corridors. The fire risks of integrated corridors were analyzed, and risk assessments were carried out for multiple risks [10]. Canto-Perello and Curiel-Esparza carried out a risk analysis of workers’ access to integrated corridors, derived potential hazards for electrical, gas, and drainage facilities, and gave recommendations for countermeasures against these potential hazards [9]. Zhang et al. analyzed the fire risk types and characteristics and proposed measures to reduce fire risks in integrated pipe corridors [11]. Jang et al. (2016) studied gas explosions in integrated pipe corridors due to gas leaks and unknown ignition [12].

In terms of safety management, Dove studied environmental and psychological issues on the basis of research on safety issues and preventive measures in integrated corridors and raised the importance of safety monitoring and safety management in integrated corridors. A lot of research work has been carried out on the introduction of sunlight in integrated corridors, automatic fire alarms, automatic fire extinguishing devices, and integrated corridor rescue devices and equipment, and certain results have been achieved [13]. In their research on the safety issues of integrated corridors, Curiel et al. summarized their problems, including economic problems, development problems, cost problems, environmental problems, and regulatory problems and highlight the current lack of successful integrated corridor management experience [14].

In the process of construction and operation and maintenance management of integrated pipe corridors, in addition to theoretical research, there are also systems, models, and algorithms designed and applied.

In terms of system design and application, Rogers and Hunt focused on the design and development of a ventilation system for urban road tunnels and constructed a system of standards for ventilation and dispersion of air pollutants, vehicle exhaust, and fresh air in tunnels [15]. Yoo et al. researched and developed an information technology- (IT-) based tunnel risk assessment system. The system was developed in a geographic information system (GIS) environment, using GIS and AI technology to analyze potential risks in tunnels [16].

Bhatia applied machine learning techniques to construct computational techniques for predicting risk perception, enabling more accurate mapping of participants’ risk ratings, quantitative prediction of new risks, and quantifying the degree of association between risk sources and large amounts of text that can be used to identify cognitive and affective factors [17]. Delage and Kuhn demonstrated that for any fuzzy risk-averse measure that satisfies weak continuity [18], there Chen et al. proposed an axiomatic framework through the simultaneous analysis of individual subjects in the system and natural scenario outcomes, which is used for system risk metrics and management [19]. In terms of risk analysis and judgement, Fang and Marie proposed a decision support system (DSS) for project risk and risk interaction modeling and management to facilitate supporting project managers in making decisions about risk responses [20].

Model design: in order to overcome the shortcomings of traditional hazard identification methods, an energy transfer theory-preliminary hazard analysis-evolutionary tree model (EPE) was designed, based on which an ordered list of 189 hazards in a series of measured units was proposed, including common hazards, specific hazards, and multiple hazards in 28 categories for integrated pipeline corridors, covering common hazards, special hazards, and multiple hazards, which was used to build a safer, more coordinated and efficient risk architecture for urban living [21]. Ouyang et al. designed a worst-case vulnerability assessment and mitigation model for integrated pipeline corridors, used a decomposition algorithm based on column constraint generation to obtain an exact solution of the model, and validated the proposed method using an example of an integrated pipeline corridor with interdependent power and water systems in Tianjin Eco-City, China [22]. Based on Choi et al.’s results, Xia et al. introduced an entropic risk model investigated the impact of design on risk in the construction phase of underground projects and developed a risk assessment model [23, 24].

Algorithm application and improvement: Canto-Perello et al. argued that the safety management is a key issue facing the management of integrated pipe corridors with many participating subjects and complex financing and ownership relationships and proposed a method based on a combination of expert systems and color coding to analyze the potential key risk factors of integrated pipe corridors using hierarchical analysis [25, 26]. Zhou et al. designed a Bayesian network-based integrated Bayesian network inference and sensitivity analysis of pipeline accident scenarios by a risk assessment model for sewer pipes in the corridor and established a risk assessment framework for sewer lines to identify serious threats to sewer pipes in the integrated corridor [27]. In terms of risk response, for the problem of natural gas pipeline leakage in integrated corridors, Tang et al. proposed a systematic framework for dynamic safety risk analysis of natural gas pipeline leakage by combining the butterfly knot model (BT), Bayesian network (BN), and fuzzy set theory (FST) with monitoring data [28]. Fang et al. proposed a dynamic quantitative risk analysis method to study natural gas pipelines in integrated corridors: Potential incidents of natural gas pipelines were identified and implemented through case studies and expert experience; Bayesian networks were developed to derive key influencing factors; and key challenges for natural gas pipelines in integrated corridors were predicted and analyzed [29]. Canto-Perello et al. proposed an expert system combining color-coded, Delphi, and hierarchical analysis to analyze the criticality and threat of integrated pipelines that is used to support the planning of urban underground facility safety policies [26].

In summary, the application of intelligent video surveillance to integrated pipe corridors has the following three theoretical implications: (1) identification of staff entering the corridor, so as to monitor their behavior and guarantee the standardization of their operations, (2) when an emergency situation occurs, managers clearly grasp the specific situation of staff in the corridor, so as to ensure that the emergency situation is dealt with in a timely and effective manner, and to accurately guide the corridor, and (3) identification of outsiders who have entered the corridor illegally to prevent them from causing damage to the corridor. It also has the following practical technical advantages: (1) 24 h all-weather monitoring: the monitoring system automatically stores the video data and continuously records the images of each monitoring area during any period of time. (2) Storage function and support for image playback: the monitoring system can store video data for a longer period of time; for the corridor, it will generally store about 30 d of video data for playback viewing. Especially when an accident occurs in the corridor (e.g., fire), the playback of the monitoring system can be used to analyze the cause of the accident and determine the responsibility for the accident and prevent similar accidents from occurring. (3) Intrusion detection function: the front-end camera of the monitoring system can support motion detection, no moving people, or objects in the corridor except for the time period of normal inspection, and the monitoring system can compare the images collected from different time periods, find suspicious people or objects, and send an alarm to the monitoring center in time to prompt the corridor operator to have abnormal conditions. (4) Linkage function: the corridor contains a variety of detection devices (such as intrusion detection, fire detection, and natural gas leak detection), when these devices find abnormalities, they can be linked to the front-end cameras of the monitoring system according to the preset plan, timely real-time screen feedback to the monitoring center, so that operators can intuitively understand the situation on site and take targeted measures. For example, when an intrusion detection device issues an alarm, it can be linked to the rotation of its nearby camera, pointing the lens at the location where the intrusion occurred.

3. The Proposed Algorithm

In this section, the CenterNet-based feature extraction and the corresponding MIAUC loss function are presented [29]. The methodological framework of this study is shown in Figure 1. The surveillance video is divided into a fixed number of segments, and these instances are placed into packets. Either positively or negatively labeled packets can be used to fully train the person anomaly detection model.

3.1. CenterNet-Based Feature Extraction

As a semienclosed underground space, the integrated pipeline corridor not only accommodates many electromechanical installations but also contains various municipal utility pipes. As a result, the daily work is to inspect and maintain the pipes. When regular personnel attend to problems during inspections, it will take longer to remain in the showroom. A rapid response to emergencies therefore helps to effectively protect intact pipes and pipelines and to ensure the safety of personnel through timely evacuation. Conventional object detectors represent each object by means of an enclosing box closely aligned axially around the object. Object detection is then reduced to image classification of a large number of potential object bounding boxes. However, sliding window-based methods require enumeration of all possible object positions and sizes, which is computationally wasteful. Therefore, a simpler, more efficient alternative is needed.

The CenterNet method is simple in structure and very fast in operation. Table 1 describes the performance of CenterNet. Therefore, in this study, each segmentation feature was extracted using CenterNet. By feeding the input image into a full convolutional network, a heat map is generated. The peaks in this heat map correspond to the center of the object. The image features at each peak predict the height and weight of the object enclosing the box.

Make that indicates the bounding box of object . The corresponding centroids are then . Use the key point estimator prediction of all centrepoints and return to the object size of each object .To limit the computational burden, a single size prediction is used for all object classes .Thus, the similarity between the center point and the target objective can be measured by the L1 loss:

The corresponding loss function is

In which is a constant, set . Then, a single network is used to predict the key points , offset , and size . The network predicts a total of outputs. All outputs share a common fully convolutional backbone network. Based on the above method, the peaks in the heat map are first extracted separately for each category. All responses with values greater than or equal to their 8 connected neighbors are detected, and the first 100 peaks are held. Set is the detected centers of category point of the set. The position of each key point is determined by the integer coordinates it give. Use of key point values as a measure of its detection confidence and generates a bounding box at the location.

In which is an offset prediction, and is size prediction. All outputs are generated directly from key point estimation without (nonmaxima suppression) values or other postprocessing based on IoU.

3D detection estimates the 3D bounding box of each object and requires three additional attributes for each centroid: depth, 3D dimension, and orientation. A separate header is added for each of them. The depth is a single scalar for each centroid. However, depth is difficult to regress directly. Therefore, use as the output transform, where σ is stype function, calculating depth as an additional output channel of the key point estimator . Once again, it uses the same data provided by ReLU two convolutional layers separated by a separation. Unlike previous modalities, it uses an inverse -transform in the output layer. After the -transform, the depth estimator is trained using the L1 loss in the original depth domain. The 3D dimensions of the object are three scalars. Using a separate head and L1, the losses are regressed directly to their absolute value in meters. By default, the direction is a single scalar quantity. However, it is difficult to regress. The directions are represented as two boxes with intrabox regressions. Specifically, the directions are encoded using eight scalars, with each box having four scalars. For one bin, two scalars are used for softmax classification, and the remaining two are regressed to an angle within each bin.

3.2. Anomaly Scoring Functions Based on Convolutional Autoencoders

Convolutional autoencoder-based anomaly scoring function local information is particularly important in the context of anomaly detection as anomalies are located locally in the scene. Consequently, a convolutional autoencoder (CAE) is used to learn the different features of the anomaly score function extracted from the fragment. CAE was proposed by Masci et al. Its weights are shared between all positions in the input to preserve spatial locality. The anomaly score function is given in this equation.

In which is used with the parameter ,. The modeled encoders are used with the parameters modeled decoders, while is an argument to the exception scoring function.

The architecture of CAE is organized in different encoder and decoder layers. On the encoder side, three convolutional layers and two pooling layers exist and have the same inverse structure on the decoder side. In the first convolutional layer, the CAE architecture consists of 256 filters with a step size of 4. It generates 256 feature maps with a resolution of pixels. Next comes the first pooling layer, which generates 256 feature maps with a resolution of pixels. All pooling layers have a kernel and perform subsampling by the maximum pooling method. The second and third convolutional layers have 128 and 64 filters, respectively. The last encoder layer generates pixel feature maps. The decoder reconstructs the input by performing deconvolution and deconvolution in reverse order on the input. The output of the final deconvolution layer is a reconstructed version of the input. Table 2 summarizes the details of the CAE layers.

3.3. Loss Function Based on Multi-Instance AUC

The MIL method does not require exact time annotation. In MIL, the exact temporal location of anomalous events in the video is unknown. Instead, only video-level labels indicating the presence or absence of anomalies throughout the video are required. Videos containing anomalies are labeled positive, while videos without any anomalies are labeled negative. A positive video is then represented as a positively labeled package , where different time periods form individual instances in the bag , where is the number of packages. It is assumed that at least one of these instances contains an exception. Similarly, a negatively annotated video consists of the negatively annotated package, and indicates that the time period in the package forms a negative annotation instance . All instances in the negative package do not contain exceptions. AUC is a popular performance metric in classification. Especially when one does not know the cost of misclassification or has to deal with unbalanced categories, AUC has been successful in measuring the ability of a model to distinguish between different categories of events. Inspired by the concept of AUC, which calculates the rate at which randomly sampled anomalous instances have higher anomaly scores than randomly sampled normal instances, the MIL is further applied to the AUC-based anomaly detection problem. Letdenote the instance space,, andindicating abnormal and normal video clips,and, is, and the probability distribution of abnormal and normal instances in,, indicates a positive package,, and. The probability distributions of the corresponding anomaly scoring functions range from 0 to 1, respectively. The true positive marking rate (TPR) is the scoring function example of the exception rates correctly classified as abnormal. where is the threshold value, is the expected value, and indicates that the condition is λ the indicator function. When λ is true, ; otherwise, . False alarm rate (FPR) as a function of scoring will of random normal instances misclassified as abnormal.

The AUC is created by plotting all points against area threshold under the curve formed . AUC of the integral of the form is as follows:

AUC estimated values are as follows:

However, it is not possible to use this formula without fragment-level annotation. Therefore, this study extends the concept of AUC and proposes the following multi-instance correct rate (MITPR) and multi-instance error rate (MIFPR). MITPR denotes the anomaly score function that will put out the rate at which at least one instance of a random positive labeled package is classified as an exception:

MIFPR indicates an exception scoring function that will be from . The rate at which at least one instance of a random packet of negatively labeled numbers is classified as an exception:

The highest anomaly score is obtained by comparing the instances in the positive and negative labeled packets. The segment corresponding to the highest anomaly score in the positive labeled packet is a true positive labeled instance (the anomaly segment). The segment with the highest anomaly score in the negative annotation count packet is a negative annotation instance (normal segment), which is the most similar to the anomaly segment and may generate false alarms in the actual anomaly detection. Multi-instance AUC (MIAUC) is then defined in a similar way to AUC using the area under the MITPR () curve as a function of MIFPR (), as follows:

MIAUC is the anomaly score of all instances in at least one positive package over all instances in a negative labeled package. Given that is the set of positive packets and is the set of negative labeled packets, the estimate of MIAUC can be calculated as follows:

The limitation of the above loss function is that it ignores the underlying temporal structure of the anomalous video. In a realistic situation, anomalous events usually occur only for a short period of time. In this case, the scores of the instances in the anomaly package should be sparse, indicating that only a few clips may contain anomalies. Secondly, as the video is a series of segments, the anomaly score should vary smoothly between individual video segments. Therefore, temporal smoothing is implemented between the anomaly scores of temporally adjacent video clips by minimizing the difference in scores between adjacent video clips. By combining the sparsity and smoothing constraints on the instance scores, the loss function becomes the equation (13), in which indicates a time smoothing term, and denotes a sparse term.

4. Experimental Framework

4.1. Experimental Data

By 2020, Beijing will have completed 150 to 200 kilometers of integrated pipeline corridors in major projects such as the Beijing Urban Vice Centre, the Winter Olympic Games, the World Park, and the new airport. The Beijing World Expo underground integrated pipeline corridor was put into trial operation on 16 April 2019, with a total length of 7.1 km, featuring six main road settings, including one main corridor and five branch corridors, arranging for heat, gas, water supply, recycled water, electricity, and telecommunications to enter the corridor. A total of 291 cameras have been installed throughout the inner and outer corridors, allowing the entire corridor to be scanned within 5 minutes.

Based on the video surveillance system in the integrated pipe corridor for the 2022 Winter Olympics, a large-scale data set was constructed to assess the methodology. By interviewing company employees (Beijing Infrastructure Investment Co. Ltd.), abnormal behavior of people was classified into five categories: trespassing, personal injury, personal crowding, fast movement, and irregular dressing, which are important for the security of the integrated corridor. To ensure the quality of the dataset, videos with unclear anomalies were discarded, and possible anomalous events were supplemented by manual demonstrations conducted in the integrated corridor. The above measures resulted in the collection of 55 real surveillance videos where anomalies were evident. Using the same constraints, 55 regular videos were collected. The resulting dataset was 24 hours in length and consisted of 110 real-world surveillance videos. The distribution of anomalous events is shown in Table 3. For the proposed anomaly detection method, only video-level labels were used during the training process. However, in order to evaluate its performance on the test videos, temporal annotations, i.e., the start and end frames of the anomalous events in each test anomaly video, must be used. Therefore, in order to obtain the exact time range of each anomaly, a video is labeled by multiple annotators, and the final temporal annotation is an average of the different annotations. The dataset is divided into two parts: a training set consisting of 45 normal and 40 anomalous videos and a test set consisting of the remaining 10 normal and 15 anomalous videos. Both the training and test sets contain a variety of anomalies. In addition, some videos contain multiple anomalous times.

4.2. Assessment Indicators

Following previous work on anomaly detection, the area under the curve (AUC) is used to assess the performance of the method. In order to obtain a good recognition algorithm, the value of the AUC should be as high as possible.

4.3. Comparison Methods

The proposed method is compared with the following five methods: SVDNet, HJE, deep GMM, CCUKL, and DS. SVDNet and HJE are fully supervised anomaly detection methods. Supervised methods that aim to model normal and abnormal behavior from labeled data are typically designed to detect specific abnormal behavior predefined during the training phase. Deep GMM is a semisupervised anomaly detection method that requires only normal video data for training. CCUKL and DS are unsupervised anomaly detection methods designed to learn normal and anomalous behavior using statistical properties extracted from unlabeled data.

SVDNet is a deep representation learning process based on singular vector decomposition (SVD) with a constrained and relaxed iterative (RRI) training scheme that iteratively integrates orthogonality constraints into CNN training.

HJE (human joint estimation) is an anomalous behavior recognition algorithm that extracts human feature points from the mapping and integrates them with a support vector machine (SVM).

Deep Gaussian mixture model (GMM) is a scalable depth generative model that is built from observed normal events and stacks multiple GMM layers on top of each other. It uses PCANet to extract appearance and motion features from 3D gradients.

CCUKL (unsupervised kernel learning with clustering constraint) is an unsupervised kernel framework for anomaly detection based on feature space and support vector data description (SVDD).

DS (dominant set clustering method) is an anomaly behavior method using a dominant set-based unsupervised learning framework.

4.4. Implementation Details

Each video was divided into 20 nonoverlapping segments, and each video segment was considered as an instance of a packet. Then, 30 positively labeled packages and 30 negatively labeled packages were randomly selected as small batches. Using the algorithm developed by Theano Development Team (2016), the gradients were calculated by automatically differentiating them by inverse mode on the computational graph. Losses were then calculated as shown in Equation (6), and the back-propagation method was applied to calculate the losses for the entire batch. To obtain the best performance, the sparsity and smoothness constraint parameters in the MIL rank loss were set to . All settings for the above methods are consistent with the usual training settings.

5. Experimental Results

5.1. Recommended Efficiency Comparison

In this section, the quantitative evaluation of different methods on realistic datasets is first described. Then, it is analyzed how the size of the package affects the model performance. Finally, the false alarm rate is investigated. Evaluation of the proposed methods to evaluate the performance of the methods, experiments were conducted on realistic datasets, and a quantitative comparison in terms of AUC is given in Table 3.

The results show that the MIAUC method achieves an AUC of 84.42, which is significantly better than existing methods and 15.9% higher than the second best HJE. Fully supervised and semisupervised methods always perform better than unsupervised methods, suggesting that unsupervised methods are not suitable for anomaly detection in the integration pipeline. This is because the surveillance videos are too long, and the anomalies occur mainly in short periods of time. As a result, the features extracted from these untrimmed videos are not sufficiently discriminatory for unsupervised methods. In contrast, the performance difference between fully supervised and semisupervised methods is not significant. However, the fully supervised methods are not sufficient to distinguish between normal and abnormal patterns. In addition to producing low reconstruction errors for the normal part of the video, they also produce low reconstruction errors for the abnormal part.

5.2. Sensitivity Analysis of the Number of Instances per Package

The number of instances per packet is a key parameter that represents the number of instances contained in each packet. In anomaly detection, the smaller the number of instances per packet, the better. The reason for this is that for a certain length of video, the more accurate the annotation, the more information it provides. If there is only one instance in a packet, the proposed method corresponds to a fully supervised approach. However, annotating anomalous segments in training videos is complex and time-consuming. In addition, in real-world video surveillance systems, the available videos may be limited by the environment.

The number of instances per bag was set from 5 to 50 in steps of 5. The corresponding results are given in Figure 2. As the size of the duffel bag decreases, the AUC increases significantly. Above a threshold of 20, performance began to stabilize. According to a paired -test, the AUC reaches 84.42 when the number of instances in each bag is 20, which is not statistically different (at the 5% level) from the best performance (). This suggests that, to some extent, the proposed algorithm relaxes the restriction on packet size.

5.3. False Alarm Rate Analysis

In real-world video surveillance systems, normal video forms a major part of the surveillance video data. As in Aesop’s fable “The Wolf Came”, if a video surveillance system consistently reports normal video as anomalous, staff will no longer trust the alarms. Therefore, a practical anomaly detection method should have a low false alarm rate for normal video. Based on this belief, the performance of the method was evaluated and other methods on the normal video dataset. Table 4 gives the false alarm rates of the different methods at a 50% threshold. This demonstrates that the proposed method has a much lower false alarm rate than the other methods, which shows the general usefulness in practice. This validates that using both abnormal and normal videos during training helps to understand more normal patterns of the MIAUC model.

6. Application Solutions

6.1. Overview of Application Solutions

The monitoring system is based on the number of front-end cameras in the corridor, and different application options are selected to ensure a reasonable, economical and efficient layout of the monitoring system.

6.1.1. Simple Program

When the main length of the corridor is short, the number of fire protection zones (generally around 200 m in length) is small, and the number of front-end cameras is small, data storage can be completed by a single network hard disk recorder. The front-end camera is connected to the network switch in the equipment room on site via a category 5 network cable or single-mode optical fiber, and the network can be a star network. The network hard disk recorder is also connected to the networking switch, and the video data is stored by the network hard disk recorder on site and uploaded to the monitoring center via single-mode fiber. The monitoring center sets up video workstations and core switches, and video data is uploaded to the monitoring center, where the video images are viewed through the monitors accompanying the video workstations. The front-end camera supports the “motion detection” function, when the picture appears abnormal, for example, in the midnight there are people in the corridor activities, through the network hard disk recorder to the monitoring center on duty to send out fire alarms, the flame detector output switch to link the automatic fire extinguishing system in the corridor.

This solution features a simple monitoring system structure, flexible configuration, local storage of video data, and small amount of network communication, but the scale of the monitoring system is limited by the capacity of the network hard disk recorder, no scalability, only suitable for small monitoring systems.

6.1.2. Centralized Program

When the main body of the corridor is long, fire partition more, every two fire partitions with a device room, each device room only set up a set of networking switches for data teleportation, do not set up network hard disk recorder. The front-end cameras on site are connected to the networking switch in the equipment room using a star network and are then aggregated and networked with other networking switches to upload data to the monitoring center. The centralized solution site configuration is shown in Figure 3.

The networking switch between each device and the networking switch of the monitoring center is interconnected through a fiber optic ring network (no need to use a star network to lead the fiber optic to the monitoring center one-to-one) and then connected to the core switch of the monitoring center, and the video data is stored centrally through the disk array of the monitoring center and displayed through the large screen of the monitoring center.

The disk array can adopt Windows, Linux, Solaris, and other operating systems, support RAID technology (RAID0, 1, 5, 10, 50, etc.), a variety of protocols (such as video streaming protocol/NFS/CIFS/FTP/HTTP/AFP/RTSP, etc.), hot-swappable hard disks, redundant hot standby power supply, etc., high reliability, and large storage capacity. Video display using large screen system, generally “splicing wall,” the screen can be DLP, plasma, LCD, etc., and the typical number of screens are , , , , etc., video data through the large screen controller as required in the designated screen area display.

The solution features a simple monitoring system structure, the front-end cameras are networked through the site networking switch, no local storage of data, expansion only requires additional front-end cameras and site networking switch can be, and storage equipment is located in the monitoring center, high data security, and easy maintenance. However, this solution has a large amount of network communication [2], and the scale of the monitoring system is limited by this cannot be too large; otherwise, it will lead to high costs and easy to affect the quality of image transmission (such as image lag, loss, etc.), so suitable for medium-sized monitoring system.

6.1.3. Distribution Type Scheme

When the main body of the corridor is long and there are more fire protection zones, the site configuration of the monitoring system of the simple type scheme can be used (see Figure 4 for details), which can be expanded by increasing the number of network hard disk recorders, with one equipment room for each two fire protection zones, and each equipment room set up a set of network hard disk recorders for data local storage and a set of network switches for data remote transmission. When there are more video monitoring points on site, a separate dedicated video transmission network needs to be built. The network switch located in the equipment room of the corridor will centralize the video data and then network it with the network switch of the monitoring center through the fiber optic ring network and then connect it to the core switch of the monitoring center to upload the video data and finally display the live picture on the big screen of the monitoring center. The distributed solution network is shown in Figure 5. Video data is stored in individual network hard disk recorders on site, spreading the risk so that even if a single network hard disk recorder fails, it will not affect the video data stored in other areas.

Features of the solution: it has the advantages of a minimalist solution, while solving the problem of capacity limitation of network hard disk recorders, it is simple to expand and only requires sets of additional field equipment, which is suitable for the phased construction of surveillance systems; it uses a fiber optic ring network to connect each set of networking switches to ensure the reliability of the network; the monitoring center calls video data remotely as needed, with a small amount of network communication; video data storage is distributed. The video data storage is distributed structure, the network hard disk recorder between each device stores the video data within its control range, and damage to individual storage devices will not have a fatal impact on the whole monitoring system, improving the reliability of the monitoring system, suitable for medium or large monitoring systems.

6.2. Application Solution Design

The video surveillance system provides visual information about safety and operations for effective monitoring and disposal in the integrated corridor. Therefore, the design of the video surveillance system based on anomaly detection of people is divided into two levels: workflow design and interaction design with other systems. Figure 6 shows the design based video surveillance system, which is located in the integrated corridor for the 2022 Winter Olympics.

The method proposed in this study is applied to a video surveillance system in a comprehensive pipeline corridor. The video surveillance system based on personnel anomaly detection consists of three modules: video capture, anomaly detection, and surveillance display.

The video capture module consists of 291 cameras, including bullet cameras and hemispherical cameras. These cameras are placed at key locations in the integrated corridor, such as up and down stairs, corners, and important facilities. The bullet cameras are mainly used to monitor the facility bay as they are always focused on a fixed field of view. In contrast, the dome cameras have a wider field of view and are used to monitor the pipeline corridors. All cameras have H.264 compression encoding in 1080p, and the video data is stored for a period of no less than 15 days. In addition, the video data obtained by the surveillance cameras is sent evenly to the anomaly detection module.

The anomaly detection module processes the video data received from the video capture module. According to the proposed method, this task requires two levels of video processing. In a first step, regions of interest in the scene are detected, and the corresponding features are extracted. A graphical element based on these features is then generated to describe the region of interest. The second level provides anomaly scores about the person’s behavior and determines whether the behavior is normal or not. The results are stored and presented to the monitoring display module.

The monitoring display module displays the results sent by the anomaly detection module. If an abnormal event is detected, a pop-up window will sound an alert. For normal results, only live video is displayed.

Based on the above method, an intelligent link between the video surveillance system and other systems in the integrated corridor is achieved, including the fan system, lighting system, broadcasting system, telephone system, and access control system. As a result, effective measures can be taken in a timely manner against personnel abnormalities to ensure the stable operation of the integrated pipe corridor and the safety of personnel. When the video surveillance system detects an emergency in the integrated corridor, the person anomalies are classified into predefined categories, and the stations are located in the area where the cameras are located. In addition, different systems are linked together according to the different types of person anomalies. Detailed information is given in Figure 7 and Table 5.

When a trespass is detected, the access control system is connected, and the access locks to the corresponding areas are switched off to prevent intruders from deepening further into the integrated corridor. In addition, an announcement system warns of trespassers and a telephone system allows nearby staff to be rushed to the scene.

When physical damage is detected, the lighting system is connected. As a result, the emergency lighting and dispersal indicators in the abnormal area are switched on, and the normal lighting system is switched off. In addition, the locks on all evacuation routes in the corresponding areas are opened to ensure effective evacuation and to avoid secondary accidents. The fan system is switched on to vent harmful gases and to cool the integrated ducts.

As the three remaining anomalies (i.e., personal traffic congestion, rapid movement, and irregular dress) do not cause direct injury and occur relatively frequently, measures were taken accordingly. An announcement system is used to warn site personnel, and a telephone system is used to instruct site personnel on standard practices.

7. Summary and Outlook

With the construction of integrated urban corridors, the use of network video surveillance systems in them will increase, and the technical aspects will become more mature and create more intersections with other detection systems in the corridors. The increased scale of integrated pipe corridors brings with it the risk of break-ins. The use of intelligent supervisory systems to identify people entering the interior of integrated pipe corridors is an effective way to combat trespassing.

Urban integrated pipe corridors contain several detection systems, such as exit control devices for personnel access, intrusion detection alarm devices, fire detection systems for electrical compartments, and gas leak detection systems for natural gas compartments. When any of them detects an abnormality, the network video surveillance system can be linked to display the images in the monitoring center, and for different situations, the corresponding treatment plan can be activated by the operator. The network video surveillance system and other systems complement each other, covering all aspects from daily monitoring to key inspection, from preprevention to postintrusion treatment, forming a unified whole and effectively safeguarding the safe operation of equipment and pipelines in the city’s integrated pipeline corridor. This study proposes a flexible and interpretable MIL method for anomaly detection in integrated pipeline corridors. This study first divides the longer surveillance video into multiple parts and then extracts functions for each part based on CenterNet. Inspired by the area under the curve (AUC) concept, MIAUC was further applied to the loss function model, which encourages higher scores for anomalous segments compared to normal segments. In addition, a new large-scale dataset was constructed based on surveillance videos in the integrated corridor. Finally, by validating on a real dataset, the MIAUC proposed in this study outperforms other benchmark methods. As the focus in this study will be on the binary MIL problem, while multiclass MIL is more interesting and challenging, it is also worthwhile to consider exclusion points, i.e., instances where the package is always negative, or dependencies between instances within the package are assumed to exist. In addition, in future research, the focus will be on the application of massive integrated corridor surveillance videos.

Data Availability

The dataset utilized in this paper is based on the video surveillance system in the integrated pipe corridor for the 2022 Winter Olympics. As the data belongs to a commercial corporation (Beijing Infrastructure Investment Co., LTD), it is not available to the public unless authorized.

Conflicts of Interest

The authors declares that they have no conflicts of interest.