Abstract

Video surveillance plays a vital role in maintaining the social security although, until now, large uncertainty still exists in danger understanding and recognition, which can be partly attributed to intractable environment changes in the backgrounds. This article presents a brain-inspired computing of attention value of surrounding environment changes (EC) with a processes-based cognition model by introducing a ratio value of EC-implications within considered periods. Theoretical models for computation of warning level of EC-implications to the universal video recognition efficiency (quantified as time cost of implication-ratio variations from to , ) are further established. Imbedding proposed models into the online algorithms is suggested as a future research priority towards precision security for critical applications and, furthermore, schemes for a practical implementation of such integration are also preliminarily discussed.

1. Introduction

Surveillance plays a vital role in maintaining social security and protecting infrastructure facilities of a country [1, 2]. But until now, there are still considerable uncertainties associated with danger understanding and recognition, especially for engineering-critical applications [35], which can be partly attributed to implications of environment conditions to video recognition efficiency of the surveillance system. It has been demonstrated that suitable model parameters in online algorithms and difficulty level of object detection tasks in different environments can be much different [6].

Surrounding environment changes as particular changes in backgrounds are also responsible for some significant but still unresolved issues in objects recognition and tracking [7]. Because the backgrounds cannot be well-characterized in uncontrolled environments changes, the surveillance video recognition becomes more intractable [8]. Recognition of objects, accidents, and behaviors in dynamic environments is still a great challenge in video surveillance [9], which should be carried out through objects detection, motion tracking and analyses, and understanding and recognition of other details with robust and efficient algorithms. Environments changes are so rich and varied that an online algorithm with universal significance is demanded towards the effective dangers detection and warning in dynamic environment changes [1017].

Numerous algorithms have been developed to tackle video recognition challenges in various environments; however, a full understanding of environmental implications to video recognition efficiency demands learning models with universal significance (ignoring uncontrolled differences in real scenarios) [1827]. That is the essential reason why the current online algorithms, even for latest algorithms, for example, the latest models for tackling crowd segmentation for the high-dimensional, large-scale anomaly detection, still encounter considerable uncertainties [23, 24]. How to evaluate and compute the regulated attention in implications of the surrounding environment changes and, furthermore, how to define the warning level of EC-implications to video recognition efficiency should be research priorities towards precision security in intelligent surveillance [2127].

It has been widely recognized that video surveillance should consider the implications of surrounding environments changes to video surveillance, but until now, there are still no models for a universal evaluation of EC-implications to video recognition efficiency [4, 1227]. To solve the unresolved issues associated with uncontrolled EC-implications, various novel optimization models were proposed and further applied in current learning systems [1315]. Robustness and efficiency of some online algorithms in tackling special EC-implications in special scenario were validated in a series of previous studies although, until now, universal models for computation of the attention value and warning level of EC-implications to video recognition efficiency remain unaddressed and, hence, an emergent issue is improving the current surveillance systems [16, 17].

Objectives in this study are to present a brain-inspired computing of warning level of the implications of surrounding environment changes to video recognition efficiency, to model brain cognition processes and establish theoretical models for precision computation of attention value of EC, and to highlight necessity of introducing proposed models in critical applications.

2. Preliminary Formulation

A conceptual framework of precision security to integrate video surveillance with EC is shown in Figure 1. Danger detection in EC-implications is of great complexity because of features diversity. Precision security aims to present a better understanding of EC-implications to danger detection efficiency in sensitive areas and allows us to consider not only “who are dangerous” but also “who are in danger” and to reduce uncertainties in uncontrolled and complicated real scenarios [2831].

Brain cognition of EC-implications can be approached in four processes, data acquirement, classification, computation, and inference. Throughout the paper, the original, classified, computed, and inferred data, respectively, are denoted by EC1, EC2, EC3, and EC4. Obviously, generates ECi+1. To reduce uncertainty, assume that only EC3 and EC4 contribute to dispelling the EC-implications and generate regulated attention-effective data (denoted by ), which is generated from determination by EC4 (with a contribution ) and a part of EC3 (with a contribution ).

Denote by amounts of newly generated effective data in the -th brain learning period, = 1, 2, 3, . Denote as the amounts of at the -th frame and let , be at the beginning and end of the -th period, respectively, = 1, 2, 3, 4, = 1, 2, 3, . Assume that the average efficiency of data exploitation is and employ a function to estimate EC1 loss. Let be degree of importance and be the ECi contributions to , ; it is clear that . During the -th learning period (with length ), define the theoretical quantification of attention value of EC-implications as the amounts of and definewhere can be interpreted as EC attention-time ratio in the -th learning period, .

Based on the performance of a rapid DL (deep learning) method, YOLO, which is one of the most efficient algorithms for objects detection, classification, and tracking [3236], such implications of EC to video surveillance and the attention value and warning level are displayed in Figure 2.

Obviously, attention-time ratio of EC is reduced in regulated attention and EC-warning level (denoted by α) is measured by corresponding time cost. Throughout the paper, computation of α is formulated as evaluation of time cost in implication-ratio changes from to , = 1, 2, ….

3. Theoretical Analyses

Nonlinear functional analyses were confirmed suitable for solving the real scenario analyses and, exactly, multistage approach has been widely employed in simulating disaster responses [3742]. But dangers of understanding and recognition in precision security are worthy of reconsideration to dispel EC-implications, utilizing determined EC-attention value and warning level for such implications. Recall that brain cognition of EC-implications can be theoretically approached in four processes and hence, correspondingly, the formulated problem should be resolved in a four-stage approach [4046].

3.1. Attention Value of EC

Brain-inspired approach to attention value and warning level of EC are shown in Figure 3, where the EC-implications are manifested as an evolution of attention value and warning level. Such approach is independent of EC-types and hence it has universal significance. Regulated attention in brain-inspired data mining approach for behavior, accidents, and emotion understanding can be carried out through the whole video sampling, training, and recognition processes [47, 48].

First, we havewhich imply        Suppose that EC3 can fully convert to EC4; we obtain

Let , . From (2)–(4) and preliminary formulation, we have Therefore, theoretical quantification of (i.e., the attention value of EC-implications) is

3.2. Determined Warning Level

It remains to determine warning level of EC-implications. To reduce time complexity of learning periods for the EC-universal significance, analyses can be divided into two cases: time cost in different learning periods is independent or considering periods are mutually dependent.

Within a single learning period, if EC evolution rate is fixed (denoted by ), then we have

Let ; we have

Taking into account the variation of within this period, for example, let ; we have and hence

For a video with learning periods, let ; we have

The solution of (11) is

Equivalently, we have

To simplify the representation of (13), define the following:

The Time-Parameters Matrices

The Original Status Matrix

The Dynamic Functions MatrixWe obtain the matrix form of (13):

Further considering relationship between surveillance videos, let ; then

The symmetric form of (17) is

Defining , , , we obtainwhere is the correlative function of the -th video in the consider security system, .

Finally, EC-warning level can be computed as time cost from to , . Regulated attention can be theoretically implemented in multidata fusion, learning, and modelling. Region of interest (ROI) or pedestrians of interest (POI) corresponds to GIS-data, including time, place, and EC through Internet of things applicable for real scenarios, as seen in Figure 4. It is worth noting that the 3D stereo generated from a 2D video sequence is advantageous to highlight EC evolution and therefore is also advantageous to determine length of learning periods.

4. Simulation and Discussion

Our proposed models in the present study are learning models with universal significance (ignoring uncontrolled differences in real scenarios), which aim to establish theoretical framework of the environmental implications to video recognition efficiency. It will serve for a universal evaluation of EC-implications to video recognition efficiency. Numerous algorithms have been developed to tackle video recognition challenges in various environments, but it is still difficult to describe the time complexity of learning periods. This can be largely attributed to the complexity of video recognition issues. Even for a given issue, it is not easy to determine learning periods for different EC-scenarios. Generally, attention-time ratio of EC is reduced in regulated attention and EC-warning level can be measured by corresponding time cost in reducing the attention-time ratio of EC. So we formulate the parameter α as the time cost in implication-ratio changes from to , . For detailed analysis on the time complexity, some examples of learning periods for video detection and tracking in different surveillance scenarios are presented in Figure 5. One possible solution to treat the time complexity is to imbed proposed models into online algorithms in critical applications, utilizing these newly added examples and evidences.

Because of time complexity of learning periods, we give EC-attention values for simulation, ten videos with given EC-attention values in Table 1. Equations (17)–(20) are employed to simulate brain-inspired computing of corresponding EC-warning level.

Ignoring the association among ten surveillance videos, from (17) and (18), the EC-warning levels from to are = 0.8868, = 0.1363, = 1.5691, and = 0.9220, respectively, . Taking into account the association among ten surveillance videos, utilizing (19) and (20) and letting , and finding a suitable association function (here ), EC-warning levels from to are = 0.4096, = 0.0984, = 0.6314, and = 0.9220, respectively, .

Characterizing EC-warning level and the implied dangers is helpful for learning how well can potential dangers be detected by video surveillance in changing environments, especially in unmanned driving, where one major bottleneck is finding effective and efficient algorithms for the danger detection and caution, majorly due to lack of adaptive attention in utilized learning systems [4951]. Numerous issues remain unresolved, a part of which are resulted from poorly understood EC-implications [5258]. Brain-inspired modelling approach to such implications in the present study majorly depends on amounts of attention data and length of attention time, ignoring the differences in real scenarios. Therefore, the proposed models have universal significance for its critical applications. It is therefore necessary to consider integration of proposed models with the online surveillance algorithms towards precision security [5961]. Such precision security can be a great challenge because that performance degradation of video recognition efficiency in critical environments has been demonstrated in some previous studies [6, 17, 21, 35].

For special scenarios when EC-implications are not significant, integration of our models with online algorithms is not necessary. Computation can be largely simplified in special applications. Taking the lane detection as an example, the biological principles are to detect and recognize a line, which can work well even if the lanes are partly missing [6264], as seen in Figure 6.

For complex applications, however, imbedding proposed models in current security systems becomes necessary, such as compressive sensing for sparse tracking [18] (it can be improved as locally compressive sensing within ROI), VIBE algorithm for real-time object detection from a moving camera [19], Adaboost algorithm for noise-detection in ROI [20], optical flow for robots’ recognition of environments [21], SVM clustering for accidents classification [22], deep learning algorithms for anomaly detection, crow analysis, and hierarchical tracking within ROI [2327]. Objects understanding and detection in dynamic environment changes are usually based on the adaptive background subtraction and other objects recognition methods [17, 21, 35, 6568]. A preliminary scheme for the practical integration of proposed models with these algorithms is presented in Figure 7, where smog as a global environmental change has significant implications to video behaviors recognition and loitering detection within a hovering period of two persons; only half of hovering behaviors is detected; only one person is red-highlighted and the other person is always in a green rectangle, indicating the degradation of video surveillance efficiency within the considered periods under any real challenging scenarios. It is worth noting that the proposed models have analytic solutions and the time cost in each iteration is much shorter than the time cost of any video recognition algorithms. Therefore, imbedding the proposed models in current security systems for critical applications is not only necessary but also feasible; proposed models can work well with any online algorithms without a great loss in surveillance efficiency.

5. Conclusion

Despite previous studies on algorithms for video surveillance in various environments, there are still considerable uncertainties in objects detection, classification, and tracking. Understanding and recognition of implications of surrounding environment changes to surveillance efficiency are still very limited. Brain-inspired modelling approach to such implications in the present study majorly depends on the amounts of attention data and attention time, ignoring difference in real scenarios. Therefore, proposed models represent biological principles of computational intelligence and have universal significance for its practical integration with online algorithms. Nevertheless, a full understanding of complexity of learning periods for different EC-scenarios is still necessary. This is also a next research priority towards a universal evaluation of implications of the surrounding environments changes to video recognition efficiency.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research was financially supported by the Shenzhen Basic Research Project (JCYJ201506 30114942260), the National Natural Science Foundation of China (41571299), the CAS “Light of West China” Program (XBBS-2014-16), and the “Thousand Talents” plan (Y474161).