Abstract

Public security is a widespread disastrous phenomenon that constitutes a grave threat. Although information fusion of video sensor networks for public security has been studied extensively, multimedia fusion in heterogeneous sensor networks or its application in public security remains a challenge and central goal in the field of information fusion. In this study, to realize the detection, monitoring, and intelligent alarm of such hazards, we develop a graph-based real-time schema for studying the dynamic structure of heterogeneous sensors for public security. In the proposed schema, data fusion algorithms based on data-driven aspects of fusion are explored to locate the optimal sensing ranges of sensor nodes in a network with heterogeneous targets. In addition, we propose a framework incorporating useful contextual and temporal cues for public security alarm, explore its conceptualizations, benefits, and challenges, and analyze the correlations of the target motion elements in the multimedia sensor stream. The experimental results show that the new method offers a better way of intelligent alarm that cannot be achieved by existing schemes.

1. Introduction

Public security is a thorny issue concealed almost everywhere and threatens our physical safety at any time. The threats are influenced by some unpredicted reasons, such as unruly crowd aroused by emergencies or natural hazards from nowhere. Unfortunately, the most tragic catastrophes are allegedly due to humans, and the entrenched invisible terrorists always find a way to break the social regulatory life. In such case, any leak may lead to various tangible and intangible losses. Accordingly, there is a critical need for offering early detection and alarm, and such guarantee may enable instant reaction to winning most time for evacuation and deployment and will also reduce the possibility of loss.

Information fusion, since born in 1970s as a military requirement, shows considerable potential, and later multisensor data fusion (MSDF) arises at the historic moment [1]. The past experience of MSDF shows that it contributes to integration and analysis of sensors data and to more accurate understanding of the situation and how to react to it [2, 3], which makes the disaster prevention possible. Mostly, it was restricted to combine vast correlation sensor data in some particular detection, and some of the topics of interest are fire detection [4, 5], surveillance of transportation situation [6, 7], and novel for the specialty as data management [8, 9]. On the other side, approaches are devoted to elimination of redundancy in the sensor networks [10]. Considering the complicacy of public security, maximizing relevant multimedia sensors data as well as exclusion of redundancy should be involved in this case. That means strong spatial-temporal correlation multimedia sensor data are needed to effectively redeploy a multiview and multisense monitor system [11]. As the equivalent intention, a coordinated scheduling method [12] was presented following the divisible load theory (DLT) to minimize both finish time and consumption in sensors network. Also topology rebuilding algorithm (TRA) [13] is addressed as the solution to data dissemination, in which the workloads meet the specified requirements.

Vision sensors that afford successive video frames conceal some correlation semantic knowledge transferred by series interactional target motion elements [14, 15]. Besides, the series multimedia sensors could be used to detect much associated information, such as the noise caused by the crowd, the smell which may be the clue of some toxic gas, and even some detected suspicious words captured by the sound sensor. As a result, the fusion of these levels of environmental data may feasibly be responsive to any situation that happens to the region without actually being here. Moreover, an alarm is made if any hidden danger is included. Hence, to pursue the goal, technologies are required to achieve immediate intelligent alarm, including the following: capture multisense environmental data in one scene from different perspectives; dig the spatial relations and interaction underlying the multimedia sensors; achieve the decision of multimedia sensors data fusion.

Recent information fusion researching shows the expressive activity and is highlighted. Most of them concentrate on making the utmost use of the comprehensive sensor resources, which are based on the reasonable disposal of multiple sensors information, and includes several distinctive uses of the correlation and independence, contextual information, and modality selection between different sensors [16, 17]. The methods are required for robust and parallel processing ability. Furthermore, complementary and redundant information from different sensors which vary in the modalities and spatiality asks for an optimizing description on consistency of aimed objects.

The fusion of various modalities is generally performed at the feature level (early fusion) and decision level (late fusion) [16]. The feature level fusion [18, 19] extracts the features from the original information referred by sensors and then analyzes and processes the extracted modality information which in our case may include visual feature (Color, Shape, or Texture), audio feature (frequencies or Texture), and motion features (motion trace or motion pattern). The feature level fusion achieves impressive compression for real-time processing and also provides considerable feature information for the decision analysis, but it may not perform well in the time synchronization between multimodal features. The decision level fusion [20, 21] employs numerous sensors to observe one object, and each sensor finishes its own local processing (e.g., pretreatment, feature extraction, and recognition), and then the local decisions are combined to a fused decision based on the correlation. This approach is advantageous in that it utilizes unique representations for the diverse modality features. Besides great scalability in fusion process rather than at the feature level, it costs extra time caused by obtaining the local decisions.

Based on diverse specific application background, there are many widely used fusion approaches involved in or concerned with relevant studies. In the information retrieval domain, data fusion is provided for the A rank/score function [22, 23]. In the application of current tendency, several approaches employ temperature, infrared, and smoke sensors for environmental data detection [24, 25] and video sensor network (VSN) is utilized by intelligent transportation systems or surveillance systems [26, 27], while information fusion is widely used in the fields of robotics, image processing, or some interconnected fields [28, 29]. For instance, the environmental data (temperature and humidity) are adopted as in-field lower-level data, in which the change at each sensor node can be observed by a chosen threshold, and then the probability of fire is confirmed by using Dempster-Shafer evidential reasoning (D-S) to fuse VSN data [5]. However, the supply of the vision sensors is limited to corroborate the assumption proposed as the first step; much associated video semantic knowledge was barely effective in timely warning and thus was wasted. In addition, the combination of the Kalman filtering (KF) with D-S [30] conquers the insufficiency of both algorithms and results in a matching model. A strategy of camera topology estimation was proposed to prevent the probability of the overlap that occurs in the reality, and also the fusion of dynamic features of the targets between successive video frames was still excluded in the theory. Moreover, an information fusion model devoted to tracking and locating materials was adopted into automated identification and location estimation [31]. The author provided a multilevel data fusion model based on the modified JDL model which joined D-S theory and weighted mean. The hybrid fusion method succeeded in fusing data from disparate sensor nodes in a noisy and dynamic environment and was advanced in uncertainty and imprecision.

Recent information fusion research shows the expressive activity and is highlighted. The research concentrates on making the utmost use of comprehensive sensor resources, based on reasonable disposal of multisensor information, and also focuses on distinctive uses of the correlation, independence, contextual information, and modality selection between sensors [16, 17]. Methods are required to improve the robust and parallel processing ability. Furthermore, the complementary and redundant information from different sensors which vary in the modality and spatiality asks for an optimized description with consistency of aimed objects.

3. Multisensors Network and Directed Graph

As far as concerned, we will begin by looking at some minimal models of public security in crowds. Regardless of how data from different sensors are organized, the underlying fusion algorithms must ultimately fuse the input data. The aim of public security-oriented fusion applications is to deal with several data-related challenges. As a result, we will explore data fusion algorithms from the data-driven aspects and seek the optimal sensing ranges of sensor nodes in a network with some targets. It is assumed that a sensor is able to dynamically adjust its sensing data by the arbitrary amount.

In general, a varying circumstance will diversify the work condition of each sensor and may cause many sensors to directly respond to the change. This means, under the scope of the effected sensors, it captures considerable correlative information about the detected environment. Meanwhile, considering the spatial relations, as one sensor detects the environmental factors and some other factors vary in time, the corporate sensors also respond to the instant scene. In this paper, we define it as the interaction between two sensors that interprets the probability when one sensor is triggered while the other responds to the same initiator. And a quantization method is used to dynamically define the interaction relationship, and directed graph is adopted to describe this related tendency, which leads to the maximum correlation information from diverse sensors, and eventually sensor data are adaptively dispatched for optimal fusion aimed at the dynamic environment.

3.1. Multisensor Attribute Node

Multisensor attribute mainly represents the perceived multimodal data, such as the image information captured by video sensors. The aims of uniform description of the interaction are to seek for the state analysis and quantization of multimodal sensor data. Besides the effects from temporal and spatial factors, we also define and illustrate the related notions of involved elements and thus write the problem below.

Definition 1 (sensor state). The state of sensor is the diverse modal sensor data, that multi-sensors perceive one scene and is to give one description of the condition from one sensing method at some point. In this paper, the multisensor state is defined as follows: state = {Decibel, Smokescope, Cohesion}, which separately indicates the results that the audio sensor, odor sensor, and video sensor quantify the basic state of the detection range. Specifically, Decibel and Smokescope quantify the real-time environment, while Cohesion measures the compactness index of motion elements in the video images.

Definition 2 (spatial dependence). Measurement of spatial relationship aims to measure the spatial scale between two entity elements. In this case, the spatial dependence is associated with the range distribution of the sensors’ coverages, rather than the linear distance between them, and in physics, it may refer to the overlapping area of any two sensors (Figure 1). Apparently, the overlapping area directly reveals the dependence of one sensor on the other, as targets can be detected by both in the overlapping area. Thus, the proportion of coverage overlap accounts for the range of one sensor’s coverage that indicates the probability that the other sensor can detect the target at the same time. Therefore, the dependence can be described by scaling the proportion of the overlapping area in one sensor’s geometric coverage under the unified geographic space. Hence, we assume any two sensors coverages as two nonempty sets of spatial elements, and based upon that, we utilize mathematical methods to describe the spatial dependence degree as follows.

The overlapping area of sensors A and B is , and the sensor coverages of A and B, respectively, are , . Pursuant to the definition we make for spatial dependence (SD), we can measure the SD between A and B as value : Among them denotes the spatial dependence sensor A to B; likewise is the spatial relationship in which B affects A. In this case, as the proportion of the two sensors’ coverage overlapping area is considerable to the sensor coverage, which equals the SD that is high in value, it represents the strong spatial dependence between them.

Accordingly any multisensor attribute node can be defined as follows: = (Time, State, SD), which indicates that at temporal space Time the sensor is at State and the SD () refers to the spatial relationship with another sensor .

3.2. Interaction of Multisensors

According to the definition of sensor attribution, the interaction between two sensors is mainly affected by the State and SD. Therefore, the values as the probability of sensor B relating to sensor A at Time can be computed on the basis of the real-time multisensor node attribute.

Considering the rate of State change and as a two-dimension random variable (, ), when the value of the quantized State approaches the infinity, the interaction probability only limits to (3) and also is a linear relationship. Similarly, as the two spatial element sets entirely match each other as approaches the infinity, only depends on State (4). The expressions can be formulized as follows: Accordingly we can define the interaction probability as a two-dimension distribution function , and , as random parameters and are mutually independent, and thus the probability function is expressed as follows: where and , respectively, denote the marginal function based on and , respectively, and the relevant probability formulas are expressed as follows: Thus, (3) equals And , and . Hence it can be achieved that, at any time, if the between sensors A and B stays constant and the State of sensor B change is , then the probability sensor B related to A is .

3.3. Directed Graph
3.3.1. Graph Evolution Rules

Graph is defined as , where is a vertex set, is an edge set, and is the probabilistic space. Any two sensors connect with probability while sensor relates to sensor which means , . As varies in time, at time the graph evolution rule can be expressed as as Figure 2.

3.3.2. Graph Syntax

Like the graph evolution rule mentioned above, the multisensor network may be diagrammatized as the graph. In the graph, the sensor nodes act as the vertexes, and any two sensors construct an interaction relationship connecting with the independent directed edge, and the edges perform at probability . Since is affected by sensor state and also dynamically varies with time factor, the graph syntax based on multisensor network and the interaction among sensor nodes can be defined as five meshes: where is the initial vertex which is the node trigger of the event, and the initial one is exclusive; the vertex set represents all the sensor nodes involved in the multisensors network; is the quantized interaction between two sensors and is also the probability that two nodes can construct incidence relation; in the random graph indicates the clustering coefficient of the initial node which means the amount of the sensor nodes that the initial node related to; denotes the clustering vertex set starting from node .

4. Multisensor Data Processing Framework

The framework incorporates useful contextual and temporal cues for public security alarm, which constitute an integrated platform of associated sensors and compute infrastructure with capability of delivering valuable real-time information regarding the natural hazards to public security. In this case, each involved sensor acts as an independent local data processing unit, and each output of them represents a set of individual decisions and is reported to the fusion center, which contributes to self-interaction or cooperates with others during the fusion process. There are two parts during this process: the first is local fusion center, which is in charge of fusing the single modality data so as to judge the surrounding change in identity-sensor. The output can utilize the other local fusion center’s capabilities of confirming the result or cooperating with others to achieve the final judgments in the results in the next part, the fusion center. This part is capable of global estimation based on multimodalities data, which finally give a comprehensive review of the environment.

Before the data fusion process, the information of all the sensors requires a unified expression, which means the decision from each other can be obtained between the homogeneous or heterogeneous sensors. In this case, all the sensor data are first quantized and then transferred to the information for decision-making that acquires the semantics from various media sources. With the public monitor system, the quantization of correlation Cohesion between motion elements can express the semantic knowledge underlying the video stream from each video sensor node. In our case, the data of acoustic sensor and odor sensor are fused to detect the same scene from different aspects. Figure 3 depicts the fusion process and the multisensor data flow, and each modality node owns an exclusive judgment standard.

4.1. Audio Sensor

The sensor senses the voice occurring in the detection region and mainly aims to capture the strange noise or irregular sound change and capture the optimal data to deploy these sensors in a decentralized pattern. Meanwhile, feature redundancy for robustness and simultaneous elimination of data redundancy are required during the voice capturing process, in case of data blocking or energy limitation [32].

The audio information performs as temporal variation analog signal, because the diversity of the sounds is to change with the circumstances. For instance, at the same spot, the voice at 10:00 a.m. differs from that at 10:00 p.m., since the first one in the rush hour may contain the noise of busy crowd, while the other is usually a peaceful night and any isolated voice can be easily caught by sensors. Thus, measuring the voice in one region is designed to collect the mean per hour and then compare it with the real-time value , and the unusual case can be excluded if the difference does not exceed a reasonable range . which denotes the mean of the ambient voice 12 per hour. Therefore, the disparity of the two values landing in the range indicates a normal situation, which is formularized as , and otherwise the node would send a warning.

Additionally, the voice also transmits fertile underlying semantics. Thus, recognizing the hidden semantics in the ambient sounds can realize the acquaintance of the scenarios. With this purpose, the processing includes two steps at each sensor node: eliminate the background noise and proceed to speech recognition. In particular, speech denoising as a mature technology has considerable academic results [33, 34] and also is used in diverse realms. In comparison, speech recognition as a raising one is proposed to analyze associated semantics. The speech consists of brain-made phoneme flow parameters and is actually a double stochastic process. Meanwhile, the voice signal is an observable time-varying sequence. The statistical model HMM is founded by the structure of time series that well simulates this process. It is a suitable model for speech recognition by vector quantization (VQ) speech signal and abundantly training the model and ultimately achieves the process of speech recognition.

As we intend to capture those particular words that may cause unwanted results, the process of speed recognition mainly contributes to their extraction from the background sounds. Considering the indeterminacy of the daily language, somehow, the speech semantics analysis may not be a decisive attribute of the event detected but plays a role of coefficient attribute with other features to intensify the judgment.

4.2. Odor Sensor

The odor sensors as environment data monitoring devices are deployed as well as the audio sensors to work as the smoke scope detection and observe the real-time ambient data. Since the odor in specific confines of an environment mostly stays at one level, from the abundant statistic data and research on the unknown environmental change, it can approach exception events information by analyzing normal distribution of sensed periodic output, which assumes that it obeys normal distribution. The probability density function is described as follows: where is the average of the given odor value and σ is standard deviation indicated as , and then the test statistics shows the law of probability distribution of the odor state. The absence of emergencies occupies the most probability, whereas the rare events are the matters that should be paid attention to. If the sensor captures odor sample , the value drifts from the normal event region in the probability distribution, leading to the judgment whether triggers the target event.

4.3. Video Sensor

Video sensors set at different angles for surveillance of one observation region can transfer successive video frames affording video stream data and analyze associated semantic knowledge. Such information instantly supplies the potential for detecting the unsafe issue and with substantive stochastic factors, the whole motion elements can be ascribed to an analyzable model. In this case, Cohesion is adopted to measure the compactness of motion elements [35], as the intense activity directs higher Cohesion that shows the active movement of the crowd.

Motorial entity objects shown in the video monitor system are described as video motion element (VME), such as pedestrians appearing in the observation regions. They usually consist of relevant states (Appear, Disappear, Stop, and Move), spatial relationships (Measure, Direction, and Topology), visual features (Color, Texture, Shape, and Size), and behavior attributes (Location and Velocity Vector) in a series. In particular, Appear, Disappear, Stop, and Move represent the four basic states of a motion element. We observe the relation that correlative effects on the interactive objects acquire this interaction as the clue to express the intrinsically link among these elements and finally estimate the possibility of changing the motion states.

In this case, we use the vertex set to describe motion entities and define the independent edge as the relation of any two vertexes connected by probability . Hence, the interactive spatial relationship in the motor process that dynamically changes with time can be approached in a united representation model in the particular region. The vertex set denotes all the objects in the region; and connect with each other via . To bring in these correlative coefficients, relevant information of these motion elements in the observation region can be extracted from the video stream and then specified as some particular description. The two objects (shown as a vertex) and indicate their dynamic topology relationship as in Figure 4, and represents the interaction probability, and the value relates to the compactness between the two objects controlled by the distance and other behavior attributes. We define the initial value as distance and in the improbability space . The contained by and varies with Time.

The intensification of linking between motion elements shows the activity and disorder degree of the elements in the frames and is indicated by Cohesion . Let and , and then the Cohesion of can be expressed as The value of Cohesion ranges from 0 to 1.

5. Simulations

In this section, we will propose a concrete implementation scheme based on the conceptual model and analyze the simulation results from the hypothetical scenarios. We assume that all targets within the node sensing ranges are sampled at an equal rate, independent of their distance to sensor nodes. The progress frame diagram is indicated in Figure 5.

The scenarios are based on the assumption that there is a known scene deployed with ordered multisensors including audio, odor, and video sensors, and among them, the sensors detect target events constituting a set , . In scenarios, we map the range of the sensor coverage onto a two-dimensional coordinate plane, and thus the coverage of sensors can be regarded as the closed bounded points set. Meanwhile, we consider the sampling period of each sensor as , and consequently can be expressed as the average rate of State change in period time , which is calculated as

5.1. Simulation

In this simulation, we assume that the effect coverages of the audio, odor, and video sensors can be diagramed as circles with radius of 4, 3, and 5, respectively, and deploy the parameters of in the coordinate plane listed in Table 1.

Known as Definition 2, we can approach the spatial dependence between any two sensors from their pixel proportion and overlapping proportion based on (1); accordingly, we can obtain the spatial dependence measuring value of entire listed as the matrix : The in the matrix equals value of , also the entire spatial relationship among sensors can be diagramed in general as in Figure 6.

In this situation, we assume the data of the sensors are subject to normal distribution and the simulation can correspond to the status of the sensors change, which we define as the expectation and variance of each kind of sensors based on actual circumstances as in Table 2.

And particularly as the example shown in Figure 7, on the top of the fig pictures the simulation results which denote the Decibel value detected by audio sensors in a certain period time (equal to 200 ms, and in which ms, ) besides the outgo discrete points on the bottom half show the average rate values of its State change in every sampling period.

Based on the measurable parameters and , the established incidence relation between two sensors probability at time can be approached by (6), and Figure 8 shows this kind of incidence relation at a certain time point.

Each continuous line in Figure 8 denotes the connection probability of one sensor relating to the others (also including itself). According to the graph evolution rule defined before, this mutual incidence relation can be expressed as graph (Figure 9(a)). On the other side, we can easily estimate the average cluster coefficient (ACC) of a network node set, and at time , it is equal to 0.1625, and the clustering coefficient of each sensor is demonstrated in Figure 9(b).

5.2. Scenario
5.2.1. Scenario 1

In this scenario we create a target event E that occurs in the spheres of the sensors, whose coordinate locates at (7, 17) in the mapping two-dimension coordinate. Meanwhile the effect coverage of the event varies over time which expands then gradually narrows down, apparently as it invasions the spheres of sensors that will change the state parameters of these sensors, and the process works like what Figure 10 shows.

Hence we define the event E radius of effect coverage denotes the initial effect radius and means the effecting time as event E works. Accordingly the expectation of sensors varies with and can be expressed as where is the initial expectation and the event E will affect the sensors , , and during its affecting time, and Figure 11 shows the incidence relation among 13 sensors at the sampling time point of this period. Also Figure 12 refers to the clustering coefficient of each sensor at the sampling time , and the corresponding ACC of the sensor node set () and the clustering affected nodes set () are listed in Table 3.

5.2.2. Scenario 2

This scenario simulates the target event as an effective point moves across the spheres of sensors with a constant velocity; the movement shows as in Figure 13.

Equally, as we defined the event , it works on the expectation of the affected sensor . The effective point will move across the sensors , and , successively, and the working status of these correlative sensors will change correspondingly as the event works. Thus, the value of floats during the period, which can be expressed as function (14), and the period from to is when the event works on the sensor. Consider During this process, the incidence relation of the sensor network can be demonstrated as in Figure 14 at the sampling time , and meanwhile at each time point in this period, we can approach the clustering coefficient of each sensor and the results are showed in Figure 15.

With the method we proposed, the acquired clustering vertex set () and the whole sensor network set () can be computed with the corresponding ACC. Table 4 demonstrates the results that the ACC index has a particular enhancement compared with the original complete graph .

5.3. Analysis

The results shown in the figures and the recoded data demonstrate that the incidence relation index and directed graph parameters significantly change during the event acting period. The interactive relationship among the sensors reveals diversities that compared to the general case depicted by Figure 8, especially in the affected sensor node set. ACC generally increases from 0.1625 to about 0.18, which points to the intense interaction influence among sensors. However, in the clustering affected node set , the ACC (around 0.22) shows the more intense interaction compared to the whole sensor network node set , which also means the method excluded the excess redundancy nodes and obtained maximum correlation information.

The authors in [36] highlight the spatial correlation of sensor nodes and measure the correlation between a sensor and its neighboring sensor nodes to approach the data aggregation. The usage of topology is to diagram the distribution of sensors, and thus in this graph, we can acquire the correlative graph parameters recorded as and compare with the indexes acquired by our method from scenarios 1 and 2 (shown in Figure 16). Due to the exclusion of real-time sensor state, the spatial correlation remains unchanged; we can deduce that the involved sensor state would develop the affirmative function to better analyze the interaction of multiple sensors.

In Figure 16, the and lines denote the ACC index of the whole sensor network in the general condition and in the event acting condition, respectively. The figure indicates that the indexes are resemblant except in the effective process of the event; the line clearly shows that the index of the clustering affected nodes increases compared to the index of the whole network, which means the nodes we approach are in a close correlation. Besides, all the indexes indicate the real-time change of the interactive relation among sensors except and also show certain growth compared with this constant.

Thus, based on the incidence relation, it makes it feasible to approach the optimal associated sensors which are derived by the events. Moreover, the spatial range of sensors enhances the dependence on sensed data, which raises the feasibility value of the final fusion process.

6. Conclusions and Future Work

We introduced a new sensor data fusion method of a heterogeneous multimedia sensor network, explored the associated frameworks and algorithms for incorporating useful contextual and temporal cues for public security alarm, and thereby analyzed the correlations among the target motion elements in the multimedia sensor stream. We investigate a multimedia intelligent processing method based on spatial relations of heterogeneous sensors, which is able to achieve scalable recognition. We also develop a dynamic structure of multimedia sensor representation and thereby realize the fast generation of multimedia stream, highly heterogeneous networks, and complicated alarm needs in public safety.

We further propose establishing a general framework for incorporating useful contextual and temporal cues for public security alarm and thereby analyze the correlations of the target motion elements in the multimedia sensor stream. The graph-based framework will serve as a useful tool for designing multimedia in-network processing schemes in MSNs. Simulation results verify the analysis of the proposed techniques.

This new multisensor data fusion scheme will inspire a number of interesting topics in this field for future research. For instance, fuzzy set theory is widely recognized as a critical issue for public security with multisensor fusion, but in the present paper, we have a light touch on this. In the future, we aim to advance implemented fusion algorithms by integrating alternative combination rules. We also attempt to deal with public events of alarm that are conducted to be optimized to multimedia sensed data and set appropriate fusion parameters as alarm thresholds, fusion weights, and others. Also, further validation through real data is of extreme importance and will also be conducted in future work validation.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The work is supported by the National Nature Science Foundation of China (nos. 41201378 and 41101432), the Science Foundation of China for Postdoctors (no. 2014M561212), and the Natural Science Foundation Project of Chongqing CSTC (no. 2011jjA30014).