An Intuitionistic Calculus to Complex Abnormal Event Recognition on Data Streams

Lijun, Zhao; Guiqiu, Hu; Qingsheng, Li; Guanhua, Ding

doi:https://doi.org/10.1155/2021/3573753

Security and Communication Networks

On this page

Abstract Introduction Literature Review Materials and Methods Discussion and Conclusions Data Availability Conflicts of Interest References Copyright Related Articles

Special Issue

Advances in Cyber Threat Intelligence

View this Special Issue

Research Article | Open Access

Volume 2021 | Article ID 3573753 | https://doi.org/10.1155/2021/3573753

An Intuitionistic Calculus to Complex Abnormal Event Recognition on Data Streams

Zhao Lijun,¹Hu Guiqiu,²Li Qingsheng,³and Ding Guanhua⁴

Academic Editor: Konstantinos Demertzis

Received12 Oct 2021

Revised23 Oct 2021

Accepted26 Oct 2021

Published09 Nov 2021

Abstract

Data mining in real-time data streams is associated with multiple types of uncertainty, which often leads the respective categorizers to make erroneous predictions related to the presence or absence of complex events. But recognizing complex abnormal events, even those that occur in extremely rare cases, offers significant support to decision-making systems. Therefore, there is a need for robust recognition mechanisms that will be able to predict or recognize when an abnormal event occurs or will occur on a data stream. Considering this need, this paper presents an Intuitionistic Tumbling Windows event calculus (ITWec) methodology. It is an innovative data analysis system that combines for the first time in the literature a set of multiple systems for Complex Abnormal Event Recognition (CAER). In the proposed system, the probabilities of the existence of a high-level complex abnormal event for each period are initially calculated nonparametrically, based on the probabilities of the low-level events associated with it. Because cumulative results are sought in consecutive, nonoverlapping sections of the data stream, the method uses the clearly defined rules of initialization and termination of the tumbling windows method, where there is an explicit determination of the time interval within which several blocks of a particular stream are investigated window. Finally, the number of maximum probable intervals in which an event is likely to occur based on a certain probability threshold is calculated, based on a parametric representation of intuitively fuzzy sets.

1. Introduction

A data stream is an ordered sequence of data, which is obtained with some temporal behavior [1]. Unlike data received from static databases, data streams are continuous and unlimited, are usually received at high speeds, and are characterized by a time-varying distribution of data. A typical example of mechanisms that create continuous data flows is sensor networks, where they produce continuous, unlimited, and high-speed data [2]. This data cannot be stored in its entirety, so it must be processed in real time and therefore the rescanning process is not possible when an update occurs. Therefore, in the discovery of knowledge from sensor data streams, it is necessary to scan the data and to use the available computing resources correctly and compactly. Also, it is necessary to properly adapt to the changing data distribution; otherwise, there is a possibility of the problem of shifting concepts occurring [3]. In addition, the speed of the knowledge discovery process must be faster than the data arrival speed, and the results must be based on the results of previous times, as otherwise data approximation methods such as sampling and load shedding must be applied, methods which lead to a reduction in the accuracy of results [4].

Accordingly, smart models for detecting events in data streams from sensor networks [5] must support real-time distributed detection and be able to use techniques such as dimensional reduction, adaptive interaction, and exploitation of spatiotemporal correlation between data [6]. These features ensure that there is no loss of anomalies that occur in small percentages, the efficiency of the system is normalized, and the efficiency of the algorithm is increased accordingly.

In a more thorough analysis, the process of detecting events and generating event flows on an existing set of multisensor flows initially involves real-time observation with a single frequency of multiple time variables of system quantitative performance parameters [7]. A sensor flow, which consists of numerical sensor values, is denoted by s_i and (t) denotes the flow value s_i in time t, where [0, +∞) holds. Assuming that n sensor flows are synchronized to report their values periodically, the set of multivariate frame information is represented at each time point t with a frame vector ΔΠ_t = (s₁(t), s₂(t), …, s_n(t)) Rⁿ. Virtually every sensor stream forms a one-dimensional time series, while the frame vector flow represents a multivariate time series [8]. There are many problems in the field of science, which require the sequential detection of a change or an event in a process. In its simplest form, an attempt is made to detect a change in the mean of a sequence, where the change is either abrupt or gradual.

A data stream consists of a potentially infinite sequence of blocks of data. Flow data has two characteristics, which are a challenge in their processing, the high arrival rate, and the possibility of unpredictable behavior. Detection of events on sensor flows aims to determine the values (t), which are abrupt changes within a framework vector flow. Each frame vector is converted to a binary vector of the same length, with each value representing a possible change in the corresponding sensor flow [1, 2, 9]. Such deviations from normal behavior are called events and binary vectors are called event vectors. An event may be an observation that does not conform to an expected pattern in the dataset. Incidents can be caused by a variety of reasons, such as sensor failure or malfunction, deviation values, or significant changes that may affect system behavior.

Therefore, an event vector in time t is represented by ΔΣ_t = () {0, 1} n, where = e_i(t) is the binary value which represents whether an abnormal behavior in the flow occurred, which is represented by a value equal to one, at time t or value s_i(t) included in the expected range of values [10]. Converting a frame vector to an event vector is based on changing detection algorithms, which aims to detect abnormal deviations in current values from the values obtained in previous steps.

Variation detection algorithms can be classified into two categories, single variable variation detection and multivariate variation detection. Algorithms that belong to the variable detection class of a variable consider each sensor flow separately and detect possible anomalies through a sequential time series analysis. Algorithms belonging to the multivariate variable detection category utilize self-oscillating multivariate models to represent each frame vector as a linear sum of the previous behaviors. Next, the goal of obtaining a binary value that indicates the change or no change for a particular variable, that is, a sensor flow, is reduced to a threshold control function between the future estimated vector and the actual vector [11–13].

Variation detection methods consider the time series of the measurement values and search for time points at which the statistical properties of the measurements change abruptly. The word is abruptly concretized as “immediately or at least very quickly if we take into account the sampling period of the measurements.” The monitored statistical properties are considered to show no or very small deviation in the times when no change is observed. Considering the above conditions, even small changes can be detected with a high probability [14, 15]. The chance of detection may be even greater if these changes are persistent for some time.

The methods of detecting changes in most cases work without any assumption that the monitored variables are described by a specific distribution. In other words, methods for detecting changes are usually nonparametric [16, 17]. Another feature of change detection methods is the detection of changes in a very short time or even immediately. Also, the magnitude information of a change in most cases is not something measurable or necessary.

The design of abrupt detection procedures consists of two major subprocesses. The first subprocess is optional and involves processing the initial data so that the final values of the sample set do not deviate too much from an initial value, from metrics such as mean and deviation, when no change is observed. The initial value may be zero or some other suitable value. In this subprocess, the final values of the sample set deviate significantly from the reference value when any change is observed. The second process involves the development of algorithms that belong to the category of statistical methods [12, 13, 18]. These algorithms must be capable of detecting abrupt changes in the sample set and the exact time at which they occurred.

An instantaneous indication of activity can lead to incorrect recognition due to the unreliability of the sensors or the inaccuracy of the recognition patterns as well as several external factors that can introduce noise into the data. Referring to the process of surveillance and recognition, such cases of misidentification of events can cause unjustified delays and slowing down procedures. Therefore, there is a need for a stronger recognition which, according to a certain probability threshold [19], can calculate all the maximum probable intervals within which activity is likely to occur [20, 21].

Considering the specific need, this paper presents the ITWec methodology. Initially, the probabilities of a high-level event at any given time are calculated nonparametrically, given the attached probabilities of low-level event activities. Because aggregate results are required in consecutive, nonoverlapping sections of the data stream, the recognition is based on clearly defined rules of initialization and termination of the tumbling windows method, where there is an explicit determination of the time interval within which multiple streams are investigated. Finally, they are calculated based on a certain probability threshold, the number of maximum probable intervals within which an event is likely to occur, based on a parametric representation of intuitively fuzzy sets as a measure of probability [22]. It is a universal mechanism that can be used for solving a large selection of various real-world problems. This method will provide a distinct tool in events critical management.

2. Literature Review

The concept of Complex Event Recognition [23–26] has been approached with various methods from the research community. Especially with the fast spread of information on different fields of modern activity like Social Networks in the form of text data streams, researchers are investigating the extraction of valuable information about real-world events.

Skarlatidis et al. [9] in 2013 created a probabilistic logic-based system for event recognition by combining the Event Calculus with Markov Logic Networks [27]. Their approach inherited the Event Calculus’ do-main-independent properties and allowed for probabilistic recognition of Composite Events with incomplete definitions. To avoid the combinatorial explosion induced by the expressivity of the logical formalism, they also transformed the entire knowledge base into compact Markov networks. Finally, they put their strategy to the test in a real-world human activity recognition task.

Fedoryszak et al. [2] aimed to address the challenge of event detection in social media networks by providing a real-time, modular system for identifying events. They used clustering on a big stream with millions of entities per minute to generate a dynamically updated collection of events. They put their method to the test using an evaluation dataset taken from a snapshot of the whole Twitter Firehose, and they offered metrics for assessing clustering quality. Finally, they attempted to illustrate a high-profile Twitter event to demonstrate the value of modeling the progression of events, particularly those recognized through social data streams.

Al-Dyani et al. [1] investigated on event detection models using text data from a variety of social media platforms. Their research was centered on domain type, detecting methods, and task type. In order to accomplish their goal of providing a comprehensive assessment of current developments in the event detection field, they also addressed the most significant open issues faced by researchers in constructing similar models. They examined and studied similar works in the subject of event detection in order to help scholars identify gaps in the literature.

Elsaleh et al. [5] proposed Internet of Things- (IoT-) Stream, a lightweight architecture for semantically annotating streams based on semantic knowledge exchange. They presented a system architecture to demonstrate the semantic model’s adoption and provide instances of system instantiation for various use cases, easing the development of IoT applications that deal with stream sensory input. The system design is built on web services, microservices, and middleware, which are all standard IoT architectures. The semantic annotations that occur in the pipeline of IoT services and sensory data analytics are part of their system approach.

Katzouris et al. [14] demonstrated an Answer Set Programming- (ASP-) based system capable of probabilistic reasoning with complicated event patterns in the form of weighted rules in the Event Calculus, the structure and weights of which are learned online. Their approach combines online structure and weight learning techniques with temporal reasoning under uncertainty via probabilistic logical inference. On Complex Event Recognition datasets for activity recognition, marine surveillance, and fleet management, they compared their implementation to a Markov Logic-based one and other state-of-the-art batches learning techniques. The results were satisfactory in terms of both efficiency and predictive performance.

From the above literature, we conclude that Complex Event Recognition is an extremely important concept that is applicable in a vast number of applications: text, video, activity recognition, maritime surveillance, or fleet management. The proposed system is an innovative data analysis system that combines for the first time in the literature a set of multiple systems for CAER.

3. Materials and Methods

The proposed ITWec methodology concerns CAER in data flows. Typically, a data stream is considered to be a sequence of elements x₁, x₂, …, x_N,… that are viewed in real time in ascending order, where N is the number of elements that have been displayed so far. In the proposed methodology, event recognition refers to the temporal comparison of patterns in data derived from different types of sensors [28]. Multiple sources provide spatiotemporal data that can be used to identify different types of activity. The activities and time series of data flow analysis proposed by ITWec make it imperative to determine an appropriate type of windows, with the main goal of limiting the flow elements to be examined, unblocking the performance of point analyses, but also the significant savings of system resources. The logic of this requirement concerns the fact that a window extracts from the vast data stream a potentially variable but finite number of elements, that is, those parts of the stream that will then be used in the evaluation of the analysis [29].

Additionally, as new elements arrive in the processing system, the contents of the window change dynamically in the way its type specifies. Existing prediction methods use fixed-size observation windows which cannot produce accurate results because of not being adaptively adjusted to capture local trends in the most recent data. Therefore, those methods train on large fixed sliding windows using an irrelevant large number of observations yielding to inaccurate estimations or fall for inaccuracy due to degradation of estimations with short windows on quick-changing trends. In this paper, we propose that the analysis for CAER is calculated based on tumbling windows on a set of updated blocks, so the system can provide up-to-date answers continuously to capture the trend for the latest resource utilization and then build an estimation model for each trend period [30].

Specifically, on the W_E data stream a window with coupling condition E that is applied at time τ₀ Τ to the data stream elements S, that is, to the current contents of S(τ₀); then [31, 32]for something as big as it can become, but always finite n ℕ. Based on the above, it is concluded that at any given time a solid finite subset of sets W_E(S(τ_i)) ⊂ S(τ_i) is obtained. Also, each window addresses the innumerable elements of a single data stream and practically transforms it into a temporary finite-size relation. If an analysis concerns multiple streams (e.g., connection), then a separate window is usually declared for each, even if they have similar semantics (e.g., the data of each stream in the last half hour). Logical windows require an explicit determination of the time interval within which it will be investigated which blocks of the stream are the same. This requirement is greatly simplified if the concept of the scope of each window is defined as a mapping from the field of time landmarks to the field of spaces [33, 34]:

Essentially, for each time instance, the range function returns the time limits (edges) of the window, considering the parameters that define the type of window. To implement aggregate results in consecutive, nonoverlapping parts of the data stream, and because recognition requires certain window initialization and termination rules, ITWec uses tumbling windows, where there is an explicit definition of the time interval within which the streams are identified. Specifically, if τ₀ T is the time of submission of the analysis, then the range of tumbling windows with width ω and step δ for each τ T (with τ ≥ τ₀) extends [35]:where the values τ₀, τ T are expressed in time landmarks and ω, δ ℕ in a range of time intervals (ω, δ > 0). For the sake of simplicity, in the proposed method, all time quantities are expressed as natural numbers, whereupon the calculation of the function is performed at discrete times of T, whereupon the window multiples result from the relation [36]

An additional innovative feature that greatly simplifies the process is that step is of the same size as the unit of time, so that the progress of the window is perfectly in line with the corresponding time. So, for , the contents of two consecutive snapshots of the rolling window overlap [37].

Respectively, for the contents that remain unchanged, the methodology predicts that the function will be applied again to the next pulse, after time points, which is expressed by the retrospective expression of the function where the window edges change only at the time points which specifies the step . In addition, the methodology provides for the possibility of initial “missing” windows immediately after the submission of an analysis process, when the range exceeds the period of the current contents of the stream [38].

Finally, in ITWec, the range function is monotonous (since time evolution implies homologous interval generation) and therefore can be defined even for future moments, and all future current elements are covered, regardless of when and if they eventually appear. So, when the time step is considered arbitrary and not unique, the contents of the stream will be returned in waves and, therefore since jump is equal to width of the window, after calculating the range scopes (τ, ω, ω), the window blocks for each period are calculated as follows [33, 37, 38]:

Once the data flow analysis method has been identified, the proposed event detection methodology distinguishes between high-level events and low-level events that are associated with a CAER. Specifically, in ITWec, input data are low-level activities or events, which indicate the output of recognition, which is a set of high-level activities or events, and which are temporal combinations of low-level data. When a rule consisting of a set of time constraints for low-level data is met, a high-level activity is recognized by the recognition system. In the proposed system, the probabilities of the existence of a high-level event for each time moment are initially calculated nonparametrically, based on the probabilities of the low-level events associated with it.

However, this creates uncertainty in the identification system which is inherent in the precise identification of activity or events. For example, low-level activities typically detected by primary data processing tools are often attached to those probabilities that act as confidence estimates. For example, a high-level activity expressed as a binary event is defined based on a set of low-level activities expressed as instantaneous events. Low-level activities are mutually exclusive in the sense that at any given time only one can be valid, and they are the input to the recognition system. The calculation of the instantaneous probabilities of the predicate (F = V; T), that is, the probability that F = V is true at time T, indicates that event f, which may not be strictly true or false, has a probability p to occur in all its valuations which represent independent random variables [39, 40]. The rule, which is defined as the coupling of k such events, has a probability equal to the product of the probabilities of these events.

In addition, the probability of accusations occurring often is assessed as the probability of divorce of these rules. Therefore, given the independence of each possible event, the probability of each event L in the proposed system is equal to [9, 14, 41, 42]

The probability of an event is equal to the probability of the splitting of its initials before time T if the event has not broken in the intervening time. Based on the above, it is a logical consequence of an event that evolves that a repeated update of the validity of the event means that it has an increased probability of occurring at the time of the examination. In addition, if the event is broken with a probability of p₁, then its probability is equal to the probability of splitting the initializations and . Therefore, the higher the probability p₁ is, the more important is the reduction of the probability of the event, and of course the result of the above is that successive terminations further reduce its probability.

Practically in ITWec event analysis, the focus is on calculating the probability of the instantaneous failure rate [39, 40]:where for a distinct random variable we have

The instantaneous failure rate indicates the instantaneous probability of an event occurring at time t where in this case discrete random variables [43, 44] in ITWec are calculated as follows:

Respectively, the cumulative failure rate function for a section of the flow is calculated from the following relation:

Given that the relationship between the instantaneous failure rate function and the cumulative function is calculated as follows [39, 40]:

To estimate the above function since the distribution of the current events in the flow is unknown, ITWec uses a nonparametric estimation method through the following function:where is the number of events at a time point and is the instantaneous failure rate at time . The corresponding confidence interval is calculated from the function [45–47]

To calculate the number of maximum probable intervals within which an event is likely to occur requires first defining the method of defining a threshold. The proposed model uses a Cumulative Sum Algorithm (CuSum) [48, 49] which is because the magnitude has a negative price trend under normal conditions and a positive price trend after a change. The decision function compares the increase of from its minimum value with a threshold k so that [50, 51]

This detects an event that describes a change if function exceeds the threshold value k. In this case, if the algorithm continues in subsequent times, it algorithm restarts with a value of zero in function . The CuSum used in ITWec operates based on hypothesis control theory, so that repetitive behavior follows a sequential probability ratio check, in which each decision considers as many successive past observations as necessary to accept the case. Otherwise, if a condition is accepted, a change detection signal is signaled, and the algorithm stops. The threshold value k provides a balance between the mean detection delay time and the mean time between false detections. The change detection functions used to detect positive and negative deviations are defined as follows [48, 49]:

Typical values are and (where is the standard deviation of ) [48, 50].

Accordingly, ITWec uses a maximum likelihood estimator to calculate the number of maximum probable intervals within which an event is likely to occur. So, maximizing the L(θ) function is required. In the case we examine, we have a k-dimensional distribution with and being unknown, so the parameter θ becomes ; that is, we have a vector and an array. Assume that we have a sample of size n from a multivariate distribution; that is, we assume that and are independent. Then, the probability of the sample is given by the relation [9, 15, 20]and so

By calculating the logarithm, we have

Also, we have [14, 21]where ; i is a table of dimensions pxp and therefore, maximizing the probability for , we calculate the quantitywhich is any negative number as the negative of a square form and therefore for the function is maximized [13, 29, 52].

But because probability expresses the randomness that comes from the lack of knowledge about the result of the experiment which as nondeterministic uncertainty is because the events that describe the states of the sensors described through the data stream are incompletely defined and therefore partially determined, the proposed model calculates the number of maximum probable intervals within which an event is likely to occur based on a parametric representation of intuitively fuzzy sets and specifically based on the entropy of intuitively fuzzy events [53]. Specifically, for an intuitive fuzzy set, a pair of operators, the necessity operator and the possibility operator, are defined, respectively, as [54, 55]

Considering the minimum and maximum probability of an intuitive fuzzy event A about a probability distribution P, they can be interpreted, respectively, as the probabilities of fuzzy events and , concerning the same probability distribution P, as follows [53, 54, 56]:

So, the measures of entropy of an intuitive fuzzy event, which also correspond to the entropy of marginally fuzzy events, are [30, 56]

So, to calculate the entropy of an ambiguous event in a finite field X for a probability distribution P = {p(x₁), …, p(x_n)}, the following entropies are described [54]:

However, the entropies and correspond to the minimum and maximum probabilities [13, 15, 21], so the proposed ITWec calculates the number of maximum probable intervals within which an event is likely to occur based on an intuitive representation of fuzzy sets allowing the evaluation of data flow elements both as a member and for their noninclusion in a fuzzy set [54], which gives particular realism to the way of implementing the proposed method.

4. Experiments

The evaluation of the proposed ITWec method was performed using three different versions of a dataset which includes 15 observation videos of a mechanical system. In each video, the intervals in which each low-level and high-level activity takes place are manually noted. Identification system input data are low-level activities attached to the corresponding time points, for example, the video frame in which the activity takes place. In addition, the dataset includes the coordinates of the cameras at each time point, as well as their orientation. Given the above input, the purpose of the system is to identify high-level activities such as anomaly detection. Figure 1 is a depiction of a random video frame of the dataset used in this paper.

The three versions of the dataset used include three different noise levels, which were generated for in-depth evaluation of the method. Specifically, in the first version of the dataset—smooth noise—a subset of low-level activities is attached to probabilities generated by a gamma distribution with a variable mean value.

The rest of the low-level activities are presented as in the original dataset with no probability attached. In the second version—intermediate noise—probabilities are added to the corresponding coordinate and orientation categories using the same gamma distribution. Finally, in the third version—loud noise—untrue low-level activities were added at random times resulting from a normal distribution.

Figure 2 is a depiction of the three levels of noise included in the dataset.

In the experiments, this data is given as input to calculate the instantaneous probabilities for each high-level activity to be examined. Next, we use ITWec to calculate reliable maximum intervals for each high-level activity. In the following analysis, the prediction accuracy of the method is calculated, after the output is filtered, and only high-level activities with a probability greater than a given threshold are maintained. We repeat the experiments 5 times for each value of the mean value of the gamma distribution in a range of [0.5, 8.0] with step 0.15. The higher the average value, the lower the probabilities attached to the input events of the set and the higher the probabilities of untrue events, indicating a higher noise level. All experiments are conducted on the Google Colab-GPU environment. A time series of events is presented in Figure 3 below.

The probable recognition of the events in the dataset used is presented diagrammatically in the diagram below.

The blue diagrams represent the probability distribution of a high-level event as calculated by the proposed methodology. The horizontal bars indicate the maximum intervals as obtained by ITWec for a probability threshold of 0.7 (green line), the maximum probability interval with the highest reliability as calculated for the same threshold (red line), and the benchmark line of the activity (blue line).

Figure 4 shows some common cases from the experimental process. The bottom-left diagram of the figure shows a series of initializations, which contribute to the continuous increase of the probability of high-level activity, while then a series of terminations lead to the gradual reduction of the probability. In the upper-left image, a strong termination of the activity is caused which dramatically reduces its probability from 0 : 8 to 0. In the lower and right diagrams of the image, the presented high-level events are subject to inertia between initializations and terminations. Thus, in the absence of initialization and termination, the probability of high-level activity remains constant for the period under consideration.

In conclusion and based on the threshold that is dynamically calculated for each data stream (schematic representation in Figure 5), the intervals calculated by the methodology are hyperintervals of the intervals calculated by the probability distribution.

Also, a typical report from the probability calculation process is presented in figure 6.

When the increase or decrease in probability is not abrupt, something which occurs in cases where there are continuous small indications that an activity has started or ended accordingly, the extra time moments that include these intervals have relatively high probability. However, if we have a sharp increase or decrease in probability, which means that there is a strong momentary indication of the initiation or termination of activity, the intervals may include times when the activity may be small or even zero. In these cases, adding only a time moment of low probability may not drop the probability of space below the given probability threshold. In most cases, where the increase or decrease is not abrupt, the intervals can be approached by lowering the probability threshold. On the other hand, however, a lowering of the threshold can lead to several false positives, as in the case where the probability of a high-level activity exceeds the threshold momentarily, due to some noise-influenced observations.

Regarding the termination of activities and its relationship with the benchmark line, there is no specific relationship. In some cases, the benchmark line intervals end after a series of terminations, while in other cases they end with the very first termination. This observation is related to the inherent noise in the dataset, with the result that the constructed definitions for high-level activities may not fit perfectly with the benchmark line. Since the methodology is built on the dynamically calculated probability distribution, which in turn is based on the definitions of high-level events, the methodology inherits the discrepancies with the benchmark line.

In general, however, the finding is that the proposed system can calculate a single maximum period, overcoming the effect of noise that occasionally reduces the likelihood of detecting high-level events. In cases where the system is directly affected by the loud noise, thus creating a series of false negatives between the two maximum intervals, we could significantly reduce the probability threshold, resulting in many false positives in other cases. This finding reflects one of the main issues that are generally a research issue in the recognition of activity. Figure 7 summarizes the experimental results, showing the F1-score values for high-level event recognition cases under intermediate noise, which is also the most representative case for real cases. In this figure, manual application and configuration of the threshold are made, to conclude the exact mode of operation of the proposed model. The blue charts correspond to a probability threshold of 0.6, the yellow to 0.7, the green to 0.8, and the red to a threshold of 0.9.

In contrast, Figure 8 shows the F1-score values for a representative case of high-level event recognition under intermediate noise with a dynamically defined threshold by the proposed system. The case of the yellow diagram was violently interrupted by a withdrawal, so although it is included in the diagram it is considered as nonoccurring.

Finally, a graphical representation of how the and entropies are calculated by the proposed ITWec, which correspond to the minimum and maximum probabilities, based on the representation of intuitive fuzzy sets, is presented in Figure 9.

In this way, the proposed system calculates the number of maximum probable intervals within which an event is likely to occur allowing the evaluation of the data elements both as a member and for their noninclusion in an ambiguous set, which gives a special realism in the implementation of the proposed method.

5. Discussion and Conclusions

Real-time detection and evaluation of spatiotemporal events from sensor data streams focus on event detection, correlation and causation, time prediction, system prediction, and adaptive data filtering. The speed of the knowledge discovery process must be faster than the data arrival speed; otherwise, data approximation methods such as sampling and load shedding must be applied, methods that lead to a reduction in the accuracy of results. Also, the incremental nature of the results imposes the interdependence with results of previous times, always considering the adaptation of the method to the available memory resources and computing power. Bad video quality is a reality for too many surveillance systems. In addition, video compression algorithms result in a reduction of image quality, because of their lossy approach to reduce the required bandwidth. In these cases, event recognition is a major problem. But it is possible to improve the video quality, without changing the compression pipeline, through postprocessing that eliminates the visual artifacts created by the compression algorithms.

Given the need for realistic and accurate data detection contract systems, this paper presents an innovative and highly realistic methodology that combines for the first time a set of multiple intelligent elements in an integrated framework. It is a CAER in which the number of maximum possible intervals within which an event is likely to occur is calculated based on a parametric evaluation that uses intuitively fuzzy sets [55].

An important advantage of the method, which has been demonstrated experimentally, is that the mean, deviation, and distribution functions are expressed as the sum of independent and uniformly distributed random variables. It also has the advantage of considering its history under investigation and can detect model failure more quickly when the forecast error is relatively small.

Dynamic threshold determination based on an advanced form of CuSum instantly integrates all the information into the sample sequence of the accumulated sums of the deviations of the sample values from the center axis value, creating realistic treatment conditions that can identify events constructed both for individual observations and for the averages of the logical subsets of the flow sample set. Respectively, the window control structure proposed and based on the tumbling windows methodology manages to smooth the way of data flow analysis, providing a safe and fully functional way of analysis of the data that arrive at fluctuating, time-varying rates, even when the size is not limited and not known from the beginning. Also, a key competitive advantage is that the proposed model introduces a small run-time overhead, which the GPU minimizes by inlining some of the function calls that need in the real-time event detection methodology.

Significant improvements in the evolution of the proposed system mainly concern the optimization in the process of how to implement the dynamic threshold, which is sensitive to withdrawals during stream analysis. Also, building hybrid models from other potential input sources like sound or activity recognition is a future research avenue. In addition, a significant improvement concerns the way the system is investigated with variational inference methodologies to provide a detailed approach to the subsequent probability of unobserved variables, to apply a statistical conclusion for these variables. Finally, it would be important to study the expansion of this system by implementing transfer learning and especially if and how our system can recognize more complex events.

Data Availability

Data are available upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

W. Z. Al-Dyani, F. K. Ahmad, and S. S. Kamaruddin, “A survey on event detection models for text data streams,” Journal of Computer Science, vol. 16, no. 7, pp. 916–935, 2020.
View at: Publisher Site | Google Scholar
M. Fedoryszak, B. Frederick, V. Rajaram, and C. Zhong, “Real-time event detection on social data streams,” in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2774–2782, Anchorage, Alaska, July 2019.
View at: Publisher Site | Google Scholar
I. Balabanova, S. Kostadinova, V. Markova, and G. Georgiev, “Analysis and categorization of traffic streams by artificial intelligence,” in Proceedings of the 2019 International Conference On Biomedical Innovations And Applications (BIA), pp. 1–5, Varna, Bulgaria, November 2019.
View at: Publisher Site | Google Scholar
P. Sobhani and H. Beigy, “New drift detection method for data streams,” in Proceedings of the International Conference on Adaptive And Intelligent Systems, pp. 88–97, Klagenfurt, Austria, September 2011.
View at: Publisher Site | Google Scholar
T. Elsaleh, S. Enshaeifar, R. Rezvani, S. T. Acton, V. Janeiko, and M. E. Bermudez, “IoT-stream: a lightweight ontology for Internet of Things data streams and its use with data analytics and event detection services,” Sensors, vol. 20, no. 4, p. 953, 2020.
View at: Publisher Site | Google Scholar
D. H. Kim, W. J. Baddar, J. Jang, and Y. M. Ro, “Multi-objective based spatio-temporal feature representation learning robust to expression intensity variations for facial expression recognition,” IEEE Transactions on Affective Computing, vol. 10, no. 2, pp. 223–236, 2019.
View at: Publisher Site | Google Scholar
Y. Chen, Z. Lin, X. Zhao, G. Wang, and Y. Gu, “Deep learning-based classification of hyperspectral data,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 7, no. 6, pp. 2094–2107, 2014.
View at: Publisher Site | Google Scholar
W. S. Hwang, J. H. Yun, J. Kim, and H. C. Kim, “Time-series aware precision and recall for anomaly detection,” in Proceedings of the 28th ACM International Conference On Information And Knowledge Management, pp. 2241–2244, New York, NY, USA, November 2019.
View at: Publisher Site | Google Scholar
A. Skarlatidis, G. Paliouras, A. Artikis, and G. A. Vouros, “Probabilistic event calculus for event recognition,” 2013, http://arxiv.org/abs/1207.3270.
View at: Google Scholar
R. Chalapathy and S. Chawla, “Deep learning for anomaly detection: a survey,” 2019.
View at: Google Scholar
G. Canbek, S. Sagiroglu, T. T. Temizel, and N. Baykal, “Binary classification performance measures/metrics: a comprehensive visualized roadmap to gain new insights,” in Proceedings of the 2017 International Conference On Computer Science And Engineering (UBMK), pp. 821–826, Antalya, Turkey, October 2017.
View at: Publisher Site | Google Scholar
T. Hastie and R. Tibshirani, “Generalized additive models,” Statistical Science, vol. 1, no. 3, pp. 297–310, 1986.
View at: Publisher Site | Google Scholar
R. V. D. Schoot, S. Depaoli, R. King et al., “Bayesian statistics and modelling,” Nature Reviews Methods Primers, vol. 1, no. 1, 2021.
View at: Publisher Site | Google Scholar
N. Katzouris, A. Artikis, and G. Paliouras, “Online Learning Probabilistic Event Calculus Theories in Answer Set Programming,” 2021, http://arxiv.org/abs/2104.00158.
View at: Google Scholar
D. Hamer, “Probability, anti-resilience, and the weight of expectation,” Law, Probability and Risk, vol. 11, no. 2-3, pp. 135–158, 2012.
View at: Publisher Site | Google Scholar
T. Kieras, J. Farooq, and Q. Zhu, “Modeling and assessment of iot supply chain security risks: the role of structural and parametric uncertainties,” in Proceedings of the 2020 IEEE Security and Privacy Workshops (SPW), San Francisco, CA, USA, May 2020.
View at: Google Scholar
Y. Yuan, W. Wang, and W. Pang, “A Genetic Algorithm with Tree-Structured Mutation for Hyperparameter Optimisation of Graph Neural Networks,” 2021, http://arxiv.org/abs/2102.11995.
View at: Google Scholar
V. D. Blondel, J. L. Guillaume, R. Lambiotte, and E. Lefebvre, “Fast unfolding of communities in large networks,” Journal of Statistical Mechanics: Theory and Experiment, vol. 2008, no. 10, Article ID P10008, 2008.
View at: Publisher Site | Google Scholar
A. J. M. Garrett, “Review: probability theory: the logic of science,” in Law, Probability and Risk, vol. 3, no. 3-4, pp. 243–246, 2004.
View at: Publisher Site | Google Scholar
W. Y. Poon and S. Y. Lee, “Maximum likelihood estimation of multivariate polyserial and polychoric correlation coefficients,” Psychometrika, vol. 53, no. 2, 1988.
View at: Publisher Site | Google Scholar
A. B. Mrad, V. Delcroix, S. Piechowiak, P. Leicester, and M. Abid, “An explication of uncertain evidence in Bayesian networks: likelihood evidence and probabilistic evidence,” Applied Intelligence, vol. 43, no. 4, pp. 802–824, 2015.
View at: Publisher Site | Google Scholar
S. Guopan, “The effect of probability on risk perception and risk preference in decision making,” in Proceedings of the 2010 International Conference On Education And Management Technology, pp. 690–693, Cairo, Egypt, November 2010.
View at: Publisher Site | Google Scholar
Y. Biao, “Abnormal Event Detection Based on IPZM,” in Proceedings of the 2011 6th IEEE Joint International Information Technology and Artificial Intelligence Conference, Chongqing, China, August 2011.
View at: Publisher Site | Google Scholar
J. Yu, Y. Lee, K. C. Yow, M. Jeon, and W. Pedrycz, “Abnormal event detection and localization via adversarial event prediction,” IEEE Transactions on Neural Networks and Learning Systems, pp. 1–15, 2021.
View at: Publisher Site | Google Scholar
Y. Zhang and H. Chao, “Abnormal event detection in surveillance video: a compressed domain approach for hevc,” in Proceedings of the 2017 Data Compression Conference (DCC), Snowbird, UT, USA, April 2017.
View at: Publisher Site | Google Scholar
X. Zong, Y. Chen, A. Liu et al., “Abnormal event detection in video based on sparse representation,” in Proceedings of the 2020 15th International Conference on Computer Science & Education (ICCSE), pp. 649–653, Delft, Netherlands, August 2020.
View at: Publisher Site | Google Scholar
S. Triantafillou, F. Jabbari, and G. Cooper, “Causal markov boundaries,” 2021, http://arxiv.org/abs/2103.07560.
View at: Google Scholar
Y. Ishi, T. Yoshihisa, T. Kawakami, and Y. Teranishi, “A distributed sensor data stream delivery system with communication loads balancing for heterogeneous collection cycle requests,” in Proceedings of the 2012 IEEE 18th International Conference on Parallel and Distributed Systems, pp. 728-729, Singapore, December 2012.
View at: Publisher Site | Google Scholar
A. Artikis, M. Sergot, and G. Paliouras, “An event calculus for event recognition,” IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 4, pp. 895–908, 2015.
View at: Publisher Site | Google Scholar
S.-u.-R. Baig, W. Iqbal, J. L. Berral, and D. Carrera, “Adaptive sliding windows for improved estimation of data center resource utilization,” Future Generation Computer Systems, vol. 104, pp. 212–224, 2020.
View at: Publisher Site | Google Scholar
I. Ari, E. Olmezogullari, and O. F. Çelebi, “Data stream analytics and mining in the cloud,” in Proceedings of the 4th IEEE International Conference on Cloud Computing Technology and Science Proceedings, pp. 857–862, Taipei, Taiwan, December 2012.
View at: Publisher Site | Google Scholar
J. Traub, P. M. Grulich, A. C. Rodriguez et al., “Scotty: efficient window aggregation for out-of-order stream processing,” in Proceedings of the 2018 IEEE 34th International Conference on Data Engineering (ICDE), pp. 1300–1303, Paris, Italy, April 2018.
View at: Publisher Site | Google Scholar
W. J. Bao and Z. Ying, “A survey and performance evaluation on sliding window for data stream,” in Proceedings of the 2011 IEEE 3rd International Conference on Communication Software and Networks, pp. 654–657, Xi’an, China, May 2011.
View at: Publisher Site | Google Scholar
K. Demertzis and L. Iliadis, “Bio-inspired hybrid intelligent method for detecting android malware,” in Knowledge, Information and Creativity Support Systems, Springer, Cham, Switzerland, 2016.
View at: Publisher Site | Google Scholar
H. L. Hong, L. Z. Longbo, J. W. Jinmiao, and F. W. Fengying, “An improved sampling algorithm for landmark windows over weighted streaming data,” in Proceedings of the 2010 8th World Congress on Intelligent Control and Automation, pp. 2823–2827, Jinan, China, July 2010.
View at: Publisher Site | Google Scholar
X. Zhong, J. Chen, L. Zhang, and Y. Zhang, “Window-based dynamic streaming tensor analysis based on CP decomposition,” in Proceedings of the 2021 4th International Conference on Artificial Intelligence and Big Data (ICAIBD), pp. 681–686, Chengdu, China, May 2021.
View at: Publisher Site | Google Scholar
L. Ren, C. Shi, and X. Ran, “Small salient target detection using overlapped sub window,” in Proceedings of the 2011 4th International Congress on Image and Signal Processing, vol. 3, pp. 1448–1451, Shanghai, China, October 2011.
View at: Publisher Site | Google Scholar
T. Bäckström, “Overlap-add windows with maximum energy concentration for speech and audio processing,” in Proceedings of the ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 491–495, Brighton, UK, May 2019.
View at: Publisher Site | Google Scholar
M. Raeis, M. J. Omidi, and J. Kazemi, “Improving instantaneous capacity and outage probability in df-relaying,” in Proceedings of the 2013 21st Iranian Conference On Electrical Engineering (ICEE), pp. 1–5, Mashhad, Iran, May 2013.
View at: Publisher Site | Google Scholar
M. Wu and P. Y. Kam, “Instantaneous symbol error outage probability over fading channels with imperfect channel state information,” in Proceedings of the 2010 IEEE 71st Vehicular Technology Conference, pp. 1–5, Taipei, Taiwan, May 2010.
View at: Publisher Site | Google Scholar
S. G. Chen, “Reduced recursive inclusion-exclusion principle for the probability of union events,” in Proceedings of the 2014 IEEE International Conference on Industrial Engineering and Engineering Management, pp. 11–13, Selangor, Malaysia, December 2014.
View at: Publisher Site | Google Scholar
K. Demertzis and L. Iliadis, “Adaptive elitist differential evolution extreme learning machines on big data: intelligent recognition of invasive species,” in Advances In Big Data, pp. 333–345, Springer, Cham, 2017.
View at: Publisher Site | Google Scholar
L. C. Ludeman, “Appendix c: table of discrete random variables and properties,” in Random Processes: Filtering, Estimation and Detection, pp. 591-592, Wiley IEEE, NJ, USA, 2009.
View at: Publisher Site | Google Scholar
D. Semenova and N. Lukyanova, “Random set decomposition of discrete-continuous random variables,” in Proceedings of the 2012 IV International Conference “Problems Of Cybernetics And Informatics” (PCI), pp. 1–4, Baku, Azerbaijan, September 2012.
View at: Publisher Site | Google Scholar
F. Kucharczak, F. Ben Bouallegue, O. Strauss, and D. Mariano-Goulart, “Confidence interval constraint-based regularization framework for PET quantization,” IEEE Transactions on Medical Imaging, vol. 38, no. 6, pp. 1513–1523, 2019.
View at: Publisher Site | Google Scholar
Z. Sheng and L. Cheng, “A method to construct the confidence intervals for process capability indices based on fuzzy set theory,” in Proceedings of the 2016 3rd International Conference On Information Science And Control Engineering (ICISCE), pp. 758–762, Beijing, China, July 2016.
View at: Publisher Site | Google Scholar
K. Zaman and S. M. Khan, “Construction of confidence interval on mean value with interval data,” in Proceedings of the 2013 IEEE Symposium on Computational Intelligence for Engineering Solutions (CIES), pp. 157–162, Singapore, April 2013.
View at: Publisher Site | Google Scholar
T. Flynn and S. Yoo, “Change detection with the kernel cumulative sum algorithm,” in Proceedings of the 2019 IEEE 58th Conference on Decision and Control (CDC), pp. 6092–6099, Nice Acropolis, Nice, France, December 2019.
View at: Publisher Site | Google Scholar
V. M. Artyushenko and V. I. Volovach, “Modeling the algorithm of cumulative sums in the applied problems of detecting the signals with random time of occurrence in non-gaussian noise,” in Proceedings of the 2021 Systems of Signals Generating and Processing in the Field of on Board Communications, pp. 1–5, Moscow, Russia, March 2021.
View at: Publisher Site | Google Scholar
T. Alkhaldi, L. Mihaylova, and H. Gellersen, “QRS complex detection using centered Cumulative Sums of Squares,” in Proceedings of the 2013 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA), pp. 168–171, Poznan, Poland, September 2013.
View at: Google Scholar
V. I. Volovach and V. M. Artyushenko, “Detection of signals with a random moment of occurrence using the cumulative sum algorithm,” in Proceedings of the 2021 Systems of Signals Generating and Processing in the Field of on Board Communications, pp. 1–6, Moscow Russia, March 2021.
View at: Publisher Site | Google Scholar
Y. Xue, L. Zhang, B. Wang, and F. Li, “Feature selection based on the kullback-leibler distance and its application on fault diagnosis,” in Proceedings of the 2019 Seventh International Conference on Advanced Cloud and Big Data (CBD), pp. 246–251, Suzhou, China, September 2019.
View at: Publisher Site | Google Scholar
M. Jezewski, R. Czabanski, and J. Leski, “Introduction to fuzzy sets,” in Theory And Applications Of Ordered Fuzzy Numbers: A Tribute To Professor Witold Kosiński, P. Prokopowicz, J. Czerniak, D. Mikołajewski, Ł. Apiecionek, and D. Ślęzak, Eds., pp. 3–22, Springer International Publishing, Cham, 2017.
View at: Publisher Site | Google Scholar
T. Chaira, “Fuzzy/intuitionistic fuzzy set theory,” in Fuzzy Set And its Extension: The Intuitionistic Fuzzy Set, pp. 1–40, Wiley, NJ, USA, 2019.
View at: Publisher Site | Google Scholar
A. Imura, T. Takagi, and T. Yamaguchi, “Intention recognition using conceptual fuzzy sets,” in Proceedings of the Second IEEE International Conference on Fuzzy Systems, vol. 2, pp. 762–767, San Francisco, CA, USA, March 1993.
View at: Publisher Site | Google Scholar
R. Tansuchat, U. Pham, and C. L. Van, “On soft computing with random fuzzy sets in econometrics and machine learning,” Soft Computing, vol. 25, no. 12, pp. 7745–7751, 2021.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2021 Zhao Lijun et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

261

Downloads

459

Citations