Abstract

Monitoring physical assault is critical for the prevention of juvenile delinquency and promotion of school harmony. A large portion of assault events, particularly school violence among teenagers, usually happen at indoor secluded places. Pioneering approaches employ always-on-body sensors or cameras in the limited surveillance area, which are privacy-invasive and cannot provide ubiquitous assault monitoring. In this paper, we present Wi-Dog, a noninvasive physical assault monitoring scheme that enables privacy-preserving monitoring in ubiquitous circumstances. Wi-Dog is based on widely deployed commodity Wi-Fi infrastructures. The key intuition is that Wi-Fi signals are easily distorted by human motions, and motion-induced signals could convey informative characteristics, such as intensity, regularity, and continuity. Specifically, to explicitly reveal the substantive properties of physical assault, we innovatively propose a set of signal processing methods for informative components extraction by selecting sensitive antenna pairs and subcarriers. Then a novel signal-complexity-based segmentation method is developed as a location-independent indicator to monitor targeted movement transitions. Finally, holistic analysis is employed based on domain knowledge, and we distinguish the violence process from both local and global perspective using time-frequency features. We implement Wi-Dog on commercial Wi-Fi devices and evaluate it in real indoor environments. Experimental results demonstrate the effectiveness of Wi-Dog which consistently outperforms the advanced abnormal detection methods with a higher true detection rate of 94% and a lower false alarm rate of 8%.

1. Introduction

School violence, as the leading cause of juvenile delinquency, has become an increasingly serious social issue and attracted extensive academic attention from researchers. According to a report of the National Center for Education Statistics, 28% of total 4326 examined adolescents reported bullying victimization, whose physical and mental health were severely affected [1]. To curb the prevalence of school violence, governments have introduced relevant policies to deal with it. A key enabler for effective school violence prevention is to automatically detect and alarm the instantaneous physical assault with existing available infrastructures. Wearable sensor based scheme may provide possible approaches to monitor a specific group of users, especially the guarded ones [2]. However, the always-on-body dedicated sensors (e.g., data glove [3], RFID [4], and smartphone [5]) render it not only uncomfortable to comply with but also inapplicable to general-purposed monitoring. More common solutions resort to camera-based monitoring [68]. Premounted cameras continuously collect and analyze the video frames of areas-of-interests, yet they bring underlying privacy issues and only operate in a clear line-of-sight (LOS) view.

Recent innovations in the increasingly hot area of wireless human sensing [911] inspire us to develop an upgraded assault monitoring system using Wi-Fi signals. Similar to many other activity recognition systems, the rationales of Wi-Fi-based assault detection are twofold. On the one hand, Wi-Fi signal is pervasive nowadays with densely deployed Wi-Fi infrastructures in public places, which delivers the idea of ubiquitous device-free surveillance into a practical solution. On the other hand, abrupt physical assault along with rapid movements of body parts could alter Wi-Fi signals and thus encode distinct features in received signals. Such features can be effectively captured by PHY layer Channel State Information (CSI), which is measurable on commercial Wi-Fi devices. Instead of the traditional Received Signal Strength (RSS), CSI conveys subcarrier-level channel measurements, with natural advantages in revealing the characteristics of superposed signals, and has been widely used for wireless sensing.

Emerging CSI-based technologies have promoted the revolution of action recognition domain, covering from macro-movements to micro-movements [1220]. Existing approaches, however, cannot be directly applied to assault monitoring since previous works usually rely on the repetitive patterns of actions [14, 17] or distinct profiles of single-person specialized gestures [1820]. In contrast, assault events are likely irregular and unpredictable. Compared with normal single-user activities, physical assault events are differentiated based on three criteria [5]. High-intensity. Multiple users (both attackers and victims) in the process of physical conflicts behave aggressively, inducing rapid and intensive body movements. Irregularity. In a real situation, physical assault cannot be regarded as repetitions of simple actions. Instead, the assault-induced signals generate complex and disordered fluctuations with the escalation of physical assault. Continuity. Severe physical assault always happens to specific victims, while the procedure of abuse can last for a long time. With in-depth understanding of the violent process and elaborate analysis of the event properties, it seems possible to recognize a violent event from RF signals.

In this paper, we show for the first time the feasibility of leveraging Wi-Fi signals for monitoring physical assault. We present Wi-Dog, a noninvasive physical assault monitoring scheme on commercial Wi-Fi infrastructures, to protect users from potential physical assault, just like a loyal dog does. Wi-Dog enables a privacy-preserving manner for daily assault monitoring in ubiquitous environments and advances the state-of-the-art Wi-Fi-based sensing approaches by solving three critical challenges. How to obtain abundant motion information from noisy CSI dynamics? Assault-induced fluctuations in CSI is easily distorted and blurred by background noises and irrelevant body movements. To reveal the genuine CSI waveforms, some previous works extracted the first principle component which still suffered from the noise interferences [16] or the second principle component which lost the majority of motion information [15, 18]. Instead, Wi-Dog obtains abundant and accurate CSI fluctuations by taking advantage of spatial diversity. The key intuition is that the same subcarrier of different antennas suffers from various channel distortions but the same noise sources, generating some similar variations in waveforms. Therefore, we propose a series of noise reduction steps to properly manipulate CSI dynamics from multiple subcarriers, eliminating irrelevant interferences while retaining motion cues of interests to the greatest extent. How to precisely and sensitively detect abnormal transitions during human interactions? As the drastic conflicts are location-variant and complex, conventional variance-based segmentation [21] cannot be utilized because slight interactions near the links may induce higher variation in CSI waveforms rather than assault. To address this challenge, we resort to the signal complexity of target frequency band to monitor the variation of intensity and irregularity level. The cross correlation of adjacent subcarriers is also supplemented to enhance the capability of location-independent detection. How to differentiate real assaults from assault-like actions? Existing fall-like detection works [21] fully utilized the features extracted from both amplitude and phase information in time-frequency analysis, which cannot be directly applied in our amplitude-based approach. Moreover, based on previous experimental study [5, 22], we notice that assault process can be easily mistaken as strenuous exercise (e.g., run, frog leap, and exergame) or normal human interactions (e.g., talk with body language). To make a comprehensive identification of assault process, we firstly extract novel features representing intensity level from Doppler frequency shifts and adopt a SVM classifier to classify time segments with high-intensity features, which is termed as local analysis. To reduce the false alarm rate, we further consider the irregularity as well as continuity in longer time duration by counting the occurrence frequency of new patterns and underlying slices.

We prototype Wi-Dog with commercial Wi-Fi devices and validate its performance in real environments. Experimental results show that Wi-Dog can monitor the imitated physical attacks with a high true detection rate of 0.94 (0.85), along with a low false alarm rate of 0.08 (0.11), using only a single pair of Wi-Fi transmitter-receiver in LOS (NLOS) environment. The results also show that Wi-Dog is robust to rational changes of parameters, including thresholds, individual diversity, duration time, and sampling rate. In a nutshell, our contributions are summarized as follows.(i)To the best of our knowledge, Wi-Dog is the first work to present a noninvasive physical assault monitoring system with only a pair of commercial Wi-Fi devices. We empower pervasive available Wi-Fi signals with the sensing ability and expand the boundaries of Wi-Fi to a new realm. We envision that this capability could enact as an early step toward more general emergency detection applications, including, but not limited to, terrorist threat warning, elders’ healthcare, and injury rescue.(ii)We drill down the domain of human abnormal interactions and explore the feasibility of Wi-Fi-based physical assault monitoring. The key enabler is to fully exploit the high-intensity, irregularity, and continuity of human motions reflected in CSI measurements. Hence we conduct holistic analysis and classify the violent events from both local and global perspective using elaborate-designed features.(iii)We innovatively recover the informative motion-induced signals from noisy CSI measurements and design a location-independent indicator to detect the physical assault based on signal complexity.(iv)We perform extensive experiments and validate Wi-Dog in both classroom (NLOS) and corridor environments (LOS) by imitating real physical assault with different volunteers. Experimental results show that Wi-Dog outperforms the state-of-the-art abnormal detection approaches in assault monitoring with a high true detection rate of 0.94 (0.85), along with a low false alarm rate of 0.08 (0.11) in LOS (NLOS) environment with rational changes of parameters.

2. Preliminary and Background

In this section, we present the critical intuitions of assault monitoring system and introduce the concept of Doppler Effect in CSI.

2.1. Analysis of Physical Assault

Formally, physical assault is properly defined by the World Health Organization (WHO) as the intentional use of physical power toward another person which may result in deadly injuries [23]. The definition involves intentions and outcomes of assault itself, irrespective of the essential characteristics it contains. To understand and explore the assault process, we resort to Violent-Flows Database [24], which is publicly recognized as an evaluation benchmark for crowd assault detection. All 246 videos (123 assault and 123 non-assault) are downloaded from YouTube, with an average 3.6s video clip. Observing uncontrolled assault conditions in real world, we believe that a practical assault monitoring systems should meet the following requirements. Privacy-preserving. A monitoring system should not violate the privacy of users, especially the normal users presenting in the surveillance area. Device-free. One should never expect an attacker to expose himself and wear any devices to be monitored. In contrast, an attacker would typically seek ways to hide from any monitoring system. Hence a useful system should work in passive and noncontact mode, allowing monitoring of attackers without any devices. Omnidirectional. Assault events frequently happen at blind corners that are not easy to be covered by traditional monitoring like camera-based systems. Ubiquitous monitoring systems should provide omnidirectional coverage for as large area as possible. Inspired by recent innovations on CSI-based human behaviour sensing, we believe the ubiquitously existing Wi-Fi signal provides an alternative promising way to monitor indoor assault, which fulfils all the above conditions.

2.2. Doppler Effect in CSI

The proof-of-concept assault monitoring system highly relies on the extraction of Doppler shift in CSI. Doppler shift is described as the change in wavelength of a signal for receivers, which results from relative motions of target objects, transmitters, and receivers. In the context of wireless sensing, if we consider a moving human reflecting the multipath signals as a virtual transmitter [25], then a human performing full-body motions could generate distinct Doppler shift in receivers. Therefore, the motion-induced variation of k-th path in frequency is given bywhere is the Doppler frequency shift for the k-th path, denotes the wavelength of the signal in the medium, and implies the length of the k-th propagation path. That is, the quicker the motion is performed, the faster the path length changes and the larger the Doppler shift is produced. To further depict channel properties of multiple paths, we introduce the concept of CSI [26], the superimposed channel response of each individual path, which can be written aswhere implies the phase shift caused by carrier frequency offset (CFO), is the carrier frequency, N is the total number of propagation paths, and denotes the attenuation factor for the k-th path at time t. Considering that can be expressed as the product of the speed of light c and time of flight , another expression of is . To understand the relationship between CSI and Doppler Effect, we slit CSI into static component () and dynamic component (). As human motion may change a set of path length, can be unfolded as:where denotes the set of dynamic path. To eliminate phase noises and CFO, the unwrapped instantaneous CSI power is calculated as follows.Noticing that the frequency of sinusoids can be extracted, we use Hilbert Transform to evaluate the speeds of human body parts for assault analysis. In the following sections, we would introduce the system architecture and design of Wi-Dog.

3. Overview

Wi-Dog is a device-free physical assault monitoring system using only a pair of commercial Wi-Fi devices. Figure 1 shows the architecture overview of Wi-Dog, which consists of four critical components, i.e., CSI collection, processing, segmentation, and assault recognition step. Specifically, we resort to fine-grained CSI tractable on commercial NICs to explore the underlying characteristics of assault-induced variations. The rationale is that furious physical assault along with rapid movements of body parts severely affects the propagation path. Wi-Dog continuously records and processes the raw CSI measurements based on a set of advanced processing algorithms. The precise spectrogram of motion-induced Doppler Effect can be recovered through a band pass filter, subcarrier selection steps, and time-frequency analysis. As a consequence, the noise-covered motion information comes out and takes up notable portion of signal fluctuations. In the CSI segmentation step, we advance the existing variance-based segmentation approach by proposing a novel assault indicator combining wavelet entropy and cross correlation, that is, a key step for Wi-Dog to detect abnormal transitions within a wide range and accomplish the goal of location-independent monitoring during human interactions. In the assault recognition step, we explicitly analyze the suspected assault series from both local and global aspects. For local analysis, we extract features representing high-intensity and adopt a SVM classifier to preserve suspected time slices. To enhance the efficiency and robustness of Wi-Dog, we take irregularity and continuity into consideration to further analyze the long-term fighting process. Wi-Dog triggers assault alarm only when suspected time slices have been confirmed by local-global analysis. Otherwise, no assault is detected within the surveillance area.

4. Wi-Dog Design

In this section, we give out critical observations of physical assault and further detail the methodologies of Wi-Dog by experimental studies.

4.1. CSI Processing
4.1.1. Band Pass Filtering

Raw CSI measurements contain high amplitude impulses, burst noises with high frequency, and significant static interferences with low frequency. To obtain sanitary CSI data with a target frequency shift, a three-order Butterworth band pass filter is a natural choice to remove irrelevant signal components in . Practically, considering real physical assaults are extremely intensive (according to [27], the average torso speed of Olympic boxers can reach to 3m/s, corresponding to ), we set the lower cut-off frequency to 1Hz to eliminate interference of static components, while the upper cut-off frequency is set to 140Hz to keep more high-frequency action information.

4.1.2. Antenna and Subcarrier Selection

Observation I: Embracing multiple antennas intuitively enhances the spatial granularity by harnessing frequency diversity [28]. However, taking all subcarriers into account is not panacea for the exposure of some vital information. As is shown in Figures 2(a) and 2(b), we note that the antennas with higher variances in static are likely to possess less dynamic environment responses, which indicate that background noises contribute a lot to the captured variations of CSI measurements, rather than dynamic responses. Therefore, the variance of CSI can be helpful to exclude insensitive antennas.

Observation II: Based on the intuition that motion-induced variations of CSI waveforms are correlated, the most common variations of all subcarriers can be extracted by Principle Component Analysis (PCA). However, two challenges arise as well. First, residual noises are stubborn and nonnegligible due to their internal correlation. In Figures 2(c) and 2(d), a pair of antennas with lower cross correlation of 30 subcarriers in static is likely to possess higher correlation in dynamic environment, and vice versa. The reason is that the same subcarriers of different antennas are affected by the background noises to varying degrees, leading to similar variations in waveforms, which we term as ‘relevant noise’. Second, it is a dilemma to select the first (noisy but principle information remained) [18] or the second (sanitized but with information loss) principle component [18]. To kill two birds with one stone, we meet these challenges by utilizing cross correlation to select subcarriers which are robust to relevant noises and sensitive to human movements.

Selection Strategy. We present our strategy for selection of antenna pairs and subcarriers as follows. For antenna selection, we first calculate the standard deviation and mean cross correlation C for each antenna before motion starts. Then, the Antenna Selection Indicator (ASI) is defined as . The antenna pair with the lowest ASI is selected as the robust antennas. The brief break during physical conflicts, e.g., observing response, quarrelling, or any relatively slight interactions, can be termed as static moments to calibrate the antenna selection. For subcarrier selection, to find a trade-off between sensitivity and robustness, we choose the top 20 subcarriers with the lowest cross correlation values in static and with the highest cross correlation values in dynamic environment, respectively. The final subset of subcarriers are chosen from the intersection of 40 subcarriers. Specifically, the criterion of our subcarrier selection strategy is not to consider those subcarriers with the largest variance in dynamic moments, because we are unaware beforehand of whether the variation is induced by location-variant movements or relevant background interferences. What is more, to reduce dimensions and useless interferences of CSI data, we apply PCA to extract common variations of selected subcarriers and choose the first principal component for time-frequency processing. By our advanced selection strategy, the extracted first principal component contains 95% major motion information as well as negligible noises and contributes to the increase in the proportion of reflecting power in recovered spectrogram. Figure 3 shows the PCA result of all 90 subcarriers, which still contains inferior subcarriers and nonnegligible 'relevant noise'. Clearly, our processing method provides a smoother waveform with informative fluctuations and weakly noises.

4.2. Time-Frequency Analysis

To effectively analyze the CSI waveforms in the time-frequency domain, Short-Time Fourier Transform (STFT) is adopted to generate the spectrogram which shows the energy of each frequency component with time. We use the normalized FFT magnitudes and a sliding window approach for suitable time-frequency resolution of 1.93Hz and 0.5ms. A Gaussian window with a size of 3 is further applied to smooth the spectrogram. Figure 4 shows fine-grained spectrogram with two volunteers imitating real physical collisions. In this experimental study, one volunteer who plays the role of attacker is asked to aggressively push the other volunteer who acts as a victim walking toward him, while the victim is asked to respond to the assaults as real as possible. The imitated actions repeat five times, with no interruption. Triple heuristic observations have been verified based on this experimental study. Drastic conflicts always contain abundant intensity cues (e.g., speed, acceleration, and kinetic energy) which lead to distinct variations of power in corresponding frequency bands. With the escalation of assault, the attacker would bare his teeth and claws, while the victim may fight back or flee; the distribution of energy of reflected signal is irregular due to complex and rapid movements of body parts. The physical assault could burst out anytime with a short duration, while the assault process keeps continuity until the attacker radically releases his resentment. These observations can be summarized as three critical characteristics, that is, intensity, irregularity, and continuity. Hence, we are inspired to exploit these unique characteristics of physical assault to realize passive assault monitoring.

4.3. CSI Segmentation

To identify the assault-induced variations, the key issue is to discern every transition points in CSI time series. Previous variance-based sliding window approach is widely adopted for passive motion detection [21]. We discard this approach in assault monitoring for two reasons. First, over-threshold variance is easily affected by irrelevant motion information, which may delay or mislead the real-time detection in target frequency. Second, the absolute value of variance cannot precisely reveal the intensity and complexity of interactions. Cross correlations between different subcarriers have been shown to be effective in reflecting the process of human walking. However, we argue that it is insufficient to discern fluctuation caused by micro-energy response from macro-movements in the distance.

Therefore, we resort to wavelet entropy (WE), a new method of complexity measurement for signals [29, 30], which is capable of monitoring micro-energy response and quantifying the order-disorder states of the reflected signals. Lower WE denotes the simpler components of frequency and more orderly changes of movements, and vice versa. WE possesses unique advantages as follows. On the one hand, compared with the periodic motions of human objects, the signals reflected from drastic actions of body parts are highly nonstationary and correspond to more randomness in time and frequency domain. WE could precisely reflect the intensity level of energy variations of target frequency bands. On the other hand, WE is a feasible and location-independent indicator, which is suitable for our dynamic experimental scenes. Wavelet entropy is defined as , where represents the normalized ratio of wavelet energy at the j-th scale, .

In practice, we select db6 wavelet due to its better orthogonality and compact support, which makes wavelet transform more suitable for signal oddity detection [31]. We adopt a sliding window method to calculate WE with the sliding window width of 1s, half of which is used as the step size. To further improve the robustness of detection system, we propose a light-weight Assault Detection Indicator (VDI) combining wavelet entropy WE and cross correlation C as , with a 5-point median filter to smooth the curve. The higher VDI means the higher intensity level of activities and the more frequent signal changes. We select transition points which are the local minimum values below the predefined threshold which will be further evaluated in evaluation part.

As is shown in Figure 5, we compare the performance of our proposed method with RT-Fall's segmentation step, which is capable of real-time and continuous fall detection by variance-based segmentation. The entire procedure of human interactions lasts about 50s that consist of assault part (6s~ 30s) and normal part (31s~ 48s). The upper figure shows the normalized sliding curve combining average value and variance of amplitude. We notice that the previous segmentation method cannot precisely and timely reveal the high-intensity with the escalation of physical assault, while amicable interactions near the links may generate great variations in waveforms (37s- 40s), which is unreliable for passive assault monitoring. The lower figure reveals the superiority of our advanced segmentation approach, which clearly depicts the complexity level of assault-induced signals and mitigates the location-variant interferences caused by human normal activities.

4.4. Assault Recognition

The assault recognition step aims to explicitly characterize the features of violent attacks and precisely detect the presence of violent events. Hence, we identify suspected violent events from two perspectives, i.e., the local view and the global view. The purpose of local analysis is to identify drastic actions from mild human activities, whereas global analysis is to evaluate the irregularity as well as continuity of signals in a given time interval.

4.5. Local Analysis

The most essential part of local analysis is to obtain high-intensity information. Noting that dominant power strengths of frequency bins caused by human torso reflect the major trends of human movements, we adopt the percentile method, an effective PLCR extraction method, to estimate the torso movement speed [32]. The cumulated percentage P(f, t) of energy F(f, t) at a given frequency f and time t is calculated aswhere selected frequency values f should not be singular at time t and satisfy P(f,t)≥0.75. Therefore, several features revealing high-intensity can be extracted from both the total spectrogram and dominant speed.(1) The area of the surrounded curve (ASR) denotes the area of the speed curve surrounded with horizontal axis. The rationale is that assault-like actions produce relatively enormous velocity along with high peaks of amplitudes in a short time duration. The efficiency of ASR depends on previous segmentation step.(2) The power changing rate (PCR) is proposed to reflect the increase of kinetic energy based on the observation that abnormal drastic motions typically have high Doppler energy content within a specific frequency band. We give the mathematical formula of PCR as follows:where F(f,t) represents the FFT power coefficients of a specific frequency f at time t, fu and fl is the upper bound and lower bound of the interested frequency band, and t0 refers to the time when transition point is detected.(3) Peak amplitude bandwidth (PAB) chooses the 1/2 and 1/4 peak amplitude bandwidth to reflect the divergence between peak values and valley values of FFT magnitudes. The reason is that energy of intensive assaults disperses in a wide band around the frequency of peak amplitude.(4) High-frequency duty ratio (HDR) is often used to measure the ratio of high level of the time. Considering violent actions are always along with rapid high-frequency changes, we count the number of times when FFT coefficients F(f,t) meanwhile exceed preset frequency f and a predefined threshold.

The SVM classifier is then applied to select assault-like actions, which is originally designed for binary classification. We use LibSVM toolbox [33] with Gaussian Radial Basis Function (RBF) kernel in the training process and set the cost parameter C and gamma g in kernel function to be 4 and 0.0884 through 10-fold cross-validation.

4.6. Global Analysis

Global analysis starts to be considered only when assault-like action is detected. We adopt two features to characterize the continuity and irregularity of complex physical assault.(1) Detection Confidence is calculated to reflect the continuity of human activities. As is shown in Figure 4, violent events always last for a relatively long time due to the escalation of physical conflicts, while other normal movements (e.g., lying down and sitting down), even with abrupt acceleration, seem unlikely to occur several times within a short time duration. To quantify the continuity of actions, let J be a sequence of segmented slices; the predicted assault probabilities of following segments are calculated. We give a detection confidence CT aswhere Pjdenotes the probability of violent action in the j-th segments. The formula indicates that with the increase of the number of assault-like segments within time duration T, the possibility of assault events will be higher, which could efficiently reduce the rate of false alarm. We set time duration T as 15s for real-time and robust monitoring.(2) Lempel-Ziv complexity is a feasible indicator to reflect the irregular degree between the vicinity segments. Physical assault may result in irregular patterns due to various extents of attacks, while rapid macro-movements, e.g., running and frog leaping, have repetitive profiles. In such case, Lempel-Ziv complexity [34] is appropriate to measure the dissimilarity of both sanitized amplitudes and extracted velocity, by counting the number of distinct patterns and recurrence rate in target series. The normalized Lempel-Ziv complexity C(n) can be calculated as

where c(n) denotes a complexity counter; =2 means the binary consideration here. Finally, another SVM classifier is adopted to differentiate real physical assault from similar dynamic actions, using the normalized feature to quantify the continuity and regularity. Wi-Dog would trigger the alarm only when both preset conditions meet the criteria.

5. Evaluation

In this section, we interpret the experimental strategies, overall performance and detailed impact study of Wi-Dog, respectively.

5.1. Experimental Strategy

Experimental setup. We use a ThinkPad X200 laptop with a single antenna as the sender and a Lenovo T460 laptop with three antennas as the receiver. Both laptops are slightly modified with Intel 5300 NICs and set up to inject in monitor mode on channel 165 at 5.825 GHz, with package rate set to 1000 pkts/s. In Figure 6, we evaluate Wi-Dog with 10 volunteers (5 males and 5 females) in a multipath-affected classroom surrounded with tables and chairs, and a narrow corridor with the width of 1.5m, which can be regarded as LOS environment. We place the sender and the receiver at the height of 0.8m corresponding to the height of human torso and separate them by a distance of 3m for an appropriate detection range.

Data collection. We collect abundant CSI data in almost two weeks for 5 mild interactions (i.e., shaking hands, making bows, talking with body language, hugging, and giving high-five), 5 normal human actions (i.e., lying down, sitting down, walking, running, and playing exergames) performed by one volunteer with the other one standing by, and long-time physical assaults imitated by two volunteers of each group with necessary protective equipment (e.g., pushing over the victim). Each volunteer is asked to continuously finish specific normal actions and then conduct assault attack in total 5 minutes for 50 sets, while there are no predefined constraints for physical assaults. The ground truth is acquired according to the video recordings. Due to the layout limitation, Figure 7 only shows an example of intensive action. At the beginning, the attacker and the victim both keep static status in Figure 7(a). Then the attacker walks toward the victim (Figure 7(b)) and pushes over the victim on the ground (Figure 7(c)). The system would be awaken and keeps a close watch on the subsequent actions. If the victim struggled to strike back and the conflict arose, Wi-Dog would consider these actions as physical assault and raise the alarm.

Metric. We evaluate the performance of Wi-Dg based on two metrics.(i)True Detection Rate (TDR) is the probability that the physical assault is accurately detected from similar activities, which is defined as the ratio of accurately detected assaults from all human interactions. Higher TDR represents the effectiveness of an emergency alert system.(ii)False Alarm Rate (FAR) is the proportion of the system wrong alarms when there is no violent assault happening, which can be defined as the ratio of wrongly detected assaults and accurately detected assaults. Lower FAR promises fewer misalarms and optimizes the public resource.

5.2. Overall Performance

In this part, we resort to three state-of-the-art anomaly detection methods, RT-Fall [21], WiFall-2014 [35], and WiFall-2017 [36] as the baselines. WiFall-2014 [35] was the first work for passive fall detection with COTS Wi-Fi devices, and its extended version WiFall-2017 [36] advanced the performance by considering the subcarrier sensitivity and principal component extraction. By comparison, we could understand the necessity of subcarrier selection. RT-Fall is the most similar work, which extracted features from bimodal CSI information and realized real-time segmentation. However, even though physical assaults and fall-like actions happen with similar energy variations, these fall detection methods cannot be directly applied due to the lack of assault-procedure analysis. We then add global analysis to further process extracted features. The main purpose of comparison is to reveal the superiority of our CSI processing algorithms and segmentation methods. As is shown in Figures 8(a) and 8(b), we notice that WiFall-2014, without subcarrier selection and action segmentation, cannot realize robust assault monitoring with only statistical features, which could mislead the assault alarm. The average TDR and FAR of WiFall-2014 in LOS are around 0.71 and 0.28, while in NLOS environment they are around 0.64 and 0.35. WiFall-2017 emphasized the importance of subcarrier selection and earned a 0.19 (0.13) higher TDR and 0.07 (0.10) lower FAR compared with its previous version in LOS (NLOS). Yet the lack of segmentation still limits its practical use. RT-Fall reveals its satisfactory performance in complex scenarios. The TDR and FAR of RT-Fall are 0.91 and 0.13 in LOS, 0.86 and 0.15 in NLOS. Compared with RT-Fall, Wi-Dog outperforms RT-Fall by 3% higher TDR and 5% lower FAR in LOS, similar TDR and slightly lower FAR around 0.85 and 0.11 in NLOS. We explain why amplitude-based Wi-Dog could achieve similar or even better performance than full-information-employed RT-Fall. First, Wi-Dog uses the selected cross correlation to characterize the properties of phase difference which has been proved in [14]. Second, Wi-Dog overcomes the drawback of amplitude-based detection method which is, as commonly argued, its vulnerability to NLOS environments based on the appropriate subcarrier selection. Third, Wi-Dog is robust to micro-movements near the links and sensitive to macro-movements in the distance by monitoring the complexity level of suspected actions in target frequency bands. In contrast, location-variant movements may constrain the accuracy and robustness of RT-Fall.

5.3. Parameter Study

Impact of local-global analysis. In order to validate the practicability and necessity of the detailed description of assault characteristics, we evaluate the local-global analysis by separately evaluating all three properties. Table 1 shows the specific value of independent-processed part. For single-variable analysis, both intensity (In) and irregularity (Ir) analysis are infeasible to sensitively detect or precisely alarm, while only considering continuity (Co) is meaningless. For double-variable analysis, we notice that the method of (In+Ir) could achieve obvious progress in sensitivity, while it still maintains relatively high FAR in experiments. (In+Co) sharply lowers the FAR by excluding those slices with enormous short-time energy, while leading to the omission of assault alarm. We also take (Ir+Co) into consideration; while the irregular features of short-time slices are subjected to continuity analysis, the final result is not unsatisfactory. To make overall quantification of physical assault, an integrated local-global analysis is imperative for raising the sensitivity as well as reducing the false alarms.

Impact of group diversity. To evaluate the general applicability of Wi-Dog for different users, we recruit 10 volunteers consisting of 5 males and 5 females to show the impact of group diversity. During the experiments, the participants possessing various heights, weights, and ages are required to be paired with different partners. The reason is that different participants may react with various intensive extents. As is shown in Figure 9, Wi-Dog monitors physical assault of all participants with relatively high accuracy. Among the results, the detection rates of Male-beat-Male (M→M) and Male-beat-Female (M→F) seem satisfactory. We owe it to the rapid macro-movements caused by men-characters. Conversely, we also notice that in the process conducted by female, the detection rate drastically decreases to 0.88 in LOS and 0.86 in NLOS when the victim is female, and 0.85 in LOS and 0.78 in NLOS when the victim is male, which are still practicable even in real environment. We explain that Wi-Dog is sensitive to high-intensity actions and body profiles in the monitoring area, while female participants fail to maintain consistent high-intensity attacks to overweight male victim. The loss of fighting information may induce a slighter Doppler frequency shifts that can be confused with those normal interactions.

Impact of thresholds. As precise segmentation is essential for long-time assault detection, an appropriate threshold of VDI is needed to sensitively monitor abnormal transitions. Figure 10 depicts the TDRs and True Negative Rates (TNRs) of assault activities under a rational variation of VDI thresholds with the increment value of 0.01. With the increase of thresholds, the TDR decreases from 1 to 0.2, indicating that only extremely strenuous actions over the predefined thresholds can be correctly identified, while those below-threshold violent actions would lead to a general uptrend of TNRs. We note that lower thresholds lead to higher sensitivity but unreliability in assault monitoring. For balanced overall performance, Wi-Dog achieves the best trade-off between TDR and TNR of 0.94 and 0.92 by setting the threshold of VDI as 0.275, which is a general threshold fitting the majority of assault types.

Impact of duration time. We further evaluate the performance with changing duration time T. Intuitively, longer duration time could contain more underlying assault actions, which would significantly reduce the misinformation and ensure the system accuracy. As is shown in Figure 11, we observe an ideal trend of rising TDR as well as decreasing FAR with longer T. The reason behind observations is that assault-like actions (e.g., fall and run) bursting out enormous short-time energy can be excluded by continuity analysis, which may otherwise induce severe misalarms. However, when duration time T exceeds 20s, FAR has an obvious rising trend while TDR has a slight decline. We explain that even physical collisions would generate some similar patterns and present some kind of regularity. Furthermore, longer time duration would bring about relatively higher possibility of confusion with continuous middle-intensity movements during exergames. We set T=20 as a reasonable choice by fully considering the real-time capability and accuracy.

Impact of packet rates. Based on a clean channel 165 and both laptops set up in monitor mode to avoid uncontrolled packet losses, further experiments are conducted to see the relationship between system accuracy and sampling rate. Since abundant information of drastic actions can be fully preserved by fine-grained transmission, we suppose that Wi-Dog can achieve better performance with the increase of sample rates. Figure 12 verifies our intuition, which shows the positive correlation between two variables. By horizontal comparison, we notice that all systems enhance the performance in varying degrees. The rationale is that growing packets in fixed time window would induce more informative features, e.g., wavelet entropy and variances. By vertical comparison, we observe that Wi-Dog outperforms the other two methods when the sampling rate increases from 500 pkts/s to 2000 pkts/s. In particular, Wi-Dog possesses a favourable result of 0.91 even with only 500 pkts/s, while the other three methods raise the TDR but could only approximate the value in 2000 pkts/s. Therefore, Wi-Dog could make assault monitoring available with imperfect Wi-Fi devices even with low sampling rate.

Impact of distance to LOS path. In general, amplitude-based detection severely suffers from its natural deficiencies, for example, obstacle and long-distance attenuation, which limits the effective detection range. We test the impacts of distance with two baselines in the mid-perpendicular of a single link, ranging from 1m to 5m. Figure 13(a) demonstrates the varied trend with the changing performing distance. Obviously, the detection rate of WiFall-2014 suffers from maximum long-distance attenuation due to inferior subcarriers. With efficient signal processing steps, the other three methods keep relatively high detection rate. Due to the lack of accurate segmentation, WiFall-2017 omits some useful segments and only obtains 0.74 at 5 meters; RT-Fall and Wi-Dog are both competitive to sensitively detect physical assault.

In Figure 13(b), we notice that the closer the distance between users and links is, the higher the FAR is. This is because the location of users is relevant to the sensitivity of the monitoring system. For example, some slight interactions near the link could be easily mistaken as intensive actions, so the system may raise misalarm. The reason lies in the variance-based segmentation, which is a location-dependent indicator. In contrast, Wi-Dog copes well with this critical challenge by decomposing superposed signal into different frequency band, which makes Wi-Dog practicable in wide range monitoring.

6. Discussion and Limitations

6.1. Detecting Multiple Targets in Close Contact

Given that Wi-Dog excels in monitoring physical assault mainly based on its natural properties of high-intensity, irregularity, and continuity, we intend to further promote the sensitivity of Wi-Dog by detecting multiple targets in close contact. The original motivation springs from vision-based assault method [37] by effective spatiotemporal modeling to detect crowd abnormal events, which implies tight connection between crowd density and probability of abnormal events. Unfortunately, two thorny problems arise.

First, as the latest work [38] counts crowd by using multiple wireless links, it seems unsolvable to calculate the crowd density in a single link beforehand; let alone the relative locations. TensorBeat [39] is recently proposed to monitor multiperson breath rates in the high dimensions. Wi-Run [13] borrows the idea of tensor decomposition for multiple-runner recognition. However, the calculation of rank R representing the number of people is an NP-hard problem in tensor decomposition. Second, the obstacle caused by human motions in LOS path cannot be captured due to our reflection-based theory basis, though the problem can be mitigated by optimal device placements in experiments. Nevertheless, we leave the early detection of multiperson scenario as one of our future works.

6.2. Detecting Multitype Assault

Although Wi-Dog has been validated to accurately monitor physical assault based on the rapid and continuous movements of body parts, it still cannot cover all types of violent actions in real world. For example, if victim encounters sudden deadly gunshots or any single lethal assault, the assault alarm would not be triggered. Note that assault process comprises various features, including audio cues and visual cues [40] (e.g., screaming, gunshots, explosion, and blood); just in case, multiple sensors should be put into services and provide multimodal information for major-concerned public places. We envision an enhanced version of Wi-Dog by multimodal feature processing for more complicated scenarios, in which wireless sensing technology acts as an avant-courier and decision-making center.

6.3. Embracing Deep Learning Methods

The feasibility of Wi-Dog partly relies on the elaborate-design spectrum features. Specifically, we extract the spectrum features to represent the motion intensity and further consider the irregularity and continuity so as to depict the correlation of motions, while it requires the domain knowledge and handcraft feature selection. Moreover, if two users push over along the tangent line of the Fresnel Zone [17], the reflected signals could be very weak and the spectrum features could be obscure. Therefore, TDR is relatively low (i.e., at almost 0.65). Global analysis is helpful to some extent since the directions of human movements are irregular, yet it lowers the time-efficiency. It is also a natural choice to deploy multiple pairs of trans-receivers [21, 35, 36], yet it increases the economic cost.

Recent years show the potential of deep learning embedded Wi-Fi sensing, such as Deep Neural Network (DNN), Convolutional Neural Network (CNN), and Recurrent Neural Network (RNN), which could automatically extract high-level prominent features through multiple hidden layers. We believe the Wi-Dog could be further promoted by using these advanced models. For example, intrinsic motion characteristics could be captured by CNN through convolutional operations, and the long-term relations could be established by RNN. Since there is no work focusing on deep learning-based abnormal detection, we leave it as our future work.

Wi-Dog is related to the previous abnormal detection works in three categories: wearable sensor based, camera based, and wireless sensing based.

Wearable sensor based. Wearable sensor-based systems are both widely used and commercially available owing to the rapid development of sensor technology.

For example, [4] recognized free-weight activities by attached RFID tags and further assessed the exercise quality from a local-global perspective. Reference [2] developed a fall detection system by monitoring the variation of 3-dimension acceleration data, with a specific wearable device placed on human's waist. Reference [3] utilized a multisensor integrated glove to identify abnormal behaviours of paralysis patients. Reference [5] explored the possibility of using a smartphone to detect abrupt physical attacks. However, all these methods require per-user wearing sensors, while Wi-Dog aims to achieve contact-free assault detection.

Camera based. Camera-based assault detection system uses fixed cameras to capture pictures or video frames to identify human assault. Reference [6] firstly presented an approach to analyze assault relying on the information of motion trajectory and acceleration. Reference [7] further studied aggressive fighting using extreme acceleration pattern as discriminant feature. Reference [8] designed a novel image descriptor for assault with spatial and temporal features. Even so camera-based schemes address the problem of wearing extra devices, the issues of privacy and limited monitoring scope are still up in the air.

Wireless sensing based. Wireless sensing-based schemes attract extensive attention in recent years [911]. We adopt CSI-based schemes because fine-grained CSI is available in ubiquitous Wi-Fi devices. In recent studies, CSI was utilized to track the walking path [41] as well as functional body movements [12] and recognize walking postures [14], multiperson running detection [13], and even slight movements like breath [17] and finger movements [1820]. However, existing CSI-based schemes highly depend on the observation of the repetitions with reproducible features, while violent behaviours can be random and irregular. NotiFi [42] proposed a non-parameter training scheme for abnormal activity detection by using Dirichlet process. Yet it requires the user’s continuous walking and cannot handle the presence of multiple users. RT-Fall [21] is the most similar state-of-the-art, which used the variations of calibrated CSI phase differences to detect fall and extracted the distinct power decline pattern to find transition points. However, we discard the phase information of CSI power due to the impact of carrier frequency offsets. Motivated by CARM [15] which extracted PLCRs from time-frequency analysis, Wi-Dog makes one step further in accurate feature extraction of velocity, including a selection strategy of antennas and subcarriers as well as a segmentation method. Furthermore, Wi-Dog improves the robustness through all-round local-global analysis.

8. Conclusion

Wi-Dog is a noninvasive physical assault monitoring scheme on a single link with commercial Wi-Fi devices, which consistently analyzes the local-global characteristics of CSI waveforms. A set of novel CSI processing methods are proposed to choose reliable antenna pairs and sensitive subcarriers. Moreover, a feasible Assault Detection Indicator (VDI) is developed to monitor target frequency transitions of location-variant behaviours. Finally, all-round local-global analysis is adopted to fully exploit the features of physical assault. We prototype Wi-Dog on commodity Wi-Fi devices and evaluate the overall performance in both LOS and NLOS scenarios. Experimental results further validate the accuracy and robustness of Wi-Dog, compared with the state-of-the-art abnormal detection methods. We consider Wi-Dog as an early step toward general emergency detection on wireless sensing and a significant complement for computer-vision-based abnormal detection in security-minded places, including, but not limited to, terrorist threat warning, fall detection for the elderly, and exercise quality assessment.

Data Availability

The data used to support the findings of this study are currently under embargo while the research findings are commercialized. Requests for data, 6 months after publication of this article, will be considered by the corresponding author.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work is supported by National Key Research and Development Project of China No. 2017YFC0704100 (entitled New Generation Intelligent Building Platform Techniques).