Abstract

Smartphones are ubiquitously integrated into our home and work environment and users frequently use them as the portal to cloud-based secure services. Since smartphones can easily be stolen or coopted, the advent of smartwatches provides an intriguing platform legitimate user identification for applications like online banking and many other cloud-based services. However, to access security-critical online services, it is highly desirable to accurately identifying the legitimate user accessing such services and data whether coming from the cloud or any other source. Such identification must be done in an automatic and non-bypassable way. For such applications, this work proposes a two-fold feasibility study; (1) activity recognition and (2) gait-based legitimate user identification based on individual activity. To achieve the above-said goals, the first aim of this work was to propose a semicontrolled environment system which overcomes the limitations of users’ age, gender, and smartwatch wearing style. The second aim of this work was to investigate the ambulatory activity performed by any user. Thus, this paper proposes a novel system for implicit and continuous legitimate user identification based on their behavioral characteristics by leveraging the sensors already ubiquitously built into smartwatches. The design system gives legitimate user identification using machine learning techniques and multiple sensory data with 98.68% accuracy.

1. Introduction

We are living in an era of context-aware systems whose aim is to acquire a user’s context and reason on it to change a system’s behavior to match the user’s changing situation [1]. Making user’s context information available to such systems is a critical task, and one such information is the identity of the user. Furthermore, today’s era is an era of smart devices such as smartphones (SP), smartwatches (SW), smart TVs, and even smarthomes (SHs). The modern SWs consist of extensive computing power, different sensors, and the ability to communicate with other smart devices, for example, SH and SP via Bluetooth or the WIFI. SWs are a comparatively new expansion and probably the first SW to be truly modern and smart “The Pebble” became available in early 2013 [2].

In 2014, many other SWs were released, and almost all of these operate with Android phones and run the Android Wear subsystem. These SWs include the Moto 360, Sony SW 3, LG G, and Samsung Gear. While the sale of these SWs has recently been modest, the introduction of the Apple watch in 2015 greatly increased interest in such devices. It is now clear that SWs have become as ubiquitous as SPs, and current market projections indicate that nearly 400 million SWs will ship by 2020 which is 25 times greater than 2014 sales [2, 3].

Modern SWs are equipped with a variety of motion sensors that are useful for monitoring device movements like tilt, rotate, and shake. Some of these sensors are the ambient light sensor, accelerometer, compass, gyroscope, magnetometer, and GPS sensors. These sensors support similar capabilities and applications of smartphones such as health-care applications that require physical activity recognition (PAR). The accelerometer, linear accelerometer, magnetometer, and gyroscope sensors are ideal for PAR and gait-based legitimate user identification over SPs [47]. This work will show that SWs are equally capable of performing PAR and gait-based legitimate user identification.

The proposed legitimate user identification model uses a personal (single predictive) model to identify a user within a group of users. Finally, the identification model uses a predictive model to determine if an unknown user is a legitimate user or is an impostor. This work utilizes SWs and SPs to collect and store sensor data from three different sensors. These sensors include accelerometer, magnetometer, and gyroscope sensors. The data collected by these sensors were ultimately sent to the computer for further processing. This work utilizes the Android-based SPs and SWs because these devices are easily available in the market at low price.

Gait-based legitimate user identification on SWs has several advantages over the SPs, for example, portability, location, and orientation that almost remain stable which are quite important advantages over SPs. Both the location and orientation of SP may diverge, depending on the user’s style of wearing and on the activity that the user is performing. Change in any of the above-discussed issues will reduce the effectiveness. Furthermore, some locations and positions simply do not generate the appropriate signatures for legitimate identification. Explicitly, the issue of orientation and location occurs with females because they frequently bring their SPs off the body, but in case of SW, the device will be carried in a fix position such as on the wrist almost all the time. Above all, a SW can easily transmit the data to other paired devices using Internet or Bluetooth which is evident that the SWs are superior for user identification for cloud-based secure applications like Internet banking or to access SHs.

To support the above-said discussion, we found that recently a South Korean telecommunication company named SK Telecommunication started working towards a system meant to use SW to provide legitimate user identification, in order to access a secure online banking application [8]. However, banking applications normally use security similar to that of most other applications, where accessing your bank online requires a special randomized key from your bank or a special USB drive or any other secure identification means. A custom-designed user identification application will simply allow customers to tap on their registered SW to access their online banking system without much effort. In addition to this, the online system is subject to powerful encryption at both ends to secure the user personal information.

Keeping in mind the computational sources of SWs, the proposed system will be fairly simple as the user only needs to register the SW to use it in conjunction with any digital banking portals to authorize legitimate user access. Once a user is registered, it presumably involves some sort of verification and identification or the user’s SW can be given a tap while running the correct software for identification as a legitimate user. If the SW is lost or stolen, the user can pass a kill command to nullify the online access by SW. This system has a number of different possible ways in which it could pair up and identified a legitimate user on SW, and our proposed solution can be a foundation for such secure applications.

In addition to the above, SW-based legitimate user identification can support many other real-life applications, for example, acting as the foundation for a delegated identification system for SH. More specifically, while a legitimate user is approaching their SH, their SW transmits its sensor signal to the SH which would compare it with the previously sent signals, and if sensor signal matched, then it would open the door. The proposed solution can also be used for such kind of secure systems to identify a legitimate user with an acceptable accuracy with least computational power and time.

The rest of the paper is organized as follows: Section 2 describes the related work. Section 3 presents the procedure for collecting the raw signals from the six users performing five different activities and how the raw signal is transformed into a suitable format for machine learning algorithms. The results of these experiments are presented in Section 4. Section 5 discusses the immediate future extensions to the current research. Finally, Section 6 concludes the work.

Recently, wearable devices like SWs have emerged in our daily lives. However, limited research has been done on legitimate user identification by these wearable devices. Besides, these several traditional legitimate user identification approaches have been proposed based on passwords such as secret information possession and physiological biometrics such as iris patterns and fingerprints. More recently, behavior-based legitimate user identification utilizes the distinct behavior of users such as gestures and gaits [5, 7].

Different physiological biometrics for legitimate user identification systems are out there, such as iris patterns [9], fingerprints [10], and face patterns. However, such legitimate user identification requires user interactions. For example, fingerprint identification needs users to put their finger on the scanner. Hence, these approaches requiring user compliance cannot achieve continuous and implicit identification [11] which was an ultimate goal of our proposed system to overcome.

In contrast to above-discussed solutions, behavior-based legitimate user identification assumes that the people have distinct but stable patterns for a certain behavior such as gait [5, 7, 12], handwriting patterns [13, 14], and GPS patterns [15]. Such legitimate user identification exploits users’ behavioral patterns to identify a legitimate user. Some important and classical works from the literature in the area that specifically use built-in sensors for legitimate user identification are discussed below.

Kayacik et al. [16] proposed a temporally and spatially aware user behavioral lightweight model based on hard and soft sensors. For some reason, they did not quantitatively show the legitimate identification performance, but they have shown that the attackers can be detected in 717 seconds. Buthpitiya et al. [15] proposed a GPS sensor-based system that could detect abnormal activities by analyzing legitimate users’ location history. Trojahn and Ortmeier [14] and Shahzad et al. [13] have developed a mixture of a handwriting and keystroke-based method to achieve legitimate user identification through the screen sensor. Zhu et al. [17] proposed a system which constantly collects the data from three different built-in sensors namely the gyroscope, magnetometer, and accelerometer to construct gesture models while a legitimate user is using the device. Nickel et al. [12] proposed an accelerometer-based behavior recognition system for legitimate user identification using a k-nearest neighbor-based classification algorithm. Lee et al. [18, 19] empirically proved that using more sensors can improve legitimate user identification performance by using a support vector machine (SVM) as a final classification algorithm. Li et al. [20] proposed five basic movements, namely, sliding up, sliding down, sliding right, sliding left, and tapping and their related combinations as legitimate user behavioral patterns with which to perform legitimate user identification.

In regards to the works discussed above, Riva et al. [21] proposed a prototype using voice recognition, phone placement, and face recognition proximity to progressively identify a legitimate user. However, their objectives were just to decide when to identify the legitimate user and thus not match to the proposed framework. Furthermore, their scheme requires access to sensors that need users’ permissions, which limiting their application for implicit legitimate user identification in a real-time environment. Mare et al. [22] proposed a two-fold legitimate user identification model in which the signals sent from a bracelet worn on the user’s wrist are correlated with the operations of the terminal to confirm the continued presence of the user if the two movements correlate according to a few coarse-grained actions. Lee and Lee [11] proposed a legitimate user identification system named iAuth for implicit but continuous user identification in which the end user is identified based on their behavioral characteristics by leveraging the built-in sensors. They have built a system which gives better identification than previously possible using sensor data from multiple devices and machine learning techniques. Their system was able to consume only 2% of the battery to produce 92.1% accuracy.

To the best of our knowledge, there is no SW gait-based legitimate user identification research proposed in the literature that works in the way this one does including the ones discussed above. This study takes the advantage of the idea of identifying a legitimate user on SW by employing different activity patterns. Data on different activities are recorded using the embedded triaxial without limiting the scope only to a controlled environment. The aim of this work is to propose a semicontrolled environment system in which the proposed system overcomes the limitations of users’ age, gender, SW wearing style (left or right hand), and regular activity style while wearing a SW. The user was enforced to perform daily activity differently at different times because the goal was to investigate the ambulatory activity performed by any user towards legitimate user identification in real-time scenarios.

Thus, this work introduces a novel two-fold legitimate user identification system in which the proposed system first recognizes the activity and then the identification process comes in to decide whether the recognized activity has been performed by a legitimate user or imposture. Additionally, this work experiments with a single-subject-cross-validation process to further validate a legitimate user identification. The proposed system is a semicontrolled environment-based activity recognition and legitimate user identification system.

3. System Modeling

This section describes the process for collecting the raw signals from different users performing different physical activities under study. This section also explains the process for extracting the meaningful features. Furthermore, we will explain the process of transforming the time series raw sensor signals into examples that can be handled by different classifiers from machine learning literature, for example, Decision Tree (DT), K-Nearest Neighbor (KNN), Support Vector Machine (SVM), and Naive Bayes (NB).

3.1. Data Collection

The raw signals were collected for five different activities from six users (three female and three male) having a mean age of twenty-five years old. The criterion for selecting the subjects was based on gender because different genders exhibit different patterns when performing the same activity. These activities include walking, walking upstairs, walking downstairs, running, and jogging. All subjects performed these activities twice each day for more than a month. Therefore, the proposed system utilizes the collected raw data from the same users for the same activity but performed on different days.

The participants enrolled in this study were approved by the laboratory head. This is a formal prerequisite because the experiments involved human subjects although there was a negligible risk of injury. The involved subjects were asked to answer a few nontechnical questions about their gender, age, height, weight, left-or-right handedness, and so on, which were used as characteristics in the proposed study. Then the subjects were asked to fasten the SW on their wrist and place a Bluetooth paired SP in their pocket. Both devices run a simple custom-designed application that controls the data collection process and instructs the participant to add their name and select the activity from the list of five different activities and the sensor from three different sensors. Once the initial instructions have been completed, the SP screen is turned off and placed into the pants pocket. The SP instructs the SW running the paired data collection application to collect the raw signal at a rate of 20 Hz. Each of these sensors generates 3-dimensional signals and appends a timestamp to the values. After every five minutes, the SW sends the data to the SP, and after a successful transmission, the SP vibrates to notify the user that the data collection process has been successfully completed and they can stop the current activity.

3.2. Feature Extraction

SW sensor measurements are of the form where are, respectively, the components of the acceleration relative to the smartwatch. The proposed system systematically removes the gravity component from each of the measurements. Raw accelerometer measurements are quite noisy since even a SW in a fixed position could return sensor measurements depicting bursts of acceleration. To minimize the effect of noise, the proposed system used a simple moving average based on a window of 3 points for each of the components. For each component, the smoothed time series was then broken into windows.

There are plenty of ways to prepare the raw sensor signals prior to using them for legitimate user identification. Some gait-based works utilized the data within the time domain [2325] but other systems map the time series sensor data onto examples using a sliding window approach. This technique permits the use of traditional machine learning classification approaches to handling the time series data. Our proposed study also utilizes the same sliding window approach employed in the prior work [2, 57].

The discussed windowing process initially partitions the time series raw signal into 30 seconds non-overlapping windows. Then, from each window, the system generates relatively simple features (together with time and frequency) for each sensor individually but uses the same encoding technique [2]. Each feature is calculated from each axis of the raw signal. Since the data are sampled at 20 Hz and the window size is 30 seconds (which includes 25 samples within each iteration), there are 600 time series values per axis per window and 1800 values per window for three sensors. During the feature extraction process, the proposed system changes the window size from 25 samples per window to up to 400 samples per window in different experiments to further validate the behavior of window size for legitimate user identification.

The said process holds for all three sensors’ data, and each of these time series values is transformed into 72 features using the feature encoding. The extracted features are average acceleration, average absolute difference, standard deviation, and average resultant acceleration, in which 1 feature for each axis is obtained (in total 4 features per axis), the average difference between peaks (10 features for each axis). Our system also calculates the binned distribution in which the proposed system determines what fraction of readings fall within a 10 equal-sized bins, and this function generates 10 features for each axis individually.

3.3. Classifiers

This work leverages the different classifiers available in Matlab. The literature has highlighted that each classifier will have varying results depending on what the proposed system is predicting. The training process involves learning in relation to the label user wants to predict [26]. For experimental setup, the proposed system involved four different types of classifiers, for example, DT, KNN, SVM, and NB in order to compare the performance.

3.3.1. Decision Tree (DT)

In DTs, the input space is first separated by class regions to determine the DTs. Nodes are generated with decision functions that branch depending on the output of a decision. As one traverses from root to leaf, the classifier effectively narrows the prediction space until it reaches its final prediction at the leaf. Decision trees bring scalable and fast implementation with the need to tune many parameters [26].

3.3.2. K-Nearest Neighbors (KNN)

When classifying a given unseen feature vector, KNN will find the k-nearest points given a distance function, look at all k training labels, and predict the label as the majority of the k labels. An advantage of KNN is its robustness against noisy data, and there is only the number of nearest neighbors which needs to carefully tune [26]. It is an instance-based classifier which is also one of the most popular classifiers used for SP-based PAR and is found to be the best in terms of performance and computational complexity as compared to the decision trees [27].

3.3.3. Support Vector Machine (SVM)

SVM recognizes a diverse set of physical activities using motion and other sensors, and the literature has highlighted that their performance is superior to that of the other classifiers [28].

3.3.4. Naive Bayes (NB)

NB is a simple and well-known classification method. NB is a probabilistic classifier [30], and Bayes’ rule contains probabilistic models. Bayes’ rule relies on the statistical properties of data and the accuracy of data. To begin with, it finds the solution from statistics as well as by data mining [29]. All of these classification methods are suitable for real-time legitimate user identification because they can be generated and evaluated rapidly.

The values used for the different parameters of the classification methods are as follows: SVM is used with a quadratic kernel function; KNN is used with a Euclidean distance function, and nearest neighbors are set to 10; DT is used with 85 as the number of trees. All the said parameters are carefully tuned and optimized prior to the final experimental setups. All the experiments are carried out using Matlab R2014b and installed on core i5 and 8 GB of RAM machine.

4. System Setup

The output of each classifier result is a strong indicator of the system’s ability to predict the legitimate user of the SW. 10-fold-cross-validation has been performed to extract the meaningful information for each legitimate user. In each experiment, the user’s data are split into 10 subsets in which a single subset is chosen as a validation set towards legitimate user identification and the rest of the 9 subsets are used as training data to be fed into each classifier individually. Classifier results are generated with the given setup with every instance in the validation set being classified against the training sets. This entire process is repeated for every activity and each user and for all three sensors by picking each subsequent subset as a validation set with the reaming as the training set. Leading to a total of 30 experiments for a single sensor’s data, 90 experiments for all 3 sensors and 360 experiments for all four classifiers which are weighted for the final results to identify either a legitimate user or imposture.

4.1. Experimental Setup

The first experimental setup compared the performance of four different classifiers for three different sensors data for a fixed number of samples per window, after which each classifier was chosen and tested multiple times while changing the number of samples per window, that is, 25, 50, 75, 100, 125, 150, 175, 200, 225, 250, 275, 300, 325, 350, 375, and 400 samples. The main goal of our second experiment was to measure the effect of changing the window size on the performance of each classifier. In both experimental setups, the training and testing data are randomly divided, and classification results are obtained using a 10-fold-cross-validation process.

4.2. Experimental Results

The SW gait-based legitimate user identification task is first to identify a user from a pool of users and then to verify that specific user can access the device based on a sample of the user’s performed activities taken from the selected sensor. This process requires the training data from all the users and their performed activities. Such experiments seem fairly simple, such as the transformed data associated with the sensor data are individually used to train and evaluate using 10-cross-validation. In the identification process, each user has its own classification model, and when a sample is provided, the model determines whether the sample belongs to the legitimate user or to an imposter. This identification experiments and evaluates a model for each of the 6 participants in the study, and in each case, each activity is considered independently.

Here, we turn to the first experimental results, in which we used a fixed sample size in each window. Table 1 shows the raw accuracy for legitimate user identification for three different sensors and four classifiers based on the performed activities. Straight walking activity-based legitimate user identification using the accelerometer sensor performed better than the other activity-based methods over a DT classifier with an accuracy of 98.68%.

These results show that even 400 samples per window are sufficient to identify a user most of the time especially if one uses the accelerometer data with a classifier other than NB. Here, one can note that the accelerometer sensor data are clearly more informative and helpful in identifying a legitimate user than the magnetometer and gyroscope data.

The gait-based legitimate user identification process explained above involves building a single predictive model to first identify a specific user from a set of users and then deciding whether the identified user is legitimate or an impostor. At the lowest level, the results are based on identifying an individual user based on 400 samples per window of data for different activities performed at different times. However, one can improve the proposed model by using more data and then employing a majority voting scheme to identify a legitimate user from the pool of different users.

In order to demonstrate how this scheme works and to provide greater insight into the results, confusion matrices are generated for each user which is presented in the following tables. Due to space limitations, Table 1 shows the overall results only for legitimate user identification based on different activities performed by the legitimate user or impostor. The results shown in the tables are based on an identification model generated from three sensors’ data and four different classifiers.

The columns in Tables 2 and 3 correspond to the predicted users and the rows correspond to the actual users. Thus, the values in the diagonal in boldface correspond to correct identification of the legitimate user while the rest of the values correspond to identification of an impostor as a legitimate user or vice versa. The obtained results clearly indicate the ability of the proposed model to correctly identify the legitimate user. Based on the stated results, one can compute the accuracy for identifying a legitimate user or accuracy for aggregated overall six users. For example, the accuracy for correctly identifying the legitimate user 1 is almost 97.81% while the accuracy for identifying the legitimate user 2 is 98.26%. Whereas, the overall accuracy would be simply the total number of correct predictions divided by the total number of predictions in the case of a DT classifier using accelerometer sensor data.

The corresponding results interpreted with the most predicted legitimate user strategy are shown in Tables 13 within a fixed size of the window for each activity. This strategy always leads to perfect results except for the case of NB classifier. Based on a visual inspection of the confusion matrices and based on the fact that there is usually no second user who gets nearly as many votes as the actual user. We believe that for the population used in this experiment, one could get perfect legitimate user identification accuracy using fairly small samples of data.

For trusted external judgments and for statistical analysis of any legitimate user identification system, true positive, true negative, false positive, and false negative are usually compared. The terms true and false refer to whether the prediction corresponds to the external judgment or not and the terms positive and negative refer to the classifier’s prediction. The test names in Tables 4 and 5 are abbreviated as TPR = true positive rate, TNR = true negative rate, FPR = false positive rate, FNR = false negative rate, PPV = positive predictive values, NPV = negative predictive values, SEN = sensitivity, SEP = specificity, FDR = false discovery rate, FOR = false omission rate, and ACC = individual user identification accuracy.

PPVs are the scores of the positive statistical results based on true positive and true negative values. PPV shows the performance of a statistical measure and in the proposed model it has been used to confirm the probability of positive and negative results. A higher value of PPV indicates that fewer positive results are false. False Omission Rate and False Discovery Rate is a statistical method used in multiple hypothesis testing to correct for multiple comparisons. It measures the proportion of false negatives which are incorrectly rejected. FOR is computed by using false negative and true positive and it can also be computed by taking the complement of NPVs. FDR measures the proportion of actual positives that are incorrectly identified. FDR is also one way to abstracting the rate of type I errors in null hypothesis testing when conducting multiple comparisons between classes. FDR is computed by using FP and TP.

The second experimental study is based on varying the window size for the legitimate user identification process. The window size is an important system parameter which determines the time that the system needs to perform an identification, that is, window size directly determines the system’s identification frequency. In this experiment, the system varies the window size from 25 samples per window to 400 samples per window with 25 sample blocks within each window. Given a fix window size for each targeted user, the model is learned using 10-fold-cross-validation for training, validation, and testing. Here, we utilize the average accuracy across all activities stated before. In these experiments, we investigate the influence of the window size on average accuracy in choosing a proper window size. Within each window, another important system parameter is the total number of samples from the 3-dimensional signal which affects the average and overall accuracy because a larger training set provides the system more information but allows more chances for the system to be overwhelmed and degrades the classifier’s generalization performance. According to the observations; the largest number of samples per window produces the maximum accuracy in almost all cases and for each activity, particularly, when the number of samples per window exceeds 200 samples or more. The accuracy decreases when the training set size is lower than 200 because a larger training set is likely to cause over-fitting so that the constructed training model would introduce more errors than expected. The detailed results over a different number of samples per windows with a 99% confidence interval are shown in Figures 13.

5. Discussion

The outcomes of the activity recognition and identification experiments described above provide the overall results for the proposed identification model. Recall that the results presented in Tables 13 have been aggregated over all identification models, that is, one per subject and all identification decisions presented here are based on multiple instances, that is, 25–400 samples per window of data. The results in Figures 13 indicate that SW-based identification can be relatively accurate when using only 400 samples per window of data from different activities performed by each user. Form the identification results, one can confirm that the accelerometer sensor data performs slightly better than the gyroscope sensor data and together the accelerometer and gyroscope sensors produce much higher identification results than the magnetometer sensor. In terms of classifiers, DT outperforms the KNN, SVM, and NB classifiers. In this sequence, KNN outperforms the SVM and NB but performs less well than Dts. SVM slightly underperforms against KNN but performs much better than NB classifier.

Activity recognition-based user identification models perform much better for almost all activities except for running. The overall accuracy for the proposed activity recognition-based identification is almost for walking activity and accuracy for jogging and to for the rest of the activities-based user identification model. The results of the proposed two-fold activity recognition and legitimate user identification model are presented in Figures 13 and Tables 15 which show the ability to efficiently recognize the individual activity and identification based on the recognized activity of the individual user.

As we earlier explained, there is no research in the literature as similar to the work presented in this paper. However, we found two closely related works, and their comparison results are presented in Table 6. These methods have exactly been tested as the settings mentioned in their respective works. Based on the results, one can conclude that the activity recognition-based legitimate user identification framework performed better because we used a single predictive model and ambiguity activity recognition analysis that significantly help the model to perform better in the identification process.

We also measured the time for doing activity recognition and user identification in the proposed system which is less than 5 seconds in extreme case of 400 samples per window as shown in Figure 4. One can also observe that as we decrease the size of the window (i.e., we increase the number of samples for training model), the time for doing an implicit activity recognition and user identification increases slowly at first and then sharply increases when the size of the window decreases from 150 samples per window. One can also observe from identification results that the higher the window size, the better the identification results. Therefore, the proposed system can achieve acceptable performance in terms of accuracy and computational time which makes the proposed system efficient and applicable in real-world scenarios.

We have also analyzed the model’s ability to defend against impostures such as masquerading attacks. Recall that the goal of the proposed model was to prevent an imposture from getting access to the secure and sensitive information or services against the stored passwords. The obtained results also show that the proposed model is secure against the masquerading attacks. The term “secure” means that the imposture cannot cheat the system by performing these attacks in a short time. Therefore, the proposed system performed well in recognizing the adversary who is launching the masquerading attack. Thus, within several windows, the probability for imposture escaping detection is only. Therefore, the proposed system shows good performance in defending against masquerading attacks too.

6. Future Work

The goal of this research was to show that activity recognition-based legitimate user identification is an effective approach in gait-based identification domain. As we have shown, it is possible to distinguish between both individuals performing the same activity. An immediate question we have for future work is to determine how identification time can be improved within different activities as a first step. For this, the immediate solution is to use any lightweight feature selection method which somehow will help to improve the discriminative power and reduce the dimensions at the same time. To the best of our knowledge, this idea is relatively new in SW-based activity recognition and identification. The preliminary results indicate that this is indeed a promising area of research.

Additionally, as discussed above, a key limitation to activity recognition-based user identification is the variability of the signal. Our future work will focus on studying the potential of a further windowing process and feature selection. We will then use simple DT, KNN, and SVM classifiers. However, from the current results, one can observe that the DT classifier outperformed all other classifiers. When using the DT classifier, the results are promising for both activity recognition and user identification. However, this study was conducted on a relatively small set of users. The experimental dataset includes 5 activities performed by 6 six users per activity but the advantage is this dataset does not distribute classes uniformly. This has led to a set of results in line with other state-of-the-art works.

In addition to the above, future work would entail bulking out the experimental dataset with more users (more than 15), activities (more than 10), and training runs. Finally, our experimental dataset was collected using Android SWs running in Android Wear OS. It would be useful to use different SWs running different operating systems. We believe it is useful to measure PAR-based user identification on a wide verity of SWs. One final goal of this study was to incorporate this technology into a real-time system.

7. Conclusion

Smartwatches are becoming increasingly popular. This popularity has forced the community to study the security implications of these small and powerful devices. It has been suggested that activity recognition and gait-based identification combined with SWs are possible. This study described an effective two-fold system for performing SW-based activity recognition and user identification. This study demonstrates that gait as measured by the commercial grade SW sensor is sufficient to identify an individual with modest accuracy. Furthermore, a simple sliding window approach is shown to be sufficient for representing the time series sensor data. Experimental results demonstrate the advantage of combining the time and frequency domain information. The proposed system can achieve user identification average accuracy up to 98.68% with negligible system overhead, minimum time, and power consumption. We hope that the proposed system can act as a key technique for implicit activity recognition-based legitimate user identification in real-world scenarios.

Data Availability

The experimental datasets will be provided upon reasonable requests to [email protected].

Conflicts of Interest

The authors have no conflicts of interest to declare.

Authors’ Contributions

The authors Mohammed A. Alqarni, Asad Khan, Adil Khan, Sajjad Hussain Chauhdary, Manuel Mazzara, Tariq Umer, and Salvatore Distefano contributed equally to this work.