Abstract

Smart homes based on the Internet of Things have been rapidly developed. To improve the safety, comfort, and convenience of residents’ lives with minimal cost, daily activity recognition aims to know resident’s daily activity in non-invasive manner. The performance of daily activity recognition heavily depends on solving strategy of activity feature. However, the current common employed solving strategy based on statistical information of individual activity does not support well the activity recognition. To improve the common employed solving strategy, an activity feature solving strategy based on TF-IDF is proposed in this paper. The proposed strategy exploits statistical information related to both individual activity and the whole of activities. Two distinct datasets have been commissioned, to mitigate against any possible effect of coupling between dataset and sensor configuration. Finally, a number of machine learning (ML) techniques and deep learning technique have been evaluated to assess their performance for residents activity recognition.

1. Introduction

The world’s population is aging, leading to uneven population composition. It is estimated that, by 2050, there are more than 20% of the population who will exceed 64 years and the number of people over the age of 80 in the world will reach nearly 379 million, about 5.5 times (69 million) in 2000 [1, 2]. This increase in population aging is expected to lead to an increase in age-related diseases, which in turn will provide an additional burden on health care [2]. As a population ages, the potential support ratio tends to fall. PSR is the number of people aged 15–64 per one older person aged 65 or older. This ratio describes the burden placed on the working population (unemployment and children are not considered in this measure) by the non-working elderly population. Between 1950 and 2009, the potential ratio reduced from 12 to 9 potential workers per person aged 65 or over [1].

In recent years, smart homes based on the Internet of Things have been rapidly developed in order to improve the safety, comfort, and convenience of residents’ lives with minimal cost. They are mainly used in intelligent video surveillance, patient monitoring systems, and human-computer interaction, virtual reality, smart security, athlete-assisted training and so on. Obviously, the fundamental of smart home is the recognition of user activity.

The main purpose of Ambient Assisted Living (AAL) is to support the independent living and subsequently alleviate a portion of the problems associated with ageing. It is widely seen as an effective approach to solving some of the problems associated with supporting population ageing [3, 4]. With the continued development of smart homes technologies, individuals, such as the elderly and disabled, can improve their quality of life and can live independently at home.

Activity recognition (AR) is one of the important ways of AAL. AR is a complex process and can be generally classified into two categories in terms of the type of sensor that is used for activity monitoring. The first is called as vision-based activity recognition. The methods in this category utilize computer vision techniques, including feature extraction, structural modeling, movement segmentation, action extraction, and movement tracking to analyze visual observations for pattern recognition. The second is called as sensor-based activity recognition. Sensor data generated by sensor-based monitoring is primarily a time series of state changes and/or various parameter values typically processed by data fusion, probabilistic or statistical analysis methods, and formal knowledge techniques for activity recognition. Sensor-based activity recognition can be divided into two categories. The first is based on wearable sensor activity monitoring, which is more concerned in mobile computing. The second is dense sensing, which is more suitable for applications that support smart environments.

In this paper, we focus on sensor-based activity recognition. A key step of sensor-based activity recognition is activity feature solving. However, the current common employed solving strategy based on statistical information of individual activity does not support well the activity recognition. To improve the common employed solving strategy, an activity feature solving strategy based on TF-IDF is proposed in this paper. The proposed strategy exploits more statistical information related to both individual activity and the whole of activities.

The rest of paper is organized as follows. Section 2 describes related work. Section 3 describes process of activity recognition. The proposed feature solving strategy is explained in Section 4. Section 5 describes the implementation of the experiment and method evaluation. Section 6 concludes the paper.

At present, a number of methods of identifying activity have been developed. According to the sensor type, it can be divided into video sensor based, wearable sensor based, and embedded sensor based. For video sensors, Ashish Khare et al. proposed a video sensor-based behavioral recognition approach that integrates local binary patterns [5]. Lin et al. proposed a new network-based transmission (NTB) algorithm for human activity recognition in video [6]. However, user’s privacy is a huge challenge, and many users are reluctant to place sensors in sensitive places such as bedrooms and bathrooms. And video sensors are also affected by factors such as day and night, the environment, and so on. For wearable sensors, Kevin Bouchard et al. use passive RFID-based activity recognition systems to detect anomalies in cognitive impairment [7]. Yang et al. proposed a simple method for identifying human activities based on simple object information involved in RFID usage activities [8]. Andrey et al. proposed an accelerometer-based convolutional network for activity recognition [9]. However, it is inconvenient for users to carry, most users are not willing to carry the sensor on their body, and the acquisition of activity sometimes depends on factors such as the location carried by the sensor. The embedded sensor solves the problem brought by video sensor and wearable sensor. The embedded sensor has the advantages of effectively protecting the user’s personal privacy, being free from the influence of the surrounding environment and not requiring the user to carry [1012].

Activity recognition in smart homes can be divided into knowledge-driven and data-driven [1315]. For knowledge-driven approach, knowledge is generated from field experts. In [16], Chen et al. proposed a real-time continuous activity recognition of multi-sensor data streams in knowledge-driven smart homes. In addition, ontology is often integrated into knowledge-driven methods. In [17], Latfi et al. present an ontology-based model of the TSH for elderly activity recognition. Salguero et al. propose that the ontology automatically generates the features of the ADL classifier for behavior recognition [18]. The ontology-based approach is clear and easy to understand. Knowledge-driven is therefore called a top-down approach, but it is poor in dealing with uncertainty and time information.

In contrast, data-driven approaches collect data from a large number of sensor streams, organize the data to form information, then integrate and refine related information, and use machine learning technology to train and fit to form an automated decision model based on the data [19]. In [20], a framework for acquiring and developing different layers of context models in a smart environment is proposed. Tapia et al. propose a real-time algorithm to automatically identify physical activities [21]. Data driven is also known as a bottom-up approach. Strong ability to deal with uncertainty and time information. Therefore, this paper uses data-driven activity recognition.

Data-driven approaches are generally divided into a generation method and a discrimination method. In the generation mode, Patterson et al. propose multiple different HMM models for activity recognition [22]. In order to improve the HMM model for identifying complex activities, a multi-layer hidden Markov model (HHMM) is proposed in [23]. Vail et al. propose a new, effective feature selection algorithm for m-estimates-based CRF to identify the most important features for behavior recognition [24]. Although it works better with uncertain or incomplete data, it requires a lot of data to learn to optimize the model. With the development of neural networks, deep learning is gradually applied to activity recognition. Li et al. proposed a BP neural network for representing and identifying human activities from observed sensor sequences [25]. Deep Belief Network (DBN) model is proposed for successful human activity recognition [26]. Guan proposes an ensemble of deep long-term short-term memory (LSTM) networks for behavior recognition [27]. In [28], Chen et al. use LSTM recurrent neural networks to analyze sensor readings from accelerometers and gyroscopes to identify human activity and provide position-aware methods to improve recognition accuracy.

For activity feature selection and solving, start time and duration of activity instance are commonly used temporal features. Individual sensors, set of frequent sensors and sequence of frequent sensors are common used space features [29]. For space features, the common solving strategy of feature includes frequency, density, etc., that space features are activated. Because current solving strategy only takes into account statistical information of individual activity, it does not support well the activity recognition.

3. Process of Activity Recognition

As shown in Figure 1, activity recognition process includes four stages.

In the first stage, raw sensors events are collected in form of stream when a daily activity is occurring. In Figure 2, raw sensors events of a sample of activity “Sleep” are presented. When a daily activity instance starts, some sensors will be activated orderly in time series until the daily activity instance ends. When some sensor is activated, the activated date, the activated time, the name, and the value of the sensor are stored. For example, the first activated sensor is “M021” with value “ON” at time “00:06:32.834414” in 2011-06-15 for activity “Sleep” in Figure 2.

In the second stage, sensor events sequence is separated into a number of sub sequences. Each subsequence corresponds to an entire activity instance.

In the third stage, features of daily activity are selected and solved. Generally, features are divided into temporal features and space features. Start time and duration of an activity instance are common temporal features. Sensors are common space features. Temporal features and space features are used to characterize daily activity instances. After features are selected, features can be solved according to some strategy.

In the last stage, activity recognition model is built. Then, training data is provided to train recognition model. Trained recognition model is employed to assign an activity label to each of test activity instances.

4. Activity Feature Selection and Solving

4.1. Activity Feature Selection

As mentioned above, our work focuses on activity feature selection and solving. The task of feature selection is to determine feature set. It is common to previous work that both temporal features and space features are involved in our work [10]. Temporal features include start time and duration of an activity instance. Space features are divided into two categories by formula of feature solving. The first category of space features is named Start-End Frequency (SEF) features. Each of SEF features corresponds to a sensor. The second category of space features is named as TF-IDF features. Each of TF-IDF features also corresponds to a sensor.

Formally, let be the set of sensors which are deployed in a smart home. Feature set is defined . and denote start time and duration of an activity instance, respectively. is set of SEF features. is set of TF-IDF features.

4.2. Activity Feature Solving
4.2.1. Temporal Activity Feature Solving

For an activity instance, start time and duration are extracted as the values of features and . In Figure 2, the values of and of the activity instance “Sleep” are “00:06:32” and 12717 seconds, respectively.

4.2.2. SEF Activity Feature Solving

SEF activity feature solving process is presented in Algorithm 1. For an activity instance and a sensor (k>=1 and k<=n), the corresponding SEF feature value is assigned to 2 if both the first sensor and the last sensor are . The corresponding SEF feature value is assigned to 1 if the first sensor or the last sensor is . The corresponding SEF feature value is assigned to 0 if neither of the first sensor and the last sensor are . For activity instance “Sleep” in Figure 2, the value of SEF feature is assigned to 2 when is corresponding to sensor “M021”.

Input: , a set of activity instances
, set of SEF features
Output:
1. ;
2. Assign 0 to , where >=1 and k<=n, j>=1 and j<=m
3. ;
4. while(j<=m)
5. Extract the first activated sensor ;
6. Extract the last activated sensor ;
7. ; ;
8. while(k<=n)
9. if   is same to   then
10. ++;
11. end if
12. if   is same to   then
13. ++;
14. end if
15. ++;
16. end while
17. end while
18. return  
4.2.3. TF-IDF Activity Feature Solving

(1) TF-IDF. Considering a set of terms and a set of documents , Term Frequency-Inverse Document Frequency (TF-IDF) is a common weighting formula which is employed to evaluate how important a term is to a document in the field of information retrieval [30]. Formally, TF-IDF is defined as . , where is how many times the term appears in the document . .

In this paper, TF-IDF is employed to evaluate how important a sensor is to an activity instance. Considering a set of sensors and a set of activity instances , TF-IDF is defined as .

Ranges of different TF-IDF feature values vary considerably. To normalize TF-IDF feature values, two optimization functions are introduced into TF-IDF feature solving. The first function is sigmoid function which is shown in Formula (1). It can map TF-IDF feature value to the interval of . The second function is Tanh function which is shown in Formula (2). It can map TF-IDF feature value to the interval of . TF-IDF activity features solving process is presented in Algorithm 2.

Input: , a set of activity instances
, set of features
Output:
1. ;
2. ;
3. while(j<=m)
4. Collect all sensors which are activated when is active;
5. Calculate the TF-IDF value of using , formula (1) and formula (2);
6. ;++;
7. while(k<=n)
8. for each    in  
9. if is same to   then
10. ;
11. end if
12. end for
13. ++;
14. end while
15. end while
16. return

5. Evaluation

5.1. Data Availability

In this study, we employ two public datasets, “tulum2009” and “cairo” in [31], to illustrate the applicability of the proposed approach. These datasets have been published by the Washington State University [31]. Statistical information concerning the two data sets are described in Table 1. Values listed under column “Sensors” correspond to the number of sensors involved and their corresponding categories. Similarly, values listed under column “Activity Categories” correspond to the number of activity classes involved while those listed under column “Activity Instances” correspond to the number of involved activity instances. Values listed under column “Residents” correspond to the number of residents involved. Lastly, values listed under “Measurement Time” correspond to durations over which data were collected duration that data is collected.

For the “tulum2009” dataset, the following identifier categories were considered.(1)Identifiers with names starting with “M” indicate infrared motion sensors—M001–M018.(2)Identifiers with names starting with “T” indicate temperature sensors—T001–T002.

Involved atom activities include “Cook_Breakfast”(“C_B”), “Cook_Lunch”(“C_L”), “Enter_Home”(“E_H”), “Group_Meeting”(“G_M”), “Leave_Home”(“L_H”), “Eat_Breakfast”(“E_B”), “Snack”(“S”), “Wash_Dishes”(“W_D”), “Watch_TV”(“W_T”). Involved atom activities and interactive activities are presented in Table 2.

Similarly, for the “cairo,” dataset, the following identifier categories were considered.(1)Identifiers with names starting with “M” indicate infrared motion sensors—M001-M027.(2)Identifiers with names starting with “T” indicate temperature sensors—T001–T005.

Involved activities include “Bed to toilet” (“B_T_T”), “Breakfast” (“B”), “sleep”(“S”), “wake” (“W”), “work in office” (“W_I_O”), “Dinner” (“D”), “Laundry” (“Lau”), “Leave home” (“L_H”), “Lunch” (“Lch”), “Night wandering”(“N_W”), “take medicine”(“T_M”). Involved atom activities and interactive activities are presented in Table 3.

5.2. Experimental Preparation

In this study, the proposed approach was compared against frequency based feature solving approach. Frequency based activity features solving is commonly employed in previous research [Liu17]. Frequency based activity features solving process is presented as follows. For an activity instance and a sensor (k>=1 and k<=n), the corresponding feature value is assigned by the frequency that is activated. For activity instance “Sleep” in Figure 1, the values of features are (BATV001, 1), (BATV002, 1), (BATV006, 1), (BATV010, 1), (BATV012, 1), (BATV013, 1), (BATV015, 1), (BATV019, 1), (BATV021, 1), (BATV022, 1), (BATV102, 1), (BATV105, 1), (LS013, 2), (M021,14), (MA020, 10) when the values of features are greater than zero.

These approaches are evaluated by their corresponding performance of activity recognition through Support Vector Machine (SVM), Sequential minimal optimization (SMO), and Random Forest (RF). The used toolset employed was Weka 3.9. In addition, we experiment on the same datasets using a state-of-the-art deep learning technique Long Short-Term Memory (LSTM), which is appropriate for time series data. The used LSTM consists of an input layer, two hidden layers and an output layer. In the dataset cario, the numbers of neurons in the input, hide, and output layers are set to 20, 40, 40, and 21, respectively. In the dataset tulum2009, the numbers of neurons in the input, hidden, and output layers are set to 20, 40, 40, and 37, respectively. Epoch is set to 1, 5, 10, and 15, respectively. 10-fold cross validation was performed. Evaluation metrics considered included accuracy, precision, and F-measure.

5.3. Results
5.3.1. The Whole Results

Recognition accuracies concerning the tulum2009 dataset are depicted in Table 4. The accuracy using features TF-IDF, TF-IDF+Sigmod, or TF-IDF+Tanh far exceeded that the one using features FF when employing SVM and SMO. The accuracies are almost equal when employing RF. Recognition accuracies concerning the cairo dataset are depicted in Table 5. The accuracies using features TF-IDF, TF-IDF+Sigmod, or TF-IDF+Tanh far exceeded the one using features FF when employing SVM. The accuracies using features TF-IDF, TF-IDF+Sigmod, or TF-IDF+Tanh still a little exceeded those using features FF when employing SMO and RF.

Recognition precisions concerning the tulum2009 dataset are depicted in Table 6. The precisions using features TF-IDF, TF-IDF+Sigmod, or TF-IDF+Tanh less or more exceeded that the one using features FF when employing all of three classifiers. Recognition precisions concerning the cairo dataset are depicted in Table 7. The precisions using features TF-IDF, TF-IDF+Sigmod, or TF-IDF+Tanh exceeded the one using features FF when employing all of three classifiers.

Recognition F-Measures concerning the tulum2009 dataset are depicted in Table 8. The F-Measures using features TF-IDF, TF-IDF+Sigmod, or TF-IDF+Tanh less or more exceeded that the one using features FF when employing all of three classifiers. Recognition F-Measures concerning the cairo dataset are depicted in Table 9. The F-Measures using features TF-IDF, TF-IDF+Sigmod, or TF-IDF+Tanh exceeded the one using features FF when employing all of three classifiers.

Recognition results using LSTM concerning the tulum2009 dataset are depicted in Table 10. The best accuracy 76.01%, the best precision 80.13%, and the best F-Measure 77.99% are achieved when 5 is assigned to Epoch. The accuracies and F-Measures using features TF-IDF, TF-IDF+Sigmod, or TF-IDF+Tanh exceeded the best counterpart using LSTM when employing SVM, SMO, and RF. The precisions using features TF-IDF+Sigmod or TF-IDF+Tanh exceeded the best one using LSTM when employing SMO. Only the best precision using LSTM a little exceeded the one using feature TF-IDF when employing SMO.

Recognition results using LSTM concerning the cairo dataset are depicted in Table 11. The best precision 66.18% and the best F-Measure 66.82% are achieved when 10 is assigned to Epoch. The best accuracy 58.79% is achieved when 15 is assigned to Epoch. The accuracies, precisions, and F-Measures using features TF-IDF, TF-IDF+Sigmod, or TF-IDF+Tanh exceeded the best counterpart using LSTM when employing SVM, SMO, and RF. By the results, LSTM is not enough effective to activity recognition. The main reason is that sparse training data and relatively more neural network nodes lead to overfitting of the training set.

5.3.2. The Results of Individual Activity

Best and worst recognition number of activities are counted for both accuracy and precision. Let be set of activity categories. Let =“FF”, “TF-IDF”, “TF-IDF+Sigmod”, “TF-IDF+Tanh” be set of feature categories. For and , and denote whether get best and worst accuracies using feature solving. and denote whether get best and worst precisions using feature solving. For accuracy and precision, the number of best activities recognition is defined as and . The number of worst activities recognition is defined as and .

, , , and of individual activity are shown in Figures 36 concerning two datasets. For the dataset tulum2009, FF is worst in two of three classifiers concerning on . FF is worst in all of three classifiers concerning on . FF is best only in RF concerning on . FF is not best in any of three classifiers concerning on . TF-IDF, TF-IDF+Sigmod, and TF-IDF+Tanh are close in all of three classifiers concerning on and . TF-IDF is best in all of three classifiers concerning on . TF-IDF+Sigmod is best in two of three classifiers concerning on .

For the dataset cario, FF is worst in all of three classifiers concerning on both and . FF is not best in any of three classifiers concerning on both and . TF-IDF is best in two of three classifiers concerning on . TF-IDF and TF-IDF+Sigmod are best in two of three classifiers concerning on . TF-IDF and TF-IDF+Tanh are best in two of three classifiers concerning on .

In accordance with results obtained in this study, the following points must be noted. Strategies based on TF-IDF feature outperform strategy based on FF feature in accuracy, precision, and F-Measure regardless of whole or individual of activities.

6. Conclusion

This paper presents the strategies based on TF-IDF as a means of activity features solving with regard to activity recognition applications. The proposed strategies were evaluated using three classifiers on two distinct datasets, and results obtained in this study demonstrate the ability of strategy based on TF-IDF to dramatically improve the performance of activity recognition systems.

Data Availability

The authors employed two public datasets “tulum2009” and “cairo” in to illustrate the applicability of the proposed approach. These datasets have been published by the Washington State University [31]. The url is http://casas.wsu.edu/datasets/.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the Fundamental Research Funds for the Central Universities (no. 3132018194) and the Open Project Program of Artificial Intelligence Key Laboratory of Sichuan Province (no. 2018RYJ09) and CERNET Innovation Project (no. NGII20181203).