Abstract
This paper proposes a new indoor people detection and tracking system using a millimeterwave (mmWave) radar sensor. Firstly, a systematic approach for people detection and tracking is presented—a static clutter removal algorithm used for removing mmWave radar data’s static points. Two efficient clustering algorithms are used to cluster and identify people in a scene. The recursive Kalman filter tracking algorithm with data association is used to track multiple people simultaneously. Secondly, a fast indoor people detection and tracking system is designed based on our proposed algorithms. The method is lightweight enough for scalability and portability, and we can execute it in real time on a Raspberry Pi 4. Finally, the proposed method is validated by comparing it with the Texas Instruments (TI) system. The proposed system’s experimental accuracy ranged from 98% (calculated by misclassification errors) for one person to 65% for five people. The average position errors at positions 1, 2, and 3 are 0.2992 meters, 0.3271 meters, and 0.3171 meters, respectively. In comparison, the Texas Instruments system had an experimental accuracy ranging from 96% for one person to 45% for five people. The average position errors at positions 1, 2, and 3 are 0.3283 meters, 0.3116 meters, and 0.3343 meters, respectively. The proposed method’s advantage is demonstrated in terms of tracking accuracy, computation time, and scalability.
1. Introduction
Indoor detection and tracking of people are useful solutions to energy assignment, health, and safety [1]. Studies show that an indoor detection and tracking system can reduce energy usage for lighting and Heating, Ventilation, and Air Conditioning (HVAC) systems by more than 30% [2]. Additionally, these systems can also improve security applications by giving emergency systems the ability to make more wellinformed decisions. So that can enhance the response of emergency systems by providing them with realtime location information of people, where they are going, and the densities of people at different sites to decide whether they are safe or not. Moreover, indoor detection and tracking systems could also help health care businesses monitoring the elderly when they fall. For example, based on location information, nursing staff could make a decision ensuring their safety.
Researchers have studied various types of sensing technologies for indoor object detection measurements, such as passive infrared (PIR) [3], optical cameras [4, 5], LIDAR [6], WiFi [7], and 10 GHzto24 GHz microwave [8]. However, all these technologies have challenges with inaccuracy, privacy, environmental robustness, and system complexity [9]. For instance, the HD camera system and other technologies such as WiFi, Bluetooth, and UWB are used for positioning [10–12]. In these studies, Machine Learning (ML) methods are employed to detect people. These methods include decision trees, hidden Markov models, and convolutional neural networks. Machine learning is a computationally intensive process and cannot readily be implemented onto an embedded system.
Moreover, typically camerabased tracking systems require a clear view and the right lighting conditions displayed in [13], where the system uses background subtraction and the Lucas and Kanade tracking algorithm to determine indoor human counting. The system had an experimental accuracy of 97% under lab conditions, but the accuracy during fieldtesting dropped significantly. Additionally, another critical problem with camera systems is their intrusive nature, leading to privacy concerns. However, research is being done to discover whether it is possible to use lower resolution cameras to circumvent privacy issues [14].
Motivated by this, this research chose the onboard millimeterwave (mmWave) radar sensor (IWR1642BOOST) [15] as the sensing technology [16]. mmWave is a remote wireless sensing technology that has raised lots of attention from academia and industry due to its exceptional advantages. Compared to the existing wireless sensing technologies, this particular radar technology can overcome environmental occlusion problems. We aim to explore fast and robust people detection and tracking models, algorithms, and application guidance using mmWave sensors for indoor applications.
Ongoing research in object detection and tracking data process technologies are mainly focused on visionbased methods [4]. There are currently only limited studies using mmWave radar data for indoor detection and tracking of people. In [17], Wei and Zhang set up a new highprecision passive tracking method (mTrack) and used highly directional 60 GHz millimeterwave radios to run a discrete beam scanning mechanism to pinpoint the object’s initial location and track its trajectory. However, it is based on a signalphase model. Hence, it is not suitable for applying detection and tracking indoors. In [18], Palacios et al. developed two indoor localization algorithms tailored to mmwave propagation characteristics based on commercial 60 GHz mmwave hardware. However, its experiment results only considered location error, and the system computation load is not mentioned. The most related work is people counting and tracking using a mmWave radar sensor by Texas Instruments (TI) [19]. The system employs densitybased clustering (DBSCAN) with an extended Kalman filter (EKF), and it reported an accuracy of 51% to 99% between 1 and 5 people. However, its accuracy is questionable since only DBSCAN [20] is used to clustering the varying density data. Moreover, its portability and scalability are limited due to the use of EKF to convert the polar measurement to Cartesian coordinates. The conversion is taken for ease of use, yet it brings additional computation load and process noise.
This paper includes two main contributions. Firstly, we present a systematic approach to detecting and tracking people indoors using a mmWave radar sensor. Two efficient clustering algorithms are proposed here to provide high accuracy and shallow processing time; the recursive Kalman filter (RKF) tracking algorithm performs much better than the EKF in algorithmic complexity and computation time. Moreover, a fast indoor people detection and tracking system was designed based on our proposed algorithms. Furthermore, the system can operate on an embedded platform, the Raspberry Pi 4, creating computing constraints to introduce a portability and scalability aspect. Comparing the results to the commercially available system from TI shows that the method is faster, more accurate, and less heavy than the TI system.
2. Methodology
2.1. Hardware Framework
The hardware system consists of the millimeterwave radar sensor (IWR1642BOOST), a Raspberry Pi 4 (1.5GHz, 4GB RAM), and a monitor. The flow of hardware information is shown in Figure 1. The sensor emits a radar signal, taking a snapshot of the indoor location at a given point in time. The returned radar signal undergoes preliminary processing on the sensor, the output of which is a point cloud. This point cloud is a collection of points that represents detected people. The point cloud is then processed on the Raspberry Pi 4. The output of the processing is information on identified targets, which is then displayed on a monitor.
2.2. System Data Process Framework
The flow chart in Figure 2 depicts the systematic approach used in this paper to process and analyze mmWave sensor raw data (point cloud), then the sensor transmits the point cloud data to the Raspberry Pi. This paper mainly focuses on the clustering and referencing algorithms, tracking algorithm, and timing analysis of the merged approach for real time and portability, then compares it to the TI method. Firstly, the point cloud information from the mmWave sensor is parsed and then processed for static clutter removal. The points are then grouped in the clustering+referencing module, and then finally, the people’s points are tracked in the tracking module, from which the people number is derived.
2.3. Static Clutter Removal
The static clutter removal model is aimed fat excluding as many as static points as possible from the background. It requires range information since it filters out nonrange changing (static) objects from the scene. The steps of the static clutter removal algorithm are listed as follows.
Step 1. Range processing performs Fast Fourier Transform (FFT) on Analog to Digital Converter (ADC) samples per antenna per chirp. FFT output is a set of range bins.
Step 2. Perform static clutter removal by subtracting the estimated Direct Current (DC) component from each range bin.
Step 3. Range processing results in local scratch buffers are Enhanced Direct Memory Access (EDMA) to the radar data cube with transpose.
2.4. Clustering and Referencing
The clustering stage is aimed at identifying the number of people in a scene, and since a single centroid is needed to track each person, a referencing process is required.
2.4.1. Clustering
Due to the mmWave sensor field of view, the data’s density varies from time and distance against the sensor. For example, the closer the people are to the sensor, the more dense points can be collected. On the other hand, as distance increases, only a few points can be obtained, especially for smaller objects. To demonstrate, Figure 3 shows the different total number of collected points (cluster density) of people located at different distances from the sensor. There are only 31 points (blue and green) of two people around 12 m away from the sensor, while another person is only 3 m away from the sensor, which collected 87 points (blue and green). It showed that a denser cluster represents a person closer to the sensor. In contrast, a person who is further away is represented by a less dense and more variable cluster.
(a)
(b)
The black cross stands for the densitybased noise points after the clustering stage. These clusters are identified to be the noise of points that are too small to represent people. The densitybased noise identifies works by treating each point as a node and then calculating the distance matrix between itself and all the other nodes. The distance between each node is the difference in displacement in the direction and the displacement in the direction. If a node is within a distance threshold of 0.2 m to the other nodes, those nodes are extracted.
2.4.2. Referencing
After clustering, all detected people are represented by clusters. A reference point on the and plane needs to be found to locate each cluster’s position. This reference point will later be used for tracking clusters and extracting trajectory information. The reference point can be the mean center of a cluster and also can be the real center point (both can be called centroid) of a cluster. For people clusters, both can be used as the reference point.
The two densitybased clustering and referencing algorithms that we implemented are DBmeans and DBmedoids, respectively. The algorithms can get the centroid location or the centroid point of a cluster with a shallow misclassification rate. The two algorithms are presented as Algorithm 1 and Algorithm 2.


2.5. Tracking
2.5.1. Recursive Kalman Filter
The tracking stage is necessary to locate people as they move through the indoor space and maintain accurate and reliable measurements. In this paper, a recursive estimation method with Kalman filter (RKF) plus a motion model is applied for motion state prediction and estimation of people. Since there are inconsistencies in the rate at which data was lost from the sensor, we decided to recursively calculate the error covariance matrix and Kalman gain in each update stage. We could get a more accurate update compared to a static Kalman gain.
We opted for the RKF, as it has not been researched in mmWave indoor people tracking, and we theorized that we could make it lightweight for realtime application. The RKF is wellsuited for indoor people tracking when using a constant velocity (CV) model, and we also considered an acceleration model by random noise. Moreover, it can improve accuracy by avoiding the EKF’s process errors caused by linearization by keeping computation in the polar coordinates, illustrated in Figure 4. Figure 4 illustrates the single reflection point at time . Multiple reflection points represent reallife radar objects. Each point is represented by range (), angle (), and radial velocity (range rate). To employ RKF, we keep the raw data processing from detection to tracking under the polar system and keep the visualization under the Cartesian coordinates for the best view.
The system state in the polar system at step can represent as
The motion state model and observation model of people can be built as follows: where is the mmWave sensor sampling time interval and was set to 50 ms. and are the system noise covariance matrix and measurement noise covariance matrix, respectively.
is a transition matrix, where is a measurement matrix,
An implementation flowchart of the proposed RKF algorithm is summarized in Figure 5. As shown, the update step involves recursively calculating the Kalman gain , then calculating the current data frame’s state and the error covariance matrix . Recalculating the Kalman gain and error covariance can give the estimate system more robust and practical flexibility. Moreover, if no measured data are available, the estimated values are used as the updated values. The algorithm is described as follows.
In the initialization step, the mean values and covariance matrix of the states are set up at , where the superscript “+” indicates that the estimate is a posteriori, and is the error state covariance matrix.
In the prediction step, the state and its covariance matrix at are projected one step forward to obtain the a priori estimates at .
In the update step, the actual measurement is compared with predicted measurement based on the a priori estimate. The difference is used to obtain an improved a posteriori estimate as in Figure 6. Symbols and are the measurement vector and innovation covariance, respectively.
2.5.2. Data Association
Since there could be multiple people at any time, and the Kalman filter can only track a single person at a time; therefore, we implement a lightweight data association approach with a recursive Kalman filter to work on multiple objects. The global nearest neighbor (GNN) data association algorithm used in our system is a simplified version and based on the centroid data after the clustering and referencing step. The simplified GNN diagram is shown in Figure 5, and the algorithm description is shown in Algorithm 3.

After GNN is processed, the associated centroids can be passed through the update step of the RKF to be a multiobject tracker. Each track goes through a life cycle of events. At the maintenance step, we decide to change the state or delete the track that is not used anymore.
3. Experiment and Evaluation
3.1. Experiment Setup
To evaluate our algorithms’ performance, we set up experiments at three different data collection sites around the University of Auckland Newmarket campus to model various real indoor scenarios. The mmWave sensor data was captured using the TI IWR1642BOOST. The IWR1642BOOST radar sensor includes an FMCW transceiver, operating at 76 GHz to 81 GHz (4 GHz available bandwidth) with four receive channels and two transmit channels. It outputs a data frame containing the point cloud, with information for each detected point, including range, azimuth angle, and Doppler velocity. Various settings and modes, such as different ranges, can be selected for using the chirp configuration parameters. There are settings for shortrange (10 m), midrange (30 m), and longrange (80 m), albeit at the expense of a narrower field of view. For indoor detection and tracking, we opted for a range of 6 m to maximize the resolution and view field.
For each of the data collection sites, the mmWave sensor was mounted on a tripod and elevated to a height of 1.8 to 2 m. The sensor is placed in the environment so that the field of view covers the range of 1 to 6 meters and an azimuth of 60° to 60°, oriented towards the direction in which people would enter the scene. Additionally, an HD camera was also mounted on top of the millimeterwave sensor to gather ground truth information and recording. Figure 7 shows the sensor setup at the different data collection sites. We use data from the three sites to evaluate the methods and algorithms described in the previous sections.
(a)
(b)
(c)
(d)
3.2. Evaluation of the Clustering and Referencing
To evaluate the two clustering and referencing algorithms, we tested and compared DBmeans against DBmedoids using a wealth of data obtained from the one experiment site. Experiments were conducted simulating various indoor activities. Data were recorded simultaneously using the TI sensor as well as a video camera, which was used to gather ground truth data. The room was selected to maximize the full range of the sensor. A 6 m by 6 m grid was drawn on the floor to contain the experiment within the sensor range, which allowed us to control when people entered and left the site. The walking activities were selected to test the clustering capabilities of the sensor and to model real indoor scenarios.
Figure 8 shows an example frame of using the DBmeans and DBmedoids algorithms separately. As can be seen, for DBmeans, the centroids are reference locations of each cluster, and for the DBmedoids, the centroids are the real reference points of each cluster. Table 1 also shows the comparison between the DBmeans and the DBmedoids in terms of average misclustering rate and processing time (per frame) using the same total number of data sample frames. In comparison, DBmeans achieves a better average accuracy with 84.75% than DBmedoids with 82.70%. Additionally, DBmeans has a much lower processing time than DBmedoids. Hence, we choose DBmeans as the clustering algorithm of our system.
(a)
(b)
The densitybased clustering algorithm we designed for this task can manage variable cluster densities. Moreover, this algorithm can also handle noise as well as DBSCAN.
3.3. Evaluation of the Tracking
To evaluate the RKF, we compared the tracking accuracy and the processing time of our method to EKF, which TI used.
For the RKF weighting matrix initialization and optimization, we ran through various options and got the best performing combinations. The weighting matrices of the RKF can be initialized as follows:
By contrast, the EKF is employed to track the same objects. Figure 9 shows the filter results between RKF and EKF using the experiment data set, and Table 2 shows the comparison of the Root Mean Squared Error (RMSE) and process timing (total frames) between the RKF and EKF.
(a)
(b)
As can be seen, both EKF and RKF can estimate unmeasurable system states and smooth out the process/measurement noise very well. However, in terms of algorithmic complexity and time consumption, the RKF is much more lightweight than the EKF of TI since the RKF does not need to perform coordinate system conversion and calculate the Jacobian matrix which contribute a lot of additional computational load to the system.
3.4. Evaluation of the Merged Process
To evaluate the merged process for scalability and portability, we merged all the algorithms into a tracking system called centroidsTracker (cTracker) to pars and present the raw point cloud data in real time. Proof of concept for realtime application on a portable embedded platform was demonstrated using a Raspberry Pi 4. This feature’s challenge is to design the algorithm to be lightweight enough, such that the processing time is less than each frame’s duration. It was done by minimizing the algorithms’ timing complexity and writing the program in python with efficient libraries, however, with the limitation of fewer libraries being available on the Raspberry Pi. Efficient code strategies include using lightweight libraries such as NumPy. It provided us with a very quick run time.
Figure 10 presents part of the objects and the camera ground truth. The black points represent the raw point data returned by the mmWave sensor device at experimental sites, and the colored circles represent the clustered and tracked people. As can be seen, all movements, including the walking/standing movements of people, were tracked and represented.
(a)
(b)
(c)
(d)
(e)
(f)
Apart from the code successfully executing on the Raspberry Pi 4, the embedded application’s performance in real time also depends on the time complexity of algorithms. Table 3 shows the average processing time (per frame) from a different number of people with data samples between 1 and 5 people. As expected, an increase in the number of people increases the number of points to be processed. More importantly, run time with five people (a high processing load) is below the 50 ms (the frame rate of the mmWave sensor) constraint ensuring consecutive frames are not missed. Besides, our cTracker can track each person correctly, even with some radar measurement data lost (see Figures 10(b) and 10(e)).
The obvious benefit of the algorithm is the Kalman filter’s implementation, as no measurement data input would result in the Kalman filter predicting the missing people until they reappear. Besides, if people’s data disappears for long periods, the Kalman filter slowly moves people under their predicted velocity, as predicted using the constant velocity model. The prediction eventually estimates the person as having left the room.
3.5. Comparison with TI System
Compared with the misclustering TI system, Figure 11 shows the average misclustering rate for different numbers of people (total 12917 frames). As can be seen, our system’s misclustering rate is much lower than TI’s between 1 and 5 people data sets. However, it also displays that both general trends are increasing as the number of people increases. Since the number of people increases, a higher proportion of objects begin occluding each other, leading to a rise in errors. Additionally, missing data from the sensor is another significant reason for increasing the misclustering rate between both systems.
For tracking accuracy comparison, three data sets were collected from a person walking at the positionknown location. We then ran those data sets through both our system and the TI system and calculated the RMSE in the and directions. The location coordinate from the sensor is shown in Table 4. Table 5 shows that our system’s average position error was 0.2992 meters in location 1, 0.3271 meters in location 2, and 0.3171 meters in location 3. In comparison, the TI system’s average position error was 0.3283 meters, 0.3116 meters, and 0.3343 meters, respectively. The TI system was relatively more accurate only at location 2.
4. Conclusion
In this paper, the indoor people detection and tracking system is designed based on the proposed data process algorithms. Our methodology processed in the order of static clutter removal, clustering into clusters, and referencing to identify the centroids, then tracking the centroids by using a recursive Kalman filter (RKF). The experiments are set up at three different data collection sites modelling various indoor scenarios. Comparing with the TI system, our system can detect and track each object more accurately. The processing pipeline cycle is under 50 ms (per frame), which can work in real time on an embedded platform such as a Raspberry Pi. Our future work consists of data fusion from multiple mmWave radar sensors to increase the useful field of view of the system and accuracy. Moreover, we will also use deep learning approaches for tracking and classifying various species objects.
Data Availability
The data used to support the findings of this study can be freely accessed at https://github.com/hasc/OccupancyDetection/tree/master/Data. The mmWave radar sensor product is obtained from https://www.ti.com/tool/IWR1642BOOST#technicaldocuments.
Conflicts of Interest
The authors declare no conflict of interest.
Acknowledgments
The authors wish to acknowledge the technical support of the University of Shandong Ying Cai (Jinan, China). This research was funded by the Key Research and Development Project of Shandong Province, China, Grant number 2015GGX101048.