Abstract

We present a data fusion-based methodology for supporting the sports training. Training sessions are planned by coach on the basis of the analyzed data obtained during each training session. The data are usually acquired from various sensors attached to the athlete (e.g., accelerometers or gyroscopes). One of the techniques dedicated to processing the data originatnig from different sources is data fusion. The data fusion in sports training provides new procedures to acquire, to process, and to analyze the sports training related data. To verify the effectiveness of the data fusion methodology, we design a system to analyze training sessions of a tennis player. The main functionalities of the system are the tennis strokes detection and the classification based on data gathered from the wrist-worn sensor. The detection and the classification of tennis strokes can reduce the time a coach spends in analyzing the trainees’ data. Recreational players for self-learning may also use these functionalities. In the proposed approach, we used Mel-Frequency Cepstrum Coefficients, determined from the accelerometer data, to build the feature vector. The data are gathered from amateur and professional athletes. We tested the quality of the designed feature vector for two different classification methods, that is, k-Nearest Neighbors and Logistic Regression. We evaluate the classifiers by applying two tests: 10-fold cross-validation and leave-one-out techniques. Our results demonstrate that data fusion-based approach can be used effectively to analyze athlete’s activities during the training.

1. Introduction

The need for sports achievements motivates research laboratories and sports clubs to search for more efficient training methods. New ways in the enhancement of the training methods have emerged together with the development of mobile technologies. Over the last decade, the results of research on applying mobile technologies into sports training have been encouraging. The sensors such as accelerometers, gyroscopes, or GPS receivers are the most helpful parts of various applications used by coaches and athletes. These sensors enable monitoring, for example, the type, the duration, and the intensity of athletes’ sports activities. It is useful for both the trainers and the trainees to improve the effectiveness of sports training. On one hand, the trainers can send helpful and timely feedback to the athlete, reinforcing the link between research and coaching practice. On the other hand, the monitoring of the athletes’ sports activities supports better training performance of the athletes.

The sports training is a complex task in which the knowledge from fields related to anatomy, biomechanics, physiology, psychology, and didactics is needed [1]. One of the purposes of sports training is to achieve maximal performance from an athlete or the team. Trainer and well-prepared training sessions should help either the athlete or the team to gain the maximal performance repeatedly before each competition. The well-prepared training sessions should improve injuries prevention of the athlete or even help in rehabilitation of injured trainee [2].

There are types of sports training [3]: physical, technical, tactical, and psychological. The physical training pursues improving the motor abilities of an athlete. Regarding physical variables to be monitored, we should mention acceleration, endurance, speed, force, flexibility, and fatigue index [1, 4]. The technical training aims at acquiring and mastering the sports skills. Among the technical variables, there are the proper execution of movements, repetition of sequential movements, correct posture during movement execution, and starting time [4, 5]. The tactical training is related to the studies of different strategies in the sports discipline. The psychological training is oriented towards improving the athlete’s personality.

Typically, the sports training is composed of planning, monitoring, and analyzing. The planning of the training sessions takes into account the specific needs and conditions of the athlete, that is, information on athlete’s sports performance, shape, mental preparation, skills, predispositions, abilities, and limitations. Obviously, planning the training session is a complex task, since it needs to take into account the long-term objective and depends on various external factors, that is, temperature or nutrition.

The monitoring of sports training is the process of data collection during the sports session. Recent advances in wearable technologies allow continuously monitoring the athlete discreetly and without obstructing the comfort during training sessions. The modern wearable sensors can capture motion-related parameters, useful for supporting athletes’ sports activities. The standard devices for the athletes’ motion-capture are the inertial sensors, that is, accelerometers and gyroscopes, as well as magnetometers [4, 6]. These sensors are gaining more and more popularity, and various applications are relying on these sensors in sports training [79]. The other example of sensors applied to support the athletes’ sports sessions is video cameras. Unfortunately, the mobility of the systems composed of video cameras is limited. Moreover, the computational costs of the analysis of captured data are very high.

Once the data are collected, the next phase of the sports training is the data analysis. The data analysis is the crucial step in the sports training and must be performed efficiently. The reason is that, during each training session, the massive amount of data produced by various sensors could be gathered. The problem is the efficient processing of the vast volume of data acquired from different sources at the different level of complexity. It is clear that the up-to-date information on physiological (i.e., sports performance and shape) and technical (i.e., skills) preparation of the athlete has a significant influence on the further training process and its outcomes. Thus, there is a need for developing the methods to fuse such data.

Some authors [10] suggest that one of the techniques designed to process a large volume of data originating from different sources is data fusion. The data fusion technique combines the data acquired from multiple sources (a) to improve the accuracy and robustness of the outcomes, (b) to create meaningfully new information that cannot be obtained from the sources separately, and (c) to provide a complete picture of the investigated object or process. The data fusion results, in the more complex analysis, help to improve the decision-making process. We present how the trainers and trainees may benefit from the fusion of the data originating from different sources during training sessions.

This paper aims to demonstrate the data fusion methods in the context of sports training. We present self-developed system for detection and classification of the tennis strokes (serve, backhand, and forehand) based on the accelerometer’s data gathered from the wrist-worn sensor. The proposed approach can reduce the time of data analysis and help the coaches to quickly retrieve the critical elements of an athlete’s training session. It also provides insights into tennis players for improving their strokes technique and overall performance. The proposed system can also be used by recreational players for self-learning. Applying the wrist-worn sensor eliminates the need for expensive setup for the sensing data.

The main contribution of the work is the algorithm for tennis strokes detection and recognition based on Mel-Frequency Cepstrum Coefficients as the feature generator. We test the performance of two classifiers: k-Nearest Neighbors and Logistic Regression. To investigate the proposed approach, we also collected 1794 samples of tennis strokes from 15 amateur players and 621 samples from 8 professional players.

The paper is organized as follows. In the first part, we present an overview of data fusion techniques and algorithms and the example applications in sports training. In the second part, to provide a broader view, we present and discuss original research result obtained from applying data fusion to support sports training of tennis player. In the end, the results are discussed, and the conclusions are given.

The monitoring of sports training can be considered as a part of the human activity recognition (HAR). There are two main approaches to solve the HAR problems, that is, by applying external or wearable sensors [11, 12]. The external sensors-based methods can be used only in the predefined locations, whereas in the wearable sensor-based approaches, the devices are attached to the users’ bodies.

The camera-based systems (e.g., BTS Bioengineering, OptiTrack, Vicon) are the typical example of the motion-capture systems composed of external sensors. Unfortunately, most of such systems are expensive, and they can be employed only in a laboratory setting. The alternative to the camera-based tracking systems, that is, wearable sensor-based systems, has become increasingly popular. The application of wearable sensors in human motion analysis aids in overcoming the main drawback of camera-based tracking systems, that is, the lack of mobility.

The basic example of a system capable of processing motion data is composed of the accelerometer or the gyroscope. There are numerous commercial products supporting the tracking of the physical activities of athletes. For example, Adidas and Nike provide the devices for running activities [13, 14]. The accelerometer can also be used in the technical training of tennis player [9, 1517]. Ahmadi at el. introduce the IMU sensor-based approach to the skill assessment and acquisition. The results show that it is possible to apply the accelerometer data for assessing the skill level of the tennis player. However, the authors point out that this approach has some limitations related to the measurement range and the sampling frequency of the accelerometer.

Another example of one-sensor systems is presented in [7]. The authors propose an approach to recognize the swimming strokes and to count their number. In the study, two different sensor placements were tested: one was attached to the wrist and the second one to the upper back of the subject. Data acquired from the accelerometer are fused by QDA classifier (Quadratic Discriminant Analysis). It is reported that the classification accuracy of the approach is (breast-stroke), (freestyle), (back-stroke), and (turn) for the data gathered from the upper back-worn sensor. The classification accuracy for the wrist-worn sensors is significantly lower.

Tracking the sports activities can be also based on two, three, or more sensors. There are some commercial devices dedicated to supporting athletes’ training. For example, for tennis players, the wrist-worn wearable devices by Indiegogo [18] and Babolat [19] are offered. The first company offered Pivot, the wireless motion-capture system designed for tennis players. The system composed of triaxial accelerometers, triaxial gyroscopes, and triaxial magnetometers can track footwork, body position, elbow bend, and knee bend. On the contrary, Babolat delivers a smart wristband to track training of the tennis player. It is mainly designed to track tennis strokes.

There are also some papers that study the applications of data fusion to support sports training [2023]. For example, Connaghan et al. [21] consider the problem of monitoring the technical training of tennis players. In the proposed system, the data are acquired from single wearable IMU sensor (Inertial Measurement Unit) composed of the triaxial accelerometer, the triaxial gyroscope, and the triaxial magnetometer. The sensing device was attached to the player’s forearm. In order to fuse the measurement data, the authors used the Naive-Bayes classifier. The classifier was trained using data gathered from seven players. The classification accuracy of detection and classification depends on the source of data. For example, fusing data from the accelerometer and the gyroscope improves the overall accuracy of classification up to . On the contrary, the fusion of the data from both the accelerometer and the magnetometer enhances the performance up to and from the gyroscope and the magnetometer up to . Applying data fusion for data acquired from the accelerometer, the gyroscope, and the magnetometer improves the overall performance up to .

In paper [24], the other example of a fusion-based approach supporting sports training is considered. The authors proposed an approach to golf swing classification relying on measurement data from the triaxial accelerometer and the triaxial gyroscope. The gathered data are fused by deep convolutional neural network (deep CNN). The results are compared with Support Vector Machine (SVM) classifier. The reported overall accuracy for deep CNN is 95.04% and it is 86.79% from SVN.

As we presented, the measurements acquired from accelerometers, gyroscopes, and magnetometers are sources of essential data in the monitoring of athlete’s activities. The fusion of these data with a video signal can improve the overall performance of the motion tracking system. With this new functionality, the trainer can provide corrections and guidance to the trainee. It is worth stressing that adding a video camera to the tracking system limits the mobility of the system. For example, the work in [25] introduces the data fusion-based approach to combine the data acquired from IMU sensor and a video camera. To test the performance of the proposed algorithm, the authors used the system to detect and to classify the tennis strokes. The system was composed of video cameras arranged in the laboratory setting and IMU sensor attached to the player’s forearm. The features used to detect and to classify the tennis strokes are generated from video images and IMU sensor. The authors tested two classifiers, that is, k-Nearest Neighbors (K-NN) and Support Vector Machine (SVM). The dataset used to train classifiers consists only of the samples gathered from the professional players. The results show that it is possible to obtain the better detection and classification performance for data originating from multiple sources.

As it is presented in paper [26], inertial sensors can also be fused with other measurement units, for example, the force sensor. The authors applied gathered data to estimate rider trunk pose with the use of Extended Kalman Filter. The performance of the proposed approach is demonstrated through indoor and outdoor riding experiments. In the study, five healthy and experienced bicycle riders (four males and one female) perform both the indoor and the outdoor tests. The results showed the outperformance of the proposed algorithm compared to methods estimating the rider trunk pose without the force sensor. Moreover, the proposed algorithm gives the estimation accuracy comparable with other fusion methods known in the literature.

3. Data Fusion

3.1. Motivation for Data Fusion

The data acquired through wearable sensor-based systems for sports training usually originate from different sources. Thus, building a complete and coherent picture of the athlete based on the gathered data is a challenging problem. Among existent approaches to process multisourced data, we propose a framework based on data fusion. Data fusion offers various methods, techniques, and architectures for building the picture of the object of interest [27].

The system relying on multiple sensors, in which measurements from each sensor are processed separately, suffers several limitations and issues [28, 29]:(i)Sensor deprivation: lost ability to monitor the desired features(ii)Limited spatial coverage: each sensor has a limited area of operation(iii)Imprecision: the precision of sensors’ measurements is limited(iv)Uncertainty: it appears when the sensor cannot measure all desired features

Data fusion methods in the multisensory systems introduce some advantages, such as reduction of uncertainty, improvement of the measurement precision, enhancement of the signal to noise ratio, and compensation for lacking the features. A significant advantage of fusion methods is their ability to integrate the independent features and prior knowledge. It is essential when the noncommensurate data must be combined. The noncommensurate data originate from the heterogeneous sensors [30]. For example, the data from the accelerometer and the video camera are noncommensurate data.

3.2. Methods for Data Fusion

Levels of abstraction are one of the main concepts of data fusion [10] and they define the stages at which the fusion can take place. In paper [10], the following three categories of levels of abstraction were specified: the signal level, the feature level, and the decision level.

3.2.1. Signal Fusion Algorithms

Fusion at the signal-level fusion is usually used to combine raw signals acquired from different sensors. The signals at this level are commensurate, that is, acquired from sensors measuring directly the same property. For example, the signals from the accelerometer and the gyroscope are commensurate, since they are measuring directly the same property, that is, the kinematic parameters of moving object.

A typical issue at the first level of fusion is the state estimation. The popular method to address this issue is the Kalman Filter (KF). Kalman Filter is the statistical approach to fusing commensurate signals relying on recursively made predictions and updates. One of the most common examples of the Kalman Filter application is the fusion of signals acquired from the accelerometer and the gyroscope to estimate the attitude of the object. The modifications of the KF are Extended Kalman Filter (EKF) and Unscented Kalman Filter (UKF). These extensions of Kalman Filter are usually applied to the nonlinear problems. Another approach applied at the first level of abstraction is particle filtering (PF). The PF algorithm based on Sequential Monte Carlo techniques is estimating the parameters of the conditional probability density function.

The fusion methods at this level allow getting further description of the object which is not possible with the use of the existing sensors. In Figure 1, we show the model of data fusion at signal level.

3.2.2. Feature Fusion Algorithms

Fusion at the second level is used when the sensors provide the noncommensurate data. At this level, we start with determining the feature vector; then we perform the feature fusion. The feature vector is the high-level representation of the object. To construct the feature vector, we have to generate the features set and select the most relevant from the original ones. The suitable features are determined by applying feature selection or feature extraction methods.

The generated features can be grouped as (a) time domain, (b) frequency domain, and (c) time-frequency domain. In the first group, we have the features characterizing the signal (e.g., maximum or minimum of amplitude, zero crossing rate, rise time, etc.), its statistics (e.g., mean, standard deviation, cross-correlation, peak-to-valley, energy, kurtosis, entropy, skew, etc.), and the fractal features. The second group mainly consists of spectral features (e.g., spectral peaks, roll-off, centroid, flux, and energy), Fourier coefficients, power spectral density, and the energy of the signal. For the last group, we have wavelet representation (e.g., Gabor wavelet features), Wigner-Ville distribution-based analysis, and Mel-Frequency Cepstral Coefficients (MFCC) [33].

The number of generated features might be large (e.g., the Fourier coefficients). Thus, it might be necessary to reduce it. We have two approaches to reach the proper amount of features: the feature selection or the feature extraction. The feature selection is applied to select the most appropriate elements from the original set, whereas the feature extraction methods reduce the dimensionality of the initial feature vector. In the context of multisensory systems, the feature-level data fusion is vital due to the communication bandwidth and the energy limitation.

Afterwards, having prepared the feature vector, the next step is the feature fusion. In general, we can divide feature fusion method into two categories: nonparametric and parametric. The classical nonparametric methods are k-NN, SVM, Logistic Regression (LR), and Artificial Neural Networks (ANN).

The main parametric algorithms are Gaussian Mixture Models (GMM) and k-Means methods. In GMM, we perform the feature fusion based on the value of the likelihood function. To train Gaussian Mixture Models, we usually used Expectation Minimization method. The k-Means method is the distance-based approach to unsupervised classification of the observations.

In Figure 2, we present the basic model of feature-level fusion.

3.2.3. Decision Fusion Algorithms

The decision-level fusion is performed at the highest level of abstraction. It is the process of selecting one hypothesis from the set of hypotheses generated at the lower levels. Operations at this level have two main advantages: improving the decision accuracy and saving the bandwidth for systems working in network settings [34]. To enhance the quality of decision, in some cases, it is possible to apply on unique mechanism for incorporating domain-specific knowledge and information [33]. The typical decision-level fusion methods include Bayesian inference, fuzzy logic, and Dempster-Shafer theory.

We present the model of decision fusion in Figure 3.

3.3. General Architecture for Data Fusion-Based Systems

In the section, we introduce the general architecture of the data fusion-based multisensory system (Figure 4). In the proposed architecture, we can distinguish the following tiers:(i)Data acquisition tier(ii)Data fusion tier(iii)Data presentation tier

The first tier is composed of the sensors (e.g., wearable sensors) attached to the human body or near the human body. Then, the sensor data are transferred (e.g., wirelessly) to the computational unit (e.g., smartphone, personal computer, or cloud).

The primary element of the proposed architecture is the second tier. All of the data fusion related computations take place herein. At this tier, the transferred data are processed at the different levels of abstraction using the previously discussed methods. The applied data fusion-based approach allows designing systems well suited to the user’s needs.

The last tier is designed for data presentation, that is, visualizing (e.g., charts and plots) and reporting (e.g., tables) results. In general, the tier allows interaction between the user and the system.

The general architecture of data fusion-based system can be applied in problems related to industry, environmental monitoring, medicine, or sports [35]. One of the well-known problems in industrial communities is the condition-based maintenance. In paper [36], the authors consider the problem of engine fault diagnosis. The approach is also applied in robotics, for example, in navigation [37], localization and mapping [38], and object recognition [39]. Data fusion can also be used in the environmental monitoring. For example, in [40], the environmental data are fused to detect a volcanic earthquake. In [41], gathered data are used to monitor the weather conditions. In medicine, data fusion is mainly used in the analysis of medical images for location and identification of tumors, abnormalities, and disease. In paper [42], data fusion is used for the brain diagnosis, in [43], the data fusion-based algorithm is applied to the breast cancer diagnosis, and in [44], an approach to recognize the anatomical brain object is proposed.

In Figure 5, we present the data fusion in the context of the sports training. At the first stage, the sports training is planned by taking into account the current sports performance, shape, mental preparation, skills, predispositions, abilities, and limitations of the athlete. Then, the training session is monitored, and some data are gathered. The gathered data are analyzed at the third stage by applying data fusion methods. The first level of data fusion is used to obtain the athlete’s parameters that cannot be measured directly. One of the examples is the estimation of the trainee’s body orientation on the basis of the data from the accelerometer and the gyroscope [26]. Another example is the estimation of the walking speed of the athlete [45, 46]. The walking speed can be estimated by fusing data from the accelerometer and the gyroscope and by applying the numerical integration. The second-level data fusion can be used to support sports training in case of noncommensurate data. At this level, it is possible to detect and to classify specific sports discipline movements, for example, strokes for tennis or swimming or the swing motion in golf. The last level of data fusion is the least known in the sports training. However, in paper [47], the authors showed that applying the data fusion at the third level can improve the overall accuracy of the classification.

Based on the data fusion-based analysis of measurements, the adaptation of the sports training is performed (Figure 5). The adaptation of the sports training can be completed automatically or semiautomatically. In the second case, the coach is supporting the adjustment of the sports training. It can be seen that the fusion-based training process has an iterative nature.

4. Materials and Methods

4.1. Data Collection and Experimental Setup

In this paper, we apply data fusion methods to monitor the sports training. The proposed approach is validated experimentally on data collected during the sessions of tennis training.

To test the proposed approach, we develop the system to acquire and to process the measurement data. The system is connected with the wrist-worn Pebble Watch motion sensor (Figure 6). The Pebble Watch can measure the acceleration (from the three-axis accelerometer) and the strength of magnetic field (from the three-axis magnetometer). The sensor can measure the acceleration up to [G] with sampling frequency of 10 Hz, 25 Hz, 50 Hz, and 100 Hz.

Accelerometers measure the change of velocity over time in a three-dimensional space. The measured acceleration signals are composed of gravitational and body motion components. Data gathered from accelerometers attached to the human body can be used to recognize various human activities. The type of action recognized based on measurements depends on the sensor placement. Typical places of sensor attachment are upper limbs (the arm and the forearm), lower limbs (e.g., the ankle and the thigh), and lumbar [4].

The self-developed system has two main parts: the data acquisition subsystem and the analysis and inference subsystem. The data acquisition subsystem consists of two components: the first one transfers data from the Pebble Watch sensor, and the second one stores data on the mobile device (Figure 7). The system helps in recording data from each participant, profiling and managing the participants, and browsing the history of stored samples.

The tennis players’ activity was measured on the indoor tennis court. We acquired measurement data from each participant after warming-up; that is, the players were instructed to perform several serves, forehand, and backhand strokes. Each participant used wrist-worn sensor, the Pebble Watch, to measure the acceleration in three axes. The data are transferred wirelessly to the system (Figure 7).

Participants were divided into two groups: the amateurs and the professionals. The first group consisted of 12 male and 3 female amateur tennis players. The second group consisted of 6 male and 2 female professional tennis players. The participants were asked to perform the following tennis strokes: forehand, backhand, and serve. In Table 2, we summarized the number of samples of each tennis stroke from the amateur and the professional players. We present the number of acquired samples in Table 1.

The samples of the acceleration measurements for the selected tennis strokes are shown in Figure 8. In Figure 9, we illustrate the sequence of movements for serve stroke.

4.2. Methods

In this section, we present a data fusion-based approach for monitoring the training of a tennis player. The proposed algorithm detects and classifies the tennis strokes based on the data gathered during training sessions. It is based on the general architecture of the system introduced in Figure 4. In Figure 10, we specify the main steps of the proposed approach: the preprocessing, the feature generation and extraction, and the classification. In the preprocessing step, we remove unwanted components of high-frequency noise; that is, the raw signals captured from the accelerometer were passed through a low-pass filter. In presented approach, we applied the Simple Moving Average (SMA) algorithm. SMA is a fast and simple algorithm to remove unwanted high-frequency components from the measurement signals.

When the preprocessing is completed, the next step in our approach is the feature extraction. We propose a method for feature extraction based on Discrete Fourier Transform (DFT) and Mel-Frequency Cepstral Coefficients, since the designed system should not allow heavy computations. Originally, MFCC were applied in speech recognition [48] but this method has also been successfully applied in the image recognition [49], the emotion recognition [50], and the eye movement identification [51]. The MFCC-based features have some advantages: ability to represent the signal in a compact form, low cost of determination, and high accuracy of classification for the basic classifiers [51].

4.2.1. Mel-Frequency Cepstral Coefficients

The section presents the MFCC-based feature extraction algorithm for the acceleration signal. We denote measurements from the acceleration as , . corresponds to the number of the samples.

Initially, the first phase of MFCC algorithm is the preemphasis for the compensation of rapid decaying spectrum of speech. Because the frequency of the acceleration signal is smaller compared to the speech signal, this step is not necessary for our approach [52].

In the next step, we transform the acceleration signal from the time domain to the frequency domain by applying Discrete Fourier Transform:where is the size of the frame and is the Hamming function calculated from [53]:

Then, we can determine the energy spectrum from the following formula:where is the number of the filters and is the triangular filter bank:where are the boundary points of the filters.

Then, we determine the mapping between the real frequency scale (in Hz) and the perceived frequency scale (in Mels) by

Finally, we apply a Discrete Cosine Transform (DCT) to the natural logarithm of the Mel spectrum, obtaining the Mel-Frequency Cepstral Coefficients:

Flow diagram for the MFCC-based feature extraction is shown in Figure 11.

4.2.2. Feature Extraction

As the dimension of the MFCC related features generated from the acceleration data becomes very high, we applied Principal Component Analysis (PCA) in this regard (Figure 10). PCA is a linear technique to project data onto an orthogonal lower-dimensional space so that the variance of the projected data is maximized [54]. Thus, we adopt the PCA algorithm to reduce the dimensions of high-dimensional features.

Suppose that we have dataset , where , is the dimension of a data point, and is the length of measurement sequence. We determine the covariance matrix fromwhere

Since is real and symmetric, we can diagonalize the matrixwhere is a real and unitary matrix. The diagonal elements of are the eigenvalues of , and the columns of are the eigenvectors of . We sort the eigenvalues from the large one to the small one, and are the corresponding eigenvectors. A data point is projected onto eigenvector by

The eigenvector with the largest eigenvalue is called the principal component. The vector provides the best direction of data projection. Similarly, the set of eigenvectors is used to transform the -dimensional space to the -dimensional space. More details on the PCA method can be found, for example, in [54].

4.2.3. Classification

In the presented approach, we applied two different classifiers to fuse features, that is, k-Nearest Neighbors and Logistic Regression algorithms. In our studies, we decided to compare k-NN with LR, since the first method does not need a training process, while the second one requires training. Also the chosen methods have relatively low computational costs [5457].

k-NN is a nonparametric distance-based supervised classifier. This method is efficient and easy to implement. The classifier is based on the closest training examples in the feature space [54] and has the formwhere denotes training set of examples, corresponds to the number of the nearest points, is the set of indices of nearest points of in training set , and is the indicator function.

LR is a generalization of the linear regression algorithm [58]. In this case, the linear combination of inputs is passed through a sigmoid function, and the Gaussian distribution is replaced by the Bernoulli distribution [58]. The general formula for Logistic Regression is as follows:where stands for Bernoulli distribution, denotes the weights, and is the sigmoid function.

We employ the training process for the sigmoid activation function. To this end, we optimize the following cost function:where is the number of training examples, is the training sample, and is the corresponding correct label. We determine the parameters of LR by applying the Gradient Descent optimization approach. More details on learning algorithm for Logistic Regression can be found, for example, in [58].

5. Experimental Results and Discussion

The proposed algorithm is evaluated on data gathered during the tennis training sessions. The dataset contains the labeled three-axis acceleration measurements captured for serve, backhand, and forehand strokes using the wrist-worn sensor, the Pebble Watch (Section 4.1).

In the performance analysis of the proposed approach, we applied both 10-fold cross-validation and leave-one-out methods. The cross-validation-based evaluation technique means that the data from each participant is used in training and testing stages. The leave-one-out technique aims to prove the performance of the algorithm for the new user. In our experiments, we applied one of the versions of this approach, that is, user-independent. In this case, data provided by the test user is evaluated based on the training data acquired from the other participants.

To measure the performance of the proposed approach, we applied the accuracy metric. It is the standard metric to summarize the overall performance of the classifier. The accuracy metric is defined as follows.where correct predictions correspond to the number of correct predictions and total predictions are the total number of predictions.

In this study, the features are generated by the MFCC-based features generator, while PCA was applied as the method to reduce their number. We evaluate the performance of the proposed fusion-based approach for two classifiers: k-NN and Logistic Regression.

In the next subsections, we study the effects of the feature vector size, the window size, and the MFCC-based features vector on the performance of the proposed approach. We also analyze the performance of the proposed algorithm applying the cross-validation and leave-one-out techniques.

5.1. The Effect of Different Number of Features

In this section, we study the impact of a different number of features on the classification performance. To this end, we used the PCA method to extract the proper number of features from the set of MFCC-related features. The extracted features were tested on two classifiers, that is, k-NN and Logistic Regression. In Figure 12, we present the classification performance of the classifiers for different sizes of the features vector.

Figure 12 shows the best classification accuracy of k-NN for the vector size equal to 25 features. Meanwhile for the Logistic Regression-based classifier, we obtain the best performance for a size of feature vector equal to 50.

5.2. The Effect of Window Size

In this section, we discuss the role of the window size in the classification task for both k-NN and Logistic Regression-based approaches. As is shown in Figure 13, when we increase the window size, we also improve the classification performance of k-NN and Logistic Regression classifiers. The optimal window sizes are similar for both methods, and they are equal to 80 samples. When we increase the size of the window, the classification performance decreases.

5.3. The Effect of MFCC-Based Features

In Figure 14, we present the example of MFCC-based features obtained from the acceleration signal acquired from the serve stroke. The section shows the results from the analysis of the effects of MFCC coefficients. In Figure 15, we compare the classification performance of k-NN and Logistic Regression-based methods against the different size of the frame in Discrete Fourier Transform. The optimal size of the frame for k-NN equals 40 samples. Meanwhile for Logistic Regression, we found the optimal size of frame to be equal to 80 samples. It can be seen that further increasing of the frame size for both k-NN and Logistic Regression decreases the classification performance. In our research, we applied optimal size of the frame for the proposed classifiers.

In Figure 16, we present the results of analysis on how the number of filters in MFCC-based generator changes the classification performance. An optimal number of filters for the k-NN method equals 16. The optimal number of filters for Logistic Regression is also equal to 16.

5.4. Analysis Using Cross-Validation

In this section, we assess the predictive performance of the proposed approach. To this end, we applied cross-validation technique. To create the training set and the test set, we applied 10-fold cross-validation procedure. In this study, a 10-fold cross-validation procedure was used to generate the training set and the test set. The training set is used to train k-NN and Logistic Regression algorithms. In the test step, the obtained classes were compared to the true classes in order to measure the classifiers’ performance. The performance analysis was tested separately for the two datasets. The first dataset contains the measurements gathered during a tennis training of amateur players. The second dataset is based on the data acquired from the professional players.

In Table 3, we compare the results of performance analysis for both k-NN and Logistic Regression methods. As it is shown, we obtain better results for Logistic Regression with the data acquired from amateur players during training sessions. The k-NN method had the better performance for the dataset collected from the professional players.

5.5. Analysis Using Leave-One-Out

In this section, we analyze the performance of the proposed approach to predict the tennis strokes based on the new data acquired from a new player. In order to evaluate k-NN and Logistic Regression classifiers, we applied the leave-one-out evaluation [59]. In Table 4, we present the results obtained based on the data from amateur and professional players. Similarly, Logistic Regression-based classifier obtains better results for amateur players, whereas the k-NN method obtains better results for the professional participants.

6. Conclusions

This work proposed the data fusion-based approach to support the sports training of tennis players. The proposed solution is based on the features determined in the frequency domain. We prepared the datasets for three tennis strokes: backhand, forehand, and serve. The data was divided into two subsets: the first one contained the measurements from amateur players, and the second one comprised the data provided by professional players.

We evaluated two different classification methods: k-NN and Logistic Regression. In 10-fold cross-validation test, the Logistic Regression-based classifier provided better results for the datasets containing the measurements gathered from amateur players and for the collection of data acquired from all participants. The k-NN algorithm achieves better results for the data provided by professional players. We obtain similar results in the leave-one-out evaluation test. Based on our studies, we recommend using the k-NN algorithm to classify the data gathered from professional players and the Logistic Regression method for data acquired from amateur players.

The accuracy of the used classifiers is high, particularly for dataset containing measurements gathered from professional players; however, it is possible to improve the presented results. One of the methods that we plan to use in further works is the Naive-Bayes classifier. Another direction of the future works relates to adding the third level of data fusion. The decision-level data fusion can improve the overall accuracy of the detection and classification algorithm.

The presented data fusion-based algorithm is a useful tool for both the trainer and the trainee. On one hand, it can help the trainer to analyze not only the effects of the specific training session but also the trends in a sequence of training sessions. On the other hand, it can be used by the trainee in self-training, for example, to count tennis strokes during each training session. In summary, the data fusion in sports training provides new procedures to process and to analyze the sports training data.

In the future work, we will add more tennis strokes, which can be detected and classified by the proposed method. We also plan new algorithms extending the systems’ functionalities that support the analysis of the tennis strokes.

Data Availability

The dataset used in the research can be downloaded from https://www.ii.pwr.edu.pl/~krzysztof.brzostowski/files/Tenis_dataset.zip.

Conflicts of Interest

The authors declare that they have no conflicts of interest.