Abstract

This work introduces Wearable deep learning (WearableDL) that is a unifying conceptual architecture inspired by the human nervous system, offering the convergence of deep learning (DL), Internet-of-things (IoT), and wearable technologies (WT) as follows: (1) the brain, the core of the central nervous system, represents deep learning for cloud computing and big data processing. (2) The spinal cord (a part of CNS connected to the brain) represents Internet-of-things for fog computing and big data flow/transfer. (3) Peripheral sensory and motor nerves (components of the peripheral nervous system (PNS)) represent wearable technologies as edge devices for big data collection. In recent times, wearable IoT devices have enabled the streaming of big data from smart wearables (e.g., smartphones, smartwatches, smart clothings, and personalized gadgets) to the cloud servers. Now, the ultimate challenges are (1) how to analyze the collected wearable big data without any background information and also without any labels representing the underlying activity; and (2) how to recognize the spatial/temporal patterns in this unstructured big data for helping end-users in decision making process, e.g., medical diagnosis, rehabilitation efficiency, and/or sports performance. Deep learning (DL) has recently gained popularity due to its ability to (1) scale to the big data size (scalability); (2) learn the feature engineering by itself (no manual feature extraction or hand-crafted features) in an end-to-end fashion; and (3) offer accuracy or precision in learning raw unlabeled/labeled (unsupervised/supervised) data. In order to understand the current state-of-the-art, we systematically reviewed over 100 similar and recently published scientific works on the development of DL approaches for wearable and person-centered technologies. The review supports and strengthens the proposed bioinspired architecture of WearableDL. This article eventually develops an outlook and provides insightful suggestions for WearableDL and its application in the field of big data analytics.

1. WearableDL: Conceptual Architecture

Wearable DL is a concept derived from a holistic comparison between the evolving big data system and the human nervous system (NS) in terms of architecture and functionalities. Although the human NS is a biological mechanism, it essentially inspires the convergence, collaboration, and coordination of three key elements such as wearable tech (WT), Internet of things (IoT), and deep learning (DL) in the development of big data system for actionable outcomes and informed decision making.

The article views the big data system with respect to its close resemblance with the human nervous system (NS). The NS is responsible for coordinating the actions such as the transmissions of signals to and from the human body, identification, perception, decision making, and information storage [1]. Similarly, the big data system (or model) is evolving and conversing various domains such as wearable sensors, edge computing, fog computing, cloud computing, and deep learning (DL) to achieve equivalent functions such as signal communication, perception, decision making, analytics, and storage. As the complexity of the big data system rises, it becomes important to understand the architectural and functional components of the NS. This could guide us to develop more improved and sophisticated version of a big data system.

1.1. A Brief Overview of the Human Nervous System

The NS is composed of two subsystems:(1)Central nervous system (CNS) consists of the brain and spinal cord(2)Peripheral nervous system (PNS) consists of nerves with sensory and motor fibers

1.1.1. PNS

The end elements of PNS are sensory and motor fibers which are connected to the parts and organs of the body. The sensory fibers sense various sensations including pressure, temperature, and pain on the body and sends them to the nerves leading to the spinal cord (a part of CNS). The motor fibers receive the commands from the CNS to actuate and activate the muscles and organs. The bundle of fibers which collectively forms nerves connected to the spinal cord relay the information back and forth between CNS and PNS.

1.1.2. CNS

The spinal cord is a part of CNS which serves two purposes:(1)It acts as a bidirectional relay for the signals to flow between the body and the brain. This function supports the NS to make centralized decisions.(2)The spinal cord also coordinates the reflexes in which the decisions are made in real-time to avoid delays in critical conditions. A simple example of the reflex is removing the hands from a hot object.

The ultimate top layer of CNS is the human brain made of approximately 100 billion neurons [1]. Each neuron connects to one or more other neurons. The brain receives the signals from the spinal cord and other sensor organs such as eyes, nose, tongue, and ears. The brain processes the incoming signals and makes decisions. It generates commands that pass through the spinal cord to the PNS. The commands activate the muscles or organs of the body. Apart from the processing and decision making, the brain also stores the information that is used in a short or long-term decision making process.

1.2. PNS vs Wearable Tech/Wearable Edge Devices for Big Data Collection and Application (Actions)

WT is comparable to the sensory and motor fibers of PNS because of the following:(i)Fibers are the carriers of the information similar to WiFi backbone in WT(ii)WT is located onto the periphery of IoT architecture, interacting with the environment for sensing and actuation

For example, modern smartwatches come with built-in sensors such as heart rate, motion, ambient light, and also actuators including touch screen, audio speakers, and tactile (or haptic) feedback. Edge devices, such as smartphones, act similar to the nerves (as part of PNS). The smartphone receives the data from the connected smartwatch sensors and also commands the smartwatches to alert the wearers through the actuations on haptic, visual, or audio feedback. This helps us collect the sensor data and send them to the upper layer such as edge devices.

1.3. CNS-Spinal Cord vs IoT and Fog Gateways for the Big Data Flow and Local Intelligence

IoT and fog gateways are equivalent to the spinal cord in CNS (Figure 1) as follows:(1)Big data transfer (flow) between PNS (sensory and motory nerves) and the brain (the central processing and intelligence unit)(2)Local intelligence for locally responding to some stimuli such as extreme heat and pain

As described earlier, the spinal cord plays an important role in reacting and responding to some specific stimuli such as feeling pain and reacting to the pain, e.g., caused by extreme heat. It is also responsible to deliver the motor response and reactions from our brain to the PNS (our motor actuators and muscles) for any dynamic (kinematic) movement (motion) internally and externally, i.e., inside our body or outside. IoT intelligence, as a local intelligence, is functioning similar to the spinal cord. For example, the smartwatch sensor data can be processed onto fog gateways which are located in homes or hospitals away from the centralized cloud servers. In this case, the sensor data are processed on the gateway for the local decision support in time-critical applications, e.g., the sensor data streams could help the detection of a fall event in an elderly person living alone at home. In this way, the fog or IoT gateway provides reflex-type services to alert appropriate individuals such as medical personnel or caretakers to respond to the event immediately. This reduces the potential delays in time-sensitive events.

1.4. CNS-Brain vs DL and Cloud Computing for Big Data Analytics

The cloud computing servers are equivalent to the physical architecture of the brain, and the DL-based big data analytics resembles the function of the brain. The human brain is a centralized processor to receive the incoming stimulus from the spinal cord (connected to PNS) or other sensor organs. Upon receiving, it perceives and makes decisions on how and when to respond to the stimuli. It also stores the information. Similarly, the cloud computing servers receive the big data from WT via fog computing. Upon receiving, it uses high-performance computers to apply DL methods (explained in the next section) that help in decision making. Very similar to the brain, the cloud computers derive when and how to respond to the incoming queries. It often stores the sensor data to learn the patterns and create historical database to enable informed decision making in the future.

1.5. Outline and Contributions

In this article, we endeavor to describe the benefits and challenges associated with the use of DL in the wearable big data. We have conducted a thorough survey of more than 100 literatures related to DL and its applications in wearable IoT. The survey allowed us to create a holistic picture combining wearable sensors, IoT, DL, and big data. This work’s key contributions are structured as follows:(i)Section 2 provides an overview of wearable IoT including the concept, its different categories of wearable IoT devices, and its future direction in a nutshell.(ii)Section 3 provides a research roadmap for DL thorough understanding of its past, its present, and its future. Here, we focus on how understanding the human brain, specifically neocortex, links to the development of the artificial intelligence (AI) and how that is mainly divided into three areas: ML, DL, and Cortical Learning (CL) which are covered in this section.(iii)Section 4 emphasizes on the recent similar work applications of WearableDL in big data analytics. Over 100 recently published literatures were reviewed and included in this section to correlate with the paradigms of WearableDL and its applications.(iv)Section 5 projects WearableDL future research and application direction in association with the wearable big data.

2. Wearable Internet-of-Things

In 1965, an observation, later regarded as Moore###XXX###2019;s law, estimated that the number of transistors on integrated circuits (IC) doubles every two years [2]. Moore###XXX###2019;s law prediction played an important role in the semiconductor industry and motivated the evolution of miniaturized yet high performance computing (HPC) chips which revolutionized the modern world. This evolution caused an explosion in the production of electronic devices and therefore brought a limitless expansion in the use and applicability of the computing chips that today drive smart wearable devices, smartphones, personal computers, smart homes, and smart cities along with WiFi, Internet, and other communication devices. As a result of the aforementioned evolution, explosion, and expansion, the wearable devices are booming in the market, and therefore, we witness the growth of personalized big data that hold a significant value to the end-users including citizens, communities, hospitals, and governments to improve health or performance, reduce medical cost, and increase efficiency [3].

2.1. Wearable Devices Categories

Overall, the wearable devices can be categorized into three main classes (Figure 2):(1)Implantable devices: these devices are implanted inside the body for a long period of time, e.g., cardiac pacemaker and deep brain stimulator are implanted for 5–10 years to provide current to specific organs.(2)Wearable contact devices: this is the largest category among the three types. These devices are targeted to stay on the body unobtrusively to collect various parameters including heart rate, physical activity, body temperature, muscular activity, blood/tissue oxygenation, and other physiological parameters. The most common devices in this category are smartwatches, smart clothing, smart footwear, fitness trackers, HR chest belts, and ECG Holter monitors.(3)Wearable ambient monitors: these devices are made to sense outside environment instead of the body’s physiological state. Google glasses (smart glasses) are a simple example of this category, in which, a wearable camera allows to record the surrounding scenes [4].

2.2. Wearable IoT: Convergence of Wearable Devices and Internet-of-Things (IoT)

The convergence and deployment of wearable devices, Internet-of-things (IoT), and cloud computing together allow us to record, monitor, and store a wide range of the big data from individuals such as personalized health and wellness data, body vital parameters, physical activity, and behaviors, which are all critical data indicating the quality and the trend of daily life [5]. In the past, wearable devices were a stand-alone system. However, bringing wearable devices into the framework of IoT makes it possible to stream the data from an individual to a centralized location such as cloud servers. The continuous accumulation of the wearable data becomes a massive big data [6] that, in general, are a set of sequential time-series signals and logs containing biometrical, behavioral, physiological, and biological information depending on the nature of wearable devices categorized above. One of the key objectives of collecting the wearable big data is to support remote or on-site decision making by detecting symptoms, events, and anomalies, or by producing contextual awareness [7].

2.3. Wearable Data Categories

Wearable biosensing devices can collect a large variety of physiological data continuously, all-day long and in any-place health, mental, and activity status monitoring. These multiparameter physiological sensing systems provide us with reliable and crucial measurements for supporting online decision making by detecting the symptoms and producing contextual awareness [8]. A wide range of wearable data in biomedical and health is provided by this overview [9].

2.4. Emerging Unobtrusive Wearable Devices

Wearable sensors can be either woven or integrated into clothing, accessories, and the living environment, such that individuals’ or patients’ data can be collected in their daily life. According to an overview [10], four emerging unobtrusive wearable technologies (WT) which are essential for collecting the individuals’ health data are the following:(1)Unobtrusive sensing methods(2)Smart textile technology(3)Flexible-stretchable-printable electronics(4)Sensor fusion

2.5. Data Reliability

Data reliability strongly depends on the type of collected data and specifically on the category of the collected wearable data in general. In the wearable DL scenario, it is not the role of the wearable devices to assess data reliability. A presifting of the data, particularly in case of structured data, can be implemented directly on the device by embedding data sifting policies dictated by a prior interaction with medical specialist, physicians, and studies. However, data reliability should be assessed at IoT and DL level as discussed later.

3. Artificial Intelligence

Artificial Intelligence (AI) is ultimately the ability to reconstruct the human biological intelligence for modern machines. AI domain is currently divided into three active learning-based areas of research: machine learning (ML) [11, 12], deep learning (DL) [13], and cortical learning (CL) [14] (Figure 3).

3.1. Cortical Learning

Cortical learning (CL) is inspired from our cortical structure (i.e., based on studying the neocortex) and coined by Hawkins et al. [14, 15] from https://numenta.com/Numenta. The cortical area is the largest part of the brain in humans and monkeys compared to other species and the main source of our intelligence [14, 16]. The CL algorithm/approach (CLA), inspired by the architecture of the human cortex [14, 17], is applied to an approach called hierarchical temporal memory (HTM) [16, 1820]. Cortex learns the spatial and temporal patterns in the sequential data, e.g., for visual perception, spoken language comprehension, manipulating objects, and navigation in a complex 3D world [17].

3.2. Machine Learning

Machine learning is the mother subject for deep learning and many other statistical or probabilistic analysis approaches but not necessarily related to CL which is neuroscience-based endeavor for AI (computational neuroscience or systematic neuroscience). ML is mostly referring to shallow ML approaches which are not scalable to the data size. This set of shallow artificial learning algorithms [11, 12] helps machine directly learn from the data, model the data, and generate machine intelligence. ML is highly founded on mathematics, e.g., linear algebra, calculus, statistics, probability, and stochastic optimization approaches such as evolutionary algorithms (EA) and Monte Carlo search. Some of the ML limitations are as follows:(1)It is very broad and often mathematically proved but not biologically inspired. This is a problem since biologically inspired algorithms often are proved to be extremely powerful and robust such as genetic algorithms. On top of that, AI is targeting biological intelligence at the first place and ultimately aims to replicate/reconstruct our biological intelligence, human intelligence.(2)It is often shallow and not scalable to the data size, i.e., as the data size or dimensionality grows exponentially (big data problem), the traditional ML approaches (e.g., SVM) can not scale up the data size. This causes a problem so-called under-fitting which means there are not enough parameters in the learning approach for approximating the best fitting function.(3)It is also hard to apply it to high-dimensional data directly. That is why we have to apply dimension reduction to the data first by either manually do the feature extraction or engineering (hand-crafted features) and then apply the ML approach to the data features with the reduce dimensionality.(4)ML approach accuracy and robustness for noisy data is almost not comparable to DL approaches since ML approaches are learning from few examples or small training data compared to DL approaches which are capable of learning from massive dataset (big data).

3.3. Deep Learning

DL approaches differ from shallow ML algorithms in terms of scalability, i.e., depth (number of hidden layers) and width (number of cells or units or neurons in each layer). DL (or deep ML) is a scalable ML approach capable of scaling to the data size in terms of high number of data samples or data dimensionality. DL is applied to artificial neural networks (ANN or NN) and that is why it is also known as deep neural networks (DNN) [13, 21]. DL is the ability to learn the deep architectures of NN using backpropagation (BP) [22, 23]. Error backpropagation [24] is the dominant training approach for NN which was proposed in 1986 for training multilayer perceptrons (MLP) which is backpropagation of the resulting error between the predicted output and the given labels into the network for fine-tuning the weighs in order to minimize the loss/cost function in a nonconvex surface. DL is loosely inspired by the visual cortex [2527]. It is mimicking our brain [27] in terms of learning and recognizing the spatial and temporal patterns (or spatiotemporal) in the data. DNN are basically deep hierarchical layers of perceptrons [28], as artificial neurons, for representation and regression learning [29, 30].

4. Deep Learning

The research question of ”How can the massive wearable big data be analyzed to produce actionable outcomes?” is difficult to answer when the wearable big data is heterogeneous, unlabeled, and unstructured. This means the wearable big data seeks unsupervised learning methods that can not only analyze the data but also identify helpful patterns leading to informed decision making. In recent years, deep learning (DL) has been established as a new area of machine learning research which aims to advance artificial intelligence [13]. A plethora of studies provide evidence that DL has achieved state-of-the-art results in various fields related to computational intelligence and big data including computer vision and image processing [31], speech processing [32], natural language processing (NLP), and machine translation [33, 34]. Similarly, these DL advancements bring a new promise to analyze the unsupervised wearable big data in order to recognize the spatial and temporal patterns related to health, wellness, medical condition, sports performance, and safety (Table 1).

Deep learning (DL) is exponentially gaining interest in research and development (R&D) community at academia and industry as they are also being heavily invested on by giant software and hardware companies such as Google, Nvidia, and Intel [3537].

4.1. Deep Learning History: Receptive Fields of Neurons Inspired from Cat’s Visual Cortex

Simple-cells and complex-cells [38] were found in the receptive fields of single neurons in the cat’s visual cortex. The discovery of receptive fields of neurons in the cat’s visual cortex [38] contributed enormously to NN, AI, computer vision, and neuroscience community (demonstrated visually with timeline in Figure 4).

4.2. Deep Learning History: The First Conceptual Architecture—Cognitron and Neocognitron

The discovery of simple and complex cells was followed by the introduction of cognitron and neocognitron (by Fukushima et al. in 1975 [39, 4244]). Neocognitron (as the first proposed deep NN architecture) was the inspiration behind the introduction of the convolutional neural networks (CNNs) by LeCun in 1989 [45] (as shown in the past part of the timeline in Figure 4). Cognitron and neocognitron (Fukushima et al. [39, 42, 43]) were introduced as self-organizing multilayered neural networks. The proposed cognitron and neocognitron architectures, by Fukushima [42, 43], is composed of the simple-cells and complex-cells inside the CNN architecture.

4.3. Deep Learning History: Neural Networks

Schmidhuber’s survey [46] thoroughly reviews the history of DL in NN since the birth of ANN along with different types of learning approaches applied to DNN architectures such as unsupervised learning (UL), supervised learning (SL), and reinforcement learning (RL). It also discusses evolutionary algorithms (EA) and optimization approaches (e.g., genetic algorithms) along with the learning algorithms for minimizing mean squared error (MSE) and sum of squared error (SSE).

4.4. Deep Learning Math Foundation: Artificial Neural Networks Universal Approximation Theorem

ANN is a universal function approximator based on the universal approximation theorem [23, 4749]. This theorem proves that ANN, even with a single hidden layer of finite size, can approximate any continuous functions [47]. This approximation theorem [47] is applicable to high-dimensional as well as low-dimensional function approximations [23]. For example, 2-dimensional-CNN (2D-CNN) or 3-dimensional-CNN (3D-CNN) for image and point-cloud classification (high-dimensional 2D or 3D data) was compared to feed-forward neural network (FFNN) for a low-dimensional time-series signal classification. In this case, CNN contains much more parameters for high-dimensional function approximation compared to FFNN which is a low-dimensional function and containing much less parameters.

4.5. Deep Learning Applications: Recent Breakthroughs and State-of-the-Art Results

DL has achieved the state-of-the-art results in many fields such as computer vision, speech processing, and machine translation as the following:(1)A breakthrough in 2012 for computer vision using DL: deep convolutional neural nets (CNN), LeNet by LeCun in 1989 [45], proved to be enormously efficient in an end-to-end image classification and analysis [31].(2)A breakthrough in 2012 for speech processing using DL: another important breakthrough, almost in the same year as Krizhevsky [31], was applying DL to TIMIT, massive dataset for speech recognition [32]. Microsoft immediately started adopting and applying this approach to its own AI assistant, Cortana for Windows 10 [32].(3)Google brain project—The first large-scale DL project in 2012: this project, as a large scale distributed deep networks led by Dean et al. [50], applied deep belief network (DBN) to massive data from Youtube (videos of cats) using 16,000 computers in distributed parallel configuration. This large-scale implementation of DBN, on distributed parallel computing platforms, successfully recognized cats in videos after watching millions of cat videos on Youtube without any supervision or teaching signals (unsupervised setting).(4)Bridging the gap between human-level translation and machine translation in NLP by Google neural machine translation: Google neural machine translation (GNMT) [33, 34] is as an end-to-end DL model for automated translation which has outperformed the conventional phrase-based translation systems by far. The proposed GNMT [33, 34] system requires big computational power (big compute) and massive datasets (big data) for both training and translation inference for building big model (big net).

4.6. Deep Learning Architectures

Some of the important and famous DNN architectures are the following:(1)Feed-forward neural network (FFNN): this is the simplest NN, also known as multilayer perceptron (MLP), with feed-forward connections. FFNN is also referred to as fully connected network (FCN) inside CNN architectures.(2)Convolutional neural network (CNN): CNN is loosely inspired by the cat’s visual cortex [38]. It was initially proposed as Cognitron [42] and Neocognitron [39]. CNN architecture was initially applied to digit recognition and trained using BP by LeCun in 1989 [45]. This CNN architecture is also referred to LeNet [45].(3)Deep belief network (DBN), Deep Boltzmann Machines, and Restricted Boltzmann Machines (RBM): DBN was initially proposed and trained using backpropagation by Hinton et al. [51, 52] as a deep unsupervised learning (DUL) approach using greedy pre-trained stacked up layers of RBM.(4)Autoencoder (AE): AE is another DUL approach for dimensionality reduction [53] and data compression. Variational AE (VAE) [54] is another recent version of AE which improved the AE precision in generating images and data as a generative model using Bayesian distribution.(5)Spike neural network (SNN): This type of NN is mimicking the Spike stimulation of brain inside the ANN, i.e., loosely inspired from how Spikes are activating neurons in our biological NN (brain).(6)Deep Q-networks (DQN): DQN [55, 56] is the first deep reinforcement learning (DRL) approach proposed by Google DeepMind. This DL approach achieved human-level control in playing variety of Atari games.

4.7. Deep Learning Dominant Training Approach: Backpropagation

DL is mainly related to the algorithms for learning big and deep NN architectures. BP is dominantly the learning algorithm used in DL [29, 57] which is the main power behind the scalability of DL architectures such as CNN and DBN. DL is coined mainly by LeCun et al. in 2015 [13]. Goodfellow et al. [21] published a book providing a thorough explanation of DL theory and approaches.

4.8. Deep Learning Categories and Subdomains

The DL approaches, regardless of their application domains, are mainly categorized into three dominant groups (the same as ML): deep unsupervised learning (DUL), deep supervised learning (DSL), and deep reinforcement learning (DRL). There are also some subcategories (subdomains) which are currently the active area of research in DL as well such as transfer learning (TL), semisupervised learning, learning-by-demonstration, and imitation learning.

4.9. Deep Learning Biological and Neurological Inspiration Related to Backpropagation

BP (proposed by Rumelhart et al. [24, 29, 57]) in DL is supported neurologically by random synaptic in Lillicrap et al. [26]. Lillicrap et al. [26] argues that BP is functioning similar to an error feedback neuron for error optimization (minimization). Yasmin and DiCarlo [25] also provide another strong biological foundation for BP and CNN architecture in DL. They demonstrate visually how the goal-oriented convolutional hierarchical layers are inspired from sensory cortex.

4.10. Deep Supervised Learning

DSL is divided into three categories [46]: FFNN & CNN, recurrent neural network (RNN), and the hybrid one (combination of both: convolutional long short-term memory (LSTM) and convolutional RNN).

4.10.1. Feed-Forward Neural Network and Convolutional Neural Network

FFNN was also traditionally known as MLP. FFNN, so-called FCN, is often the last two years inside CNN architecture in DSL. CNN was at first applied to optical character recognition (OCR), specifically for digit recognition, and trained using BP by LeCun et al. in 1989 [45]. This CNN, named LeNet after LeCun et al. [45], was brought back to the attention in 2012 after reducing the classification-error rate almost in half in Imagenet contest by Alexnet, named after Krizhevsky et al. [31].

4.10.2. Recurrent Neural Network and Long Short-Term Memory

LSTM was proposed by Hochreiter and Schmidhuber in 1997 [58]. LSTM was successfully trained using BP through time (BPTT) unlike vanilla RNN which was heavily suffering from the problem of vanishing gradient and exploding gradients [59]. LSTM-based RNN is often applied to sequential learning for temporal pattern recognition.

4.10.3. Hybrid DL: Convolutional LSTM (ConvLSTM)

Xingjian et al. [60] proposed convolutional LSTM (ConvLSTM) as a hybrid approach which is a combination or an integrated version of CNN [13] and RNN (LSTM [58]). In this regard, Zhang et al. [61] show reliable results using hybrid approach for speech recognition. Residual bidirectional ConvLSTM [61] is a very deep network including bidirectional LSTM and CNN with residual connections for an end-to-end speech recognition which is an efficient and powerful deep hybrid model for acoustic speech recognition.

4.11. Deep Unsupervised Learning

DUL focuses on unlabeled big data which are abundantly available nowadays on the web. Bengio et al. [62] review DUL approaches and provide new perspectives on them. Yeung et al. [63] propose an approach for learning the big unlabeled data existing on web (i.e., also referred to as wild). Rupprecht et al. in 2016 [64] also propose a DUL framework for the unlabeled big data as new methodology of learning multiple hypothesis. Mirza et al. [65] provide a DUL architecture for the generalization of the aligned features, specifically to perform TL across multiple task domains.

4.11.1. Deep Belief Network, Restricted Boltzmann Machine, and Google Brain Project

DBN is, in fact, the stacked up layers of pretrained restricted Boltzmann machine (RBM). In 2012, Google brain project (as a ”large scale distributed deep networks”) led by Dean et al. [50], was applying DBN using massive data from Youtube videos on cats and 16,000 computers in distributed parallel configuration. This large-scale implementation of DBN, on distributed parallel computing platforms, successfully recognized cats in videos after watching millions of cat videos on Youtube without any supervision or teaching signals, i.e., entirely accomplished in unsupervised setting (DUL). When DBN is trained on a massive dataset (big data) as DUL, it can learn how to probabilistically reconstruct its inputs. DBN layers (layers of representation) can act as feature detectors (extractors) on inputs [32, 50, 51, 66]. After this learning step, a DBN can be further trained in a supervised way to perform classification [32, 51] for TL.

4.11.2. Generative Adversarial Networks (GANs)

Goodfellow et al. [67] proposed GAN as two networks which are competing against each other. One of these networks is generator and another one is discriminator. The generator tries to produce fake input data similar to the real one to fool the discriminator. This adversarial training performed on GANs is entirely based on game theory.

4.11.3. Autoencoders

AE is an unsupervised DL architecture for DUL applied to denoising, dimensionality reduction [53], data compression, and image or data generation (generative models). VAE [54] is an improved AE as a generative model using Bayesian distribution. It can also be trained and transferred to a DSL architecture (TL) for classification and regression purposes [68].

4.12. Deep Learning (DL) and Reinforcement Learning (RL) Started Getting Published in Nature: Quick Review of the Recent Years Progress

This part briefly walks you through how DL and RL were combined/accomplished with an incredible speed since 2015 only and only from Nature publication perspective:(1)In 2015, deep learning (DL) models found their way into Nature publications by producing incredible results in AI [13].(2)At the same year, neuroscientists found a very interesting relationship between goal-driven DL models and our sensory cortex in the brain [25]. This was a huge leap toward biological inspiration of deep reinforcement learning models.(3)In that year, one DL-based AI agent created super-human-level performance in many Atari games [55]. This approach, so-called Deep Q-Networks (DQN), demonstrated least human-level control (performance) in playing many Atari games [56]. This was the birth of deep reinforcement learning (DRL) because of the combination of RL (initially proposed by Sutton in 1984 [69]) and DL [13].(4)In 2016 (one year later), another DL-based AI agent, AlphaGo, dominated the Go game by only watching the previously human-played Go games [70]. AlphaGo (Silver et al. from Google DeepMind in 2016 [70]) made a considerable impact on DRL community by dominating the game of Go (a Chinese ancient chess-like game) using two deep cooperative networks [70]: deep policy network and deep value network (DQN [55]). The policy network was basically recommending the next possible moves (actions) and the value network (i.e., Q-network) evaluate the moves intuitively based on the previous experiences. Eventually Q-network (value-network) picks the most valuable move based on the selected move with maximum rewarded value. The cooperative networks in AlphaGo [70] are cooperating with each other on contrary with GAN.(5)Recently, AlphaGo Zero [71] started learning the Go game from scratch only-and-only by playing in a try-and-error fashion and even beats the previous AlphaGo eventually.(6)Finally, an important implementation of grid-like cells in mice [72] can loosely demonstrate how the navigation is performed using these grid-like cells and how they are represented in artificial agents. These Grid cells were discovered in 2005 by Wills et al. [73] and a team of scientists in Norway [74]. They were awarded the 2014 Nobel Prize for their discoveries of cells that constitute a local/global positioning system in the brain.

DRL has opened a new frontier in AI so-called artificial general intelligence (AGI) which is exponentially growing and succeeding in demonstrating human-level and even super-human level performance not even in playing Atari games but in robotics [75] and other domains performing complex task such as imitation learning or learning by demonstration:(1)Combining inverse RL and GANs: since GAN is a generative model to maximize the reward function to fool a discriminator network, it is related to RL in terms of learning how to maximize the reward function. In RL, learning the reward function for an observed action is coined as inverse reinforcement learning (IRL). Finn et al. [76] show that IRL [7779] is equivalent to GAN [67] by highlighting the mathematical connection between them.(2)Generative adversarial imitation learning (GAIL): it is the combination of GANs and imitation learning [80]. This model is also introduced earlier in 2016 by Baram et al. [81] as model-based adversarial imitation learning. These models generally aim to model human behaviors and motives using IRL [82] which is dominantly targeting lack of reward of function for variety of complex task or the difficulty of defining a reward function for these tasks rather than learning a reward function for them.

Another recent development in DRL is applying RNNs, specifically LSTM [58], for learning the temporal dependencies since the RL tasks are all sequential, known as generative RNNs. In this regard, Schmidhuber team is one for the main fore-frontiers by introducing the world models [83]. A beautiful combination of GAIL and generative RNNs are proposed very recently in Zhu et al. [84] in order to apply these RNN-based GAIL models for diverse visuomotor skills, specifically in robotics manipulation across both simulation and real domains. In this direction, generative query networks (GQN) [85] have shown very promising results in terms of an agent predicting how the environment would look like taken a specific action. GQN is another great combination of RNN, GAN, and imitation learning.

5. WearableDL: Literature Review

Feature extraction is the key in understanding and modeling the repetitive patterns of the collected physiological and behavioral data. Traditionally hand-crafted features were extracted based on the expert knowledge, which were labor-intensive and time-consuming, for classification or regression purposes. Moreover, the manual feature extraction process does not scale well when the wearable data is growing rapidly in size temporally (number of samples in time) or spatially (number of dimensions). That is why our article aims to explore DL approaches since they are capable of scaling to the data size. In this section, we review the literature related to DL approaches for analyzing different types of wearable data as demonstrated and mapped briefly in this Table 2.

5.1. Embedding DL in Mobile, Wearable, and IoT Devices

Lane et al. [86] presents a study on embedded DL in wearables, smartphones, and IoT devices in order to build the knowledge of the performance characteristics, resource requirements, and the execution bottlenecks for DL models. Regarding DL for mobile, wearable, and embedded sensory applications, DL requires a significant amount of device (and processor) resources. The limited availability of memory, computation, and energy on mobile and embedded platforms is a serious problem for powerful DL approaches.

5.1.1. SparseSep: Large-Scale-Embedded DL in Smartphones

SparseSep [87] leverages the sparsification of fully connected layers and separation of convolutional kernels to reduce the resource requirements of DL algorithms. SparseSep [87] allows large-scale DNN (with fully connected layers and with convolutional layers) to run and execute efficiently on mobile and embedded hardware with minimal impact on inference accuracy.

5.1.2. DeepX and Demo: Embedded DL Execution Accelerator

DeepX [89] is a software accelerator for efficient embedded DL execution. DeepX significantly lowers the wearable resources (e.g., memory, computation, and energy) required by DL which is a severe bottleneck to mobile (smartphone) adoption. DeepX [89] is an embedded efficiently executable large-scale DL model on mobile (smartphone) processors versus the existing cloud-based offloading. Demo [92] and DeepX [8991] are good case studies for adapted and embedded low-powered DL software for mobile devices and smartphones, specialized for wearable and behavioral big data analytics.

5.1.3. Embedded DL for Wearable Multimodal Sensor Data Fusion and Integration

Radu et al. [6] used smartphone and smartwatch for human or user activity recognition (HAR). Data integration and fusion, using DL from smartphone and smartwatch, is the focus of this work [6]. DL, specifically RBM, is proposed in [6] for integration (or fusion) of sensor data from multiple sensors (different modalities). Bhattacharya and Lane [7] performed a smartwatch-centric HAR using DL, specifically RBM. Behavior and context recognition tasks related to smartwatches (such as transportation mode, physical activities, and indoor/outdoor detection) using DL (RBM) is performed and focused in [7]. Although DL-based (RBM) human activity recognition outperforms other alternatives, DL resource consumption is unacceptably high for constrained WT devices like smartwatches. Therefore, a complementary study is conducted in Bhattacharya and Lane [7] related to the overhead of DL (RBM models) on smartwatches.

5.1.4. DeepEye: Embedded DL in Wearables with Built-in Camera for Wearable Image Analytics

Wearables with built-in camera provide us with the opportunities to record our daily activities from different perspectives and angles. This is potentially useful in terms of a low vision over our daily lives. DeepEye [93] is a match-box sized wearable camera capable of running multiple cloud-scale-embedded DL models in the device for almost real-time image analysis without offloading them to the cloud. DeepEye [93] is powered by a commodity wearable processor to address the bottleneck of executing multiple DL models (CNN) on wearable limited resources with specifically limited runtime memory. Chen et al. [95] propose an embedded deep CNN into iOS smartphones by maximizing data reusability for approaching the high bandwidth burden in DL, specifically the convolution layers of CNN. The effective data reuse makes it possible to parallelize all the computing threads without data loading latency. Chen et al. [95] enhance the capability of DL on local iOS mobile (smartphone) devices.

5.1.5. DeepEar: Embedded DL in Smartphones for Audio Signal Analytics

Regarding mobile audio sensing and analysis, DL has radically changed related audio modeling domains like speech recognition [146]. DeepEar [94] is a framework for mobile audio sensing using DL, which is trained in an unsupervised setting using a large-scale unlabeled dataset (big audio data) from 168 place visits. With 2.3 M parameters, DeepEar [94] is more robust to background noise compared to conventional approaches in the wearables, specifically in smartphones (mobile devices).

5.2. Embedded DL in Mobile Sensing Framework

Lane et al. [97] is a survey on mobile sensing architecture composed of sensing, learning, and distribution. This survey [97] reviews the existing mobile phone sensing algorithms, applications, and systems related to the architectural framework for mobile phone sensing research. Harari et al. [96] discusses the potentials and limits of smartphones in collecting wearable biometric and physiological data for behavioral science since smartphones help us enormously collect continuous behavioral data in our daily lives without attracting any attention. The collected continuous behavioral data includes social interactions, daily activities (physical activity), and mobility patterns. Harari et al. [96] look at the practical guidelines for facilitating the use of smartphones as a behavioral observation tool in psychological science. Lane and Georgiev [98] provide a low-power embedded DL using a smartphone System-on-Chip (SoC). This work highlights the critical need for further exploration of DL in mobile sensing towards robust and efficient wearable sensor data inference. DeepSense [99] is a DL framework to address the noisy mobile sensor data and feature engineering problems in mobile sensing. DeepSense [99] integrates CNN and RNN to extract temporal and spatial patterns in the mobile sensor data dynamics for car tracking, HAR, and user identification.

5.3. DL for Time-Series Data Analytics

In many real-world applications (e.g., speech recognition or sleep stage classification), data are collected over the course of time. This time-series data contains temporal patterns related to different classes of behaviors (behavior prediction). Hand-crafted features are expensive to extract since they require the expert knowledge of the field. That is why DUL offers powerful feature learning for time-series data analysis and forecast (prediction). Since wearable data are often collected as time-series signal data, DL plays an important role for learning and recognizing (inference) the temporal pattern in this data. In this aspect, LSTM [58] is dominating other DL approaches. A review of the recent developments, in DUL for time-series data, is given by Längkvist et al. [101] and Gombao [102]. Although DL has shown promising performance in modeling the static data (e.g. computer vision and image classification [31]), applying them to time-series data has not yet been well-studied and explored (understudied). Längkvist et al. [101] and Gombao [102] provide current challenges, projects, and works that either applied DL to time-series data analysis or modified the DL to account for the current challenges in time-series data.

5.4. DL for Mobile Big Data Analytics

The availability smartphones and IoT gadgets led to the recent mobile big data (MBD) era. Collecting MBD is profitable if there is learning methods for analytics to recognize the hidden spatial and temporal patterns from the collected MBD. Alsheikh et al. [104] propose DL in MBD analytics as a scalable learning framework over Apache Spark. Mobile crowdsensing is an efficient MBD collection approach combining the crowd intelligence, smartphones, wearables, and IoT devices (gadgets). Regarding MBD analytics, Alsheikh et al. [105] focuses on the accuracy and privacy aspects of mobile and people-centric crowdsensing as a true MBD collection approach by service providers. DeepSpace, Ouyang et al. [103], is a DL approach for MBD analytics applied to predicting human trajectory by understanding their mobility patterns. DeepSpace [103] is composed of two models: course and fine prediction models.

5.5. DL for Mobile Wireless Sensor Network Data Analytics

Marjovi et al. [106] explains how to collect data using mobile wireless sensor network (WSN) on public transportation vehicles and analyzing them using DL (AE) for temporal pattern recognition.

5.6. DL for EEG Data Analytics

Stober et al. [107110, 113] are applying DL approaches for classifying and recognition of EEG recordings for rhythm perception. It specifically applied stacked AE and CNN on the collected EEG data to distinguish the rhythms on a group and individual participants. Given the EEG data, Stober et al. [107110, 113] use DL for detection and classification of EEG signal in terms types and genres. Wulsin et al. [111] also model EEG waveform data (brain time-series signal) for anomaly measurement, detection, and recognition (classification) using DL approaches, specifically DBN. Narejo et al. [112] classify EEG data (brain time-series signal) for eye states using DUL, specifically DBN and AE. DL for compressed sensing, in brain-computer interface (BCI), is demonstrated in Ma et al. [114] for extracting the motion-onset visual evoked potential (mVEP) BCI features. Ma et al. [114] combine DL with compressed sensing to analyze discriminative mVEP features to improve the mVEP BCI performance. Ma et al. [114] demonstrate DL effectiveness for extracting the mVEP feature for compressed sensing in BCI systems.

5.7. DL for Physiological Data Analytics

Wang and Shang [115] modeled physiological data (time-series biometric signals) using DL, specifically DBN. DBN, as a DUL approach, can automatically extract features from raw physiological data of multiple channels. Using the pretrained DBN, Wang and Shang [115] built multiple classifiers to predict the levels of arousal, valance, and liking based on the learned features. Based on the experimental results, DBN is applied to raw physiological data effectively learns relevant features, emotional patterns, and predict emotions.

5.8. DL for Big Data Analytics

Big data analytics and DL are two highly focused areas in the data science. Big data is the result of collecting massive amounts of data with useful information in different domains such as national intelligence, cybersecurity, fraud detection, marketing, and medical informatics [147]. DL can extract high-level abstractions as data representation layers through a hierarchical learning process. A key benefit of DL is the analysis through learning the massive amounts of unsupervised data. This key benefit makes DL an extremely valuable tool for big data analytics since the available raw data are largely unlabeled, unannotated, and uncategorized. Najafabadi et al. [116, 117] explore how DL is utilized for big data analytics by extracting complex patterns from massive volumes of data, semantic indexing, data tagging, fast information retrieval, and simplifying discriminative tasks. Najafabadi et al. [116, 117] also investigate DL in terms of analyzing the streaming data, high-dimensional data, scalability of models, and distributed computing.

5.9. DL for Mobile Gait Analytics

Hannink et al. [121123] estimate mobile stride length in human gait using DL, specifically deep CNN. Spatial gait pattern recognition and mobile gait analysis are performed in [121123] to address motor impairment in neurological disease. Deep CNN is used for stride length estimation to map stride-specific inertial sensor data to the resulting stride length.

5.10. Embedded DL for Inertial Data Analytics

In Ravi et al.’s studies [124127], DL is applied to inertial sensor data analysis for real-time human activity recognition & classification.

5.11. DL for Electronic Healthcare Records Data Analytics

dos Santos et al. [128] discuss DL applications in health-care management and diagnostics as most of the studies suggest DL for clinical diagnosis due to its accurate pattern recognition of disease in electronic medical records (EMR). Based on Dos Santos and Carvalho [128], DL assists in medical decisions, the accuracy of the diagnosis, and medical treatment recommendations. DL for clinical data analysis is discussed in Miotto et al. [129131]. DeepPatient [129] is an application of DL for massive patient electronic health-care records (EHR) data analytics and prediction. Miotto et al. [129, 131] clearly demonstrate the transition from ML approaches [130] to DL due to the fact that DL overperformed ML on patients’ massive EHR datasets. Choi et al. [132134, 136] review DL approaches and applications for EHR for population health research.

5.12. DL for Electronic Medical Records Data Analytics

An electronic medical record (EMR) is a digital paper chart containing the patient’s medical history. Personalized predictive medicine requires modeling of patient illness and care processes long-term temporal patterns.

5.12.1. DeepCare: Personalized Medicine Recommender System

DeepCare [139] analyze and recognize the patients’ EMRs long-term temporal patterns. Health-care observations, recorded in EMRs, are episodic and irregular in time. EMRs are collected via health-care observations, patient’s disease, and personal care history. DeepCare [139] reads EMRs, predicts future medical outcomes, and recommends proper medications. DeepCare models patient health state trajectories with explicit memory of illness. Built on LSTM [58], DeepCare introduces time parameterizations to handle irregular timing by moderating the forgetting and consolidation of illness memory.

5.12.2. Deepr: Deep Record for EMR Data Analytics

Nguyen et al. [137] propose DeepR (deep record) for analyzing the massive EMRs in medicine. DeepR [137] is a predictive system for analyzing EMRs and detecting predictive regular clinical motifs from irregular episodic records. DeepR is an end-to-end DL system to extract features from EMRs and predicts automatically any future risk and transforms a record into a sequence of discrete elements separated by coded time gaps and hospital transfers.

5.12.3. Deep Reinforcement Learning for Clinical EMR Data Analysis in Medication Dosing

Nemati et al. [138] optimizes medication dosing from suboptimal clinical examples using the DRL approach. A clinician-in-the-loop sequential decision-making framework [138] is proposed for an individualized dosing policy of each patient’s evolving clinical phenotype using the publicly available MIMIC II intensive care unit database with a DRL that learns an optimal heparin dosing policy from sample dosing trails and their associated outcomes in large EMRs. The proposed DRL system [138] demonstrates that a sequential modeling approach, learned from retrospective data, could potentially be used at the bedside to derive individualized patient dosing policies.

5.13. DL for ECG Data Analytics

Wearables have enormous potential to provide low-risk and low-cost long-term monitoring of electrocardiography (ECG), but these signals highly suffer from significant movement-related noise. Shashikumar et al. [140] present DL-based atrial fibrillation (AF) detection in a sequence of short windows with significant movement artifact. Pulsatile photoplethysmographic (PPG) data and triaxial accelerometry were captured using a multichannel wrist-worn device. A single-channel electrocardiogram (ECG) was recorded (for rhythm verification only) simultaneously. A DL approach was developed on these data to classify AF from wrist-worn PPG signals. A continuous wavelet transform was applied to the PPG data, and CNN was trained on the derived spectrograms to detect AF.

5.14. DL for Cybersecurity Data Analytics

DeepSpying [100] is a mobile-sensing framework for data collection and DL for data analytics in information security (i.e., cybersecurity) domain to protect individual privacy. DeepSpying [100] pioneers WT-based data collection and DL-based data analysis for patient’s information security and privacy protection.

5.15. DL for Smartglass and Smartglove Data Analytics

Advani et al. [4] build a multitask AI visual-assistance system for assisting visually impaired people in grocery shopping using smart glass, smart glove, and shopping carts for providing auditory and tactile feedback. This AI system [4] is part of the visual cortex on Silicon project aimed at developing interfaces, algorithms, and hardware platforms to assist the visually impaired with a focus on grocery shopping.

5.16. DL for Wearable 3D Point Cloud Data Processing and Analytics

Poggi et al. [141, 142] recognize the crosswalk (i.e., crosswalk recognition) on the route using DL for point cloud processing (i.e., 3D data learning) with a suitable wearable mobility aid for the visually impaired people. Poggi and Mattoccia [142] present a wearable mobility aid for the visually impaired individuals using embedded 3D vision and DL-based approach. Poggi et al. [141] relies on an RGBD camera and FPGA embedded in a wearable eyeglass for effective point cloud data processing with a compact and lightweight embedded computer. The computer also provides feedback to the user using a haptic interface as well as audio messages. Poggi et al. [141] does crosswalk recognition for several visually impaired users as a crucial requirement in an effective design of a mobility aid. Poggi et al. [141] propose a system to detect and categorize crosswalks by leveraging on point-cloud processing and DL techniques. Ji et al. [143] processes 3D data using CNN for HAR. They develop a novel 3D CNN model for action recognition of both the spatial and the temporal patterns using 3D convolutions for capturing the motion information encoded in multiple adjacent frames. They also apply the developed models to HAR in the real-world environment of airport surveillance videos.

5.17. DL for Multimodal Physiological Data Analytics

Du et al. [144, 145] discuss the effects of DL in mortality prediction. In these works [144, 145], a combination of auditory, text, and physiological signals are utilized to predict the mood (happy or sad) of 31 narrations from subjects engaged in personal story-telling. They extracted 386 audio and 222 physiological features (using the Samsung wearable simband) from the data. A subset of 4 audio, 1 text, and 5 physiologic features was identified using sequential forward selection (SFS) for inclusion in DNN. These features included subject movement, cardiovascular activity, energy in speech, probability of voicing, and linguistic sentiment (i.e., negative or positive).

6. WearableDL: Future Insights

We have presented a biologically inspired architecture ”WearableDL” for the wearable big data analytics that resembles the human NS. We also reviewed briefly the current frontiers in AI, specifically DL approaches and architecture. We carefully selected more than 100 recently published research articles related to WearableDL architecture with focus on DL, IoT, and WT (Section 4). Although WearableDL meets with obstacles and challenges, we believe that it could be practically and potentially useful especially when the wearable data are massive (volume), heterogeneous (variety), and sampled at different frequency (velocity). In this section, we intend to provide our future view of “WearableDL” challenges and its potential application to wearable big data analytics.

6.1. Health Insurance Decision Making

DL brings a great promise and could increase the value of the wearable big data by making them actionable, e.g., health insurance companies thrive on the data to minimize the cost. Therefore, it becomes extremely important that they learn more about their customers and their lifestyles. They are also interested in knowing the information such as how often their customers perform physical activity such as walking, jogging, or other exercises. The health insurance industry wants to track if their customers have smoking or drinking habits. Due to the promise of accuracy, precision, and efficiency, the application of DL on personalized wearable data can play a major role to estimate the insurance policy cost and also to give rebates if their customers cultivate healthy habits [148]. The trend of a data-driven health insurance policy has already been considered in many countries including North America, Europe, and Asia.

6.2. High Performance in Sports and Athletics

Another area that will be impacted by DL is the billion-dollar sports industry. The performance of athletes is not only a moment of pride for their country or state team but also an economical model and therefore the athletes strive to outperform. Today, they use WT in their training to improve their performance inch by inch [149]. Such precision in their performance also demands WT to offer fine-grain quality in the measurement of body###XXX###2019;s kinematic motion such as agility and balance and physiological parameters such as heart rate, oxygenation, and muscular strength. Various DL methods can be applied to analyze highly sampled wearable big data and extract the actionable information to improve sports performance. DL could also help detect sports injuries during the game or in the training, so effective decisions are made in time.

6.3. Supporting Elderly Population

Aging population across the globe is a well-known phenomenon. By 2030, 20% or more population will be 65+ years of age [150]. This indicates that we will need to seek technological solutions to support senior citizens who are more prone to disorders, severe health conditions, and injuries due to decaying physical and mental capabilities. In the last decade, WT have specifically been targeted to provide health-care services and comfortable assisted living. However, it is not enough to just collect the data from WT. It is equally important to make the WT personalized to the specific condition experienced by an elderly individual. DL could fill this gap by learning the daily patterns in the wearable big data and by offering the decision makers the relation between the historic and current data. In this way, DL could lead the prediction of underlying health conditions which are often not detected by WT alone.

6.4. Challenges

Although DL comes with several promises for the wearable big data, it also needs to overcome a number of barriers and obstacles for its wide spread adoption.

6.4.1. Unlabeled Wearable Big Data

This is a very common important problem when it comes to analyzing the wearable big data since this data are often collected in a complete unlabeled or unannotated fashion. That is why UL is becoming an important scope for applying DL to the collected big data. As reviewed and talked about, this scope is often known as DUL and it is still an active area of research, specifically when it comes to wearable big data which are time-series and sequential. Sequence learning is one of the attractive ways approaching this problem using LSTM, RNN, and ConvLSTM.

6.4.2. Computational Bottlenecks, Demand, and Complexity

Currently, deep models face the burden of computational demand to achieve exceptional performance on large-size datasets. These models are currently aimed to run on cloud servers. However, fog computers which require lightweight algorithms will demand new type of DL models that learn from small datasets. As also mentioned in Section 4 and Table 2, embedding DL into mobile, wearable, and IoT devices has two important bottlenecks: memory bandwidth for matrices and computational power for matrix multiplication operation in parallel or distributed setting.

6.4.3. Data Reliability

In many situations, data collected by wearable devices can be affected by noise and error due to nonideal collection setting, particularly for structured and complex data. In this regard, the wearable devices can be designed to perform a presifting and prefiltering of the data. Therefore, DL can be applied to identify and isolate the corrupted data in the decision-making process. DL can generalize the data in an extraordinary way and that is how it can isolate the corrupted/noisy data and identify the distinctive, repetitive, and robust spatiotemporal pattern in such data.

Abbreviations

AGI:Artificial general intelligence
AI:Artificial intelligence
ANN:Artificial neural net
Backprop:Backpropagation
BCI:Brain-computer interface
BP:Backpropagation
BPTT:Backpropagation through time
CL:Cortical learning
CLA:Cortical learning algorithm
ConvLSTM:Convolutional LSTM
CNN:Convolutional neural net or ConvNet
CNS:Central nervous system
DBN:Deep Boltzmann machine
DBS:Deep brain stimulation
DL:Deep learning
DRL:Deep reinforcement learning
DSL:Deep supervised learning
DNN:Deep neural network
DQN:Deep Q-network
DUL:Deep unsupervised learning
EA:Evolutionary algorithm
ECG:Electrocardiography
EEG:Electroencephalography
EM:Expectation maximization
EMG:Electromyography
EMR:Electronic medical record
FCN:Fully connected network
FFNN:Feed-forward neural network
FLANN:Fast library for approximate nearest neighbor
FPGA:Field programmable gate arrays
GAN:Generative adversarial nets
GNMT:Google neural machine translation
GPU:Graphical processing unit
GQN:Generative query network
GRU:Gated recurrent units
HR:Heart rate
HPC:High-performance computing
HTM:Hierarchical temporal memory
HAR:Human activity recognition
IoT:Internet-of-Things
IRL:Inverse reinforcement learning
IC:Integrated circuits
ICA:Independent component analysis
KNN:K-nearest neighbors
LDA:Linear discriminant analysis
LSTM:Long short-term memory
ML:Machine learning
MLP:Multilayer perceptron
MBD:Mobile big data
MSE:Mean-squared error
NS:Nervous system
NN:Neural net
NLP:Natural language processing
PNS:Peripheral nervous system
PCA:Principle component analysis
RBM:Restricted Boltzmann machine
RL:Reinforcement learning
RNN:Recurrent neural nets
STDP:Spike-time-dependent plasticity
SNN:Spike neural net
SL:Supervised learning
SSE:Sum of squared error
SVM:Support vector machine
TL:Transfer learning
UL:Unsupervised learning
VAE:Variational autoencoder
WIoT:Wearable Internet-of-things
WT:Wearable tech.
WearableDL:Wearable deep learning.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This material is based upon work supported by the National Science Foundation (NSF) under grant numbers 1652538 and 1565962. Research reported in this publication was also supported by the National Institute of Mental Health/ National Institutes of Health (NIH) under award no. 1R01MH108641-01A1.