Table of Contents Author Guidelines Submit a Manuscript
Mobile Information Systems
Volume 2018, Article ID 8125126, 20 pages
Review Article

WearableDL: Wearable Internet-of-Things and Deep Learning for Big Data Analytics—Concept, Literature, and Future

1Wearable Biosensing Lab, Department of Electrical, Computer, and Biomedical Engineering, University of Rhode Island, Kingston, Rhode Island, USA
2Intelligent Control & Robotics Lab, Department of Electrical, Computer, and Biomedical Engineering, University of Rhode Island, Kingston, Rhode Island, USA

Correspondence should be addressed to Aras R. Dargazany; ude.iru@radsara

Received 23 July 2018; Accepted 2 October 2018; Published 14 November 2018

Guest Editor: Giuseppe De Pietro

Copyright © 2018 Aras R. Dargazany et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


This work introduces Wearable deep learning (WearableDL) that is a unifying conceptual architecture inspired by the human nervous system, offering the convergence of deep learning (DL), Internet-of-things (IoT), and wearable technologies (WT) as follows: (1) the brain, the core of the central nervous system, represents deep learning for cloud computing and big data processing. (2) The spinal cord (a part of CNS connected to the brain) represents Internet-of-things for fog computing and big data flow/transfer. (3) Peripheral sensory and motor nerves (components of the peripheral nervous system (PNS)) represent wearable technologies as edge devices for big data collection. In recent times, wearable IoT devices have enabled the streaming of big data from smart wearables (e.g., smartphones, smartwatches, smart clothings, and personalized gadgets) to the cloud servers. Now, the ultimate challenges are (1) how to analyze the collected wearable big data without any background information and also without any labels representing the underlying activity; and (2) how to recognize the spatial/temporal patterns in this unstructured big data for helping end-users in decision making process, e.g., medical diagnosis, rehabilitation efficiency, and/or sports performance. Deep learning (DL) has recently gained popularity due to its ability to (1) scale to the big data size (scalability); (2) learn the feature engineering by itself (no manual feature extraction or hand-crafted features) in an end-to-end fashion; and (3) offer accuracy or precision in learning raw unlabeled/labeled (unsupervised/supervised) data. In order to understand the current state-of-the-art, we systematically reviewed over 100 similar and recently published scientific works on the development of DL approaches for wearable and person-centered technologies. The review supports and strengthens the proposed bioinspired architecture of WearableDL. This article eventually develops an outlook and provides insightful suggestions for WearableDL and its application in the field of big data analytics.

1. WearableDL: Conceptual Architecture

Wearable DL is a concept derived from a holistic comparison between the evolving big data system and the human nervous system (NS) in terms of architecture and functionalities. Although the human NS is a biological mechanism, it essentially inspires the convergence, collaboration, and coordination of three key elements such as wearable tech (WT), Internet of things (IoT), and deep learning (DL) in the development of big data system for actionable outcomes and informed decision making.

The article views the big data system with respect to its close resemblance with the human nervous system (NS). The NS is responsible for coordinating the actions such as the transmissions of signals to and from the human body, identification, perception, decision making, and information storage [1]. Similarly, the big data system (or model) is evolving and conversing various domains such as wearable sensors, edge computing, fog computing, cloud computing, and deep learning (DL) to achieve equivalent functions such as signal communication, perception, decision making, analytics, and storage. As the complexity of the big data system rises, it becomes important to understand the architectural and functional components of the NS. This could guide us to develop more improved and sophisticated version of a big data system.

1.1. A Brief Overview of the Human Nervous System

The NS is composed of two subsystems:(1)Central nervous system (CNS) consists of the brain and spinal cord(2)Peripheral nervous system (PNS) consists of nerves with sensory and motor fibers

1.1.1. PNS

The end elements of PNS are sensory and motor fibers which are connected to the parts and organs of the body. The sensory fibers sense various sensations including pressure, temperature, and pain on the body and sends them to the nerves leading to the spinal cord (a part of CNS). The motor fibers receive the commands from the CNS to actuate and activate the muscles and organs. The bundle of fibers which collectively forms nerves connected to the spinal cord relay the information back and forth between CNS and PNS.

1.1.2. CNS

The spinal cord is a part of CNS which serves two purposes:(1)It acts as a bidirectional relay for the signals to flow between the body and the brain. This function supports the NS to make centralized decisions.(2)The spinal cord also coordinates the reflexes in which the decisions are made in real-time to avoid delays in critical conditions. A simple example of the reflex is removing the hands from a hot object.

The ultimate top layer of CNS is the human brain made of approximately 100 billion neurons [1]. Each neuron connects to one or more other neurons. The brain receives the signals from the spinal cord and other sensor organs such as eyes, nose, tongue, and ears. The brain processes the incoming signals and makes decisions. It generates commands that pass through the spinal cord to the PNS. The commands activate the muscles or organs of the body. Apart from the processing and decision making, the brain also stores the information that is used in a short or long-term decision making process.

1.2. PNS vs Wearable Tech/Wearable Edge Devices for Big Data Collection and Application (Actions)

WT is comparable to the sensory and motor fibers of PNS because of the following:(i)Fibers are the carriers of the information similar to WiFi backbone in WT(ii)WT is located onto the periphery of IoT architecture, interacting with the environment for sensing and actuation

For example, modern smartwatches come with built-in sensors such as heart rate, motion, ambient light, and also actuators including touch screen, audio speakers, and tactile (or haptic) feedback. Edge devices, such as smartphones, act similar to the nerves (as part of PNS). The smartphone receives the data from the connected smartwatch sensors and also commands the smartwatches to alert the wearers through the actuations on haptic, visual, or audio feedback. This helps us collect the sensor data and send them to the upper layer such as edge devices.

1.3. CNS-Spinal Cord vs IoT and Fog Gateways for the Big Data Flow and Local Intelligence

IoT and fog gateways are equivalent to the spinal cord in CNS (Figure 1) as follows:(1)Big data transfer (flow) between PNS (sensory and motory nerves) and the brain (the central processing and intelligence unit)(2)Local intelligence for locally responding to some stimuli such as extreme heat and pain

Figure 1: The WearableDL conceptual architecture: the human nervous system as the main biological model and inspiration (right) vs the human-made computing model as the actual architecture (left).

As described earlier, the spinal cord plays an important role in reacting and responding to some specific stimuli such as feeling pain and reacting to the pain, e.g., caused by extreme heat. It is also responsible to deliver the motor response and reactions from our brain to the PNS (our motor actuators and muscles) for any dynamic (kinematic) movement (motion) internally and externally, i.e., inside our body or outside. IoT intelligence, as a local intelligence, is functioning similar to the spinal cord. For example, the smartwatch sensor data can be processed onto fog gateways which are located in homes or hospitals away from the centralized cloud servers. In this case, the sensor data are processed on the gateway for the local decision support in time-critical applications, e.g., the sensor data streams could help the detection of a fall event in an elderly person living alone at home. In this way, the fog or IoT gateway provides reflex-type services to alert appropriate individuals such as medical personnel or caretakers to respond to the event immediately. This reduces the potential delays in time-sensitive events.

1.4. CNS-Brain vs DL and Cloud Computing for Big Data Analytics

The cloud computing servers are equivalent to the physical architecture of the brain, and the DL-based big data analytics resembles the function of the brain. The human brain is a centralized processor to receive the incoming stimulus from the spinal cord (connected to PNS) or other sensor organs. Upon receiving, it perceives and makes decisions on how and when to respond to the stimuli. It also stores the information. Similarly, the cloud computing servers receive the big data from WT via fog computing. Upon receiving, it uses high-performance computers to apply DL methods (explained in the next section) that help in decision making. Very similar to the brain, the cloud computers derive when and how to respond to the incoming queries. It often stores the sensor data to learn the patterns and create historical database to enable informed decision making in the future.

1.5. Outline and Contributions

In this article, we endeavor to describe the benefits and challenges associated with the use of DL in the wearable big data. We have conducted a thorough survey of more than 100 literatures related to DL and its applications in wearable IoT. The survey allowed us to create a holistic picture combining wearable sensors, IoT, DL, and big data. This work’s key contributions are structured as follows:(i)Section 2 provides an overview of wearable IoT including the concept, its different categories of wearable IoT devices, and its future direction in a nutshell.(ii)Section 3 provides a research roadmap for DL thorough understanding of its past, its present, and its future. Here, we focus on how understanding the human brain, specifically neocortex, links to the development of the artificial intelligence (AI) and how that is mainly divided into three areas: ML, DL, and Cortical Learning (CL) which are covered in this section.(iii)Section 4 emphasizes on the recent similar work applications of WearableDL in big data analytics. Over 100 recently published literatures were reviewed and included in this section to correlate with the paradigms of WearableDL and its applications.(iv)Section 5 projects WearableDL future research and application direction in association with the wearable big data.

2. Wearable Internet-of-Things

In 1965, an observation, later regarded as Moore###XXX###2019;s law, estimated that the number of transistors on integrated circuits (IC) doubles every two years [2]. Moore###XXX###2019;s law prediction played an important role in the semiconductor industry and motivated the evolution of miniaturized yet high performance computing (HPC) chips which revolutionized the modern world. This evolution caused an explosion in the production of electronic devices and therefore brought a limitless expansion in the use and applicability of the computing chips that today drive smart wearable devices, smartphones, personal computers, smart homes, and smart cities along with WiFi, Internet, and other communication devices. As a result of the aforementioned evolution, explosion, and expansion, the wearable devices are booming in the market, and therefore, we witness the growth of personalized big data that hold a significant value to the end-users including citizens, communities, hospitals, and governments to improve health or performance, reduce medical cost, and increase efficiency [3].

2.1. Wearable Devices Categories

Overall, the wearable devices can be categorized into three main classes (Figure 2):(1)Implantable devices: these devices are implanted inside the body for a long period of time, e.g., cardiac pacemaker and deep brain stimulator are implanted for 5–10 years to provide current to specific organs.(2)Wearable contact devices: this is the largest category among the three types. These devices are targeted to stay on the body unobtrusively to collect various parameters including heart rate, physical activity, body temperature, muscular activity, blood/tissue oxygenation, and other physiological parameters. The most common devices in this category are smartwatches, smart clothing, smart footwear, fitness trackers, HR chest belts, and ECG Holter monitors.(3)Wearable ambient monitors: these devices are made to sense outside environment instead of the body’s physiological state. Google glasses (smart glasses) are a simple example of this category, in which, a wearable camera allows to record the surrounding scenes [4].

Figure 2: Different types of wearable IoT-based devices for wearable big data collection.
2.2. Wearable IoT: Convergence of Wearable Devices and Internet-of-Things (IoT)

The convergence and deployment of wearable devices, Internet-of-things (IoT), and cloud computing together allow us to record, monitor, and store a wide range of the big data from individuals such as personalized health and wellness data, body vital parameters, physical activity, and behaviors, which are all critical data indicating the quality and the trend of daily life [5]. In the past, wearable devices were a stand-alone system. However, bringing wearable devices into the framework of IoT makes it possible to stream the data from an individual to a centralized location such as cloud servers. The continuous accumulation of the wearable data becomes a massive big data [6] that, in general, are a set of sequential time-series signals and logs containing biometrical, behavioral, physiological, and biological information depending on the nature of wearable devices categorized above. One of the key objectives of collecting the wearable big data is to support remote or on-site decision making by detecting symptoms, events, and anomalies, or by producing contextual awareness [7].

2.3. Wearable Data Categories

Wearable biosensing devices can collect a large variety of physiological data continuously, all-day long and in any-place health, mental, and activity status monitoring. These multiparameter physiological sensing systems provide us with reliable and crucial measurements for supporting online decision making by detecting the symptoms and producing contextual awareness [8]. A wide range of wearable data in biomedical and health is provided by this overview [9].

2.4. Emerging Unobtrusive Wearable Devices

Wearable sensors can be either woven or integrated into clothing, accessories, and the living environment, such that individuals’ or patients’ data can be collected in their daily life. According to an overview [10], four emerging unobtrusive wearable technologies (WT) which are essential for collecting the individuals’ health data are the following:(1)Unobtrusive sensing methods(2)Smart textile technology(3)Flexible-stretchable-printable electronics(4)Sensor fusion

2.5. Data Reliability

Data reliability strongly depends on the type of collected data and specifically on the category of the collected wearable data in general. In the wearable DL scenario, it is not the role of the wearable devices to assess data reliability. A presifting of the data, particularly in case of structured data, can be implemented directly on the device by embedding data sifting policies dictated by a prior interaction with medical specialist, physicians, and studies. However, data reliability should be assessed at IoT and DL level as discussed later.

3. Artificial Intelligence

Artificial Intelligence (AI) is ultimately the ability to reconstruct the human biological intelligence for modern machines. AI domain is currently divided into three active learning-based areas of research: machine learning (ML) [11, 12], deep learning (DL) [13], and cortical learning (CL) [14] (Figure 3).

Figure 3: Comparing AI and brain along with machine learning vs deep learning vs cortical learning.
3.1. Cortical Learning

Cortical learning (CL) is inspired from our cortical structure (i.e., based on studying the neocortex) and coined by Hawkins et al. [14, 15] from The cortical area is the largest part of the brain in humans and monkeys compared to other species and the main source of our intelligence [14, 16]. The CL algorithm/approach (CLA), inspired by the architecture of the human cortex [14, 17], is applied to an approach called hierarchical temporal memory (HTM) [16, 1820]. Cortex learns the spatial and temporal patterns in the sequential data, e.g., for visual perception, spoken language comprehension, manipulating objects, and navigation in a complex 3D world [17].

3.2. Machine Learning

Machine learning is the mother subject for deep learning and many other statistical or probabilistic analysis approaches but not necessarily related to CL which is neuroscience-based endeavor for AI (computational neuroscience or systematic neuroscience). ML is mostly referring to shallow ML approaches which are not scalable to the data size. This set of shallow artificial learning algorithms [11, 12] helps machine directly learn from the data, model the data, and generate machine intelligence. ML is highly founded on mathematics, e.g., linear algebra, calculus, statistics, probability, and stochastic optimization approaches such as evolutionary algorithms (EA) and Monte Carlo search. Some of the ML limitations are as follows:(1)It is very broad and often mathematically proved but not biologically inspired. This is a problem since biologically inspired algorithms often are proved to be extremely powerful and robust such as genetic algorithms. On top of that, AI is targeting biological intelligence at the first place and ultimately aims to replicate/reconstruct our biological intelligence, human intelligence.(2)It is often shallow and not scalable to the data size, i.e., as the data size or dimensionality grows exponentially (big data problem), the traditional ML approaches (e.g., SVM) can not scale up the data size. This causes a problem so-called under-fitting which means there are not enough parameters in the learning approach for approximating the best fitting function.(3)It is also hard to apply it to high-dimensional data directly. That is why we have to apply dimension reduction to the data first by either manually do the feature extraction or engineering (hand-crafted features) and then apply the ML approach to the data features with the reduce dimensionality.(4)ML approach accuracy and robustness for noisy data is almost not comparable to DL approaches since ML approaches are learning from few examples or small training data compared to DL approaches which are capable of learning from massive dataset (big data).

3.3. Deep Learning

DL approaches differ from shallow ML algorithms in terms of scalability, i.e., depth (number of hidden layers) and width (number of cells or units or neurons in each layer). DL (or deep ML) is a scalable ML approach capable of scaling to the data size in terms of high number of data samples or data dimensionality. DL is applied to artificial neural networks (ANN or NN) and that is why it is also known as deep neural networks (DNN) [13, 21]. DL is the ability to learn the deep architectures of NN using backpropagation (BP) [22, 23]. Error backpropagation [24] is the dominant training approach for NN which was proposed in 1986 for training multilayer perceptrons (MLP) which is backpropagation of the resulting error between the predicted output and the given labels into the network for fine-tuning the weighs in order to minimize the loss/cost function in a nonconvex surface. DL is loosely inspired by the visual cortex [2527]. It is mimicking our brain [27] in terms of learning and recognizing the spatial and temporal patterns (or spatiotemporal) in the data. DNN are basically deep hierarchical layers of perceptrons [28], as artificial neurons, for representation and regression learning [29, 30].

4. Deep Learning

The research question of ”How can the massive wearable big data be analyzed to produce actionable outcomes?” is difficult to answer when the wearable big data is heterogeneous, unlabeled, and unstructured. This means the wearable big data seeks unsupervised learning methods that can not only analyze the data but also identify helpful patterns leading to informed decision making. In recent years, deep learning (DL) has been established as a new area of machine learning research which aims to advance artificial intelligence [13]. A plethora of studies provide evidence that DL has achieved state-of-the-art results in various fields related to computational intelligence and big data including computer vision and image processing [31], speech processing [32], natural language processing (NLP), and machine translation [33, 34]. Similarly, these DL advancements bring a new promise to analyze the unsupervised wearable big data in order to recognize the spatial and temporal patterns related to health, wellness, medical condition, sports performance, and safety (Table 1).

Table 1: The AI domain and DL review table.

Deep learning (DL) is exponentially gaining interest in research and development (R&D) community at academia and industry as they are also being heavily invested on by giant software and hardware companies such as Google, Nvidia, and Intel [3537].

4.1. Deep Learning History: Receptive Fields of Neurons Inspired from Cat’s Visual Cortex

Simple-cells and complex-cells [38] were found in the receptive fields of single neurons in the cat’s visual cortex. The discovery of receptive fields of neurons in the cat’s visual cortex [38] contributed enormously to NN, AI, computer vision, and neuroscience community (demonstrated visually with timeline in Figure 4).

Figure 4: The simplified research roadmap for DL: (past) how it was inspired by visual cortex research, (present) how it is related to wearable IoT and big data analytics, and (future) how it is connected to cortical learning. The related works mentioned/included in this figure are the following: visual cortex by Hubel and Wiesel [38]; neocognitron by Fukushima and Miyake [39]; Backprop by Rumelhart et al. [24]; CNN (deep learning) by LeCun et al. [13]; Capsule networks by Sabour et al. [40]; STDP-backprop by Bartunov et al. [41]; HTM (cortical learning) by Hawkins et al. [16].
4.2. Deep Learning History: The First Conceptual Architecture—Cognitron and Neocognitron

The discovery of simple and complex cells was followed by the introduction of cognitron and neocognitron (by Fukushima et al. in 1975 [39, 4244]). Neocognitron (as the first proposed deep NN architecture) was the inspiration behind the introduction of the convolutional neural networks (CNNs) by LeCun in 1989 [45] (as shown in the past part of the timeline in Figure 4). Cognitron and neocognitron (Fukushima et al. [39, 42, 43]) were introduced as self-organizing multilayered neural networks. The proposed cognitron and neocognitron architectures, by Fukushima [42, 43], is composed of the simple-cells and complex-cells inside the CNN architecture.

4.3. Deep Learning History: Neural Networks

Schmidhuber’s survey [46] thoroughly reviews the history of DL in NN since the birth of ANN along with different types of learning approaches applied to DNN architectures such as unsupervised learning (UL), supervised learning (SL), and reinforcement learning (RL). It also discusses evolutionary algorithms (EA) and optimization approaches (e.g., genetic algorithms) along with the learning algorithms for minimizing mean squared error (MSE) and sum of squared error (SSE).

4.4. Deep Learning Math Foundation: Artificial Neural Networks Universal Approximation Theorem

ANN is a universal function approximator based on the universal approximation theorem [23, 4749]. This theorem proves that ANN, even with a single hidden layer of finite size, can approximate any continuous functions [47]. This approximation theorem [47] is applicable to high-dimensional as well as low-dimensional function approximations [23]. For example, 2-dimensional-CNN (2D-CNN) or 3-dimensional-CNN (3D-CNN) for image and point-cloud classification (high-dimensional 2D or 3D data) was compared to feed-forward neural network (FFNN) for a low-dimensional time-series signal classification. In this case, CNN contains much more parameters for high-dimensional function approximation compared to FFNN which is a low-dimensional function and containing much less parameters.

4.5. Deep Learning Applications: Recent Breakthroughs and State-of-the-Art Results

DL has achieved the state-of-the-art results in many fields such as computer vision, speech processing, and machine translation as the following:(1)A breakthrough in 2012 for computer vision using DL: deep convolutional neural nets (CNN), LeNet by LeCun in 1989 [45], proved to be enormously efficient in an end-to-end image classification and analysis [31].(2)A breakthrough in 2012 for speech processing using DL: another important breakthrough, almost in the same year as Krizhevsky [31], was applying DL to TIMIT, massive dataset for speech recognition [32]. Microsoft immediately started adopting and applying this approach to its own AI assistant, Cortana for Windows 10 [32].(3)Google brain project—The first large-scale DL project in 2012: this project, as a large scale distributed deep networks led by Dean et al. [50], applied deep belief network (DBN) to massive data from Youtube (videos of cats) using 16,000 computers in distributed parallel configuration. This large-scale implementation of DBN, on distributed parallel computing platforms, successfully recognized cats in videos after watching millions of cat videos on Youtube without any supervision or teaching signals (unsupervised setting).(4)Bridging the gap between human-level translation and machine translation in NLP by Google neural machine translation: Google neural machine translation (GNMT) [33, 34] is as an end-to-end DL model for automated translation which has outperformed the conventional phrase-based translation systems by far. The proposed GNMT [33, 34] system requires big computational power (big compute) and massive datasets (big data) for both training and translation inference for building big model (big net).

4.6. Deep Learning Architectures

Some of the important and famous DNN architectures are the following:(1)Feed-forward neural network (FFNN): this is the simplest NN, also known as multilayer perceptron (MLP), with feed-forward connections. FFNN is also referred to as fully connected network (FCN) inside CNN architectures.(2)Convolutional neural network (CNN): CNN is loosely inspired by the cat’s visual cortex [38]. It was initially proposed as Cognitron [42] and Neocognitron [39]. CNN architecture was initially applied to digit recognition and trained using BP by LeCun in 1989 [45]. This CNN architecture is also referred to LeNet [45].(3)Deep belief network (DBN), Deep Boltzmann Machines, and Restricted Boltzmann Machines (RBM): DBN was initially proposed and trained using backpropagation by Hinton et al. [51, 52] as a deep unsupervised learning (DUL) approach using greedy pre-trained stacked up layers of RBM.(4)Autoencoder (AE): AE is another DUL approach for dimensionality reduction [53] and data compression. Variational AE (VAE) [54] is another recent version of AE which improved the AE precision in generating images and data as a generative model using Bayesian distribution.(5)Spike neural network (SNN): This type of NN is mimicking the Spike stimulation of brain inside the ANN, i.e., loosely inspired from how Spikes are activating neurons in our biological NN (brain).(6)Deep Q-networks (DQN): DQN [55, 56] is the first deep reinforcement learning (DRL) approach proposed by Google DeepMind. This DL approach achieved human-level control in playing variety of Atari games.

4.7. Deep Learning Dominant Training Approach: Backpropagation

DL is mainly related to the algorithms for learning big and deep NN architectures. BP is dominantly the learning algorithm used in DL [29, 57] which is the main power behind the scalability of DL architectures such as CNN and DBN. DL is coined mainly by LeCun et al. in 2015 [13]. Goodfellow et al. [21] published a book providing a thorough explanation of DL theory and approaches.

4.8. Deep Learning Categories and Subdomains

The DL approaches, regardless of their application domains, are mainly categorized into three dominant groups (the same as ML): deep unsupervised learning (DUL), deep supervised learning (DSL), and deep reinforcement learning (DRL). There are also some subcategories (subdomains) which are currently the active area of research in DL as well such as transfer learning (TL), semisupervised learning, learning-by-demonstration, and imitation learning.

4.9. Deep Learning Biological and Neurological Inspiration Related to Backpropagation

BP (proposed by Rumelhart et al. [24, 29, 57]) in DL is supported neurologically by random synaptic in Lillicrap et al. [26]. Lillicrap et al. [26] argues that BP is functioning similar to an error feedback neuron for error optimization (minimization). Yasmin and DiCarlo [25] also provide another strong biological foundation for BP and CNN architecture in DL. They demonstrate visually how the goal-oriented convolutional hierarchical layers are inspired from sensory cortex.

4.10. Deep Supervised Learning

DSL is divided into three categories [46]: FFNN & CNN, recurrent neural network (RNN), and the hybrid one (combination of both: convolutional long short-term memory (LSTM) and convolutional RNN).

4.10.1. Feed-Forward Neural Network and Convolutional Neural Network

FFNN was also traditionally known as MLP. FFNN, so-called FCN, is often the last two years inside CNN architecture in DSL. CNN was at first applied to optical character recognition (OCR), specifically for digit recognition, and trained using BP by LeCun et al. in 1989 [45]. This CNN, named LeNet after LeCun et al. [45], was brought back to the attention in 2012 after reducing the classification-error rate almost in half in Imagenet contest by Alexnet, named after Krizhevsky et al. [31].

4.10.2. Recurrent Neural Network and Long Short-Term Memory

LSTM was proposed by Hochreiter and Schmidhuber in 1997 [58]. LSTM was successfully trained using BP through time (BPTT) unlike vanilla RNN which was heavily suffering from the problem of vanishing gradient and exploding gradients [59]. LSTM-based RNN is often applied to sequential learning for temporal pattern recognition.

4.10.3. Hybrid DL: Convolutional LSTM (ConvLSTM)

Xingjian et al. [60] proposed convolutional LSTM (ConvLSTM) as a hybrid approach which is a combination or an integrated version of CNN [13] and RNN (LSTM [58]). In this regard, Zhang et al. [61] show reliable results using hybrid approach for speech recognition. Residual bidirectional ConvLSTM [61] is a very deep network including bidirectional LSTM and CNN with residual connections for an end-to-end speech recognition which is an efficient and powerful deep hybrid model for acoustic speech recognition.

4.11. Deep Unsupervised Learning

DUL focuses on unlabeled big data which are abundantly available nowadays on the web. Bengio et al. [62] review DUL approaches and provide new perspectives on them. Yeung et al. [63] propose an approach for learning the big unlabeled data existing on web (i.e., also referred to as wild). Rupprecht et al. in 2016 [64] also propose a DUL framework for the unlabeled big data as new methodology of learning multiple hypothesis. Mirza et al. [65] provide a DUL architecture for the generalization of the aligned features, specifically to perform TL across multiple task domains.

4.11.1. Deep Belief Network, Restricted Boltzmann Machine, and Google Brain Project

DBN is, in fact, the stacked up layers of pretrained restricted Boltzmann machine (RBM). In 2012, Google brain project (as a ”large scale distributed deep networks”) led by Dean et al. [50], was applying DBN using massive data from Youtube videos on cats and 16,000 computers in distributed parallel configuration. This large-scale implementation of DBN, on distributed parallel computing platforms, successfully recognized cats in videos after watching millions of cat videos on Youtube without any supervision or teaching signals, i.e., entirely accomplished in unsupervised setting (DUL). When DBN is trained on a massive dataset (big data) as DUL, it can learn how to probabilistically reconstruct its inputs. DBN layers (layers of representation) can act as feature detectors (extractors) on inputs [32, 50, 51, 66]. After this learning step, a DBN can be further trained in a supervised way to perform classification [32, 51] for TL.

4.11.2. Generative Adversarial Networks (GANs)

Goodfellow et al. [67] proposed GAN as two networks which are competing against each other. One of these networks is generator and another one is discriminator. The generator tries to produce fake input data similar to the real one to fool the discriminator. This adversarial training performed on GANs is entirely based on game theory.

4.11.3. Autoencoders

AE is an unsupervised DL architecture for DUL applied to denoising, dimensionality reduction [53], data compression, and image or data generation (generative models). VAE [54] is an improved AE as a generative model using Bayesian distribution. It can also be trained and transferred to a DSL architecture (TL) for classification and regression purposes [68].

4.12. Deep Learning (DL) and Reinforcement Learning (RL) Started Getting Published in Nature: Quick Review of the Recent Years Progress

This part briefly walks you through how DL and RL were combined/accomplished with an incredible speed since 2015 only and only from Nature publication perspective:(1)In 2015, deep learning (DL) models found their way into Nature publications by producing incredible results in AI [13].(2)At the same year, neuroscientists found a very interesting relationship between goal-driven DL models and our sensory cortex in the brain [25]. This was a huge leap toward biological inspiration of deep reinforcement learning models.(3)In that year, one DL-based AI agent created super-human-level performance in many Atari games [55]. This approach, so-called Deep Q-Networks (DQN), demonstrated least human-level control (performance) in playing many Atari games [56]. This was the birth of deep reinforcement learning (DRL) because of the combination of RL (initially proposed by Sutton in 1984 [69]) and DL [13].(4)In 2016 (one year later), another DL-based AI agent, AlphaGo, dominated the Go game by only watching the previously human-played Go games [70]. AlphaGo (Silver et al. from Google DeepMind in 2016 [70]) made a considerable impact on DRL community by dominating the game of Go (a Chinese ancient chess-like game) using two deep cooperative networks [70]: deep policy network and deep value network (DQN [55]). The policy network was basically recommending the next possible moves (actions) and the value network (i.e., Q-network) evaluate the moves intuitively based on the previous experiences. Eventually Q-network (value-network) picks the most valuable move based on the selected move with maximum rewarded value. The cooperative networks in AlphaGo [70] are cooperating with each other on contrary with GAN.(5)Recently, AlphaGo Zero [71] started learning the Go game from scratch only-and-only by playing in a try-and-error fashion and even beats the previous AlphaGo eventually.(6)Finally, an important implementation of grid-like cells in mice [72] can loosely demonstrate how the navigation is performed using these grid-like cells and how they are represented in artificial agents. These Grid cells were discovered in 2005 by Wills et al. [73] and a team of scientists in Norway [74]. They were awarded the 2014 Nobel Prize for their discoveries of cells that constitute a local/global positioning system in the brain.

DRL has opened a new frontier in AI so-called artificial general intelligence (AGI) which is exponentially growing and succeeding in demonstrating human-level and even super-human level performance not even in playing Atari games but in robotics [75] and other domains performing complex task such as imitation learning or learning by demonstration:(1)Combining inverse RL and GANs: since GAN is a generative model to maximize the reward function to fool a discriminator network, it is related to RL in terms of learning how to maximize the reward function. In RL, learning the reward function for an observed action is coined as inverse reinforcement learning (IRL). Finn et al. [76] show that IRL [7779] is equivalent to GAN [67] by highlighting the mathematical connection between them.(2)Generative adversarial imitation learning (GAIL): it is the combination of GANs and imitation learning [80]. This model is also introduced earlier in 2016 by Baram et al. [81] as model-based adversarial imitation learning. These models generally aim to model human behaviors and motives using IRL [82] which is dominantly targeting lack of reward of function for variety of complex task or the difficulty of defining a reward function for these tasks rather than learning a reward function for them.

Another recent development in DRL is applying RNNs, specifically LSTM [58], for learning the temporal dependencies since the RL tasks are all sequential, known as generative RNNs. In this regard, Schmidhuber team is one for the main fore-frontiers by introducing the world models [83]. A beautiful combination of GAIL and generative RNNs are proposed very recently in Zhu et al. [84] in order to apply these RNN-based GAIL models for diverse visuomotor skills, specifically in robotics manipulation across both simulation and real domains. In this direction, generative query networks (GQN) [85] have shown very promising results in terms of an agent predicting how the environment would look like taken a specific action. GQN is another great combination of RNN, GAN, and imitation learning.

5. WearableDL: Literature Review

Feature extraction is the key in understanding and modeling the repetitive patterns of the collected physiological and behavioral data. Traditionally hand-crafted features were extracted based on the expert knowledge, which were labor-intensive and time-consuming, for classification or regression purposes. Moreover, the manual feature extraction process does not scale well when the wearable data is growing rapidly in size temporally (number of samples in time) or spatially (number of dimensions). That is why our article aims to explore DL approaches since they are capable of scaling to the data size. In this section, we review the literature related to DL approaches for analyzing different types of wearable data as demonstrated and mapped briefly in this Table 2.

Table 2: The reviewed literature table.
5.1. Embedding DL in Mobile, Wearable, and IoT Devices

Lane et al. [86] presents a study on embedded DL in wearables, smartphones, and IoT devices in order to build the knowledge of the performance characteristics, resource requirements, and the execution bottlenecks for DL models. Regarding DL for mobile, wearable, and embedded sensory applications, DL requires a significant amount of device (and processor) resources. The limited availability of memory, computation, and energy on mobile and embedded platforms is a serious problem for powerful DL approaches.

5.1.1. SparseSep: Large-Scale-Embedded DL in Smartphones

SparseSep [87] leverages the sparsification of fully connected layers and separation of convolutional kernels to reduce the resource requirements of DL algorithms. SparseSep [87] allows large-scale DNN (with fully connected layers and with convolutional layers) to run and execute efficiently on mobile and embedded hardware with minimal impact on inference accuracy.

5.1.2. DeepX and Demo: Embedded DL Execution Accelerator

DeepX [89] is a software accelerator for efficient embedded DL execution. DeepX significantly lowers the wearable resources (e.g., memory, computation, and energy) required by DL which is a severe bottleneck to mobile (smartphone) adoption. DeepX [89] is an embedded efficiently executable large-scale DL model on mobile (smartphone) processors versus the existing cloud-based offloading. Demo [92] and DeepX [8991] are good case studies for adapted and embedded low-powered DL software for mobile devices and smartphones, specialized for wearable and behavioral big data analytics.

5.1.3. Embedded DL for Wearable Multimodal Sensor Data Fusion and Integration

Radu et al. [6] used smartphone and smartwatch for human or user activity recognition (HAR). Data integration and fusion, using DL from smartphone and smartwatch, is the focus of this work [6]. DL, specifically RBM, is proposed in [6] for integration (or fusion) of sensor data from multiple sensors (different modalities). Bhattacharya and Lane [7] performed a smartwatch-centric HAR using DL, specifically RBM. Behavior and context recognition tasks related to smartwatches (such as transportation mode, physical activities, and indoor/outdoor detection) using DL (RBM) is performed and focused in [7]. Although DL-based (RBM) human activity recognition outperforms other alternatives, DL resource consumption is unacceptably high for constrained WT devices like smartwatches. Therefore, a complementary study is conducted in Bhattacharya and Lane [7] related to the overhead of DL (RBM models) on smartwatches.

5.1.4. DeepEye: Embedded DL in Wearables with Built-in Camera for Wearable Image Analytics

Wearables with built-in camera provide us with the opportunities to record our daily activities from different perspectives and angles. This is potentially useful in terms of a low vision over our daily lives. DeepEye [93] is a match-box sized wearable camera capable of running multiple cloud-scale-embedded DL models in the device for almost real-time image analysis without offloading them to the cloud. DeepEye [93] is powered by a commodity wearable processor to address the bottleneck of executing multiple DL models (CNN) on wearable limited resources with specifically limited runtime memory. Chen et al. [95] propose an embedded deep CNN into iOS smartphones by maximizing data reusability for approaching the high bandwidth burden in DL, specifically the convolution layers of CNN. The effective data reuse makes it possible to parallelize all the computing threads without data loading latency. Chen et al. [95] enhance the capability of DL on local iOS mobile (smartphone) devices.

5.1.5. DeepEar: Embedded DL in Smartphones for Audio Signal Analytics

Regarding mobile audio sensing and analysis, DL has radically changed related audio modeling domains like speech recognition [146]. DeepEar [94] is a framework for mobile audio sensing using DL, which is trained in an unsupervised setting using a large-scale unlabeled dataset (big audio data) from 168 place visits. With 2.3 M parameters, DeepEar [94] is more robust to background noise compared to conventional approaches in the wearables, specifically in smartphones (mobile devices).

5.2. Embedded DL in Mobile Sensing Framework

Lane et al. [97] is a survey on mobile sensing architecture composed of sensing, learning, and distribution. This survey [97] reviews the existing mobile phone sensing algorithms, applications, and systems related to the architectural framework for mobile phone sensing research. Harari et al. [96] discusses the potentials and limits of smartphones in collecting wearable biometric and physiological data for behavioral science since smartphones help us enormously collect continuous behavioral data in our daily lives without attracting any attention. The collected continuous behavioral data includes social interactions, daily activities (physical activity), and mobility patterns. Harari et al. [96] look at the practical guidelines for facilitating the use of smartphones as a behavioral observation tool in psychological science. Lane and Georgiev [98] provide a low-power embedded DL using a smartphone System-on-Chip (SoC). This work highlights the critical need for further exploration of DL in mobile sensing towards robust and efficient wearable sensor data inference. DeepSense [99] is a DL framework to address the noisy mobile sensor data and feature engineering problems in mobile sensing. DeepSense [99] integrates CNN and RNN to extract temporal and spatial patterns in the mobile sensor data dynamics for car tracking, HAR, and user identification.

5.3. DL for Time-Series Data Analytics

In many real-world applications (e.g., speech recognition or sleep stage classification), data are collected over the course of time. This time-series data contains temporal patterns related to different classes of behaviors (behavior prediction). Hand-crafted features are expensive to extract since they require the expert knowledge of the field. That is why DUL offers powerful feature learning for time-series data analysis and forecast (prediction). Since wearable data are often collected as time-series signal data, DL plays an important role for learning and recognizing (inference) the temporal pattern in this data. In this aspect, LSTM [58] is dominating other DL approaches. A review of the recent developments, in DUL for time-series data, is given by Längkvist et al. [101] and Gombao [102]. Although DL has shown promising performance in modeling the static data (e.g. computer vision and image classification [31]), applying them to time-series data has not yet been well-studied and explored (understudied). Längkvist et al. [101] and Gombao [102] provide current challenges, projects, and works that either applied DL to time-series data analysis or modified the DL to account for the current challenges in time-series data.

5.4. DL for Mobile Big Data Analytics

The availability smartphones and IoT gadgets led to the recent mobile big data (MBD) era. Collecting MBD is profitable if there is learning methods for analytics to recognize the hidden spatial and temporal patterns from the collected MBD. Alsheikh et al. [104] propose DL in MBD analytics as a scalable learning framework over Apache Spark. Mobile crowdsensing is an efficient MBD collection approach combining the crowd intelligence, smartphones, wearables, and IoT devices (gadgets). Regarding MBD analytics, Alsheikh et al. [105] focuses on the accuracy and privacy aspects of mobile and people-centric crowdsensing as a true MBD collection approach by service providers. DeepSpace, Ouyang et al. [103], is a DL approach for MBD analytics applied to predicting human trajectory by understanding their mobility patterns. DeepSpace [103] is composed of two models: course and fine prediction models.

5.5. DL for Mobile Wireless Sensor Network Data Analytics

Marjovi et al. [106] explains how to collect data using mobile wireless sensor network (WSN) on public transportation vehicles and analyzing them using DL (AE) for temporal pattern recognition.

5.6. DL for EEG Data Analytics

Stober et al. [107110, 113] are applying DL approaches for classifying and recognition of EEG recordings for rhythm perception. It specifically applied stacked AE and CNN on the collected EEG data to distinguish the rhythms on a group and individual participants. Given the EEG data, Stober et al. [107110, 113] use DL for detection and classification of EEG signal in terms types and genres. Wulsin et al. [111] also model EEG waveform data (brain time-series signal) for anomaly measurement, detection, and recognition (classification) using DL approaches, specifically DBN. Narejo et al. [112] classify EEG data (brain time-series signal) for eye states using DUL, specifically DBN and AE. DL for compressed sensing, in brain-computer interface (BCI), is demonstrated in Ma et al. [114] for extracting the motion-onset visual evoked potential (mVEP) BCI features. Ma et al. [114] combine DL with compressed sensing to analyze discriminative mVEP features to improve the mVEP BCI performance. Ma et al. [114] demonstrate DL effectiveness for extracting the mVEP feature for compressed sensing in BCI systems.

5.7. DL for Physiological Data Analytics

Wang and Shang [115] modeled physiological data (time-series biometric signals) using DL, specifically DBN. DBN, as a DUL approach, can automatically extract features from raw physiological data of multiple channels. Using the pretrained DBN, Wang and Shang [115] built multiple classifiers to predict the levels of arousal, valance, and liking based on the learned features. Based on the experimental results, DBN is applied to raw physiological data effectively learns relevant features, emotional patterns, and predict emotions.

5.8. DL for Big Data Analytics

Big data analytics and DL are two highly focused areas in the data science. Big data is the result of collecting massive amounts of data with useful information in different domains such as national intelligence, cybersecurity, fraud detection, marketing, and medical informatics [147]. DL can extract high-level abstractions as data representation layers through a hierarchical learning process. A key benefit of DL is the analysis through learning the massive amounts of unsupervised data. This key benefit makes DL an extremely valuable tool for big data analytics since the available raw data are largely unlabeled, unannotated, and uncategorized. Najafabadi et al. [116, 117] explore how DL is utilized for big data analytics by extracting complex patterns from massive volumes of data, semantic indexing, data tagging, fast information retrieval, and simplifying discriminative tasks. Najafabadi et al. [116, 117] also investigate DL in terms of analyzing the streaming data, high-dimensional data, scalability of models, and distributed computing.

5.9. DL for Mobile Gait Analytics

Hannink et al. [121123] estimate mobile stride length in human gait using DL, specifically deep CNN. Spatial gait pattern recognition and mobile gait analysis are performed in [121123] to address motor impairment in neurological disease. Deep CNN is used for stride length estimation to map stride-specific inertial sensor data to the resulting stride length.

5.10. Embedded DL for Inertial Data Analytics

In Ravi et al.’s studies [124127], DL is applied to inertial sensor data analysis for real-time human activity recognition & classification.

5.11. DL for Electronic Healthcare Records Data Analytics

dos Santos et al. [128] discuss DL applications in health-care management and diagnostics as most of the studies suggest DL for clinical diagnosis due to its accurate pattern recognition of disease in electronic medical records (EMR). Based on Dos Santos and Carvalho [128], DL assists in medical decisions, the accuracy of the diagnosis, and medical treatment recommendations. DL for clinical data analysis is discussed in Miotto et al. [129131]. DeepPatient [129] is an application of DL for massive patient electronic health-care records (EHR) data analytics and prediction. Miotto et al. [129, 131] clearly demonstrate the transition from ML approaches [130] to DL due to the fact that DL overperformed ML on patients’ massive EHR datasets. Choi et al. [132134, 136] review DL approaches and applications for EHR for population health research.

5.12. DL for Electronic Medical Records Data Analytics

An electronic medical record (EMR) is a digital paper chart containing the patient’s medical history. Personalized predictive medicine requires modeling of patient illness and care processes long-term temporal patterns.

5.12.1. DeepCare: Personalized Medicine Recommender System

DeepCare [139] analyze and recognize the patients’ EMRs long-term temporal patterns. Health-care observations, recorded in EMRs, are episodic and irregular in time. EMRs are collected via health-care observations, patient’s disease, and personal care history. DeepCare [139] reads EMRs, predicts future medical outcomes, and recommends proper medications. DeepCare models patient health state trajectories with explicit memory of illness. Built on LSTM [58], DeepCare introduces time parameterizations to handle irregular timing by moderating the forgetting and consolidation of illness memory.

5.12.2. Deepr: Deep Record for EMR Data Analytics

Nguyen et al. [137] propose DeepR (deep record) for analyzing the massive EMRs in medicine. DeepR [137] is a predictive system for analyzing EMRs and detecting predictive regular clinical motifs from irregular episodic records. DeepR is an end-to-end DL system to extract features from EMRs and predicts automatically any future risk and transforms a record into a sequence of discrete elements separated by coded time gaps and hospital transfers.

5.12.3. Deep Reinforcement Learning for Clinical EMR Data Analysis in Medication Dosing

Nemati et al. [138] optimizes medication dosing from suboptimal clinical examples using the DRL approach. A clinician-in-the-loop sequential decision-making framework [138] is proposed for an individualized dosing policy of each patient’s evolving clinical phenotype using the publicly available MIMIC II intensive care unit database with a DRL that learns an optimal heparin dosing policy from sample dosing trails and their associated outcomes in large EMRs. The proposed DRL system [138] demonstrates that a sequential modeling approach, learned from retrospective data, could potentially be used at the bedside to derive individualized patient dosing policies.

5.13. DL for ECG Data Analytics

Wearables have enormous potential to provide low-risk and low-cost long-term monitoring of electrocardiography (ECG), but these signals highly suffer from significant movement-related noise. Shashikumar et al. [140] present DL-based atrial fibrillation (AF) detection in a sequence of short windows with significant movement artifact. Pulsatile photoplethysmographic (PPG) data and triaxial accelerometry were captured using a multichannel wrist-worn device. A single-channel electrocardiogram (ECG) was recorded (for rhythm verification only) simultaneously. A DL approach was developed on these data to classify AF from wrist-worn PPG signals. A continuous wavelet transform was applied to the PPG data, and CNN was trained on the derived spectrograms to detect AF.

5.14. DL for Cybersecurity Data Analytics

DeepSpying [100] is a mobile-sensing framework for data collection and DL for data analytics in information security (i.e., cybersecurity) domain to protect individual privacy. DeepSpying [100] pioneers WT-based data collection and DL-based data analysis for patient’s information security and privacy protection.

5.15. DL for Smartglass and Smartglove Data Analytics

Advani et al. [4] build a multitask AI visual-assistance system for assisting visually impaired people in grocery shopping using smart glass, smart glove, and shopping carts for providing auditory and tactile feedback. This AI system [4] is part of the visual cortex on Silicon project aimed at developing interfaces, algorithms, and hardware platforms to assist the visually impaired with a focus on grocery shopping.

5.16. DL for Wearable 3D Point Cloud Data Processing and Analytics

Poggi et al. [141, 142] recognize the crosswalk (i.e., crosswalk recognition) on the route using DL for point cloud processing (i.e., 3D data learning) with a suitable wearable mobility aid for the visually impaired people. Poggi and Mattoccia [142] present a wearable mobility aid for the visually impaired individuals using embedded 3D vision and DL-based approach. Poggi et al. [141] relies on an RGBD camera and FPGA embedded in a wearable eyeglass for effective point cloud data processing with a compact and lightweight embedded computer. The computer also provides feedback to the user using a haptic interface as well as audio messages. Poggi et al. [141] does crosswalk recognition for several visually impaired users as a crucial requirement in an effective design of a mobility aid. Poggi et al. [141] propose a system to detect and categorize crosswalks by leveraging on point-cloud processing and DL techniques. Ji et al. [143] processes 3D data using CNN for HAR. They develop a novel 3D CNN model for action recognition of both the spatial and the temporal patterns using 3D convolutions for capturing the motion information encoded in multiple adjacent frames. They also apply the developed models to HAR in the real-world environment of airport surveillance videos.

5.17. DL for Multimodal Physiological Data Analytics

Du et al. [144, 145] discuss the effects of DL in mortality prediction. In these works [144, 145], a combination of auditory, text, and physiological signals are utilized to predict the mood (happy or sad) of 31 narrations from subjects engaged in personal story-telling. They extracted 386 audio and 222 physiological features (using the Samsung wearable simband) from the data. A subset of 4 audio, 1 text, and 5 physiologic features was identified using sequential forward selection (SFS) for inclusion in DNN. These features included subject movement, cardiovascular activity, energy in speech, probability of voicing, and linguistic sentiment (i.e., negative or positive).

6. WearableDL: Future Insights

We have presented a biologically inspired architecture ”WearableDL” for the wearable big data analytics that resembles the human NS. We also reviewed briefly the current frontiers in AI, specifically DL approaches and architecture. We carefully selected more than 100 recently published research articles related to WearableDL architecture with focus on DL, IoT, and WT (Section 4). Although WearableDL meets with obstacles and challenges, we believe that it could be practically and potentially useful especially when the wearable data are massive (volume), heterogeneous (variety), and sampled at different frequency (velocity). In this section, we intend to provide our future view of “WearableDL” challenges and its potential application to wearable big data analytics.

6.1. Health Insurance Decision Making

DL brings a great promise and could increase the value of the wearable big data by making them actionable, e.g., health insurance companies thrive on the data to minimize the cost. Therefore, it becomes extremely important that they learn more about their customers and their lifestyles. They are also interested in knowing the information such as how often their customers perform physical activity such as walking, jogging, or other exercises. The health insurance industry wants to track if their customers have smoking or drinking habits. Due to the promise of accuracy, precision, and efficiency, the application of DL on personalized wearable data can play a major role to estimate the insurance policy cost and also to give rebates if their customers cultivate healthy habits [148]. The trend of a data-driven health insurance policy has already been considered in many countries including North America, Europe, and Asia.

6.2. High Performance in Sports and Athletics

Another area that will be impacted by DL is the billion-dollar sports industry. The performance of athletes is not only a moment of pride for their country or state team but also an economical model and therefore the athletes strive to outperform. Today, they use WT in their training to improve their performance inch by inch [149]. Such precision in their performance also demands WT to offer fine-grain quality in the measurement of body###XXX###2019;s kinematic motion such as agility and balance and physiological parameters such as heart rate, oxygenation, and muscular strength. Various DL methods can be applied to analyze highly sampled wearable big data and extract the actionable information to improve sports performance. DL could also help detect sports injuries during the game or in the training, so effective decisions are made in time.

6.3. Supporting Elderly Population

Aging population across the globe is a well-known phenomenon. By 2030, 20% or more population will be 65+ years of age [150]. This indicates that we will need to seek technological solutions to support senior citizens who are more prone to disorders, severe health conditions, and injuries due to decaying physical and mental capabilities. In the last decade, WT have specifically been targeted to provide health-care services and comfortable assisted living. However, it is not enough to just collect the data from WT. It is equally important to make the WT personalized to the specific condition experienced by an elderly individual. DL could fill this gap by learning the daily patterns in the wearable big data and by offering the decision makers the relation between the historic and current data. In this way, DL could lead the prediction of underlying health conditions which are often not detected by WT alone.

6.4. Challenges

Although DL comes with several promises for the wearable big data, it also needs to overcome a number of barriers and obstacles for its wide spread adoption.

6.4.1. Unlabeled Wearable Big Data

This is a very common important problem when it comes to analyzing the wearable big data since this data are often collected in a complete unlabeled or unannotated fashion. That is why UL is becoming an important scope for applying DL to the collected big data. As reviewed and talked about, this scope is often known as DUL and it is still an active area of research, specifically when it comes to wearable big data which are time-series and sequential. Sequence learning is one of the attractive ways approaching this problem using LSTM, RNN, and ConvLSTM.

6.4.2. Computational Bottlenecks, Demand, and Complexity

Currently, deep models face the burden of computational demand to achieve exceptional performance on large-size datasets. These models are currently aimed to run on cloud servers. However, fog computers which require lightweight algorithms will demand new type of DL models that learn from small datasets. As also mentioned in Section 4 and Table 2, embedding DL into mobile, wearable, and IoT devices has two important bottlenecks: memory bandwidth for matrices and computational power for matrix multiplication operation in parallel or distributed setting.

6.4.3. Data Reliability

In many situations, data collected by wearable devices can be affected by noise and error due to nonideal collection setting, particularly for structured and complex data. In this regard, the wearable devices can be designed to perform a presifting and prefiltering of the data. Therefore, DL can be applied to identify and isolate the corrupted data in the decision-making process. DL can generalize the data in an extraordinary way and that is how it can isolate the corrupted/noisy data and identify the distinctive, repetitive, and robust spatiotemporal pattern in such data.


AGI:Artificial general intelligence
AI:Artificial intelligence
ANN:Artificial neural net
BCI:Brain-computer interface
BPTT:Backpropagation through time
CL:Cortical learning
CLA:Cortical learning algorithm
ConvLSTM:Convolutional LSTM
CNN:Convolutional neural net or ConvNet
CNS:Central nervous system
DBN:Deep Boltzmann machine
DBS:Deep brain stimulation
DL:Deep learning
DRL:Deep reinforcement learning
DSL:Deep supervised learning
DNN:Deep neural network
DQN:Deep Q-network
DUL:Deep unsupervised learning
EA:Evolutionary algorithm
EM:Expectation maximization
EMR:Electronic medical record
FCN:Fully connected network
FFNN:Feed-forward neural network
FLANN:Fast library for approximate nearest neighbor
FPGA:Field programmable gate arrays
GAN:Generative adversarial nets
GNMT:Google neural machine translation
GPU:Graphical processing unit
GQN:Generative query network
GRU:Gated recurrent units
HR:Heart rate
HPC:High-performance computing
HTM:Hierarchical temporal memory
HAR:Human activity recognition
IRL:Inverse reinforcement learning
IC:Integrated circuits
ICA:Independent component analysis
KNN:K-nearest neighbors
LDA:Linear discriminant analysis
LSTM:Long short-term memory
ML:Machine learning
MLP:Multilayer perceptron
MBD:Mobile big data
MSE:Mean-squared error
NS:Nervous system
NN:Neural net
NLP:Natural language processing
PNS:Peripheral nervous system
PCA:Principle component analysis
RBM:Restricted Boltzmann machine
RL:Reinforcement learning
RNN:Recurrent neural nets
STDP:Spike-time-dependent plasticity
SNN:Spike neural net
SL:Supervised learning
SSE:Sum of squared error
SVM:Support vector machine
TL:Transfer learning
UL:Unsupervised learning
VAE:Variational autoencoder
WIoT:Wearable Internet-of-things
WT:Wearable tech.
WearableDL:Wearable deep learning.

Conflicts of Interest

The authors declare that they have no conflicts of interest.


This material is based upon work supported by the National Science Foundation (NSF) under grant numbers 1652538 and 1565962. Research reported in this publication was also supported by the National Institute of Mental Health/ National Institutes of Health (NIH) under award no. 1R01MH108641-01A1.


  1. E. R. Kandel, J. Schwartz, and T. M. Jessell, Principles of Neuroscience, McGraw-Hill Education, New York, NY, USA, 4th edition, 2000.
  2. G. E. Moore, “Cramming more components onto integrated circuits,” Proceedings of the IEEE, vol. 86, no. 1, pp. 82–85, 1998. View at Publisher · View at Google Scholar · View at Scopus
  3. “Key investments opportunities for data-driven healthcare,” utm_source=facebook utm_campaign=aihealthcare5 utm_content=groups.
  4. S. Advani, P. Zientara, N. Shukla et al., “A multitask grocery assist system for the visually impaired: smart glasses, gloves, and shopping carts provide auditory and tactile feedback,” IEEE Consumer Electronics Magazine, vol. 6, no. 1, pp. 73–81, 2017. View at Publisher · View at Google Scholar · View at Scopus
  5. S. Hiremath, G. Yang, and K. Mankodiya, “Wearable internet of things: concept, architectural components and promises for person-centered healthcare,” in Proceedings of 2014 EAI 4th International Conference on Wireless Mobile Communication and Healthcare (Mobihealth), pp. 304–307, Milan, Italy, November 2014.
  6. V. Radu, N. D. Lane, S. Bhattacharya, C. Mascolo, M. K. Marina, and F. Kawsar, “Towards multimodal deep learning for activity recognition on mobile devices,” in Proceedings of ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct, pp. 185–188, Heidelberg, Germany, 2016.
  7. S. Bhattacharya and N. D. Lane, “From smart to deep: robust activity recognition on smartwatches using deep learning,” in Proceedings of 2016 IEEE International Conference on Pervasive Computing and Communication Workshops (PerCom Workshops), pp. 1–6, Sydney, Australia, March 2016.
  8. A. Pantelopoulos and N. G. Bourbakis, “A survey on wearable sensor-based systems for health monitoring and prognosis,” IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 40, no. 1, pp. 1–12, 2010. View at Publisher · View at Google Scholar · View at Scopus
  9. J. Andreu-Perez, C. C. Poon, R. D. Merrifield, S. T. Wong, and G.-Z. Yang, “Big data for health,” IEEE Journal of Biomedical and Health Informatics, vol. 19, no. 4, pp. 1193–1208, 2015. View at Publisher · View at Google Scholar · View at Scopus
  10. Y.-L. Zheng, X.-R. Ding, C. C. Y. Poon et al., “Unobtrusive sensing and wearable devices for health informatics,” IEEE Transactions on Biomedical Engineering, vol. 61, no. 5, pp. 1538–1554, 2014. View at Publisher · View at Google Scholar · View at Scopus
  11. A. L. Samuel, “Some studies in machine learning using the game of checkers,” IBM Journal of Research and Development, vol. 3, no. 3, pp. 210–229, 1959. View at Publisher · View at Google Scholar
  12. J. L. Solé, “Book review: pattern recognition and machine learning, Cristopher M. Bishop, information science and statistics, Springer 2006, 738 pages,” SORT-Statistics and Operations Research Transactions, vol. 31, no. 2, 2007. View at Google Scholar
  13. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015. View at Publisher · View at Google Scholar · View at Scopus
  14. J. Hawkins and S. Ahmad, “Why neurons have thousands of synapses, a theory of sequence memory in neocortex,” Frontiers in Neural Circuits, vol. 10, 2016. View at Publisher · View at Google Scholar · View at Scopus
  15. S. Ahmad and J. Hawkins, “How do neurons operate on sparse distributed representations? a mathematical theory of sparsity, neurons and active dendrites,” arXiv preprint arXiv:1601.00720, 2016.
  16. J. Hawkins, S. Ahmad, and D. Dubinsky, “Hierarchical temporal memory including HTM cortical learning algorithms,” Technical report, Numenta, Inc., Redwood City, CA, USA, 2011, View at Google Scholar
  17. J. Hawkins, S. Ahmad, and Y. Cui, “Why does the neocortex have layers and columns, a theory of learning the 3d structure of the world,” bioRxiv, p. 162263, 2017.
  18. S. Ahmad and J. Hawkins, “Properties of sparse distributed representations and their application to hierarchical temporal memory,” arXiv preprint arXiv:1503.07469, 2015.
  19. S. Billaudelle and S. Ahmad, “Porting HTM models to the heidelberg neuromorphic computing platform,” arXiv preprint arXiv:1505.02142, 2015.
  20. Y. Cui, S. Ahmad, and J. Hawkins, “The HTM spatial pooler: a neocortical algorithm for online sparse distributed coding,” bioRxiv, p. 085035, 2016.
  21. I. Goodfellow, Y. Bengio, and A. Courville, “Deep learning,” 2016. View at Google Scholar
  22. Y. Bengio, “Learning deep architectures for ai,” Foundations and Trends in Machine Learning, vol. 2, no. 1, pp. 1–127, 2009. View at Publisher · View at Google Scholar · View at Scopus
  23. P. Andras, “High-dimensional function approximation with neural networks for large volumes of data,” IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 2, pp. 500–508, 2017. View at Publisher · View at Google Scholar · View at Scopus
  24. D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” Nature, vol. 323, no. 6088, pp. 533–536, 1986. View at Publisher · View at Google Scholar · View at Scopus
  25. D. L. Yamins and J. J. DiCarlo, “Using goal-driven deep learning models to understand sensory cortex,” Nature Neuroscience, vol. 19, no. 3, pp. 356–365, 2016. View at Publisher · View at Google Scholar · View at Scopus
  26. T. P. Lillicrap, D. Cownden, D. B. Tweed, and C. J. Akerman, “Random synaptic feedback weights support error backpropagation for deep learning,” Nature Communications, vol. 7, p. 13276, 2016. View at Publisher · View at Google Scholar · View at Scopus
  27. B. A. Olshausen and D. J. Field, “Emergence of simple-cell receptive field properties by learning a sparse code for natural images,” Nature, vol. 381, no. 6583, p. 607, 1996. View at Publisher · View at Google Scholar · View at Scopus
  28. Y. Freund and R. E. Schapire, “Large margin classification using the perceptron algorithm,” Machine Learning, vol. 37, no. 3, pp. 277–296, 1999. View at Publisher · View at Google Scholar · View at Scopus
  29. D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning internal representations by error-propagation,” in Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 1, no. 6088, pp. 318–362, MIT Press, Cambridge, MA, USA, 1986. View at Google Scholar
  30. R. Collobert, “Deep learning for efficient discriminative parsing,” in Proceedings of AISTATS, vol. 15, pp. 224–232, Ft. Lauderdale, FL, USA, April 2011.
  31. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Proceedings of Advances in Neural Information Processing Systems, pp. 1097–1105, Lake Tahoe, NV, USA, December 2012.
  32. G. Hinton, L. Deng, D. Yu et al., “Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups,” IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82–97, 2012. View at Publisher · View at Google Scholar · View at Scopus
  33. Y. Wu, M. Schuster, Z. Chen et al., “Google’s neural machine translation system: bridging the gap between human and machine translation,” arXiv preprint arXiv:1609.08144, 2016.
  34. M. Johnson, M. Schuster, Q. V. Le et al., “Google’s multilingual neural machine translation system: enabling zero-shot translation,” arXiv preprint arXiv:1611.04558, 2016.
  35. “How google detects cancer using deep learning,”
  36. “Dl advances in medicine,”
  37. “Dl analytics for healthcare: UCSF, intel join forces to develop deep learning analytics for health care,”
  38. D. H. Hubel and T. N. Wiesel, “Receptive fields of single neurones in the cat’s striate cortex,” Journal of Physiology, vol. 148, no. 3, pp. 574–591, 1959. View at Publisher · View at Google Scholar · View at Scopus
  39. K. Fukushima and S. Miyake, “Neocognitron: a self-organizing neural network model for a mechanism of visual pattern recognition,” in Competition and Cooperation in Neural Nets, pp. 267–285, Springer, Berlin, Germany, 1982. View at Google Scholar
  40. S. Sabour, N. Frosst, and G. E. Hinton, “Dynamic routing between capsules,” in Proceedings of Advances in Neural Information Processing Systems, pp. 3856–3866, Long Beach, CA, USA, December 2017.
  41. S. Bartunov, A. Santoro, B. A. Richards, G. E. Hinton, and T. Lillicrap, “Assessing the scalability of biologically-motivated deep learning algorithms and architectures,” arXiv preprint arXiv:1807.04587, 2018.
  42. K. Fukushima, “Cognitron: a self-organizing multilayered neural network,” Biological Cybernetics, vol. 20, no. 3-4, pp. 121–136, 1975. View at Publisher · View at Google Scholar · View at Scopus
  43. K. Fukushima, “Neocognitron: a hierarchical neural network capable of visual pattern recognition,” Neural Networks, vol. 1, no. 2, pp. 119–130, 1988. View at Publisher · View at Google Scholar · View at Scopus
  44. K. Fukushima, “Artificial vision by multi-layered neural networks: neocognitron and its advances,” Neural Networks, vol. 37, pp. 103–119, 2013. View at Publisher · View at Google Scholar · View at Scopus
  45. Y. LeCun, B. Boser, J. S. Denker et al., “Backpropagation applied to handwritten zip code recognition,” Neural Computation, vol. 1, no. 4, pp. 541–551, 1989. View at Publisher · View at Google Scholar
  46. J. Schmidhuber, “Deep learning in neural networks: an overview,” Neural Networks, vol. 61, pp. 85–117, 2015. View at Publisher · View at Google Scholar · View at Scopus
  47. B. C. Csáji, “Approximation with artificial neural networks,” M.Sc. thesis, Faculty of Sciences, Etvs Lornd University, Hungary, 2001. View at Google Scholar
  48. G. Cybenko, “Approximation by superpositions of a sigmoidal function,” Mathematics of Control, Signals, and Systems (MCSS), vol. 2, no. 4, pp. 303–314, 1989. View at Publisher · View at Google Scholar · View at Scopus
  49. K. Hornik, “Approximation capabilities of multilayer feedforward networks,” Neural Networks, vol. 4, no. 2, pp. 251–257, 1991. View at Publisher · View at Google Scholar · View at Scopus
  50. J. Dean, G. Corrado, R. Monga et al., “Large scale distributed deep networks,” in Proceedings of Advances in Neural Information Processing Systems, pp. 1223–1231, Lake Tahoe, NV, USA, December 2012.
  51. G. E. Hinton, S. Osindero, and Y.-W. Teh, “A fast learning algorithm for deep belief nets,” Neural Computation, vol. 18, no. 7, pp. 1527–1554, 2006. View at Publisher · View at Google Scholar · View at Scopus
  52. G. E. Hinton, “Deep belief networks,” Scholarpedia, vol. 4, no. 5, p. 5947, 2009. View at Publisher · View at Google Scholar
  53. G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science, vol. 313, no. 5786, pp. 504–507, 2006. View at Publisher · View at Google Scholar · View at Scopus
  54. D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013.
  55. V. Mnih, K. Kavukcuoglu, D. Silver et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015. View at Publisher · View at Google Scholar · View at Scopus
  56. V. Mnih, K. Kavukcuoglu, D. Silver et al., “Playing Atari with deep reinforcement learning,” arXiv preprint arXiv:1312.5602, 2013.
  57. D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning internal representations by error propagation,” California Univ San Diego La Jolla Inst, Oakland, CA, USA, 1985, Technical Report. View at Google Scholar
  58. S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997. View at Publisher · View at Google Scholar · View at Scopus
  59. Y. Bengio, P. Simard, and P. Frasconi, “Learning long-term dependencies with gradient descent is difficult,” IEEE Transactions on Neural Networks, vol. 5, no. 2, pp. 157–166, 1994. View at Publisher · View at Google Scholar · View at Scopus
  60. S. Xingjian, Z. Chen, H. Wang, D.-Y. Yeung, W.-k. Wong, and W.-c. Woo, “Convolutional LSTM network: a machine learning approach for precipitation nowcasting,” in Proceedings of Advances in Neural Information Processing Systems, pp. 802–810, Montreal, Canada, December 2015.
  61. Y. Zhang, W. Chan, and N. Jaitly, “Very deep convolutional networks for end-to-end speech recognition,” arXiv preprint arXiv:1610.03022, 2016.
  62. Y. Bengio, A. C. Courville, and P. Vincent, “Unsupervised feature learning and deep learning: a review and new perspectives,” CoRR, abs/1206.5538, vol. 1, 2012.
  63. S. Yeung, V. Ramanathan, O. Russakovsky, L. Shen, G. Mori, and L. Fei-Fei, “Learning to learn from noisy web videos,” arXiv preprint arXiv:1706.02884, 2017.
  64. C. Rupprecht, I. Laina, M. Baust, F. Tombari, G. D. Hager, and N. Navab, “Learning in an uncertain world: representing ambiguity through multiple hypotheses,” arXiv preprint arXiv:1612.00197, 2016.
  65. M. Mirza, A. Courville, and Y. Bengio, “Generalizable features from unsupervised learning,” arXiv preprint arXiv:1612.03809, 2016.
  66. Y. Bengio, P. Lamblin, D. Popovici, H. Larochelle et al., “Greedy layer-wise training of deep networks,” in Proceedings of Advances in Neural Information Processing Systems, vol. 19, p. 153, Vancouver, British Columbia, Canada, December 2007.
  67. I. Goodfellow, J. Pouget-Abadie, M. Mirza et al., “Generative adversarial nets,” in Proceedings of Advances in Neural Information Processing Systems, pp. 2672–2680, Montreal, Canada, December 2014.
  68. C. Finn, X. Y. Tan, Y. Duan, T. Darrell, S. Levine, and P. Abbeel, “Deep spatial autoencoders for visuomotor learning,” in Proceedings of International Conference on Robotics and Automation (ICRA), Stockholm, Sweden, May 2016.
  69. R. S. Sutton, “Temporal credit assignment in reinforcement learning,” Doctoral dissertation, University of Massachusetts Amherst, Amherst, MA, USA, 1984. View at Google Scholar
  70. D. Silver, A. Huang, C. J. Maddison et al., “Mastering the game of go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, 2016. View at Publisher · View at Google Scholar · View at Scopus
  71. D. Silver, J. Schrittwieser, K. Simonyan et al., “Mastering the game of go without human knowledge,” Nature, vol. 550, no. 7676, p. 354, 2017. View at Publisher · View at Google Scholar · View at Scopus
  72. A. Banino, C. Barry, B. Uria et al., “Vector-based navigation using grid-like representations in artificial agents,” Nature, vol. 557, no. 7705, p. 429, 2018. View at Publisher · View at Google Scholar · View at Scopus
  73. T. J. Wills, F. Cacucci, N. Burgess, and J. O’Keefe, “Development of the hippocampal cognitive map in preweanling rats,” Science, vol. 328, no. 5985, pp. 1573–1576, 2010. View at Publisher · View at Google Scholar · View at Scopus
  74. M. Fyhn, T. Hafting, M. P. Witter, E. I. Moser, and M.-B. Moser, “Grid cells in mice,” Hippocampus, vol. 18, no. 12, pp. 1230–1238, 2008. View at Publisher · View at Google Scholar · View at Scopus
  75. D. Kalashnikov, A. Irpan, P. Pastor et al., “Qt-opt: scalable deep reinforcement learning for vision-based robotic manipulation,” arXiv preprint arXiv:1806.10293, 2018.
  76. C. Finn, P. Christiano, P. Abbeel, and S. Levine, “A connection between generative adversarial networks, inverse reinforcement learning, and energy-based models,” arXiv preprint arXiv:1611.03852, 2016.
  77. A. Y. Ng and S. J. Russell, “Algorithms for inverse reinforcement learning,” in Proceedings of ICML, pp. 663–670, Stanford, CA, USA, June-July 2000.
  78. P. Abbeel and A. Y. Ng, “Apprenticeship learning via inverse reinforcement learning,” in Proceedings of Twenty-First International Conference on Machine Learning, Banff, Alberta, Canada, July 2004.
  79. P. Pieter and A. Y. Ng, “Inverse reinforcement learning,” in Encyclopedia of Machine Learning, pp. 554–558, Springer, Berlin, Germany, 2011. View at Google Scholar
  80. J. Ho and S. Ermon, “Generative adversarial imitation learning,” in Proceedings of Advances in Neural Information Processing Systems, pp. 4565–4573, Barcelona, Spain, December 2016.
  81. N. Baram, O. Anschel, and S. Mannor, “Model-based adversarial imitation learning,” arXiv preprint arXiv:1612.02179, 2016.
  82. B. Wang, T. Sun, and S. X. Zheng, “Beyond winning and losing: modeling human motivations and behaviors using inverse reinforcement learning,” arXiv preprint arXiv:1807.00366, 2018.
  83. D. Ha and J. Schmidhuber, “Recurrent world models facilitate policy evolution,” arXiv preprint arXiv:1809.01999, 2018.
  84. Y. Zhu, Z. Wang, J. Merel et al., “Reinforcement and imitation learning for diverse visuomotor skills,” arXiv preprint arXiv:1802.09564, 2018.
  85. S. A. Eslami, D. J. Rezende, F. Besse et al., “Neural scene representation and rendering,” Science, vol. 360, no. 6394, pp. 1204–1210, 2018. View at Publisher · View at Google Scholar
  86. N. D. Lane, S. Bhattacharya, P. Georgiev, C. Forlivesi, and F. Kawsar, “An early resource characterization of deep learning on wearables, smartphones and internet-of-things devices,” in Proceedings of 2015 International Workshop on Internet of Things towards Applications, pp. 7–12, Seoul, South Korea, 2015.
  87. N. Lane and S. Bhattacharya, “Sparsifying deep learning layers for constrained resource inference on wearables,” in Proceedings of 14th ACM Conference on Embedded Network Sensor Systems, pp. 176–189, Stanford, CA, USA, November 2016.
  88. S. Bhattacharya and N. D. Lane, “Sparsification and separation of deep learning layers for constrained resource inference on wearables,” in Proceedings of ACM Conference on Embedded Networked Sensor Systems (SenSys) 2016, Stanford, CA, USA, November 2016.
  89. N. D. Lane, S. Bhattacharya, P. Georgiev et al., “A software accelerator for low-power deep learning inference on mobile devices,” in Proceedings of 15th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN), pp. 1–12, Vienna, Austria, April 2016.
  90. N. D. Lane, S. Bhattacharya, P. Georgiev et al., “Demonstration abstract: accelerating embedded deep learning using DeepX,” in Proceedings of 2016 15th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN), pp. 1-2, Vienna, Austria, 2016.
  91. N. D. Lane, S. Bhattacharya, P. Georgiev et al., “Accelerating embedded deep learning using DeepX: demonstration abstract,” in Proceedings of 15th International Conference on Information Processing in Sensor Networks, p. 61, Vienna, Austria, April 2016.
  92. N. D. Lane, S. Bhattacharya, P. Georgiev, C. Forlivesi, and F. Kawsar, “Demo: accelerated deep learning inference for embedded and wearable devices using DeepX,” in Proceedings of 14th Annual International Conference on Mobile Systems, Applications, and Services Companion, p. 109, Singapore, June 2016.
  93. A. Mathur, N. D. Lane, S. Bhattacharya, A. Boran, C. Forlivesi, and F. Kawsar, Deepeye: Resource Efficient Local Execution of Multiple Deep Vision Models Using Wearable Commodity Hardware, ACM, New York, NY, USA, 2017.
  94. N. D. Lane, P. Georgiev, and L. Qendro, “Deepear: robust smartphone audio sensing in unconstrained acoustic environments using deep learning,” in Proceedings of 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing, pp. 283–294, Osaka, Japan, September 2015.
  95. C.-F. Chen, G. G. Lee, V. Sritapan, and C.-Y. Lin, “Deep convolutional neural network on iOS mobile devices,” in Proceedings of 2016 IEEE International Workshop on Signal Processing Systems (SiPS), pp. 130–135, Dallas, TX, USA, 2016.
  96. G. M. Harari, N. D. Lane, R. Wang, B. S. Crosier, A. T. Campbell, and S. D. Gosling, “Using smartphones to collect behavioral data in psychological science: opportunities, practical considerations, and challenges,” Perspectives on Psychological Science, vol. 11, no. 6, pp. 838–854, 2016. View at Google Scholar
  97. N. D. Lane, E. Miluzzo, H. Lu, D. Peebles, T. Choudhury, and A. T. Campbell, “A survey of mobile phone sensing,” IEEE Communications Magazine, vol. 48, no. 9, pp. 140–150, 2010. View at Publisher · View at Google Scholar · View at Scopus
  98. N. D. Lane and P. Georgiev, “Can deep learning revolutionize mobile sensing?” in Proceedings of 16th International Workshop on Mobile Computing Systems and Applications, pp. 117–122, Santa Fe, NM, USA, February 2015.
  99. S. Yao, S. Hu, Y. Zhao, A. Zhang, and T. Abdelzaher, “Deepsense: a unified deep learning framework for time-series mobile sensing data processing,” arXiv preprint arXiv:1611.01942, 2016.
  100. T. Beltramelli and S. Risi, “Deep-spying: spying using smartwatch and deep learning,” arXiv preprint arXiv:1512.05616, 2015.
  101. M. Längkvist, L. Karlsson, and A. Loutfi, “A review of unsupervised feature learning and deep learning for time-series modeling,” Pattern Recognition Letters, vol. 42, pp. 11–24, 2014. View at Publisher · View at Google Scholar · View at Scopus
  102. J. C. B. Gamboa, “Deep learning for time-series analysis,” arXiv preprint arXiv:1701.01887, 2017.
  103. X. Ouyang, C. Zhang, P. Zhou, and H. Jiang, “Deepspace: an online deep learning framework for mobile big data to understand human mobility patterns,” arXiv preprint arXiv:1610.07009, 2016.
  104. M. A. Alsheikh, D. Niyato, S. Lin, H.-P. Tan, and Z. Han, “Mobile big data analytics using deep learning and Apache spark,” IEEE Network, vol. 30, no. 3, pp. 22–29, 2016. View at Publisher · View at Google Scholar · View at Scopus
  105. M. A. Alsheikh, Y. Jiao, D. Niyato, P. Wang, D. Leong, and Z. Han, “The accuracy-privacy tradeoff of mobile crowdsensing,” arXiv preprint arXiv:1702.04565, 2017.
  106. A. Marjovi, A. Arfire, and A. Martinoli, “Extending urban air quality maps beyond the coverage of a mobile sensor network: data sources, methods, and performance evaluation,” in Proceedings of International Conference on Embedded Wireless Systems and Networks, Madrid, Spain, February 2017.
  107. S. Stober, D. J. Cameron, and J. A. Grahn, “Classifying EEG recordings of rhythm perception,” in Proceedings of ISMIR, pp. 649–654, Taipei, Taiwan, 2014.
  108. S. Stober, D. J. Cameron, and J. A. Grahn, “Using convolutional neural networks to recognize rhythm stimuli from electroencephalography recordings,” in Proceedings of Advances in Neural Information Processing Systems, pp. 1449–1457, Montreal, Canada, December 2014.
  109. S. Stober, A. Sternin, A. M. Owen, and J. A. Grahn, “Deep feature learning for EEG recordings,” arXiv preprint arXiv:1511.04306, 2015.
  110. A. Sternin, S. Stober, J. Grahn, and A. Owen, “Tempo estimation from the EEG signal during perception and imagination of music,” in Proceedings of International Workshop on Brain-Computer Music Interfacing/International Symposium on Computer Music Multidisciplinary Research (BCMI/CMMR), Plymouth, UK, June 2015.
  111. D. Wulsin, J. Gupta, R. Mani, J. Blanco, and B. Litt, “Modeling electroencephalography waveforms with semi-supervised deep belief nets: fast classification and anomaly measurement,” Journal of Neural Engineering, vol. 8, no. 3, p. 036015, 2011. View at Publisher · View at Google Scholar · View at Scopus
  112. S. Narejo, E. Pasero, and F. Kulsoom, “EEG based eye state classification using deep belief network and stacked autoencoder,” International Journal of Electrical and Computer Engineering (IJECE), vol. 6, no. 6, pp. 3131–3141, 2016. View at Publisher · View at Google Scholar · View at Scopus
  113. S. Stober, “Learning discriminative features from electroencephalography recordings by encoding similarity constraints,” in Proceedings of Bernstein Conference, London, UK, September 2016.
  114. T. Ma, H. Li, H. Yang et al., “The extraction of motion-onset VEP BCI features based on deep learning and compressed sensing,” Journal of Neuroscience Methods, vol. 275, pp. 80–92, 2017. View at Publisher · View at Google Scholar · View at Scopus
  115. D. Wang and Y. Shang, “Modeling physiological data with deep belief networks,” International Journal of Information and Education Technology (IJIET), vol. 3, no. 5, p. 505, 2013. View at Google Scholar
  116. M. M. Najafabadi, F. Villanustre, T. M. Khoshgoftaar, N. Seliya, R. Wald, and E. Muharemagic, “Deep learning applications and challenges in big data analytics,” Journal of Big Data, vol. 2, no. 1, p. 1, 2015. View at Publisher · View at Google Scholar · View at Scopus
  117. M. M. Najafabadi, F. Villanustre, T. M. Khoshgoftaar, N. Seliya, R. Wald, and E. Muharemagc, “Deep learning techniques in big data analytics,” in Big Data Technologies and Applications, pp. 133–156, Springer International Publishing, Berlin, Germany, 2016. View at Google Scholar
  118. D. Xie, L. Zhang, and L. Bai, “Deep learning in visual computing and signal processing,” Applied Computational Intelligence and Soft Computing, vol. 2017, 2017. View at Publisher · View at Google Scholar · View at Scopus
  119. H. Wen, “Vinet: visual-inertial odometry as a sequence-to-sequence learning problem,” in Proceedings of AAAI, Phoenix, AZ, USA, February 2016.
  120. R. Clark, S. Wang, H. Wen, A. Markham, and N. Trigoni, “Vinet: visual-inertial odometry as a sequence-to-sequence learning problem,” arXiv preprint arXiv:1701.08376, 2017.
  121. J. Hannink, T. Kautz, C. F. Pasluosta et al., “Stride length estimation with deep learning,” arXiv preprint arXiv:1609.03321, 2016.
  122. J. Hannink, T. Kautz, C. Pasluosta, K.-G. Gassmann, J. Klucken, and B. Eskofier, “Sensor-based gait parameter extraction with deep convolutional neural networks,” IEEE Journal of Biomedical and Health Informatics, vol. 21, no. 1, pp. 85–93, 2017. View at Google Scholar
  123. J. Hannink, T. Kautz, C. Pasluosta et al., “Mobile stride length estimation with deep convolutional neural networks,” IEEE Journal of Biomedical and Health Informatics, vol. 22, no. 2, pp. 354–362, 2018. View at Publisher · View at Google Scholar · View at Scopus
  124. D. Ravi, C. Wong, F. Deligianni et al., “Deep learning for health informatics,” IEEE Journal of Biomedical and Health Informatics, vol. 21, no. 1, pp. 4–21, 2017. View at Google Scholar
  125. D. Ravi, C. Wong, B. Lo, and G.-Z. Yang, “Deep learning for human activity recognition: a resource efficient implementation on low-power devices,” in Proceedings of 2016 IEEE 13th International Conference on Wearable and Implantable Body Sensor Networks (BSN), pp. 71–76, San Francisco, CA, USA, June 2016.
  126. D. Ravi, C. Wong, B. Lo, and G.-Z. Yang, “A deep learning approach to on-node sensor data analytics for mobile or wearable devices,” IEEE Journal of Biomedical and Health Informatics, vol. 21, no. 1, pp. 56–64, 2017. View at Publisher · View at Google Scholar · View at Scopus
  127. D. Ravi, C. Wong, F. Deligianni et al., “Special section on deep learning for biomedical and health informatics,” Journal of Biomedical and Health Informatics, vol. 21, no. 1, pp. 4–21, 2017. View at Publisher · View at Google Scholar · View at Scopus
  128. A. B. V. dos Santos and D. R. Carvalho, “Deep learning for healthcare management and diagnosis,” Iberoamerican Journal of Applied Computing, vol. 5, no. 2, 2016. View at Google Scholar
  129. R. Miotto, L. Li, B. A. Kidd, and J. T. Dudley, “Deep patient: an unsupervised representation to predict the future of patients from the electronic health records,” Scientific Reports, vol. 6, no. 1, 2016. View at Publisher · View at Google Scholar · View at Scopus
  130. R. Miotto and C. Weng, “Unsupervised mining of frequent tags for clinical eligibility text indexing,” Journal of Biomedical Informatics, vol. 46, no. 6, pp. 1145–1151, 2013. View at Publisher · View at Google Scholar · View at Scopus
  131. R. Miotto, L. Li, and J. T. Dudley, “Deep learning to predict patient future diseases from the electronic health records,” in Proceedings of European Conference on Information Retrieval, pp. 768–774, Padua, Italy, March 2016.
  132. E. Choi, M. T. Bahadori, L. Song, W. F. Stewart, and J. Sun, “Gram: graph-based attention model for healthcare representation learning,” arXiv preprint arXiv:1611.07012, 2016.
  133. E. Choi, A. Schuetz, W. F. Stewart, and J. Sun, “Using recurrent neural network models for early detection of heart failure onset,” Journal of the American Medical Informatics Association, vol. 24, no. 2, pp. 361–370, 2017. View at Publisher · View at Google Scholar · View at Scopus
  134. E. Choi, M. T. Bahadori, E. Searles et al., “Multi-layer representation learning for medical concepts,” in Proceedings of 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1495–1504, San Francisco, CA, USA, August 2016.
  135. E. Choi, A. Schuetz, W. F. Stewart, and J. Sun, “Medical concept representation learning from electronic health records and its application on heart failure prediction,” arXiv preprint arXiv:1602.03686, 2016.
  136. E. Choi, M. T. Bahadori, and J. Sun, “Doctor AI: predicting clinical events via recurrent neural networks,” arXiv preprint arXiv:1511.05942, 2015.
  137. P. Nguyen, T. Tran, N. Wickramasinghe, and S. Venkatesh, “Deepr: a convolutional net for medical records,” IEEE Journal of Biomedical and Health Informatics, vol. 21, no. 1, pp. 22–30, 2016. View at Publisher · View at Google Scholar · View at Scopus
  138. S. Nemati, M. M. Ghassemi, and G. D. Clifford, “Optimal medication dosing from suboptimal clinical examples: a deep reinforcement learning approach,” in Proceedings of 2016 IEEE 38th Annual International Conference of the Engineering in Medicine and Biology Society (EMBC), pp. 2978–2981, Orlando, FL, USA, August 2016.
  139. T. Pham, T. Tran, D. Phung, and S. Venkatesh, “Deepcare: a deep dynamic memory model for predictive medicine,” in Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 30–41, Auckland, New Zealand, April 2016.
  140. S. P. Shashikumar, A. J. Shah, Q. Li, G. D. Clifford, and S. Nemati, “A deep learning approach to monitoring and detecting atrial fibrillation using wearable technology,” in Proceedings of 2017 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI), pp. 141–144, Orlando, FL, USA, February 2017.
  141. M. Poggi, L. Nanni, and S. Mattoccia, “Crosswalk recognition through point-cloud processing and deep-learning suited to a wearable mobility aid for the visually impaired,” in Proceedings of International Conference on Image Analysis and Processing, pp. 282–289, Genoa, Italy, September 2015.
  142. M. Poggi and S. Mattoccia, “A wearable mobility aid for the visually impaired based on embedded 3d vision and deep learning,” in Proceedings of 2016 IEEE Symposium on Computers and Communication (ISCC), pp. 208–213, Messina, Italy, June 2016.
  143. S. Ji, W. Xu, M. Yang, and K. Yu, “3D convolutional neural networks for human action recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 1, pp. 221–231, 2013. View at Publisher · View at Google Scholar · View at Scopus
  144. H. Du, M. M. Ghassemi, and M. Feng, “The effects of deep network topology on mortality prediction,” in Proceedings of 2016 IEEE 38th Annual International Conference of Engineering in Medicine and Biology Society (EMBC), pp. 2602–2605, Orlando, FL, USA, 2016.
  145. T. Alhanai and M. M. Ghassemi, Predicting Latent Narrative Mood Using Audio and Physiologic Data, MIT, Cambridge, MA, USA, 2017.
  146. A.-r. Mohamed, G. Dahl, and G. Hinton, “Acoustic modeling using deep belief networks,” IEEE Transactions on Audio, Speech, and Language Processing, no. 99, p. 1, 2010. View at Google Scholar
  147. B. Furht and F. Villanustre, Big Data Technologies and Applications, Springer, Berlin, Germany, 2016.
  148. N. Sultan, “Reflective thoughts on the potential and challenges of wearable technology for healthcare provision and medical education,” International Journal of Information Management, vol. 35, no. 5, pp. 521–526, 2015. View at Publisher · View at Google Scholar · View at Scopus
  149. S. L. Halson, J. M. Peake, and J. P. Sullivan, “Wearable technology for athletes: information overload and pseudoscience?” International Journal of Sports Physiology and Performance, vol. 11, no. 6, pp. 705-706, 2016. View at Publisher · View at Google Scholar · View at Scopus
  150. J. M. Ortman, V. A. Velkoff, H. Hogan et al., An Aging Nation: the Older Population in the United States, United States Census Bureau, Economics and Statistics Administration, US Department of Commerce, Suitland, MD, USA, 2014.