Advances in Architectures, Big Data, and Machine Learning Techniques for Complex Internet of Things SystemsView this Special Issue
Review Article | Open Access
Review of the Complexity of Managing Big Data of the Internet of Things
There is a growing awareness that the complexity of managing Big Data is one of the main challenges in the developing field of the Internet of Things (IoT). Complexity arises from several aspects of the Big Data life cycle, such as gathering data, storing them onto cloud servers, cleaning and integrating the data, a process involving the last advances in ontologies, such as Extensible Markup Language (XML) and Resource Description Framework (RDF), and the application of machine learning methods to carry out classifications, predictions, and visualizations. In this review, the state of the art of all the aforementioned aspects of Big Data in the context of the Internet of Things is exposed. The most novel technologies in machine learning, deep learning, and data mining on Big Data are discussed as well. Finally, we also point the reader to the state-of-the-art literature for further in-depth studies, and we present the major trends for the future.
The fast-developing and expanding area known as the Internet of Things (IoT) [1–3] involves expanding the Internet beyond such standard devices as computers, smartphones, and tablets to also include the connection of other physical devices and objects. This allows for a variety of devices, sensors, etc. to be monitored and controlled, and to interact and communicate via the Internet. This means that an abundance of opportunity for brand new and revolutionary types of services and applications arises. As a result, we are now witnessing a technological revolution where millions of people are connecting and generate tremendous amounts of data through the increasing use of a wide variety of devices. These include smart devices and any type of wearable that are connected to the Internet, powering novel connected applications and solutions. The cost of technology has sharply decreased making it possible for everybody to access the Internet and to gather data and an abundance of real-time information.
One immediate consequence of this revolutionary emergence of novel technological opportunities is the urgent need for the development and adaptation of other related areas to further enable the development of the IoT field. Thus, new words, as well as new expressions, have started to emerge, such as Big Data [4, 5], cloud computing , and Data science. Data science has been defined as a “concept to unify statistics, data analysis, machine learning and their related methods” to “understand and analyse actual phenomena” with data [7, 8], and there is now a strong demand for professional data scientists in a multitude of sectors [9–12].
This article aims at providing a review of IoT related surveys in order to highlight the opportunities and the challenges, as well as the state-of-the-art technologies related to Big Data. There will be a particular focus on how to address the arising problems of managing the ensuing increased complexity. Since it is such a complex area, we have divided the Big Data procedure into several different stages to establish the most important points in each, while highlighting to the reader the most relevant papers related to every stage. Due to the complexity of managing Big Data, we have created separate sections in regard to the aforementioned stages of Big Data procedure. Our contribution explicitly indicates the advantages of every stage in the knowledge discovering procedure in contrast to approaches that offer more general visions. The advantage of this proposal is to be able to understand as well as analyse the challenges and opportunities in every particular phase.
The remainder of this article is structured as follows: first the next section discusses a set of general approaches to handle the complexity of managing Big Data in the context of the IoT as well as the future trends in the development of these approaches; then a section follows that discusses the knowledge discovering procedure in data gathered from a large number of diverse devices in the context of the IoT; finally, we provide a conclusion that summarises the article and points out major future trends.
2. The Internet of Things and Complexity Handling: Architectures for Big Data
The Internet of Things (IoT) paradigm has brought a great revolution to our society [13–15]. It is a technology that makes our world better. It allows us to get information about the physical environment around us, and from this data valuable knowledge can be inferred about how the world works. This knowledge enables the deployment of new real-world applications, and it makes it easier for smart decisions to improve the quality of life of the citizens of our society. There are many examples of how this novel technology runs. The smart city concept is a representative use case, where many applications have been developed for its ecosystem [16–19].
An important source of complexity within the IoT paradigm comes from the great amount of data collected. In most cases, the data also need to be processed in order to be converted into useful knowledge.
In view of the recent proposals on how to handle the complexity of Big Data, there are three general approaches to carry out the ensuing very intensive data processing: (A) local processing; (B) edge computing; and (C) cloud computing. Figure 1 shows a schematic overview of these approaches, and Table 1 summarises a representative set of ways and aspects of handling the complexity arising from the IoT. Table 1 also provides references to corresponding papers, categorised under the headings of the three general approaches mentioned above. In the following subsections, brief descriptions of each of these approaches are presented, and finally their main future trends are introduced.
2.1. Local Processing
This approach basically consists of processing the data where the data is collected. In this way, no raw data need to be communicated to remote servers. Instead, only the useful and relevant information is centralised to make smart decisions [20, 21]. In addition, deploying the first-level intelligence closer to the sensors produces an increase in the overall energy efficiency and significantly reduces the communication needs of many IoT applications.
This approach develops the concept of ‘smart sensor,’ which was initially defined as ‘smart transducer’ . A smart sensor is a sensor with computing and communication capabilities to make computations with the acquired data, make decisions, and store information for further use and perform two-way communications . Smart sensors are becoming integral parts of intelligent systems and they are indispensable enablers of the IoT paradigm and the corresponding development of advanced applications. A typical example of these developed sensors is the ‘smart wearable.’ This device can acquire several biosignals, process them, show elaborated information to the user, and send the relevant information to, for example, external platforms for medical supervision [23–25]. Other important applications come from the logistics  and industrial fields . Indeed, the new computation and communication capabilities of the IoT paradigm allow for the implementation of intelligent manufacturing systems giving rise to the next generation of industry, the so-called ‘Industry 4.0’ .
In these environments, network virtualization plays a significant role in providing flexibility and better manageability to Internet . This is a way for reducing the complexity of the infrastructure since network resources can be managed as logical services, rather than physical resources. This feature enables us to implement smart scheduling methods for network usage and dataflows routing from IoT applications .
In order to properly carry out this resource management, network performance monitoring needs to be performed in effective and efficient ways. However, it remains a challenge for network operators  since active monitoring techniques used to dynamically acquire it can introduce overheads in the network . In general, existing methods are hard to use in practice and further research is needed in this area. Nevertheless, a promising idea to address this challenge consists in reducing the data measurement by implementing intelligent measurement schemes based on inference techniques from partial direct monitoring data .
2.2. Edge Computing
Edge computing is a novel paradigm which has spawned great interest recently. It consists of the deployment of storage and computing capabilities at the ‘Edge’ of the Internet. The ‘Edge’ of the Internet can be defined as the portion of the network between sensors or data sources and cloud data centres . The edge computing paradigm aims at deploying computing, storage, and network resources in this portion. The physical proximity of the computing platforms to where the data acquisition happens makes it easier to achieve lower end-to-end latency, high bandwidth, and low jitter to services .
There are several ways to implement edge computing that have in turn led to different approaches, such as Fog Computing, Mobile Edge Computing (MEC), and Cloudlet Deployment. Fog Computing consists in using the network devices such as routers, switches, and gateways as Fog Nodes to provide storage and computing resources . In addition, network virtualization has significantly contributed to developing this paradigm by considering the fog devices as virtual network nodes. This trend increases the deployment flexibility of Fog Computing services and their integration with mobile devices and ‘things’ . MEC is a novel paradigm based on deploying cloud computing capabilities in the base stations of the telecom operators . Finally, Cloudlet Deployment consists in the same concept as Cloud Computing, but without the Wide Area Network (WAN) inconveniences. The servers are installed within the local networks where the data sources are connected. These servers are known as cloudlets .
Applications for edge computing, such as in Virtual Reality and Gaming Applications , cannot tolerate high latency, or its unpredictability. This is something that remote cloud servers cannot deliver.
2.3. Cloud Computing
The Cloud Computing paradigm is one of the most disruptive technological innovations in the last few years. It makes available to anyone a flexible amount of computing resources under per-use payment methods, the so-called ‘as-a-service’ model. Currently, more and more software and hardware solutions are redesigned for this cloud paradigm .
The cloud computing model favours the development of large-sized data centres where the resources are optimised through virtualization and efficient management systems. This technology gives the IoT applications the possibility to work in different environments in a very agile way using the same infrastructure . In such a way, combining the cloud computing paradigm with IoT forms a new type of distributed system able to provide IoT-as-a-Service (IoTaaS) . This concept allows for the integration of powerful computing resources with different types of devices such as sensors, actuators, and other embedded devices to deliver advanced services and applications based on the gathered data. A particular instance of this idea is the Sensing and Actuation Cloud where the connected IoT devices are mainly sensors and actuators , or the Cloud Cyber Physical Systems (CPS) composed of sensors or sensor networks .
There are a great variety of successful examples of this trend in many areas, where the data are analysed in the cloud through Big Data and data mining methods to infer valuable knowledge from them and deliver rich and smart services to the stakeholders. For example, the smart city concept, mentioned above, is in part made possible by a centralised cloud-based data analysis and service provision [41, 46, 47].
In addition, a combination of these options can be designed taking several aspects into account, such as power consumption, communication networks, and the availability of computing platforms. Dynamic solutions can easily adapt to the more favourable approach to better handle the complexity and meet the operation constraints.
2.4. Future Trends
Regarding the future trends of the developments of these three general approaches to intensive data processing of IoT related Big Data, there are developments at several fronts. The following is a summary of those most relevant.
When it comes to local processing, the efforts are directed towards the continuous improvement of smart sensor devices. We can distinguish several research lines here. One is the efforts to increase the performance of the devices while simultaneously reducing their power consumption. Another is the integration of multiple sensing modalities on the same chip. Still another is the efforts directed towards the improvement of the methods employed for the extraction of useful information from the raw data .
Edge computing has a promising future since it decentralises the computing power along the network and produces clear benefits when it comes to response time and reliability . The research lines in this field aim at reaching a smooth engagement with the IoT ecosystem, mainly by reducing the management complexity of dispersed edge resources and developing mechanisms to maintain the security perimeter for the data and applications .
The cloud computing paradigm has triggered a strong growth of computing services around the world. For this reason, there is intensive ongoing research on expanding cloud services and solutions to new fields of application. These tasks seek to simplify business and make services easier for stakeholders. In this way, the new 5G protocol will facilitate access for services and applications in the cloud improving the Quality-of-Experience .
3. Knowledge Discovering Procedure
In Figure 2, a classical procedure of discovering knowledge from the data gathered from a large number of diverse devices is depicted. In this figure, we get an overview of all the stages involved in such a process. There are many challenges involved in these stages that will be described next.
3.1. IoT Data Gathering
The gathering of data for IoT architectures involves collection from different sources like social networks, the web, various devices, software applications, humans, and not the least various kinds of sensors. In addition to physical sensors, there are also virtual sensors that are created by the combination and fusion of data from different physical sensors in the cloud . When it comes to the gathering of data from sensors, not only the raw sensor data are collected and stored, but these are also often linked to, for example, relevant contextual information, which increases the value of the data . All these different sources engender large amounts of various types of data that, of course, also increases the requirements for storage capacity. The increasingly affordable storage resources that have recently become available mitigates this problem to some extent though.
Sensor networks are central for realising the IoT and in order to handle large amounts of polymorphous, heterogeneous sensor data on a large scale. Very Large-Scale Sensor Networks are employed using Cloud Computing . Some of the main challenges regarding Very Large-Scale Sensor Networks are to handle the sensor resources and the computational resources and to store and process the sensor data.
Table 2 provides references to papers focused on the gathering of data in the context of the IoT.
3.2. Data Cleaning and Integration
A consequence of the way information is gathered through various sources and devices within IoT is that the information varies broadly in structure and type. This leads to a need for integration, which can be defined as a set of techniques used to combine data from disparate sources into meaningful and valuable information.
Integration is one of the most challenging issues of Big Data, which is also associated with one of the most difficult Vs of Big Data, i.e., the variety of data. Table 3 shows a summary of papers that are focused on the problem of variety of information in Big Data.
Moreover, given the current context in which companies are organized, it is not enough to work with internal, local, and private databases. In most cases, there is also a need for the World Wide Web where many diverse databases and other data sources must interact and interoperate. This circumstance leads us to concepts such as heterogeneity and uncertainty.
Table 4 summarizes papers that deal with integration by means of a diversity of techniques and methods like XML, ontological constructs from knowledge representation, uncertainty, and data provenance.
3.3. Data Mining and Machine Learning
As more devices, sensors, etc. generate large amounts of data within the IoT, the question arises whether there are possibilities of finding hidden information in that data.
Data mining is a process that detects interesting knowledge from information repositories. This process is partly based on methods derived from modern machine learning algorithms adapted to fit Big Data and that extracts hidden information from, e.g., databases, data warehouses, data streams, time series, sequences, text, the web, and the large amount or valuable data generated by the IoT. Data mining aims at creating efficient predictive and descriptive models of large amounts of data that also generalize to new data . It includes methods such as clustering, classification, time series analysis, association rule mining, and outlier analysis . The precise choice among diverse data mining and machine learning techniques often depends on the taxonomy of the dataset.
Clustering includes unsupervised learning and uses the available structure to group data based on various kinds of similarity measures. Some examples of clustering methods are hierarchical clustering and partitioning algorithms, e.g., K-Means.
Classification is the process of finding models/functions describing classes that allow the prediction of class membership for new data. Some examples of classification methods are the K-Nearest Neighbour algorithm, Artificial Neural Networks, Decision Trees, Support Vector Machines, Bayesian Methods, and Rule-Based Methods.
In time series analysis meaningful properties are extracted from data over time, and in association rule mining, association rules are detected based on attribute-value conditions that are found frequently in the dataset.
Outlier analysis detects patterns that differ significantly from the main part of the data. The methods used are based on properties such as the density distribution or the distances between the instances in the data.
Table 5 provides a summary of, and references to, papers focusing on machine learning and data mining in the context of Big Data.
3.4. Deep Learning
In recent years, deep learning has become an important technology for solving a wide range of machine learning tasks . There are applications for natural language processing , signal processing , and video analysis that allows for the achievement of significantly better results than the state-of-the-art baselines. Also, deep learning is a very useful tool for processing large volumes of data . Because of high efficiency of processing data obtained from complex sensing environments at different spatial and temporal resolutions, deep learning is a suitable tool for analysing real-world IoT data. According to Gartner’s Top 10 Strategic Technology Trends for 2017 (https://www.gartner.com/smarterwithgartner/gartners-top-10-technology-trends-2017/), deep learning and IoT will become one of the most strategic technological two-way relationships: from the IoT side there are large volumes of data produced that require advanced analytics offered by the deep learning side. A wide range of deep learning architectures  finds applications for processing the data from IoT environments: convolutional networks for image analysis, recurrent networks for signal processing, autoencoders for denoising, feed forward networks for classification, and regression. Figure 3 represents a general architecture of deep learning.
Usually, the data are processed in dedicated frameworks such as Tensorflow (https://www.tensorflow.org/), Theano (http://deeplearning.net/software/theano/), Caffe (http://caffe.berkeleyvision.org/), H20 (https://www.h2o.ai/), and Torch (http://torch.ch/). Often GPUs or clusters of GPU servers are used for the processing [78, 79].
They offer different execution models as standalones or utilize high-performance computing based on, e.g., Hadoop, or Spark Cluster that allows a reduced time of computations. The frameworks have been widely compared and the reviews can be found online (https://dzone.com/articles/8-best-deep-learning-frameworks) (https://www.exastax.com/deep-learning/a-comparison-of-deep-learning-frameworks/). It should be noticed that these frameworks implement a processing model where the data are transferred to a server performing the analysis and in a final stage the response is returned. This model is subject to latency that could not be acceptable in some applications where there are requirements for high reliability, like, for example, when it comes to autonomous cars . Thus, if efficiency constraints require real-time data processing, then a particular implementation of the algorithm is made on a local node. In its basic setting, this solution does not allow the use of information from other sources. An example of on the node-processing has been presented in , where on the node spectral domain preprocessing is used before the data is passed onto the deep learning framework for Human Activity Recognition.
For the IoT the deep analytics are made on large data collections and are usually based on creating more descriptive features of processed objects. For example, in temporal data processing for indoor location prediction , a Semisupervised Deep Extreme Learning Machine algorithm has been proposed that improves the localisation performance. The wireless positioning method has been improved with the usage of the Stacked Denoising Autoencoder and that also improves the performance by creating reliable features from a large set of noisy samples . The prediction of home electricity power consumption has been analysed with a deep learning system that automatically extracts features from the captured data and optimises the electricity supply of the smart grid .
In Edge Computing with the analytics performed by a deep learning cluster , the resource consumption has been efficiently reduced . Convolutional neural networks with automatically created features appeared to be a very good solution for privacy preservation . Also in the security domain, deep learning finds many applications, e.g., it allows the construction of a model-based attack detection architecture for the IoT for cyber-attack detection in fog-to-things computing .
Video analysis integrated in IoT networks is strongly supported by neural networks, e.g., deep learning-based visual food recognition allows for the construction of a system employing an edge computing-based service for accurate dietary assessment . RTFace, a mechanism for denaturing video streams, has been based on a Deep Neural Network for face detection . It selectively blurs faces and enables privacy management for live video analytics.
3.5. Classification, Prediction, and Visualization
This section discusses the final stage in the chain of the “Procedure for Knowledge Discovery,” which is the obtainment of the final knowledge extracted from the raw data.
When employing machine learning methods for classification and prediction, it is important to use methods with good ability to generalize. The reason for this is that when we apply any of the aforementioned techniques, and after they have been trained on the original data, we want them to make good classifications and predictions of novel data rather than on the data used for training.
After machine learning methods have been applied, it is crucial to know how to interpret their outputs and understand what these mean and how they improve the knowledge in each application area. To that end, visualization methods are employed. Such methods are widely used within Big Data scenarios as they are very helpful for all types of graphical interpretations when the Volume, Variety, or Velocity are complex. In Table 6, we present a summary of, and referral to, papers that deal with visualization.
As indicated by the journal articles and the conference papers we have reviewed in this article, the complexity of Big Data is an urgent topic and the awareness of this is growing. Consequently, there is a lot of research carried out on this, and we will in all likelihood find more and more progress in this field during the next few years.
Additionally, a key issue that we really want to emphasize in this study is the aspects related to Big Data which transcend the academic area and that, therefore, are reflected in the company. An observation is that more than 50% out of 560 enterprises thinks Big Data will help them increase their operational efficiency as well as other things . This indicates that there are a lot of opportunities for Big Data. However, it is also clear that there are many challenges in every phase of the knowledge discovery procedure that need to be addressed in order to achieve a continued and successful progress within the field of Big Data.
As is shown in Figure 1, there are three general approaches when carrying out intensive data processing in IoT architectures: (a) local processing, (b) edge computing, and (c) cloud computing. The text explained each of these approaches more in detail.
We also explained the knowledge discovery procedure by dividing it into several stages as shown in Figure 2. These steps are IoT Data Gathering, Data Cleaning, Integration, Machine Learning, Data Mining, Classification, Prediction, and Visualization.
We have also discussed that many research papers are focused on the variety of information because this is in itself, in conjunction with integration, one of the most challenging issues when it comes to the IoT. This is also the reason why it is very often also associated with one of the most difficult Vs of Big Data, which is the variety of data.
The trend for the future seems to be that more investigations will be carried out in such areas as (a) techniques for data integration, again the V of Variety; (b) more efficient machine learning techniques on big data, such as Deep Learning and frameworks such as Apache’s Hadoop and Spark, that will probably have a crucial importance; and (c) the visualization of the data, with, e.g., dashboards, and more efficient techniques for the visualization of indicators.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
The authors acknowledge the support from the research center Internet of Things and People (IOTAP) at Malmö University in Sweden. This work was also supported by the Spanish Research Agency (AEI) and the European Regional Development Fund (ERDF) under project CloudDriver4Industry TIN2017-89266-R.
- H. Sundmaeker, P. Guillemin, P. Friess, and S. Woelfflé, Vision and Challenges for Realising the Internet of Things The meaning of things lies not in the things themselves, but in our attitude towards them. Antoine de Saint-Exupéry, 2010.
- A. Zaslavsky, C. Perera, and D. Georgakopoulos, “Sensing as a Service and Big Data,” in Proceedings of the International Conference on Advances in Cloud Computing (ACC), pp. 21–29, 2012.
- L. Atzori, A. Iera, and G. Morabito, “The internet of things: a survey,” Computer Networks, vol. 54, no. 15, pp. 2787–2805, 2010.
- M. Chen, S. Mao, and Y. Liu, “Big data: a survey,” Mobile Networks and Applications, vol. 19, no. 2, pp. 171–209, 2014.
- S. Sagiroglu and D. Sinanc, “Big data: a review,” in Proceedings of the International Conference on Collaboration Technologies and Systems (CTS '13), pp. 42–47, May 2013.
- M. Armbrust, A. Fox, R. Griffith et al., “A view of cloud computing,” Communications of the ACM, vol. 53, no. 4, pp. 50–58, 2010.
- C. Hayashi, What is Data Science? Fundamental Concepts and a Heuristic Example, 1998.
- V. Dhar, “Data science and prediction,” Communications of the ACM, vol. 56, no. 12, pp. 64–73, 2013.
- L. Vangelova, “Data scientist,” Scientific Teaching, vol. 79, no. 6, pp. 66-67, 2012.
- J. Hardin, R. Hoerl, N. J. Horton et al., “Data science in statistics curricula: preparing students to “think with data”,” The American Statistician, vol. 69, no. 4, pp. 343–353, 2015.
- T. Hey, S. Tansley, and K. Tolle, The Fourth Paradigm: Data-Intensive Scientific Discovery, 2009.
- P. Bevington and D. Robinson, Data Reduction and Error Analysis for the Physical Sciences, 1993.
- I. C. L. Ng and S. Y. L. Wakenshaw, “The Internet-of-Things: Review and research directions,” International Journal of Research in Marketing, vol. 34, no. 1, pp. 3–21, 2017.
- T. Saarikko, U. H. Westergren, and T. Blomquist, “The Internet of Things: Are you ready for what’s coming?” Business Horizons, vol. 60, no. 5, pp. 667–676, 2017.
- D. Gil, A. Ferrández, H. Mora-Mora, and J. Peral, “Internet of things: a review of surveys based on context aware intelligent services,” Sensors, vol. 16, no. 7, article 1069, 2016.
- R. Pérez-delHoyo, C. García-Mayor, H. Mora, V. Gilart-Iglesias, and M. D. Andújar-Montoya, “Improving urban accessibility: A methodology for urban dynamics analysis in smart, sustainable and inclusive cities,” International Journal of Sustainable Development and Planning, vol. 12, no. 3, pp. 357–367, 2017.
- Z. Lv, X. Li, W. Wang, B. Zhang, J. Hu, and S. Feng, “Government affairs service platform for smart city,” Future Generation Computer Systems, vol. 81, pp. 443–451, 2018.
- J. Macke, R. M. Casagrande, J. A. R. Sarate, and K. A. Silva, “Smart city and quality of life: Citizens’ perception in a Brazilian case study,” Journal of Cleaner Production, vol. 182, pp. 717–726, 2018.
- H. March, “The Smart City and other ICT-led techno-imaginaries: Any room for dialogue with Degrowth?” Journal of Cleaner Production, vol. 197, pp. 1694–1703, 2018.
- H. Mora, M. Signes-Pont, D. Gil, and M. Johnsson, “Collaborative Working Architecture for IoT-Based Applications,” Sensors, vol. 18, no. 6, p. 1676, 2018.
- W. Lee and A. Sharma, “Smart sensing for IoT applications,” in Proceedings of the 13th IEEE International Conference on Solid-State and Integrated Circuit Technology, ICSICT 2016, pp. 362–364, October 2016.
- Institute of Electrical and Electronics Engineers, IEEE Std 1451.0™ 2007, IEEE Standard for a Smart Transducer Interface for Sensors and Actuators Common Functions, Communication Protocols, and Transducer Electronic Data Sheet (TEDS) Formats, 2007.
- T. Islam, S. C. Mukhopadhyay, and N. K. Suryadevara, “Smart Sensors and Internet of Things: A Postgraduate Paper,” IEEE Sensors Journal, vol. 17, no. 3, pp. 577–584, 2017.
- J. Mendes Jr., M. Vieira, M. Pires, and S. Stevan Jr., “Sensor Fusion and Smart Sensor in Sports and Biomedical Applications,” Sensors, vol. 16, no. 10, p. 1569, 2016.
- H. Mora, D. Gil, R. M. Terol, J. Azorín, and J. Szymanski, “An IoT-Based Computational Framework for Healthcare Monitoring in Mobile Environments,” Sensors, vol. 17, no. 10, p. 2302, 2017.
- M. Masoudinejad, A. K. R. Venkatapathy, J. Emmerich, and A. Riesner, Smart Sensing Devices for Logistics Application, Springer, Cham, Switzerland, 2017.
- C. Chen, M. Lin, and X. Guo, “High-level modeling and synthesis of smart sensor networks for Industrial Internet of Things,” Computers & Electrical Engineering, vol. 61, pp. 48–66, 2017.
- R. Y. Zhong, X. Xu, E. Klotz, and S. T. Newman, “Intelligent Manufacturing in the context of industry 4.0: a review,” Engineering Journal, vol. 3, no. 5, pp. 616–630, 2017.
- Y. Su, X. Meng, Q. Kang, and X. Han, “Dynamic Virtual Network Reconfiguration Method for Hybrid Multiple Failures Based on Weighted Relative Entropy,” Entropy, vol. 20, no. 9, p. 711, 2018.
- C.-W. Tseng, F.-H. Tseng, Y.-T. Yang, C.-C. Liu, and L.-D. Chou, “Task Scheduling for Edge Computing with Agile VNFs On-Demand Service Model toward 5G and Beyond,” Wireless Communications and Mobile Computing, vol. 2018, Article ID 7802797, 13 pages, 2018.
- A. Yassine, H. Rahimi, and S. Shirmohammadi, “Software defined network traffic measurement: Current trends and challenges,” IEEE Instrumentation & Measurement Magazine, vol. 18, no. 2, pp. 42–50, 2015.
- H. Tahaei, R. Salleh, S. Khan, R. Izard, K.-K. R. Choo, and N. B. Anuar, “A multi-objective software defined network traffic measurement,” Measurement, vol. 95, pp. 317–327, 2017.
- X. Wang, C. Xu, G. Zhao, K. Xie, and S. Yu, “Efficient Performance Monitoring for Ubiquitous Virtual Networks Based on Matrix Completion,” IEEE Access, vol. 6, pp. 14524–14536, 2018.
- W. Shi, J. Cao, Q. Zhang, Y. Li, and L. Xu, “Edge computing: vision and challenges,” IEEE Internet of Things Journal, vol. 3, no. 5, pp. 637–646, 2016.
- M. Satyanarayanan, “The emergence of edge computing,” The Computer Journal, vol. 50, no. 1, pp. 30–39, 2017.
- Fog Computing and the Internet of Things: Extend the Cloud to Where the Things Are What You Will Learn, 2015.
- P. Hu, S. Dhelim, H. Ning, and T. Qiu, “Survey on fog computing: architecture, key technologies, applications and open issues,” Journal of Network and Computer Applications, vol. 98, pp. 27–42, 2017.
- E. Ahmed and M. H. Rehmani, “Mobile Edge Computing: Opportunities, solutions, and challenges,” Future Generation Computer Systems, vol. 70, pp. 59–63, 2017.
- M. Satyanarayanan, V. Bahl, R. Caceres, and N. Davies, “The Case for VM-based Cloudlets in Mobile Computing,” IEEE Pervasive Computing, vol. 8, no. 4, pp. 14–23, 2009.
- J. Pan and J. McElhannon, “Future edge cloud and edge computing for internet of things applications,” IEEE Internet of Things Journal, vol. 5, no. 1, pp. 439–449, 2018.
- L. J. M. Nieuwenhuis, M. L. Ehrenhard, and L. Prause, “The shift to Cloud Computing: The impact of disruptive technology on the enterprise software business ecosystem,” Technological Forecasting & Social Change, vol. 129, pp. 308–313, 2018.
- A. Celesti, D. Mulfari, M. Fazio, M. Villari, and A. Puliafito, “Exploring Container Virtualization in IoT Clouds,” in Proceedings of the 2nd IEEE International Conference on Smart Computing, SMARTCOMP 2016, pp. 1–6, May 2016.
- M. Giacobbe, R. Di Pietro, A. Longo Minnolo, and A. Puliafito, “Evaluating Information Quality in Delivering IoT-as-a-Service,” in Proceedings of the 2018 IEEE International Conference on Smart Computing (SMARTCOMP), pp. 405–410, June 2018.
- S. Satpathy, B. Sahoo, and A. K. Turuk, “Sensing and Actuation as a Service Delivery Model in Cloud Edge centric Internet of Things,” Future Generation Computer Systems, vol. 86, pp. 281–296, 2018.
- R. Lovas, A. Farkas, A. C. Marosi et al., “Orchestrated Platform for Cyber-Physical Systems,” Complexity, vol. 2018, Article ID 8281079, 16 pages, 2018.
- H. Mora, V. Gilart-Iglesias, R. Pérez-del Hoyo, and M. Andújar-Montoya, “A Comprehensive System for Monitoring Urban Accessibility in Smart Cities,” Sensors, vol. 17, no. 8, p. 1834, 2017.
- A. M. Osman, “A novel big data analytics framework for smart cities,” Future Generation Computer Systems, vol. 91, pp. 620–633, 2019.
- J. Santos, B. Volckaert, T. Wauters, and F. de Turck, “Fog Computing: Enabling the Management and Orchestration of Smart City Applications in 5G Networks,” Entropy, vol. 20, no. 1, p. 4, 2017.
- F. Mora-Gimeno, H. Mora-Mora, D. Marcos-Jorquera, and B. Volckaert, “A Secure Multi-Tier Mobile Edge Computing Model for Data Processing Offloading Based on Degree of Trust,” Sensors, vol. 18, no. 10, p. 3211, 2018.
- A. Monteriù, M. Prist, E. Frontoni et al., “A Smart Sensing Architecture for Domestic Monitoring: Methodological Approach and Experimental Validation,” Sensors, vol. 18, no. 7, p. 2310, 2018.
- M. Afrin, M. Razzaque, I. Anjum, M. Hassan, and A. Alamri, “Tradeoff between User Quality-Of-Experience and Service Provider Profit in 5G Cloud Radio Access Network,” Sustainability , vol. 9, no. 11, p. 2127, 2017.
- M. Yuriyama and T. Kushida, “Sensor-cloud infrastructure—physical sensor management with virtualized sensors on cloud computing,” in Proceedings of the 13th International Conference on Network-Based Information Systems (NBiS '10), pp. 1–8, September 2010.
- J. J. Calbimonte, H. Jeung, O. Corcho, and K. Aberer, “Semantic Sensor Data Search in a Large-Scale Federated Sensor Network,” Semantic Sensor Networks, pp. 14–29, 2011.
- J. Liu, J. Chen, L. Peng, X. Cao, R. Lian, and P. Wang, “An open, flexible and multilevel data storing and processing platform for very large scale sensor network,” in Proceedings of the 2012 14th International Conference on Advanced Communication Technology (ICACT), 2012.
- J. Gubbi, R. Buyya, S. Marusic, and M. Palaniswami, “Internet of Things (IoT): a vision, architectural elements, and future directions,” Future Generation Computer Systems, vol. 29, no. 7, pp. 1645–1660, 2013.
- M. Hassanalieragh, A. Page, T. Soyata et al., “Health monitoring and management using internet-of-things (IoT) sensing with cloud-based processing: opportunities and challenges,” in Proceedings of the IEEE International Conference on Services Computing, SCC 2015, pp. 285–292, IEEE, July 2015.
- S. Li, L. D. Xu, and X. Wang, “Compressed sensing signal and data acquisition in wireless sensor networks and internet of things,” IEEE Transactions on Industrial Informatics, vol. 9, no. 4, pp. 2177–2186, 2013.
- H. Mora-Mora, V. Gilart-Iglesias, D. Gil, and A. Sirvent-Llamas, “A computational architecture based on RFID sensors for traceability in smart cities,” Sensors, vol. 15, no. 6, pp. 13591–13626, 2015.
- F. Liu, Y. Liu, D. Jin, X. Jia, and T. Wang, “Research on Workshop-Based Positioning Technology Based on Internet of Things in Big Data Background,” Complexity, vol. 2018, Article ID 875460, 11 pages, 2018.
- C. L. P. Chen and C. Y. Zhang, “Data-intensive applications, challenges, techniques and technologies: A survey on Big Data,” Information Sciences, vol. 275, pp. 314–347, 2014.
- I. A. T. Hashem, I. Yaqoob, N. B. Anuar, S. Mokhtar, A. Gani, and S. Ullah Khan, “The rise of ‘big data’ on cloud computing: review and open research issues,” Information Systems, vol. 47, pp. 98–115, 2015.
- M. M. Najafabadi, F. Villanustre, T. M. Khoshgoftaar, N. Seliya, R. Wald, and E. Muharemagic, “Deep learning applications and challenges in big data analytics,” Journal of Big Data, vol. 2, no. 1, pp. 1–21, 2015.
- S. del Río, V. López, J. M. Benítez, and F. Herrera, “On the use of MapReduce for imbalanced big data using Random Forest,” Information Sciences, vol. 285, pp. 112–137, 2014.
- A. Halevy, A. Doan, and Z. Ives, Principles of Data Integration, Elsevier, 2012.
- A. Y. Halevy, “Answering queries using views: A survey,” The VLDB Journal, vol. 10, no. 4, pp. 270–294, 2001.
- R. Pottinger and A. Halevy, “MiniCon: A scalable algorithm for answering queries using views,” The VLDB Journal, vol. 10, no. 2-3, pp. 182–198, 2001.
- M. Brundage, Xquery: The XML Query Language, 2004.
- R. Goldman, J. McHugh, and J. Widom, “From semistructured data to XML: Migrating the Lore data model and query language,” Markup Languages: Theory and Practice, vol. 2, no. 2, pp. 153–163, 1999.
- V. Josifovski, M. Fontoura, and A. Barta, “Querying XML streams,” The VLDB Journal, vol. 14, no. 2, pp. 197–210, 2005.
- N. F. Noy, “Semantic integration: a survey of ontology-based approaches,” ACM SIGMOD Record, vol. 33, no. 4, pp. 65–70, 2004.
- A. Doan, J. Madhavan, P. Domingos, and A. Halevy, “Learning to map between ontologies on the semantic web,” in Proceedings of the 11th International Conference on World Wide Web (WWW '02), pp. 662–673, ACM, May 2002.
- T. J. Green, “Containment of Conjunctive Queries on Annotated Relations,” Theory of Computing Systems, vol. 49, no. 2, pp. 429–459, 2011.
- B. Glavic and G. Alonso, “Perm: Processing provenance and data on the same data model through query rewriting,” in Proceedings of the 25th IEEE International Conference on Data Engineering, ICDE 2009, pp. 174–185, China, April 2009.
- H. Gonzalez, A. Halevy, C. S. Jensen et al., “Google fusion tables: web-centered data management and collaboration,” in Proceedings of the the 1st ACM symposium, p. 175, June 2010.
- X. L. Dong, L. Berti-Equille, Y. Hu, and D. Srivastava, “Global detection of complex copying relationships between sources,” Proceedings of the VLDB Endowment, vol. 3, no. 1-2, pp. 1358–1369, 2010.
- A. Doan, R. Ramakrishnan, and A. Y. Halevy, “Crowdsourcing systems on the world-wide web,” Communications of the ACM, vol. 54, no. 4, pp. 86–96, 2011.
- A. Maté, H. Llorens, E. De Gregorio et al., “A novel multidimensional approach to integrate big data in business intelligence,” Journal of Database Management, vol. 26, no. 2, pp. 14–31, 2015.
- A. Mukhopadhyay, U. Maulik, S. Bandyopadhyay, and C. A. C. Coello, “A survey of multiobjective evolutionary algorithms for data mining: part I,” IEEE Transactions on Evolutionary Computation, vol. 18, no. 1, pp. 4–19, 2014.
- F. Chen, P. Deng, J. Wan, D. Zhang, A. V. Vasilakos, and X. Rong, “Data mining for the internet of things: Literature review and challenges,” International Journal of Distributed Sensor Networks, vol. 2015, 2015.
- X. Wu, X. Zhu, G.-Q. Wu, and W. Ding, “Data mining with big data,” IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 1, pp. 97–107, 2014.
- J. Dean, Big Data, Data Mining, and Machine Learning: Value Creation for Business Leaders and Practitioners, John Wiley & Sons, 2014.
- W. Fan and A. Bifet, “Mining big data: current status, and forecast to the future,” ACM SIGKDD Explorations Newsletter, vol. 14, no. 2, pp. 1–5, 2012.
- Y. Guo, Z. Yang, S. Feng, and J. Hu, “Complex Power System Status Monitoring and Evaluation Using Big Data Platform and Machine Learning Algorithms: A Review and a Case Study,” Complexity, vol. 2018, Article ID 8496187, 21 pages, 2018.
- X. Liu, Y. Zhou, and X. Chen, “Mining Outlier Data in Mobile Internet-Based Large Real-Time Databases,” Complexity, vol. 2018, Article ID 9702304, 12 pages, 2018.
- Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015.
- T. Young, D. Hazarika, S. Poria, and E. Cambria, “Recent Trends in Deep Learning Based Natural Language Processing,” IEEE Computational Intelligence Magazine, vol. 13, no. 3, pp. 55–75, 2018.
- D. Yu and L. Deng, “Deep learning and its applications to signal and information processing,” IEEE Signal Processing Magazine, vol. 28, no. 1, pp. 145–154, 2011.
- Y. Bengio, “Learning deep architectures for AI,” Foundations and Trends in Machine Learning, vol. 2, no. 1, pp. 1–27, 2009.
- Y. Tian, K. Pei, S. Jana, and B. Ray, “Deeptest: Automated testing of deep-neural-network-driven autonomous cars,” in Proceedings of the the 40th International Conference on Software Engineering, pp. 303–314, May 2018.
- D. Ravi, C. Wong, B. Lo, and G.-Z. Yang, “A Deep Learning Approach to on-Node Sensor Data Analytics for Mobile or Wearable Devices,” IEEE Journal of Biomedical and Health Informatics, vol. 21, no. 1, pp. 56–64, 2017.
- Y. Gu, Y. Chen, J. Liu, and X. Jiang, “Semi-supervised deep extreme learning machine for Wi-Fi based localization,” Neurocomputing, vol. 166, pp. 282–293, 2015.
- W. Zhang, K. Liu, W. Zhang, Y. Zhang, and J. Gu, “Deep Neural Networks for wireless localization in indoor and outdoor environments,” Neurocomputing, vol. 194, pp. 279–287, 2016.
- L. Li, K. Ota, and M. Dong, “When Weather Matters: IoT-Based Electrical Load Forecasting for Smart Grid,” IEEE Communications Magazine, vol. 55, no. 10, pp. 46–51, 2017.
- H. Li, K. Ota, and M. Dong, “Learning IoT in edge: deep learning for the internet of things with edge computing,” IEEE Network, vol. 32, no. 1, pp. 96–101, 2018.
- Y. Huang, X. Ma, X. Fan, J. Liu, and W. Gong, “When deep learning meets edge computing,” in Proceedings of the 2017 IEEE 25th International Conference on Network Protocols (ICNP), pp. 1-2, October 2017.
- S. Sharma, K. Chen, and A. Sheth, “Towards practical privacy-preserving analytics for IoT and cloud based healthcare systems,” IEEE Internet Computing, vol. 22, pp. 42–51, 2018.
- A. Abeshu and N. Chilamkurti, “Deep Learning: The Frontier for Distributed Attack Detection in Fog-To-Things Computing,” IEEE Communications Magazine, vol. 56, no. 2, pp. 169–175, 2018.
- C. Liu, Y. Cao, Y. Luo et al., “A New Deep Learning-Based Food Recognition System for Dietary Assessment on An Edge Computing Service Infrastructure,” IEEE Transactions on Services Computing, vol. 11, no. 2, pp. 249–261, 2018.
- J. Wang, B. Amos, A. Das, P. Pillai, N. Sadeh, and M. Satyanarayanan, “A scalable and privacy-aware IoT service for live video analytics,” in Proceedings of the 8th ACM Multimedia Systems Conference, MMSys 2017, pp. 38–49, June 2017.
- A. Gandomi and M. Haider, “Beyond the hype: big data concepts, methods, and analytics,” International Journal of Information Management, vol. 35, no. 2, pp. 137–144, 2015.
- D. Parmenter, Key Performance Indicators: Developing, Implementing, And Using Winning KPIs, 2015.
- P. Simon, The visual organization: data visualization, Big Data, and the quest for better decisions, 2014.
- L. Wang, G. Wang, and C. A. Alexander, “Big data and visualization: methods, challenges and technology progress,” Digital Technologies, vol. 1, no. 1, pp. 33–38, 2015.
- D. Keim, H. Qu, and K. Ma, “Big-Data Visualization,” IEEE Computer Graphics and Applications, vol. 33, no. 4, pp. 20-21, 2013.
Copyright © 2019 David Gil et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.