Abstract

The internet of things is used as a demonstrative keyword for evolution of the internet and physical realms, by means of pervasive distributed commodities with embedded identification, sensing, and actuation abilities. Imminent intellectual technologies are subsidizing internet of things for information transmission within physical and autonomous digital entities to provide amended services, leading towards a new communication era. Substantial amounts of heterogeneous hardware devices, e.g., radio frequency identification (RFID) tags, sensors, and various network protocols are exploited to support object identification and network communication. Data generated by these digital objects is termed as “Big Data” and incorporates high dimensional space with noisy, irrelevant, and redundant features. Direct execution of mining techniques onto such kind of high dimensionality attribute space can increase cost and complexity. Data analytic mechanisms are embedded into internet of things to permit intelligent decision-making capabilities. These notions have raised new challenges regarding internet of things from a data and algorithm perspective. The proposed study identifies the problem in the internet of things network and proposes a novel cuckoo search-based outdoor data management. The technique of the feature extraction is used for the extraction of expedient information from raw and high-dimensional data. After the implementation for the cuckoo search-based feature extraction, few test benchmarks are introduced to evaluate the performance of mutated cuckoo search algorithms. The consequential low-dimensional data optimizes classification accuracy along with reduced complexity and cost.

1. Introduction

The next generation of internet and computers will decline the conventional approach of the internet, reaching to the end-users by promoting the model of interconnecting “smart” objects. It will not replace the internet but will be an addition to (internet) as an infrastructure by integration of physical objects with processing and transmission technologies delivering immense range of services and applications in a more reliable, fast, and accessible way. Such revolution leads towards ubiquitous computing in which every object is embedded with microprocessors and communicate proficiently [1]. It will make physical objects “smart” and let them integrate with worldwide cyber physical frameworks. This trend will pave a way for new openings and innovations in information and communication technologies (ICT), offering new services and applications by connecting physical and virtual objects. It is emerging as a trend in which most of the surrounding objects will be on network in various forms. This is a shift from conventional internet approaches to the internet for connecting physical objects that interact with each other and humans [2].

The scope of this research work is limited to the application of the cuckoo search algorithm for dimensionality reduction issues in data mining for internet of things application. This work is more related to feature extraction as compared to the feature selection. The further scope of this research work is based on transformation of existing features by using cuckoo search algorithms for data mining. Among indoor and outdoor data service, our focus would be on outdoor data services. The outdoor data could be of many types, e.g., text, images, hypertext, audio, and video, but we are only considering textual or numerical regression of our data for the proposed model [3].

The internet of things enables us to connect with anyone, anytime, and at any place. Technological realms are constructing societies, where everything will be connected. Things have capability to be identified uniquely, operating in smart environments with the help of intellectual interfaces to connect and communicate within the physical world. Interconnected smart objects are heterogeneous and equipped with smart devices (sensors and actuators). Data sensed/captured by these objects is huge in amount and can be termed as “big data” [4]. Knowledge discovery and data mining techniques (classification, clustering, and pattern analysis) are proposed for the internet of things (IoT) by researchers to provide a suitable environment and quality services to people. The extensive volume of data (big data) produced by smart commodities with high dimensions, noisy, irrelevant, and redundant attributes generate a huge search space. If mining techniques are applied on such rough and fuzzy data, it can reduce performance, increase cost, and computation of mining algorithms. Therefore, preprocessing techniques are required to map the original datasets onto new reduced attribute subset, which can represent the original space with high accuracy [5].

In the current era, scope of real time networks and applications is not limited to social and enterprise activities. They are emerging as an extensive discipline to provide advanced and competitive environments for diverse activities including health, home, and business processes. To maintain network robustness and accessibility of proficient services, data analytic techniques are crucial. Data purification to reduce computational complexity of preprocessing and mining models is mandatory. Existing techniques are complex, thus involve large computations [6]. These facts introduce research gap, and formulation of the proposed research work is based on the following objectives: (i)Outlining untaken challenges concerned to deal with big data analysis for the internet of things and with performance analysis and limitations of existing techniques(ii)Signifying importance of preliminary processing paradigms for self-organized networks to reduce complexity and enhance performance of mining techniques(iii)Presenting a framework to reduce curse of dimensionality for the internet of things, based on a nonlinear metaheuristic approach, i.e., cuckoo search algorithm(iv)Investigating a proposed framework performance

The conducted research work identifies challenges for the outdoor data management in the internet of things and the purpose of feature extraction to extract expedient information from raw and high-dimensional data by the technique of the cuckoo search algorithm [7]. The consequential low-dimensional data optimizes classification accuracy along with reduced complexity and cost [8]. The presented work is fundamentally focused to propose a suitable preprocessing framework for internet of things to outdoor data services [9]. Moreover, the proposed technique is only related to single objective optimization, and it can be expanded for multiobjective nonlinear optimization [10]. Implementation within this document only covers the basic concept of the proposed technique and explicitly for the internet of things’ outdoor data services [11]. Dataset used in this research is based on the results of virtual hardware devices and may contain some ambitious attributes as well [12].

1.1. Dimensionality Reduction

In an IoT network, data collected by various sensing entities (e.g., sensors and RFIDs) includes redundant, irrelevant, and noisy information about a dataset. The improved performance and effectiveness of IoT services are achieved by applying frequent mining techniques. Data is collected in unwavering speed which increases the complexity of mining algorithm classifiers due to high data dimensions [13]. Data dimension depicts the several aspects or features to describe an input dataset.

Here, is the dimension, and represents the features describing the dataset .

As already mentioned, the accuracy of mining techniques can be affected by high dimensional datasets. So, some preprocessing strategies are necessitated for the transformation of high-dimensional data into lower dimensions (see Figure 1). Data with low dimensionality will optimize classification accuracy along with reduced complexity and cost. Mining the data for IoT is dissimilar to conventional data approaches [14].

In IoT, mining problems occur due to traditional data mining algorithms and hence, they need to be revised for handling scalability and big data issues [15]. Classification problems symbolize many challenges and issues in mining and machine learning research areas [16]. From the IoT point of view, the aim is to classify each object falling into big data according to the characteristics illustrated by its features [17]. Appropriate data representations are needed to describe the physical world data [18], and it is problematic to distinguish the advantageous features. Noisy, redundant, irrelevant, and distorted data can reduce the performance, diminish the classification accuracy, increase cost, and computation of mining algorithms [19]. Eliminating these factors, data mining techniques can benefit and work more efficiently [20]. According to the work proposed in [21], reducing data dimensions (dimensionality reduction) can handle this problem by the selection of relevant and meaningful features only.

If is a dataset with dimension , such that , then dimensionality reduction is implied, such that there will be a dataset having fewer dimensions , , and where in such a way that represents the dataset with fewer feature subset or dimensions .

The reduced set of data can expand the performance and speed of mining algorithms, which leads towards optimal classification results and better network performance. Dimensionality reduction is a complicated challenge due to a large search space as the size of data increases exponentially based on its attributes. An attribute may become relevant or redundant in various scenarios according to properties specified by its dimensions. So, optimal search techniques are indispensable for exhaustive search, which is impossible when a search space is indefinite. An inclusive range of searching techniques has been proposed (e.g., sequential forward/backward selection) to select profound attributes to reduce data dimensions which can represent the data in the most appropriate form for classification and other mining strategies. Despite these approaches, attribute reduction practices are undermined from data and algorithm perspectives and going through challenges for dynamic local optimum, exceeding cost and computational complexity [22].

1.2. Techniques for Dimensionality Reduction

Reducing the number of dimensions from the extensive search space is a challenging and demanding call for current networks and computing technologies. Figure 2 describes an abstract process for knowledge discovery and dimensionality reduction. Feature selection (FS) and feature extraction (FE) are two approaches for dimensionality reduction and will be discussed in following sections [23].

Dimensionality reduction (branch of statistics and machine learning) is emerging as a new realm in the solution domain for big data issues to map an original feature space onto a new space This process minimizes the number of random variables (attributes) under consideration and transforms data from high dimensions to low dimensions. Two approaches for mapping can either be choosing a subset of the original feature space (feature selection) or by forming a new space using a transformation function (feature extraction). Feature selection (or feature subset selection) from available datasets is considered more efficient to represent original data comparatively [24].

1.2.1. Feature Selection

Preprocessing techniques are indispensable for data mining to reduce complexity, processing, storage, and cost of classifiers. In IoT, input datasets on data collection layers are represented by high dimension variables and raise the processing complexity of mining algorithms. Feature selection outlines the problem of selecting a feature subset from available candidate features representing originally measured datasets [25]. Feature selection and extraction are also used extensively in image processing and computer vision field of research [26].

A more technical work on the local manifold representation with usage of affinity matrix in the field of dimensionality reduction in hyperspectral imagery [27]. In the field of hyperspectral dimensionality reduction for remote senting data, machine learning models are also extensively used for labeling the graphs along with learning features [28].

Let input to a mining or training classifier is a set of datasets. Each dataset can be a set of features describing the original set of features. The instance is a tuple including dimensions as given below. where is the domain of the feature, and cardinality of is , having dimensions . Let represents the selected dimensions in .

Equations (3) and (4) define a new subset of which is smaller than the original set and belong to the same search space that is , where number of features in is equal to . The process of feature selection is shown in Figure 3 which is based on two phases. The first part of the process selects subset from original spaces, and the second section evaluates the newly generated subset. Let be the representation vector and be the selection criteria for optimal subset . Formally, feature selection maps the high-dimensional space to low dimension by finding a subset where .

According to equation (5), the higher value of signifies an improved feature subset. The selected subset performs as a best input to the classifiers and expands the accuracy rate. It is given by the feature selection criterion that only a subset is selected that forms a large set of variables to represent datasets and does not incorporate any transformation and mapping to extract new information from existing high-dimensional datasets.

1.2.2. Feature Extraction

Feature extraction, an imperative data preprocessing technique, adds value to the mining techniques as a performance enhancer for IoT networks to transform existing high-dimensional datasets, i.e., uploading high accuracy data representation model for the original feature space [29].

The existence of inappropriate and redundant facts in original datasets demand a prerequisite process (feature extraction). The feature extraction process undermines two research issues: search technique and evaluation measures. The search space includes complete and feature subset, and feature extraction transforms existing features to find optimal solution set [30].

In equation (7), represents dimension, and denotes the recent feature subset size. Various evaluation techniques have emerged for the optimal subset selection. Searching for an optimal feature subset is termed as a nondeterministic polynomial- (NP-) hard problem. Traditional searching algorithms are not efficient to handle the high-dimensional search space. Evolutionary computing (EC) algorithms are prominent for their optimal global search competency [31]. Dimensionality reduction is a fact-based problem that determines two basic reasons including reduction of the feature space and to enlarge the accuracy of mining strategies which are demonstrated in Figure 4.

1.3. Heuristic and Metaheuristic Search Methods

Internet of things requires abstract data representation with relatively less number of features, which is a fundamental to data analysis and decision-making tasks. Optimization tools are necessarily a way to find an optimal solution driven by dynamic optimal parameters [32] in a live network like IoT. The dynamically attuned algorithms are applicable when optimization is multiobjective, e.g., maximizing a search optima parallel to another network optimization goal. In such a dynamic network environment, where the data streams are fluent, repititive, and continuous, optimization becomes a dynamic functional requirement and hence, the passive- and problem-specific algorithms do not offer sustainable search optimization for uninterrupted service delivery. The heuristic techniques are problem-dependent and implicitly deliver an approximate solution for a particular situation without an exact accuracy level. The combinatory and low-rank representations based on the heuristic search algorithm in [33] provide an evidence in favour of heuristic techniques but this is only applicable when the objective function is not dependent or followed by another global minima or maxima function.

The studies presented in [34, 35] propose optimization algorithms to get an optimal solution which also ensures the quality and efficiency of the solution with some proving statements.

Metaheuristics are problem-independent techniques that can be applied to a broad range of problems. A heuristic is, for example, choosing a random element for pivoting in Quicksort. A metaheuristic knows nothing about the problem it will be applied, but it can treat functions as black boxes. As a general distribution, the algorithms are defined by two representative categories: (i) deterministic and (ii) stochastic algorithms. In contrast to stochastic algorithms, deterministic techniques are linear, where initial variables control and determine the output with no random variables, hence, does not need to be adapted for random optimization problems. In stochastic algorithms, random output could be the end outcome of same or random input parameters depending on the triggered operations. The Stochastic algorithm can further be classified by two types of algorithms: evolutionary and metaheuristic algorithms. As given in [36], metaheuristics are naturally and biologically inspired algorithms, offering their applications in various global optimization and real-time problems. Some of these are ant colony optimization [37], particle swarm optimization (PSO), and state transition algorithms (STA) [38, 39]. In spite of the extensive optimum nature, these algorithms have some degree of randomness; it means they reduce the global search ability and easily fall into local optima. To address this, a mutated cuckoo search algorithm is proposed which establishes a solution space as a global function.

1.4. Cuckoo Search Optimization

Cuckoo search (CS) is a metaheuristic optimization technique, proposed in 2009 [40], inspired by some successful characteristics (e.g., breeding) of cuckoo’s biological behavior. The growth of this algorithm depends on two terminologies: randomization (random walk) and stochastic search. The algorithm begins to explore the local search space () for local optima. Moreover, the algorithm is not bounded by local optima; instead, it expands as the problem becomes global and offers a global optimal solution [41]. Though the originally proposed algorithm is tuned to have relatively less number of parameters and dedicatedly targets local optimization, but according to the revised and enhanced CS algorithms in [32], it can be tailored as a dynamic and global optimization algorithm to amend its performance boundary by adjusting the step size and parametric values.

1.4.1. Cuckoo’s Living Behavior

Cuckoo is an obligate and brood parasitic organism which depends on other host birds for their reproduction and to grow its offspring [42]. The cuckoo search (CS) is instigated by the influence of cuckoo’s genetic activities, e.g., foraging (search for food) [43]. Cuckoo lays its egg in the nest of host birds where eggs hatch and offspring seeks for host attention to get food [44]. Moreover, it also imitates some exterior attributes of host eggs. It is based on two approaches: exploration and exploitation. CS makes use of levy flights to generate a new solution. Cuckoo may throw the eggs of a host bird to raise the hatching probability of its own egg [45].

1.4.2. Algorithm Constraints

(a)Each cuckoo lays one egg at a time and pitches it in a randomly chosen nest(b)The nest with the best eggs will grow as a next generation(c)The number of nests is fixed. There is a probability that alien eggs can be identified by the host bird. If it happens, then the host bird either discards eggs or leaves the nest and builds a new one

1.4.3. Algorithm Formulation

Cuckoo selects a nest and dumps its egg into it that is owned by some host bird. The selection of nests depends on the random walk. The randomization is depicted by the foraging and flight behavior of a cuckoo. Each egg laid by a cuckoo represents a new solution . where is a nonlinear cuckoo search algorithm that maps existing IoT -dimensional vector with parameters to a relatively new vector . According to several observations, it is deduced that the natural flying pattern of a cuckoo and the characteristics of a levy distribution are quite similar. These random walks are not isotropic, i.e., vary in directions and magnitude.

1.4.4. Random walk

The formulation of the CS algorithm is an equilibrium consolidation of local and global random walk. Hence, it does not only optimize outdoor IoT data but can also converge to local optima when required. A random walk is a sequence of successive random or stochastic processes. where indicates levy flights and denotes sequenced multiplication for each new step, which is then added to the previous candidate solutions. In each new iteration, a solution is generated through levy flight, and steps for search are taken from the levy distribution bound to heavy-tailed distribution. As compared to the normal distribution, the heavy-tailed distribution is not exponentially bound, and most of the values during generation meet the criteria of the fitness value (objective function) [46]. A random walk is shown in Error! Reference source not found. For 10,000 steps taken at a time to choose a better position than the previous one, levy flight is preferable when the search space is exponentially unbounded and continuously expands in any dimension and size. Cuckoo search is highly recommendable because of its levy flights to handle network data for global optimization [47].

In information and communication technology, requirements for fast and self-organizing algorithms are indispensable when data is huge in amount and various data analytic activities that are initialized to improve network performance. Metaheuristic algorithms are one of the global optimization techniques that are designed to sort out current global optimization problems. Among several metaheuristics algorithms such as harmony search and bat algorithm, cuckoo search is a newly developed algorithm that can best fit the future smart IoT networks and their continuous outdoor data to provide valuable services with improved machine learning techniques [48]. Figure 5 shows a random walk graph with 10,000 steps starting from 0, where the -axis represents the time lapse , while the -axis shows the position of movement.

1.4.5. Parameter Tuning

The Cuckoo Search Strategy (CSS), in its originated form, is optimal for local search optimization but its algorithmic constraint can be tuned to broaden its applications for global search optimization problems. In order to achieve the best performance, a number of parameters in the cuckoo search algorithm need to be tuned, namely, the nest size, the elitism probability (probabilistic selection of fittest candidates), and the repetition [49]. Similar to many nature-inspired algorithms, the algorithm starts with random parameters. On each step of the iteration, the parameters are tuned with varying step size. The selection of the step size is important to convergence or divergence of the algorithm. Based on different applications, the step size can be increased or decreased for speedy convergence or performance requirements.

1.4.6. Efficiency

Cuckoo search is a metaheuristic algorithm which is nature-inspired and is now among the most widely used algorithms for optimization. It has many advantages over conventional algorithms due to the inherent randomness in its approach. Metaheuristic algorithms are very diverse, including genetic algorithms, simulated annealing, differential evolution, ant and bee algorithms, bat algorithm, particle swarm optimization, harmony search, firefly algorithm, and cuckoo search [50]. These algorithms are nature-inspired and work without any central computing paradigm. Most of the parameters are tuned with neighbouring nodes interacting with each other. The interconnection among peers makes them as nonexponentially complex.

1.4.7. Limitations

The performance of the cuckoo search algorithm is compromised if the problem is discrete and multiobjective though it performs well for continuous optimization problems. Therefore, the algorithm has limited scope when processing some real-time problems; it demands further study to overcome its limitations. Other than the continuous problems, there has been much development in terms of the step size, parameter adjustment, intercoupling with other algorithms, and other factors used to improve the performance-related markers. Meanwhile, this algorithm also has problems in adaptability and getting the best possible search results, and its algorithmic ability to solve complex problems is inadequate for real-world applications. Future research should be focused towards studying and exploration of new methods and strategies to improve high coupling functions between variables [51].

1.5. Motivation

In wireless sensor networks and IoT-based system, data is generated in enormous volume. This enormous volume is cumbersome to analyze for any fruitful analysis of data. We have to clean this data in any of the phases before the analysis of results is generated from data. Sometimes, this huge volume of data is cleaned during the nonoperational time of data processing, and existing data is updated with removed redundant data. This is also possible that data is cleaned before any analytical processing during runtime [52]. Both approaches have its own pros and cons. With reduced dimensions on scientific and mathematical basis, the data is safe with less volume. The reduced amount of data based on less dimension is always a hot topic of research with diverse implementation of IoT-based systems [53].

The tricky difference between selection and reduction is compromises on selecting the features required during run time execution in feature selection and dropping the undesired and unimportant features during the data cleansing phase. In many real-time applications, searching the huge amount of data with the high-dimensional search space is not practically feasible. The existence of unimportant features causes interferences due to redundancy, irrelevancy, and triviality of the data search space. Many of the evolutionary algorithms lack this inherent attribute for accurate selection of features for deletion and reduction of the attributes. The results in the paper show that cuckoo search outperforms many of the existing and applied algorithms for dimensionality reduction phenomenon [54].

2. Proposed Technique and Implementation

In any IoT network, data from the physical world is highly nonlinear as its environment changes dynamically depending on local or global activities. So, to manage data, preprocessing to mining and decision-making purification is needed for accurate classification and reduced cost. Objects can leave or join the network from time to time, which will need to restart the mining algorithms to deal with immediate and abrupt changes. To reduce the huge search space, selection of reduced attribute subset, and minimizing the cost of mining algorithms, feature extraction (FE) can be an essential strategy to enhance the performance of classifiers and other mining techniques. Moreover, novel algorithms are needed to overcome shortcomings of traditional approaches [55].

2.1. IoT Vectors and Dimensions

The exhaustive search space consisting of enormous dimensions is reduced through feature extraction before any further tasks relevant to data analytics involving data mining techniques. The resultant moderated vectors will increase prediction accuracy with least complexity and cost.

2.1.1. Problem Description

Data gathered by sensors is stored in a database as a combination of rows and columns. Each row represents a distinct vector or dataset and each column with its corresponding dimensions . A schema for IoT databases including dimensions and vectors is given in Table 1.

Suppose all distinctive objects in IoT are represented in vector set where each vector stores data of a single object with several dimensions (attributes) , then the number of vectors in is equal to the number of tuples in the database, so that A set of vectors can be demonstrated as a column matrix given below: where each vector can be represented along with its dimensions as a single row matrix.

If we replace each vector with its dimensions, then we get a matrix.

With respect to vectorand its dimensions, transforms the existing features that belong to the original dimension space into reduced and transformed vector and space such that

Before we formulate the algorithm for feature extraction, a brief description for used terms is given in Table 2.

2.2. Feature Extraction

The component is an original vectors’ set of dimension , and is a distinct untransformed vector of dimension . We can illustrate as a transformed vector of dimension such that . The task of feature extraction is divided into the following steps: (i)Feature construction (FC) or feature transformation (FT)(ii)Features’ subset selection (searching technique)(iii)Result efficiency

Depending on these tasks the algorithm for feature extraction can be formulated to maintain a general procedure for meaningful and efficient feature extraction. An algorithm for extraction of high-quality features from the original space is given below.

Input initial vectors setwith original features
For () do
   Construct new featuresfor each(transformation)
   Update each vectorand add it to new vectors set
        
Select subset of constructed features fromthroughCuckoo Search
Generate optimized vectors setwith reduced dimensions
End Procedure_1
2.3. Feature Construction

The feature extraction is a technique for transformation of a vast range of features or dimensions into a reduced set of features for various data analytics tasks. Transformation of original features needs some parameters and technique to construct new features from the existing one. These techniques can vary depending on the type of attributes to be constructed. In the following sections, algorithms are presented for nominal and numeric attributes. It will reduce dimensionality of data to enhance search optimization for any machine learning task, e.g., classification and data mining to target the suitable data for further processing in IoT networks. A vector with comprehensive and well-constructed attributes can benefit to achieve high prediction accuracy. Construction of nominal and numeric attributes required different operators and operations to identify hidden information which can be beneficial to data analytics for decision-making [56].

2.3.1. Algorithm for Numeric Attributes

The IoT database includes various dimensions, and each dimension can be represented as a numeric value or combination of characters and strings. In both cases, a set of operators are changed to construct features accordingly. Algorithm given below specifies the steps to construct new features for numeric attributes based on arithmetic operators (+,/, -). Figure 6 gives the tree representation for feature construction with various mathematical operations performed for dimensions . Here, the selection of the operators depends on the problem and desired outcome [57].

IF (attributesattributes) Then
  Go to procedure_2 for Nominal Attributes
Else Start Procedure _1 for Numeric Attributes
Input vector, a set of nominal attributes
single feature in each vector
, sequence of n tuples
, set of newly constructed features
set of arithmetic operators
For eachdo
   (Prevent duplication of a feature)
  For
    For
      For
        
            
End Procedure_2

Algorithm starts with initial input vectors , where each distinctive vector is a collection of characters or strings. First loop selects dimensions of an untransformed vector and copies each feature to a new vector. After construction of selected attributes, original attributes are discarded to avoid duplication of similar attributes. Internal two loops choose attributes from a new vector and select operators from a list of arithmetic operators. Last loop selects each vector one by one and adds all constructed features from for each vector .

2.3.2. Algorithm for Nominal Attributes

Other than arithmetic operations, concatenation of strings or characters is used to generate new features, if the type of attributes is not numeric. Algorithm for construction of nominal attributes is described below.

IF (attributesattributes) Then
  Go to procedure_1 for Nominal Attributes
Else Start Procedure_2 for Numeric Attributes
Input vector, a set of nominal attributes
single feature in each vector
, sequence of n tuples
, set of newly constructed features
(): concatenate features in
For eachdo
   (Prevent duplication of a feature)
  For
    For
      
        
End Procedure_3

Nominal attributes are concatenated to construct new features by making various pairs among all fields in a dataset. First loop copies values from the original vector to a new vector .

Last two loops select each dimension from a new array and select each vector sequentially to construct features for all vectors . Figure 7 gives the Heaviside function a representing dimension scaling factor at time .

The local random walk for local optimum solution is isotropic and can be represented as follows: where and represent two distinctive vectors with the d dimension at time . is a scaling factor for size transformation to control the search space in IoT.

Here, is taken from a uniform distribution, and each step in levy flight is taken from a heavy-tailed distribution. The levy distribution based on the heavy-tailed distribution increases the probability for selection of each dimension in vector .

2.4. Mutated Cuckoo Search=Based Feature Extraction (CSFE) Algorithm

The improved version of CS is used to extract old and new dimensions along with reduction of overall existing dimensions. Algorithm starts with the step of constructing new features based on original input vectors and selects enhanced attributes for each vector that is given in Table 3 [58].

The maximum optimization is achieved when in each iteration dimensions at extensive distance are chosen to identify how they are compatible to each other in one vector . This task is based on the dimension section (DS) performed according to the rule inspired by cuckoos’ strategy of laying eggs in habitat. The DS for each new transformed vector can be calculated as

=.

Algorithm input parameters are as follows:

Dataset environment:

Number of datasets in :

Discarding probability:

Scaling factor:

Number of dimensions:

Number of iterations:

Output: globally optimized

Auxiliary parameters are as follows: fitness vector with dimensions , global fitness , and local fitness .

The algorithm for the cuckoo search-based feature extraction is provided in the following section where the objective function is chosen according to the cited problem of outdoor IoT data. The algorithm will generate a feature subset in each iteration and continue this procedure until an optimized cost or performance is achieved.

IF (Attributesttributes) Then
Go to Procedure_1 for Numeric Attributes
Else Start Procedure_2 for Numeric Attributes
Start Procedure_3 for Optimal Subset Selection
Global Fit==
Local Fitness == max_ fitness
Initiate Population ofvectors in
For each vector
  Calculate fitness for current vector
  For each dimensionin
       While (Tn) at time instance t, do
          Findthrough levy flight forin which
              
         Compute fitness for:
         If (Then
             Max_ fitness = discard worst vector
         Else max_ fitness = discard worst vector
          Set best fitted vectoras a new reduced vector to
         IfThen
           Setas a new solution with reduced dimensions
          Else build new vectors to get required fitness
2.4.1. End Procedure_CSFE

The mutated cuckoo search-based feature extraction (CSFE) algorithm includes three procedures demonstrated as procedure_1 for construction of numeric attributes, procedure_2 for nominal attributes construction, and last is procedure_3 for apply global selection strategy onto constructed features to find more appropriate features which describe each vector to improve prediction exactness. Procedure_3 starts with two fitness functions for global optimization and to evaluate each vector for local optimization, initially sets the existing vectors ) as an input to procedure_3. First loop calculates fitness for original vectors sequentially before selection through CS. The next loops select each dimension from at time until it is not equal to the length of tuples in original vectors space and select dimensions randomly through levy flight and generate a new vector including optimal dimensions . In the next step, fitness for new vector is calculated. If the fitness of the reduced vector is maximum than the existing one, then the replacement is conducted and abandons the worst vector, assembles all best fitted vectors to , and finds fitness to compare with the existing one and replace if necessary. Last, IF statement finds efficiency for newly built vector space to decide whether it should be discarded or placed for further processing.

Input to the algorithm is the original feature set, and construction of features is performed according to the identified type of input features (numeric or nominal). After the construction, updated space is relocated to cuckoo search for selection of appropriate features through random walk and levy flight. The output is a new and enhanced feature subset for each vector .

2.5. Dataset Generation

The internet of things has evolved as a preeminent and exquisite source to provide valuable services to consumers, business analysts, and industries in their daily professional and personal lives where things can connect themselves to the internet and serve without any delay. Despite this rapid evolution, familiarity and adeptness to the IoT network are quite gradual. Only few highly recognizable industries are providing valued services to their consumers. In literature, most of the work regarding IoT only demonstrates the fundamental concepts and architectural aspects. In the current era when technological advancement is more beyond than the internet, the real-time networks are facing challenges to accommodate continuous and abrupt amount of collected data for various mining and machine learning tasks to accomplish the goal of smart and intelligent networks with self-continuation ability without human intervention.

The internet of things maintains data collected by the entities that are part of it. These entities are computing devices (scanner, thermostat) that can communicate over the internet to share their information and services. These smart objects are distinct, and the EPC of each object can be stored as a primary key in the database to maintain its record. The organizations with advancement of IoT are not willing to share their private data publicly for security and confidentiality. That is why in literature and over the internet dataset relevant to IoT networks are quite unavailable. To apply proposed solutions (CSFE) for dimensionality reduction onto outdoor IoT data, an appropriate and relevant dataset is required. It will facilitate to produce appropriate results through implementation of suggested technique in MATLAB for reduction of the overall extensive search space and feature space.

IoTify is a web-based platform for simulation to develop IoT applications by using virtual hardware devices, e.g., sensors. It facilitates a virtual lab and enables the creation and building of virtual IoT devices in JavaScript. The IoTify database is generated using JavaScript object naming (JSON) with extension JSON. Table 4 demonstrates the dataset that is used to accomplish results and for analysis of proposed algorithms (CSFE). IoT-based devices can extract specific and required facts from a patient’s blood and will share the generated report to the doctor when an alarming situation arises. Figure 8 outlies a flowchart for the mutated cuckoo search-based feature extraction including procedures for nominal and numerical attributes with generation of final reduced feature subset.

Reduction of original dimensions for outdoor IoT data is performed through the task of feature extraction. Here, the task of feature extraction as subtasks of feature construction and selection from newly constructed features is introduced. Selection from the extensive new search space is done using the cuckoo search-based optimization technique. To evaluate the results for suggested research techniques, an IoT-based dataset is used. This technique will lessen the exhaustive search space and generate a new organized search space that will improve the accuracy of machine learning tasks or mining classifiers.

2.6. Algorithm Result Analysis and Visualization

The mutated cuckoo search-based feature extraction is the proposed algorithm implemented in MATLAB, and results are visualized with graphs, plots, distributions, and statistical operation (mean, minimum, and maximum). Figure 9 represents the plots for newly generated space including rows and reduced dimensions. Plots are relatively at distance and scattered that indicates that dimensions are chosen from the extensive search space. Subplot shows the number of iterations for random generations, and residuals are calculated for each dimension to check its weightage for selection. The dimension selection is constructed through levy flight and step size scaling factor , and the steps are chosen randomly from the levy distribution.

Selection through CSFE searches through the extensive search space and. Search is exponentially increasing as new objects enter the IoT and need an algorithm to modify itself to adjust for immediate changes, since data generated by IoT is continuous and needs a global optimization solution to enhance network efficiency.

Figure 10 shows the heavy-tailed distribution for CSFE produced for the input dataset of IoT-based Patients’ CBC results. The highest peak represents the global optimization solution for IoT data as tail is exponentially increasing and consistently provides best fitness. The best selection is estimated trough DS and coherence estimation factor. Area under the curve is not exponentially bounded, that means that as data becomes extinct, it increases the number of fitness values more close to best optimum. In Figure 11, the size of bars shows that most of the fitness values were globally best.

The term “Internet of Things” is considered to represent innovation that relies on both the resulting network by the integration of smart objects along with developed internet technologies and a variety of supporting devices, equipment, and machines that are important to ensure this technological evolution. Applications and services are developed to take advantage of these technologies for new business trends and offering daily life conveniences. IoT is an infrastructure based on networked smart objects and integrated networks as a supplement to internet services by ensuring availability for all kinds of services anytime and anywhere to anyone. It is emerging as a trend in which most of the objects in our surroundings will be on network in various forms. This shifts from conventional internet approaches to the internet for connecting physical objects that interact with each other and humans. These kinds of technologies are producing immense amounts of data, and it becomes critical when analytics and machine learning techniques are applied to make them intelligent with self-organizing capabilities.

2.7. Performance Comparison and Evaluation

Many standards are given in literature to test the efficiency, modality, or validity of any new optimization algorithm. After implementation of the CSFE algorithm, results are analyzed through global optimization test functions.

2.7.1. Rastrigin’s Function

Rastrigin’s function is a nonlinear optimization function introduced by Rastrigin as a 2-dimensional and extended by Mühlenbein et al. For a dimensional space, this function can be illustrated as where and range for this function are . Figure 12 shows results for Rastrigin’s function for the dimensional space with individual dimensions . As compared to the cuckoo search-based feature extraction, the scattered plots show that the dimension space is still extensive, and each selected dimension is similar to previous selected which is not good to represent the whole data.

2.7.2. Mccormick’s Function

Mccormick’s function is a benchmark to test an optimization algorithm. It is defined as given below: where and are vector number and dimension number sequentially. Searching range function is and .

2.7.3. Cross-in-Tray Function

The cross-in-tray function is a continuous and multimodal test standard based on two-dimensional space initially and extended later on. The equation for this function takes the following form: where is a dimensional space and domain range for the cross-in-tray function that is .

2.7.4. Rosenbrock Function

The Rosenbrock function is a nonlinear benchmark to test the performance of optimization problems, introduced by Howard H. Rosenbrock in 1960. It is also termed as Rosenbrock’s Valley or Rosenbrock’s banana function.

The mathematical definition for Rosenbrock is mentioned below:

subjected to

Range for the Rosenbrock function is and. Figure 13 represents the fitness performance for these functions. Local maximum and global minimum for Rosenbrock are shown in Figure 14.

2.7.5. Easom Function

Easom is a multimodal and nonscalable test function to find the global minimum for a search space. It is defined as a following mathematical equation:

Search domain for the Eason function is . These test functions are used for the comparison of CSFE with particle swarm optimization and harmony search optimization algorithms.

Figure 15 displays the fitness curve for both local and global optimization. For generations, global fitness is achieved at the early stage that indicates that the running time for the cuckoo search-based feature extraction will be minimal. Figure 16 provides a graph for the best cost value through harmony search (HS). Minimum cost for the firefly algorithm (FFA) is shown in Figure 17. As compared to HS, PSO, and FFA, CSFE gives the minimum cost value in minimum iteration and less elapsed time.

Table 5 provides an overview for the assessment of CSFE against PSO and HS. Performance is measured according to the minimum cost and elapsed time corresponding to each algorithm for maximum generations to calculate the best fitness value (cost). Here, CSFE is compared with few global optimization techniques. Particle swarm optimization introduced by Kennedy and Eberhart in 1995 provides best mutation results but it is not suitable for complex tasks as it is slow due to its complex structure and mutation.

As compared to CSFE, the peak and tail for the normal distribution are narrow and exponentially bounded from which the fitness values for PSO are generated. It means that local optimization points available through PSO are relatively rare. All global optimization techniques can provide the best possible solution for continuous data generated by the internet of things for a given problem of interest. Algorithms other than CSFE took more time to run and provide minimum cost in more number of iterations.

Another optimization technique is the firefly algorithm introduced by Xin-She Yang in 2008 inspired by the flashing behavior of fireflies. The random numbers for FA are drawn from the uniform distribution with constant probability. Because of constant probability, this optimization technique is not appropriate for continuous and multiobjective optimization problems. In Table 6, PSO and CSFE are compared after 1000 runs for abovementioned test functions. Figure 18 shows the normal distribution for particle swarm optimization, where maximum height of tail shows that only best values are limited to this small area.

After the evolution of PSO and its limitations, harmony search (HS) was developed by Geem et al. in 2001 based on the concepts of music composition. In Table 7, mutated CSFE is compared with HS. After implementing CSFE and harmony search in MATLAB, the algorithms are compared according to their results generated by almost 1000 iterations.

After comparison, it is analyzed that CSFE has given more accuracy and global fitness for few mentioned test functions. Performance for both algorithms is measured corresponding to each test function as a pair of mean and standard deviation while the accuracy rate is given as percentage. CSFE has given more success rate as compared to PSO and HS due to randomization and exploration. CSFE can converge to a global maximum state when required.

In literature, many test benchmarks are introduced to evaluate the performance of any new optimization technique. In this section, few test functions have been used to assess the functionality of the proposed cuckoo search-based feature extraction technique. CSFE is evaluated individually, and performance comparison is established using few global optimization techniques. At last, the accuracy rate delivered by CSFE is more stable and consistent than PSO and HS for global optimization.

2.8. Contribution

The major contribution of this paper is analysis of different aspects regarding dimensionality reduction and discussion of evolutionary approach with a special focus on cuckoo search algorithms. We have given the detailed discussion on existing dimensionality reduction techniques outlining feature selection and extraction. A comparison of heuristic and metaheuristic search methods are described in this paper. Cuckoo search is an optimization technique which is widely used in resource allocations in operations research, and here, we have used it for finding the attributes in data which may be dropped without affecting the meaning and information coherently in the database. The cuckoo living behavior is analyzed with reference to our own problem formulation. This algorithm also possesses some inherent features which limits its working for dimensionality reduction scenario in IoT and discussed in this paper. An algorithm corresponding to the problem of dimensionality reduction in internet of things scenario is formulated to further investigate the performance measures of the cuckoo search optimization algorithm.

We have transformed our problem into IoT vectors having distinct dimensions and explained the feature selection phenomenon. An algorithm is used for feature construction with numeric and nominal attributes. We have specially introduced mutated cuckoo search-based feature extraction algorithms to work with the generated dataset. We have analyzed the result of application of the cuckoo search algorithm on dimensionality reduction and compared its performance. This cuckoo search optimization algorithm proved to be very effective in feature selection and dimensionality reduction techniques and can be used in similar kinds of future applications.

3. Recommendation and Future Work

In information and communication technology, a number of innovative trends have emerged to facilitate humans, businesses, and industries with improved and efficient services. These next generation technologies manifest new challenges and complexities. Homogenize objects, wireless, and sensor networks, addressing schemes, and visualization build a multiplex structure of IoT. Data storage and analytics is one of the most important elements that formulate the network and emphasize dealing with unpredictable amounts of raw data collected by smart objects. Requirement for cost, time, and energy is directly proportional to an incredibly increasing amount of data. In the coming era, scientists are introducing “Green Computing Devices and Networks” with reduced cost, least time, and minimum energy resources.

3.1. Energy Proficient Algorithm for Green Internet of Things (GIoTs)

The fundamental aim for IoT is to empower the smart world without greenhouse influences. To interact with real-world objects, these kinds of networks are equipped with numerous sensors, protocols, and communication technologies with high amounts of energy, sufficient cost, and complexity. Efficient algorithms are required for IoT services and applications to reduce the existing greenhouse effects or to build a new with minimum energy consumption. The proposed technique can be used to get maximum accuracy for any machine learning task with minimum cost without complex computation.

Cuckoo search-based techniques are suitable to build future GIoT with maximum accuracy and lower complexity. Cuckoo search is a global optimization technique and provides global maximum solutions for real-time systems (IoT) who generate continuous and high amounts of data with massive dimensionalities. The reason for this recommendation is that CS does not implicate a lot of mathematical computation which will definitely decrease the overall complexity of the system.

3.2. Future Internet of Things for Patient’s Monitoring

In medical scenarios, patients are monitored manually, e.g., patient’s history, current disease, and their daily health report. Individual files with distinct patient numbers are maintained including some health parameters such as heart rate, blood pressure temperature, and blood samples. These records are assessed by the concerned doctor for further treatment. Instead of all these manual procedures, a smart health monitoring IoT device can perform these actions smartly without extensive human intervention.

The architecture in Figure 19 demonstrates an abstract layout for a complete health care system (CHCS) based on an intelligent IoT device to monitor health of patients. The patient’s record managed and handled by smart IoT devices is available to concerned doctors and for users as well. A situation handled by an IoT device for patient’s monitoring can be a complete blood count (CBC) report of the patient. A continuous blood report will be generated by this intelligent IoT device and whenever there is an alarming situation some action would be triggered.

In spite of tremendous efforts made regarding ICT, there is a need to execute the emerging and evolutionary trends without negative environmental effects to compensate for the increasing amount of data with a smaller amount of energy and computations. Indispensable measurements are required to minimize negative technological effects on the health and society. It is concluded that the mutated cuckoo search-based feature extraction can be an advantageous approach towards the recent internet of things and for future green internet of things as well. Moreover, it can be utilized to enhance the performance for future IoT devices for uninterrupted monitoring of patients. As a brief description, cuckoo search optimization is a metaheuristic approach and applicable to situations where the system is in the local state or will grow up towards a global phenomenon in future.

4. Conclusion

Although IoT networks have emerged and performing exacting tasks competitively but to stable their performance, enhancement for future challenges is mandatory. It becomes possible when networks are not only fast with preeminent servers but also smart enough to cope with unpredictable circumstances. Introducing efficient and global optimization algorithms can help to achieve this target. In this research, a metaheuristic global optimization algorithm is established to reduce dimensions of outdoor data for IoT. The cuckoo search-based feature extraction is a mutated algorithm that organizes itself according to the unpredictable amount of data and produces a new and an enhanced feature space. The newly generated feature space and proposed algorithm benefit for improving the accuracy for classifiers and mining algorithms. It also makes the algorithm computationally feasible, flexible, and efficient for obtaining the target of convergence. This mutated algorithm is further generalizable to IoT indoor activities as the need of near future. The scenario to train the IoT network for future challenges and increase sphere of knowledge is also discussed. Among all abovementioned facts, CSFE can perform all activities with minimum cost and less time as evaluated. This algorithm can be further improved for multiobjective optimization problems. It can provide tremendous support to build an IoT-based smart world with no negative impacts and minimum resources.

Data Availability

Dataset is generated through IoTify. This is a web-based platform for simulation to develop IoT application by using virtual hardware devices, e.g., sensors. It facilitates like a virtual lab and enables to create a virtual IoT device in JavaScript. The IoTify database is generated using JavaScript object naming (JSON) with extension.

Conflicts of Interest

The authors declare that they have no conflicts of interest.