Abstract

Increasing the growth of big data, particularly in healthcare-Internet of Things (IoT) and biomedical classes, tends to help patients by identifying the disease early through methods for the analysis of medical data. Hence, nanotechnology-based IOT biosensors play a significant role in the medical field. Problem. However, the consistency continues to decrease where missing data occurs in such medical data from nanotechnology-based IOT biosensors. Furthermore, each region has its own special features, which further lowers the accuracy of prediction. The proposed model initially reconstructs lost or partial data in order to address the challenge of handling the medical data structures with incomplete data. Methods. An adaptive architecture is proposed to enhance the computing capabilities to predict the disease automatically. The medical databases are managed by unpredictable environments. This optimized paradigm for diagnosis produces the fuzzy, genetically categorized decision tree algorithm. This work uses a normalized classifier namely fuzzy-based decision tree (FDT) algorithm for classifying the data collected via nanotechnology-based IOT biosensors, and this helps in the identification of nondeterministic instances from unstructured datasets relating to the medical diagnosis. The FDT algorithm is further enhanced by using genetic algorithms for effective classification of instances. Finally, the proposed system uses two larger datasets to verify the predictive precision. In order to describe a fuzzy decision tree algorithm based upon the fitness function value, a modified decision classification rule is used. The structure and unstructured databases are configured for processing. Results and Conclusions. This evaluation of test patterns helps to track the efficiency of FDT with optimized rules during the training and testing stages. The proposed method is validated against nanotechnology-based IOT biosensors data in terms of accuracy, sensitivity, specificity, and -measure. The results of the simulation show that the proposed method achieves a higher rate of accuracy than the other methods. Other metrics relating to the model with and without feature selection show an improved sensitivity, specificity, and -measure rate than the existing methods.

1. Introduction

As medical knowledge grows, the electronic health record (EHR) subsequently grows dramatically. The COVID-19 prediction has become a major factor in big data analytics as data increases in size. Classification algorithms were developed to improve medical diagnosis accuracy [1]. The classification method in big data analysis leads to the classification of datasets according to the diagnostic application with machine learning algorithms.

Efficient techniques are employed in large data analytics to find insights, correlations, and cached patterns from input data collected from the nanotechnology-based IoT biosensors. The data analytics provides improved decision-making, cost reductions, and development of new items to meet the customer requirements. Hence, it addresses the challenges of various applications, such as health care, plants, and bioinformatics, with wide advantages [2]. The problems are addressed via machine learning strategies that include rule-based and decision-making. Most classification algorithms only take structured data into account. In the processing of unstructured data, structured and unstructured information is generally combined [3, 4] to reduce the disease-prediction risk. The combination of the information eases the cost of processing and reduces the redundant information.

Artificial Intelligence (AI) is a troublesome technology used as smarter technology on wide varied applications, ranging from automobile industry to healthcare industry. AI was additionally used to track the virus spread, to identify the patients with high risk areas and antiviral drug in controlling the pandemic in real-time environment. AI predicts the risk of mortality rate by analyzing the patient datasets. This application of AI can help in screening the population, notification, medical help, and suggestion on infection control. It further assists in treatment, planning, and prediction of disease spread and outcomes of patients using AI evidence-based tool.

AI is a powerful intelligence method in the fighting the pandemic, and it has scrambled the AI on healthcare analytics. AI with predefined datasets can predict and track the infectious spread on timely manner across various regions. The challenges include problems associated with forecasting the pandemic over unbiased and historical data for training the AI. It includes panic activities among humans and the statistics difference from existing pandemics (Spanish flu, H1N1 influenza, and AIDS). The lack of proper datasets and big data is considered problematic in finding the infectious spread.

Therefore, the classification of unstructured data using classification algorithms is important to classify separately. The risk of disease prediction based on qualified classifiers is reduced in this manner. Structured data treatment method is proposed in the [5] for unstructured medical image data. Integrated structure systems [6] for medical text documents are structured using a Bayesian classifier for extracting the attributes. In addition, -means identifies the data and ensures optimum data classification. The search method [7, 8] is used to classify the connections through which unstructured medical data is organized. This proposed technique produces improved accuracy results than the other techniques based on the SVM [913]. Some of the methods are listed in Table 1.

The above-mentioned methodology cannot identify medical datasets through a rule-based system. The processing of datasets from the input nanotechnology IoT sensitive biosensors [1416] was done using a rule-based method to minimize redundant data. The rule-based framework with its rule base and systems unregulated data removes redundant data. In order to improve the risk of accuracy in classification, the rule set is needed.

In this paper, we propose a fuzzy decision tree (FDT) method for classification, thereby enhancing the novelty using the genetic algorithm (GA) to improve decision-making on a rule-based basis in broad unstructured datasets from nanotechnology-based IoT biosensors.

The main contribution of the paper is as follows: (i)The work uses a normalized classifier namely the fuzzy-based decision tree (FDT) algorithm to identify the nondetermined instances relating to the medical diagnosis due to the unstructured nature of the datasets from nanotechnology-based IoT biosensors(ii)The genetic algorithm(GA) is used to improve the FDT algorithm’s classification rule collection(iii)The evaluation of test patterns helps to track the efficiency of FDT with optimized rules during the training and testing stages(iv)Finally, the proposed system uses two larger datasets to verify the likelihood of predictive precision.

In this proposed work, Section 2 provides the concept of the article. Section 3 discusses the FDT with GA to design the predictive problem. Section 4 validates the entire work. Section 5 concludes the paper with possible directions of future scope.

2. Basic Concept

This section provides the basics of the hesitant fuzzy algorithm (HFA) that eliminates the hesitations associated with fuzzy set assignment and membership degree to process the data from nanotechnology-based IoT biosensors. The following provides the HFA preliminaries:

For a fuzzy set with reference set () is generally represented in terms of a function , where produces a subset , when is applied over the : where the membership degree of is defined as different values set . Hence, for simplicity, the is generally referred to as a fuzzy set element.

3. Fuzzy Decision Tree (FDT)

In this section, we provide FDT details and how the genetic algorithm provides the optimized rule for FDT as illustrated in Figure 1.

3.1. Data Balancing

Unbalanced datasets collected from nanotechnology-based IoT biosensors. The former model tends to reduce the high-dimensional samples and do not take useful information from the account. Samples for the small class can be oversized by the latter procedure. At first, -means collects the samples from several classes in various clusters. A number of pseudoclasses were marked or numerated for the classes of balanced dataset collected from nanotechnology-based IoT biosensors.

3.2. Construction of FDT

Instances with differential membership are permitted to use FDT from to multiple branches. Using fuzzy rules, the node conditions of a branch are specified. The cases are degraded by different membership levels as they fall at different nodes. If the information or noise is incomplete, the downfall of the cases is considered beneficial. However, FDT is slow to use, but the ranking is better than an ordinary tree.

The FDT consists of construction of tree and nodes for optimal decision-making. FDT is a fuzzy logic algorithm, which uses language terms to change the attributes of the data on medical training. The knowledge gained is used for attribute evaluation on the connected node. It also uses a fuzzy dataset that includes membership, input, and target attributes. The child node set includes all instances of parent nodes that delete branch attributes. Furthermore, in all cases, the main distinction occurs in the fuzzy membership.

Consider an input preprocessed dataset () collected from collected from nanotechnology-based IoT biosensors with an attribute (), where the study uses . The membership degree of is given by where is defined as the membership degree of , is defined as the membership degree of with a fuzzy term , and is defined as the child node.

The algorithm takes the branch attribute into consideration, based on Figure 2, based on the maximum data gain fuzzy value indicating the fuzzy entropy.

Information gain is given by where is defined as the entropy function of , is defined as the child node () entropy, and is defined as the child node () instances and it is given as follows:

Inputs: membership function, training data, threshold value.
  Membership function is set as unity.
  Generate root node using fuzzy set.
  For a node ().
    Check if the end criteria is reached,
      Assume the existence of a leaf .
       Mark the record N belongs to a class is labelled.
    Else if end criteria is not reached, then,
      Estimate the .
      Estimate maximum .
      Find child nodes.
    End.
  End

The above algorithm illustrates the FDT implementation process. The implementation of the FDT uses a stopping criterion. The standardised maximum method is used as a stopover criterion.

The FDT builds the decision tree using discrete procedure in which a fuzzy system is specified for a certain attribute :

Finally, the over an attribute on a dataset () is given by

Here, Equation (7) is computed using two energy values: (i) merging and (ii) uniform discretization.

is defined as the predicted drop of entropy due to an attribute .

is defined as the predicted drop in entropy (discretization merge) due to an attribute .

is defined as the predicted drop in entropy due to an attribute (uniform frequency).

The generated intervals of parameter are considered the same and represent a controlled process. In this node, the fuzzy discretization method based on the information gain is selected using discrete methods.

3.3. FDT Inference Engine

The decision tree is considered a rule in the form of leaves. This condition contributes to the combined history and is classified as a leaf. The rule set is regarded as consistent when a single classification is performed between the leaves. The main significant for a rule is also known as consistent training information and a set of appropriate characteristics. However, when the fuzzy set is inconsistent, the nonnull membership function results from its fuzzy value over a single fuzzy set.

The FDT is then transformed into a fuzzy decision trees in Figure 2. The fuzzy rule fits every leaf. The approximate reasoning operates under four different categories that include (i) firing strength, (ii) compatibility degree, (iii) certainty degree, and (iv) overall output.

3.4. Membership Function Generation

In this paper, two discretizing methods are used for the cutting points and triangular membership functions. The SD membership feature transforms the left/right functions into trapezoidal ones, where the left and right median values are the same. Finally, both discretization functions generate and build up a membership function.

4. Rule Optimization Using GA

The decision-making with fuzzy cannot change the membership rules in order to obtain maximal results. As a result, the genetic algorithm is used as a primary factor to optimize rules using FDT output optimization.

The FDT will be used first in this article to generate classification rules. The GA builds the fitness function based on the advantages of classification and precision. The rule will be optimized if the genetic value is greater and vice versa. To optimize fitness with the crossover and mutation function, the fitness function is modified. With this change, the rules get simplified.

4.1. Coding for the Rule

GA uses the binary code, which is a fixed-long bitcode used in strings with the symbol as the human symbol. The encoding length is determined by the attributes value which affects the various GA bitcodes. If -means values in an attribute, they are distributed into bits having a value. The chromosomes are long and easier to convert in GA.

The key disadvantage of the FDT tree is the absence of discrete and numerical characteristics. A number of secondary steps must therefore be taken in conjunction with the binary code. The chromosome is set by the law of classification of instances. As the problem can be resolved by some chromosomes, its consistency would determine the rule set. If the rule set recognises a new sample, the GA selects the best rule, and GA selects next rule if the rules set does not recognise a new sample. Hence, if the rules do not recognise the new model, then the GA classifies this instance as a default situation. There are a number of regulations based on its genetic priority; the chromosome is competitive with other chromosomes.

The proposed model provides the details of the genes in terms of fixed length and chromosomes: (i)Weight. Boolean variable of an attribute(ii)Value. Attribute value: continuous or discrete(iii)Operator. Genes conjunction: continuous or discrete(iv)Gain ratio. Information gain ().

The rules are thus obtained with the four characteristic with a fixed and variable length of chromosome.

4.2. FDT Optimization Using GA

When rules are created by FDT over a subset with few rules on attaining a higher classification rate, the optimization process is applied. The aim is, therefore, to improve the accuracy of FDT by reducing the fitness function. The fitness function at GA tests the rule consistency. The fitness function is divided into four classes based on the rules established: (i)Class_A predicts true value as true and false value as true(ii)Class_B predicts true value as true and false value as fault(iii)Class_C predicts true value as fault and false value as true(iv)Class_D predicts true value as fault and false value as fault.

Therefore, the accuracy defines the fitness function: where is defined as the true data sample and is defined as the fault data sample.

The precision is capable of producing the correct results for classification:

The rule is larger if the value of the support is higher in dataset and the fitness is estimated as where is defined as the total attributes and is defined as the total attributes in a rule.

The rule is easy to understand if the individual fitness is high. Finally, it is calculated by the maximum function as well: where , , , and are defined as the variable weights lying in and

4.2.1. Crossover and Mutation Operations

The sample dataset collected from nanotechnology-based IoT biosensors is coded using code rules, creating new successful individuals. Search space will then be reduced significantly, and processing speed will be increased.

This paper uses a 2-point crossover and generates an interval of random numbers . The parents are selected randomly for crossover operation if the chance of a crossover exceeds the random number. In the same way, a random number is generated by an interval , resulting in a mutation that exceeds a random number.

The genes consist of four components that must be effectively designed. It is a three-way transmission and one operator benefit ratio, which is shown in Figures 35, which is as follows: (i)Operator mutation. If the original gene attributes are changed, the gene will mutate into the gene and vice versa(ii)Weight mutation. Count the weight of the new gene to be zero, and vice versa. The gene attribute does not occur in this article if the weight changes from one to zero(iii)Value mutation. In the case of a different attribute, the initial gene replaces the gene value. Likewise, when a decimal value is created at random, the decimal value will be changed to + or - in the event of continuing attributes.

5. Experimental Results and Discussion

This section presents the validation of FDT used to collect data from nanotechnology-based IoT biosensors, where the sensors are supplied with the data from cord-19 dataset (available at https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge). Performance metrics of the proposed method are evaluated as accuracy, sensitivity, specificity, and -measure. In other classifiers, such as SVM or Bayes, the performance is measured. Parameters are chosen based on the fuzzy model setup. It means that the algorithm, while learning, optimizes these coefficients (according to a given optimization strategy) and returns an array of parameters which reduces the error.

The proposed model is evaluated using an Artificial Immune Recognition System based on support vector machine, SVM-genetic algorithm, SVM-Fuzzy, Latent Dirichlet allocation, Bayesian classifier, dictionary-based linguistic rules decision models, and natural language processing, as shown in Tables 27.

The results show that the FDT (Table 8) is more accurate than conventional methods. The principal reason for the improvement is that the genetic algorithm is present to optimize the rules for structuring the large dataset collected from nanotechnology-based IoT biosensors. In the testing and training phase, the proposed method manages incomplete data. The FDT-classifier effectively manages the missing data during the preprocessing operation, which produces better results and outweighs conventional classifiers. This method also reduces the disparity in classification decisions based on their respective decision-making treaties. A new record is effectively classified on the basis of HDFT. In prediction, the presence of the feature selection method is significant in comparison with the standard methods. As the information gain is selected by its entropy values for the corresponding data, consequently, the results are increased and correctly diagnosed.

6. Conclusion

In this paper, we propose technique to improve the risk prediction for COVID-19 new classification technique incorporating FDT in genetic algorithms for rule optimization. The proposed solution is much more likely to be diagnosed by physicians compared to traditional prediction algorithms. Moreover, by using metaheuristic methods, the proposed work can be strengthened to refine the rule set of the decision tree.

Future methods can rely on finding the confirmation of real-time data over polymerase chain reaction of a viral agent. AI-based ML/DL/RIL methods can be used for finding the polymerase chain reaction in finding the viral medicine. The studies can be developed on collection of datasets that can provide a balance between public health and data privacy with AI interactions. The privacy of the data using blockchain technology can enable secured transactions of healthcare data and embedding AI for data analytics to predict the future of infectious spread.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

None of the authors have any conflicts of interest.