Abstract

Several studies aimed at improving healthcare management have shown that the importance of healthcare has grown in recent years. In the healthcare industry, effective decision-making requires multicriteria group decision-making. Simultaneously, big data analytics could be used to help with disease detection and healthcare delivery. Only a few previous studies on large-scale group decision-making (LSDGM) in the big data-driven healthcare Industry 4.0 have focused on this topic. The goal of this work is to improve healthcare management decision-making by developing a new MapReduce-based LSDGM model (MR-LSDGM) for the healthcare Industry 4.0 context. Clustering decision-makers (DM), modelling DM preferences, and classification are the three stages of the MR-LSDGM technique. Furthermore, the DMs are subdivided using a novel biogeography-based optimization (BBO) technique combined with fuzzy C-means (FCM). The subgroup preferences are then modelled using the two-tuple fuzzy linguistic representation (2TFLR) technique. The final classification method also includes a feature extractor based on long short-term memory (LSTM) and a classifier based on an ideal extreme learning machine (ELM). MapReduce is a data management platform used to handle massive amounts of data. A thorough set of experimental analyses is carried out, and the results are analysed using a variety of metrics.

1. Introduction

Recent technologies, such as big data, the internet of things (IoT), wearables, and so on, have a significant impact on society, healthcare organisations, and our daily lives. Big data plays an important role in obtaining the necessary data during the decision-making process. Big data is defined as a complex and massive volume of data derived from various sources and clinical data sets that provide critical information for patient treatment [1]. Furthermore, big data has the potential to improve healthcare operations through data-driven decision-making in the ambiguous environment of Industry 4.0. Big data analytics provide significant benefits for evaluating and assimilation of massive amounts of complex healthcare data. The medical system keeps track of the world’s most pressing social and economic issues in order to find innovative solutions through technology and science. The Industry 4.0 model was first proposed in 2011, and it was initially referred to as the production or manufacturing process. While incorporating, medical services and Industry 4.0 are complementary methodologies. Furthermore, with the rise of big data and the widespread use of electronic healthcare records of patients in healthcare organisations, chasing solutions to population medical problems is no longer viable. Using big data for better decision-making, on the other hand, poses some healthcare challenges.

In recent years, group decision-making has received a lot of attention in various areas of healthcare organisations [2]. The more severe the challenge, the more complex it can withstand the loss caused by a decision-making error. As a result, many civil organisations and government departments, as well as managers and experts in various fields, would be involved in decision-making. Based on the preferences of decision-makers, LSGDM selects sufficient alternates from a group of possible alternates [3]. When the number of decision-makers (DM) increases, the standard group decision-making problem transforms into the LSGDM problem. LSGDM problem-solving methods typically consist of four phases: (i) the cluster standardised individual decision matrices, (ii) standardising original individual decision metrics, (iii) selecting the best alternatives, and (iv) aggregating the cluster decision metrics.

Based on the conventional decision-making method, the current study significantly innovates by incorporating four factors: distinct decision-makers’ preference data expression, attribute weight determination method, large-scale group clustering method, and large group preference data aggregation method [4]. One of the most common fields of study is large-scale group preference data aggregation. Despite the fact that the number of studies deliberating big data is steadily increasing, applications of large-scale group decision-making procedures in the context of big data studies and medical Industry 4.0 remain rare. This work creates a new MapReduce-based LSDGM model (MR-LSDGM) for the Industry 4.0 environment to improve decision-making in healthcare management. With the rapid advancement of information technology, as exemplified by the Internet, decision support systems will evolve toward socialisation in the era of big data. This is because DMs from various areas can be invited to collaborate on difficult issues on a single network platform. Simultaneously, we can conduct online voting on a particular item and perform automated statistical analysis. Additionally, it may analyse and research multitemporal and group events, such as those on e-commerce websites and search engines, to provide critical data support for qualitative decision-making. It appears to be worthwhile to develop large-scale group support tools to aid in decision-making, given that the big data era may contain a variety of data sources, including social media, mobile devices, and websites. Several researchers have developed software implementations of LSGDM, including the WTALGDM for LSGDM on energy network dispatch optimization, MENTOR for visualizing opinion evolution, and a multiagent system model for assisting with CRP. Other application fields, such as healthcare and engineering, require the use of these tools. A large number of simulations are run to demonstrate the improved results of the MR-LSDGM technique, and the experimental findings are examined using several metrics.

Li and Wei [5] created an LSGDM model for making medical management decisions. For describing the decision data, the HFLTS is used. For clustering the DM into many subgroups, a clustering technique based on the ideal point is presented. The DM preference is then combined with the PDEHFLTS model to retain the decision data. A subgroup weight method is proposed for calculating the ranking weight based on the subgroup size and the presented hesitant entropy of PDEHFLTS. For large-scale GDM problems, Li et al. [6] used a fuzzy cluster analysis to integrate heterogeneous data. Fuzzy cluster analysis is used to divide large groups into smaller ones, and F-statistics are used to calculate the number of clusters required. The original data is kept depending on the degree of similarity. A consensus-building process is then used among these smaller groups to reach a common understanding. While other groups could not agree, a feedback system was devised to update the smaller GDM matrix, and the TOPSIS model was used to select the best option.

Song and Yuan [7] proposed a new GDM method based on arithmetic programming and employing IMGFLPR. As a result, a consensus procedure based on IMGFLPR is developed, while dynamic adaptation of expert weights is considered. Finally, problems with emergency plan election are solved using the presented method, which demonstrates the effective outcome of GDM. Wan et al. [8], inspired by multiplayer game concepts, proposed a two-step optimization algorithm that first maximises individual fulfilment while minimizing group conflict. The provided approach effectively saves decision-making time when it comes to ensuring the quality of LSGDM.

Hsu et al. [9] identified eight potential developments for providing a proper approach to the medical industry. The modified Z-DEMATEL method is used to build the mutually important relationship and prioritises this trend. By optimising the classic fuzzy number and representing the assessment environments’ confidence under uncertainty, the Z-number technique improves the consistency of expert evaluation. Liu et al. [10] developed an LGDA approach for managing dependency in HRA based on the interval two-tuple linguistic variable and cluster analysis model. In addition, an expanded Muirhead mean operator was developed to determine the amounts of reliance between the activities of consecutive operators. Finally, empirical medical dependency analysis is used to demonstrate the applicability and effectiveness of the presented LGDA model [11].

Du and Shan [12] proposed a dynamic intelligent integration suggestion approach for product ideas. They began by creating one-of-a-kind product concept assessment condition systems that included both output and input criteria. The following section describes a step for static data combination and data extraction. Later, the fundamental likelihood assignments function is used as a data extraction approach to accurately reflect and effectively capture the validity of experts’ evaluations. Dursun et al. [13] proposed a fuzzy multicriteria GDM architecture based on the fuzzy integral and measure principle to evaluate the HCW treatment alternatives for Istanbul. In the case of the GDM problem, an expert consensus is required for the calculation method to be carried out correctly. The OWA operators are used in this work to aggregate DM opinions.

Pan et al. [14] concentrated on using dynamic programming to solve the large-scale GDM problem, in which the data is in the form of linguistic variables. Because the linguistic variable cannot be directly computed, the interval type-2 fuzzy set is used for encoding. The distinct similarity and distance models are then constructed concurrently in order to determine the relationship between the interval type-2 fuzzy set. Later, a dynamic programming approach based on clustering models was presented for clustering the DM from an overall perspective. Gao et al. [15] created a novel paradigm for selecting an appropriate physician in the index system that balances the 2D calculation result. The researchers created questionaries and conducted field research to bring the given technique closer to the actual situation in China. They then calculated the best outcome for the best medical services provided by doctors [16].

3. The Proposed MR-LSDGM Technique

The MapReduce tool is used in this study to create a new MR-LSDGM approach for the healthcare industry. The MR-LSDGM approach includes BBO-FCM-based DM clustering, 2TFLR-based preference modelling, and LSTM-OELM-based classification procedures. The proposed MR-LSDGM model is depicted in its entirety in Figure 1. The MapReduce tool is used by the MR-LSDGM approach to managing massive amounts of data in the healthcare industry. The sections that follow provide a more detailed explanation of these processes [17].

3.1. MapReduce

The primary goal of the Map procedure is to compute the geometric distance between cluster centres and sampling point data. Read the data from Hadoop Distributed File Systems (HDFS) and use the stated (value, key) pair input formats as Map function input values, where “key” denotes the sampling point data ID numbers and “value” means the entire data sampling point and then read the maximal consumption. The minimal distance approach would evaluate the major cluster centre, compute Euclidean distances between other cluster centres using sample point data, and integrate the membership degree (MD) [18].

The primary goal of Reduce functions, on the other hand, is to obtain a large number of Map function outputs. To begin, obtain the key values pair from the Map functions, where “key” represents the cluster centres and “value” represents the sampling point data equivalent to the cluster centres. The data sample from a number of distinct cluster centres is then merged, and a new cluster centre is evaluated. Finally, it is determined whether the geometric distance between the novel and equivalent cluster centres exceeds a predetermined threshold or whether the number of iterations exceeds that threshold.

Despite outperforming traditional hard clustering algorithms in terms of clustering effects, fuzzy clustering algorithms have a few drawbacks. The current clustering algorithms are extremely sensitive to early clustering centres. Because the algorithms use the concept of gradual iteration, the objective functions are continuously reduced during the iteration. As a result, when the c clustering centre is arbitrarily chosen in each sample data set at first, the geometric distance that would produce the last clustering results for falling to the current optimum solution is smaller. To avoid situations in which the geometric distance between the arbitrarily chosen cluster centre is smaller, the minimum and maximum distance methods were used to determine the early cluster centre in this study.

3.2. Design of BBO-FCM Technique

In the beginning, the BBO-FCM approach is used to divide the DMs into subgroups. Every feature vector with a coefficient between [0, 1] belongs to one of the FCM clusters. Finally, the algorithm labels all of the data points (feature vectors) based on the maximum coefficient of these data points across all clusters. By minimizing the following equation, the cluster centre and fuzzy membership matrix are calculated.

where represents the sum of data; indicates the quantity of clusters; signifies the fuzzy association of point to clusters; means the cluster centres and data point , a fuzzy weight factor that determines the quantity of fuzziness produced as a result In most cases, and l = 2 is chosen (it can be stated that this value of m does not generate optimum solutions for each problem).

Because of the constraints in (1), all points must completely allocate their memberships to each cluster [19]. The fuzzy weight centre of gravity of the data is used to define the cluster centre (centroid)X.

As influences the calculation of the cluster centres data with more memberships would have a greater influence on the prototype position than data with fewer memberships. Because has an effect on the calculation of the cluster centres i, it is necessary to consider it. The distance d_ is used in the fuzzy C-means technique . Clustering using fuzzy logic (sometimes referred to as soft clustering or soft k-means) allows each individual data point to be assigned to more than one cluster. For fuzzy ‐means approach, distance is determined by

The cluster centre represents the common value of that cluster, where the components of the association matrix denote the range where the data point is related to its model. The minimalization of divide function (1) would derive the following equation:

Equation (4) is defined in an iterative manner as the distance is based on membership . The process to compute the FCM is given below:(i)Opt for the number of cluster c, select m, (ii)Compute the cluster centre by (2).(iii)Compute the novel partition matrix by (4).(iv)Relate & . When the variations of the MD computed by proper standards are smaller when compared to the provided threshold, end the process and return to step (2).

The BBO algorithm is used to define the optimal initial cluster centre of the FCM technique. The BBO has a population-based optimised technique that simulates the development and the balance of predator and prey in different ecosystems. According to research, the BBO produces better results than the other population-based techniques [20]. This technique utilises the BBO algorithm to select the optimal initial cluster centre to use for its initial cluster centre determination process. For each setting, the BBO uses an optimal population-based technique to simulate development and the balance between predators and prey. It has been discovered through research that the BBO generates superior results when compared to other population-based approaches. From one iteration to the next, a collection of solutions is retained, and all habitats send and receive inhabitants. The various habitats are determined by their immigration and emigration rates, which are probabilistically modified. An arbitrary number of habitats are occasionally mutated during all iterations. All of the solution parameters are now referred to as suitability index variables (SIV). Simon was the first to propose the concept of a biogeography-based optimization method. Using the scientific understanding of migration and the dispersion of species from one habitat to another, this method has been devised and tested. Each location has a habitat suitability index (HSI), which is based on the concepts of this algorithm and acts in a similar way to the fitness function in other population-centred algorithms. In addition, the suitability index variables refer to the independent factors that are used to determine the suitability index of a settlement (SIV).

The mathematical method of immigration and emigration is expressed as follows:where refers to the maximal rate of immigration, defines the maximal rate of emigration, implies the maximal number of habitats, and represents the habitant count of

The following are the modifications to all habitats that improve the evaluation of BBO:

where represents the higher value of mutation determined as a user, demonstrates the superior mutation probabilities of every habitat. and refers to the mutation probabilities of habitat that is given by

At this point, sets an ecosystem of habitat and calculates all equivalent HSIs, and describes the function that switches from one optimised cycle to the next. The six tuples of elements are described, where implies the number of habitats, refers to the number of SIVs, represents the rate of immigration, demonstrates the rate of emigration, refers to the migration function, and indicates the mutation operator.

3.3. Modelling Preferences of DMs Using 2TLFR Technique

Once the DMs have been clustered, their perspectives can be defined and fused using the 2TLFR technique to retain as much decision information as possible. Decision-making in healthcare is based on dynamic conditions and ambiguous information, and most decision-makers prefer linguistic variables or fuzzy values over hard numbers. In the two-tuple linguistic depiction method, the data measured in the linguistic hierarchy term set could be unified with no data loss.

Definition 1. represents a linguistic term set; denotes the outcome of an aggregation index of a group of labels measured in indicates linguistic two tuples; and ; characterises the linguistic label of the data; and means the mathematical value that expresses the values of the translation from the original results to the nearest index label in S, namely, the symbolic translation.
The following function converts mathematical numbers and linguistic two tuples. They can convert mathematical values to linguistic two tuples using (9) [8].Using the eq., they can convert a linguistic phrase to a real value (between 0 and ).To further unify the dimension, they could use the eq. to map the linguistic term between zero and one.

3.4. Automated Disease Classification Model

Finally, the disease classification process is divided into three stages: feature extraction using LSTM, classification using ELM, and parameter tuning using tree growth algorithm (TGA). As previously stated, convolution models can work on a single image and transform it from input pixel to matrix/vector representation. Current CNN pretrained models are used for feature extraction. The main goal is that CNN may not be trained, but training may be provided by the BP errors from the LSTM-DL classifiers via CNN multiple input images. Convolutional neural networks (CNNs) are used because of their improved transferability. Knowledge of this cutting-edge technology will benefit not just researchers who use CNN for radiology and medical imaging jobs but also clinical radiologists, since deep learning may influence their practise in the near future. Following CNN training, medical professionals or computer-aided detection (CADe) systems can specify the target lesions in medical pictures during the deployment phase. Figure 2 depicts a general LSTM cell. The LSTM cell contains various gates and parameters that control the behaviour of each memory cell. Every cell state is governed by the activation function of gates. For different types of gates, the input value is fed into the input gate (I), forget gate (f), activation vector (c), and output gate (o).where , , , , , , , , and denote weight (input weight, hidden weight, output weight, etc.) and , and indicate the bias weight [21].

Each time step includes a single CNN series and an LSTM model. As a single step, CNN could be passed and used on every output to the LSTM input image for all input images. The result could be achieved by folding up a CNN input framework with multiple layers in a time distribution method. The same layer is used multiple times to achieve a similar result. To determine the presence of diseases, the extracted features are fed into the ELM classifier. Assume a training data , the output functions of SLFN using hidden neuron could be determined bywhere denotes the output weight matrix and represents the network output equivalent to the training samples indicates a nonlinear piecewise continuous function and , and means a parameter of hidden node. The training network is to discover appropriate network parameters for minimizing the error functions , where

It represents an SLFN using an L hidden neuron and denotes the hidden, output matrix, and target output.

ELM uses arbitrary hidden node parameters and the tune-free trained approach to FFNN instead of iteratively upgrading network parameters as in traditional gradient descent algorithms. ELM is flexible because it employs a hidden activation function, as demonstrated by the universal approximate ability theorem. Almost any nonlinear piecewise continuous function and its linear combination perform well in the ELM algorithm [22]. The extreme learning machine (ELM) is a fast convergent training method for single hidden layer feedforward neural networks (SLFNs). This type of SLFN allows for faster convergence training and avoids the need for many iterations to update the hidden layer weights. Compared to other classical learning algorithms in applications with increasing noise, ELM appears to outperform ELM in regression and classification tests. With a single hidden layer of neurons and random feature mapping, an ELM model learns quicker than other models. High dimensions and large data sets have aroused substantial scholarly interest in the low computing complexity.

The TGA is used to optimise the ELM model’s parameter computation, resulting in improved overall classification performance. The TGA approach is stimulated by the competition between trees in the forest. A tree’s attention is divided between food and sunlight. Exploration and exploitation are the two major stages of the approach. During the exploration stage, the tree moves toward the sunlight, allowing it to investigate new locations. The tree is now fulfilled with light in the exploitation stage, and thus, it moves towards better nutrients in the root, as it moves towards the global/local optimal. The forest’s tree population is classified into four types.

The first group of trees has found a light source, and they can now compete for food. To compete with light, the tree in the second group switches to the two optimal options that are closest to it. In the third group, a new tree is planted in place of the worst tree. Finally, an optimal tree is used to create a novel plant [23]. Initially, this approach arbitrarily creates the early population of the tree (solution) within the lower and upper bounds, where the fitness values for all solutions are calculated. The following is how the early population is produced.where is the variable of solution of the population, rand represents an arbitrary value derived by the uniform distribution, and and indicates lower bound and upper bound of variable, respectively.

Next, the population is arranged based on the fitness values, and the present optimal solution at the iterations is established. The global optimal solutions are represented as The optimal solution is allocated to the initial group , and the solution from this population carries out the local search as follows:where represents the novel solution and is the ith solution in iteration. represents the rate of power decreases, and specifies an arbitrary number between [0,1].

When the novel solution has a higher fitness value than the current solution, greedy selections are used to find it. The novel solution either replaces the current one or keeps the current solutions for the next generation.

The second optimal solution is allocated to subpopulation. All solutions from the group must be shifted to the two nearest solutions (from the initial and second subpopulation) at distinct angles. The Euclidean distance is used to measure the distance between two solutions.where denotes the distance of solution, The trees that exist now are depicted as , and the ith solution in the population are signified as

The poorest solution from the population is found in the third subpopulation, . This solution is calculated by replacing it with a recent arbitrary solution.where the population sizes are represented by and are the first and second subpopulation, respectively. Next, the novel population (N) is determined by adding the initial groups , and .

The last group includes arbitrary novel results. is the final group of entire set outcomes which contains arbitrary novel findings.. Using mask operators, the population adapts an optimal solution from the initial group (N), and the adapted solution is fused. The fitness values are used to organise the novel population, and the best N solution is chosen for the next iteration. The procedure is repeated until the desired result is obtained. Finally, the best solution is determined.

4. Performance Validation

The performance of the MR-LSDGM approach is investigated in this section using the benchmark activity recognition data set from the UCI repository [24]. The data set contains information on 30 people, each with 561 attributes. The data set contains 496 instances from the Walk class, 471 instances from the Up class, 420 instances from the Down class, 491 instances from the Sitting class, 532 instances from the Standing class, and 537 instances from the Lying class.

After five repetitions, the MR-LSDGM approach produced a collection of five confusion matrices, as shown in Figure 3. The graph shows that the MR-LSDGM method yielded the best possible result in each execution run [25]. For example, the MR-LSDGM technique classified 493 instances as Walk, 464 instances as Up, 415 instances as Down, 447 instances as Sit, 506 instances as Stand, and 537 instances as Lay under run-1. Similarly, the MR-LSDGM approach classified 495 instances as Walk, 466 instances as Up, 416 instances as Down, 451 instances as Sit, 510 instances as Stand, and 537 instances as Lay in run-2. Similarly, the MR-LSDGM method classified 495 instances as Walk, 464 instances as Up, 416 instances as Down, 450 instances as Sit, 508 instances as Stand, and 536 instances as Lay under run-4. Furthermore, under run-5, the MR-LSDGM algorithm classified 495 instances as Walk, 464 instances as Up, 415 instances as Down, 453, Sit, 506, Stand, and 534 instances as Lay [26].

The classification result analysis of the MR-LSDGM technique under varying execution runs is reported in Table 1 and Figure 4. The MR-LSDGM technique has resulted in superior performance across all runs, as shown in Table 1. For example, the MR-LSDGM technique achieved maximum performance with run-1, with an average sensitivity of 0.971, specificity of 0.994, precision of 0.972, accuracy of 0.990, and F-score of 0.972. The MR-LSDGM method also performed optimally in run-2, with an average sensitivity of 0.976, specificity of 0.995, precision of 0.977, accuracy of 0.992, and F-score of 0.976. Furthermore, with run-3, the MR-LSDGM method achieved an average sensitivity of 0.973, specificity of 0.994, precision of 0.973, accuracy of 0.991, and F-score of 0.973. With run-5, the MR-LSDGM approach improved efficiency, achieving an average sensitivity of 0.973, specificity of 0.995, precision of 0.974, accuracy of 0.991, and F-score of 0.973.

Figure 5 depicts the ROC analysis of the MR-LSDGM method on the applied data set under various runs [27]. According to the results, the MR-LSDGM approach had the highest ROC value in every run. For example, in run-1, the MR-LSDGM technique achieved an increased ROC of 99.9888. In line with run-2, the MR-LSDGM method has a better ROC of 99.7676. The MR-LSDGM methodology then achieved a maximum ROC of 99.9874 in run-3. Concurrently, the MR-LSDGM technique achieved a superior ROC of 99.9721 in run-4. Finally, under run-5, the MR-LSDGM method achieved a maximum ROC of 99.9416 [28].

An extended comparison analysis is provided in Table 2 [25] to demonstrate the improved performance of the MR-LSDGM technique. With accuracy of 0.9375 and 0.9531, respectively, the CNN-2016 and CC-2018 approaches produced ineffective results [29]. At the same time, the CNN-LSTM and lightweight CNN approaches improved their accuracy to 0.9627 and 0.958, respectively. Furthermore, the CNN-BiLSTM and CNN-SF approaches have acceptable accuracy values of 0.9705 and 0.9763, respectively. In contrast, the proposed MR-LSDGM approach achieved an effective performance of 0.991 [30].

As evidenced by the tables and statistics above, the MR-LSDGM technique is clearly more effective than the other procedures.

4.1. Discussion

The healthcare IoT data sets and performance criteria for the proposed MR-LSGDM strategy are briefly outlined in this section [31]. The complete approach was developed using the MATLAB 2021a tool on a Core i3-3110M processor running Windows 8 with 2 GB RAM, and it was tested on 8 healthcare IoT data sets (Table 1) [32]. Over 30 separate runs, the new BBO-FCM approach was compared to existing algorithms such as CNN 2016, CNN 2018, CNN-SF, CNN-LSTM, lightweight CNN, and CNN-BiLSTM in terms of intracluster distance, purity index, standard deviation, root mean square error, accuracy, and F-measure [33].

5. Conclusion

The MapReduce tool is used in this study to create a new MR-LSDGM approach for the healthcare sector. The MR-LSDGM approach includes BBO-FCM-based DM clustering, 2TFLR-based preference modelling, and LSTM-OELM-based classification procedures. To manage big data in the healthcare sector, the MR-LSDGM technique employs the MapReduce tool. Furthermore, the design of the BBO algorithm for determining the primary cluster centres of the FCM technique, as well as parameter optimization of ELM using the TGA technique, contribute to improved overall classification results. A large number of simulations are run to demonstrate the improved outcomes of the MR-LSDGM technique, and the experimental results are examined using several metrics. According to the simulation results, the MR-LSDGM methodology outperformed the other methods. In the future, the model presented here could be used in telemedicine applications to help patients in remote areas.

Data Availability

The manuscript contains all of the data.

Conflicts of Interest

The authors declare that they do not have any conflicts of interest.