Fuzzy Boost Classifier of Decision Experts for Multicriteria Group Decision-Making
The expert is a vital role in multicriteria decision-making, which provides source decision opinions. In the existing group decision-making activities, the selection of experts is usually conducted artificially, which relies on personal subjective experience. It has been the urgent demand for an automatic selection of experts, which can help to determine their weights for the follow-up decision calculation. In this paper, an expert classification method is proposed to solve the problem. First, the CatBoost classification algorithm is improved by integrating the 2-tuple linguistic, which can effectively extract the features of samples. Second, the framework of the expert classification is designed. The flow combines the expert resume collection, expert classification, and database update. Third, a decision-making case is analyzed for the expert selection issue. The experiment and result indicate that the proposed classifier performs better than the classic methods. The proposed classification method of the decision experts can support the automatic and intelligent operation of the decision-making activities.
Decision-making is a vital activity in various management works. It is a complex process of thinking and operation . Rational decision-making has become an important part of the modern management theory, in which the decision-making always involves multicriteria group decision-making . The decision-making methods originate from the persons’ thinking process. Therefore, the essential basis is the experts’ information and knowledge. The expert here means the administrator, decision-maker, and scholar in the related field. The professional and adaption level of experts affects the decision-making greatly. How to select the appropriate decision experts has been a novel issue.
Moreover, the computer and information technology bring the automatic decision support system (DSS) [3, 4]. In the automatic decision-making programs like the DSS, the expert information is collected and stored, such as the professional title, educational background, professional background, and research field. It provides the opportunity to select the decision experts automatically and rationally.
Specifically, the selection of experts is usually done manually, which relies on the subjective judgment of administrators. The selection of experts is not objective enough so that the results are often not persuasive. The artificial approach is unreliable and inefficient in the actual operation. Moreover, in the Internet information era, it is an available demand to automatically analyze the expert information and select the appropriate ones for the decision-making . The experts should be analyzed with their research level and practical experience. Then the experts can be classified and selected according to their professional grades and the decision-making situation.
The core issue can be abstracted as the classification and evaluation of the experts. The typical classification algorithms of machine learning include naive Bayes, k-nearest neighbor (KNN), decision tree (DT), support vector machine (SVM), and logistic regression [6–9]. It develops with the integrated learning algorithms such as random forest, AdaBoost, XgBoost, and CatBoost [10, 11]. Meanwhile, other machine learning methods are also studied for feature extraction [12–14]. The input of the classifiers may be a real number, linguistic text, and grade variable. For the fuzzy inputs, they are usually converted with the one-hot coding method . The simple conversion may lose the detailed and hidden information of the categorical features [16–18], which makes it impossible to accurately and completely express the information carried by the original data.
Given the demands above, the experts should be analyzed automatically with the machine learning, in which the problems of information coding and feature recognition should be solved. This paper proposes an expert classification method from the perspective of objectivity and intelligent computing. The main work is briefly presented as follows.(i)The 2-tuple linguistic  is introduced into the integrated classifier CatBoost  to fully express the fuzzy features and classify the experts.(ii)The general framework is designed with the fuzzy classification model. It can be used to select decision experts and update the expert library.(iii)The ability of feature extraction and classification is tested with the one-hot encoding and traditional classifiers. Results show that CatBoost integrated with 2-tuple linguistic can effectively recognize the experts and help evaluate and select the decision expert in an automatic solution.
This paper is organized as follows. Section 2 introduces the works related to the expert classification and decision-making. Section 3 expounds on the proposed method of expert classification. Section 4 carries out a case of the expert classification for the decision-making in water environment pollution. The method and results are discussed in Section 5, and the paper is concluded finally in Section 6.
2. Related Work
2.1. Group Decision-Making and Decision Support System
The core thought of rational decision-making is evaluating the alternatives from multiple aspects and persons. The multicriteria group decision-making realizes the thought, and it will be introduced firstly in this section.
Decision-making is the process of judging a decision problem or finding a solution. In general, a satisfactory solution should be most acceptable by the entire administrators. Therefore, the group decision method is proposed to improve the traditional individual method. The group decision-making mainly uses the core theory of the multicriteria and multiattribute decision-making [21, 22], which involves the integration of the opinions of multiple experts and the weight calculation of alternative scheme attributes. In most situations, the decision opinions are given in the form of linguistic text by the group of decision experts. In order to aggregate the linguistic information, Xu  proposed a group decision-making method based on the relation of language preference. Qian et al.  redefined some basic operations of the generalized hesitant fuzzy set and presented a group decision-making method based on the improved hesitant fuzzy set. Chen  extended the technique for order preference by similarity to an ideal solution (TOPSIS) to the fuzzy decision environment. Besides, the decision-making methods are extended in the neutrosophic set environment [26–28], which try to promote the information expression level. More types of group decision-making are explored, such as the methods based on granular computing [29–31]. They aim at the decision-making with multiple dimensions.
With information technology development, the different decision-making methods can be realized in the DSS. Since Gorry  first put forward the concept of decision-making with the software system, the DSS has been studied and developed. Up to now, DSS has been widely used, including clinical DSS in hospitals, environmental DSS in government departments, and DSS in urban water supply management . The DSS mainly consists of three parts: data access and operation, model based on data, and case matching. With the development of computer and information technology, the DSS develops in various solutions [34, 35]. The DSS has provided an automatic platform to run the decision-making methods.
It can be seen that multicriteria group decision-making is built based on decision experts. The experts need to be selected to express decision opinions . The existing studies on group decision-making generally focus on the integration of experts’ decision opinions. However, the analysis of experts themselves is not sufficient, and the level differences among experts are not fully considered. Besides, the expert library is an important module of the DSS. Most of the systems collect and evaluate the experts in the manual solution. Meanwhile, the system pays more attention to the case library and other knowledge bases, instead of the expert library . It can be concluded that the selection and evaluation of decision experts directly affect the reliability of decision results. Moreover, the modern and intelligent DSS has an urgent need for automatic analysis and classification of experts. Therefore, it is necessary to explore the automatic method of the expert classification.
2.2. Classification Methods in Machine Learning
There are many kinds of classification algorithms in machine learning. The classifiers are widely used in data mining, text recognition, and other fields.
As mentioned in Introduction, the classical algorithms include naive Bayes, KNN, DT, SVM, and logistic regression [6–9]. The integrated learning algorithms include random forest, AdaBoost, XgBoost, and CatBoost [10, 11, 20]. However, the classifiers themselves do not provide a mechanism for input features. The sample data are usually converted into numerical data by one-hot encoding before training models. The hidden information may be lost by the categorical feature conversation with one-hot encoding, which will impact the classification accuracy .
The CatBoost algorithm has the characteristics of high classification accuracy and fast training speed. It has been the latest classifier which is of strong learning ability. Moreover, CatBoost provides a mechanism of converting input features into numerical values. However, the conversation method of CatBoost is not suitable for expert sets with small samples, especially for the samples with a few input features. For the problem of expert classification, the integrated method represented by CatBoost can provide an effective learning technique, but it should be remolded for the input feature conversation.
For the feature extraction of input data, some fuzzy analysis methods can provide the basic technique. The classical theories include the fuzzy set and rough set [38, 39], in which the degree of information uncertainty is expressed with the probability and membership function. The fuzzy analysis methods are developed in the demand for language comprehension. The grade variables in the text can be transformed and calculated with the vague set  and 2-tuple linguistic [19, 41]. The related studies show that the 2-tuple linguistic is flexible and effective to express the rating grade information, of which the calculation load is lower than the complex natural language processing technique.
Significantly different from previous studies, we will propose an intelligent method to create the decision expert library. Our innovative contributions are highlighted as follows:(1)The automatic classification and evaluation of decision experts are mainly focused. An improved fuzzy CatBoost classifier is proposed, integrated with 2-tuple linguistic. The information of experts is expected to be extracted more soundly and deeply.(2)The creative flow of the expert classification and selection is designed based on the improved classifier, which can help the application of group decision-making in the software programming of the DSS.
3. Classification Method of Decision Expert
3.1. Problem Description
The existing research indicates that the group multicriteria decision method depends on the selection and aggregation of experts. It is the classification problem of decision experts. The reasonable experts should be selected considering as many characteristics of experts as possible, such as the research field, academic level, and practical experience. Also, the representation of expert information should be easy and available for computer computation. The information collection and analysis calculation should be realized with the software program. Considering the demand, the problem is abstracted and presented as follows.
The fundamental task for expert classification is the importing of the expert information. In the task, the experts are selected and classified to form different categories. First, the resumes of experts are collected, and the information will be processed uniformly. In order to meet the requirements of machine learning, the input data of experts should be objective enough. This paper uses structured data to represent the attributes of experts. There are experts, and the expert set is represented as , . Each expert is described with attributes, and the attribute is represented as , where is the serial number of the experts. The attribute variable is in different forms according to the property information. One of the attribute variables is the numerical value, while others are represented with texts. The attribute variable in the text form should be converted for the unified input of the classifier. Moreover, for the elements in the expert set, they are corresponding to the category label. The category means the general evaluation of the expert level, in view of the academic level, the professional degree in the field, and the quantized achievements. The category label is represented by , where and the number of categories can be determined in the concrete decision-making situation. The main task in the paper is to build the classifier that can model the relation between the attribute set and the category label . The data processing and classification method are introduced in Section 3.2.
3.2. CatBoost Algorithm Integrated with 2-Tuple Linguistic
For the advanced integrated classification method, the CatBoost algorithm is embedded with a processing mechanism of input features. The processing replaces the original inputs of numerical codes. The main solution of CatBoost is as follows.
The attribute variable represents the i-th dimensional input feature of the k-th training sample. The category label of the k-th sample is . The conditional expected value of under the category is expressed as ; namely, . Take as a training sample, and is a random arrangement. is calculated aswhere is the smoothing parameter and it is usually set to the mean value of the target data set. is the coefficient of , and it is greater than 0. The sign means the function of which its value is 1 if the marked variables are equal. CatBoost uses different permutations in the steps of gradient enhancement.
The classical algorithm of CatBoost above has been applied in various classification issues. However, it does not work well for the small samples with its built-in processing tool of input features, as shown in the example analysis of Section 4.2. Moreover, some inputs are in the text form with fuzzy information. Then 2-tuple linguistic is introduced as the analysis tool for the fuzzy text variable.
The variable of 2-tuple linguistic can express the grade and membership degree of an object. It is expressed in the form of . The symbol is the i-th element of the predefined grade set . is the sign transfer value: , indicating the deviation between the real evaluation degree and .
For the transform of the common input attribute to the 2-tuple linguistic, different measures can be taken in view of the input format. If the input attribute is expressed with a defined grade set, namely, , the 2-tuple linguistic can be obtained directly with the function :
If the input attribute is expressed with a real number , is the number of elements in the grade evaluation set . is the rounding operator. is transformed into the 2-tuple linguistic variable by the function . The function is defined as
Conversely, the 2-tuple linguistic can be converted to by the inverse function :
In the proposed classifier, the built-in input feature processing tool in CatBoost is replaced with the 2-tuple linguistic. The CatBoost algorithm integrated with 2-tuple linguistic is proposed as the expert classifier. The improved method is abbreviated as 2L-CatBoost.
In the method, the inputs are preprocessed according to their formats. If the inputs are expressed with the grade set in text format, they can be transformed into 2-tuple linguistic following formulas (2) and (3). If the inputs are given with numbers, they can be transformed following formulas (4) and (5). Besides, the membership degree of can be given by the calculation of semantic similarity. Then the processed inputs can be imported into the CatBoost, and the CatBoost algorithm is shown in Algorithm 1.
3.3. Automatic Flow of Expert Classification and Selection
The experts should be analyzed and selected reliably for multicriteria group decision-making. Especially in modern DSS, the work should be operated with the software. Then an automatic flow of the expert classification and selection is designed, in which the collection and update of the expert information are also contained. The experts are classified by the 2L-CatBoost algorithm. The classification results can be the main reference for the expert selection, and the selected experts can provide decision opinions for the multicriteria decision-making activities. The flow of the expert classification and selection is shown in Figure 1.
As shown in Figure 1, the process of expert classification and selection consists of two parts, namely, the training and application. In the figure, the training part is shown with the green blocks. The application part is blue, and the shared part is dark orange. The concrete flow can be summarized as follows:(1)The expert information is collected with the web crawler or artificially. The information should meet the needs as much as possible; that is, the information can well reflect the academic level and experience degree of experts in a certain field.(2)The information of experts needs to be stored with structured data, which can be used to train the classifier model and facilitate users to consult expert information. Therefore, if the data source contains semistructured data, they will be converted into structured data with 2-tuple linguistic.(3)The structured expert data are labeled with the category of the expert. Concretely, the experts are allocated with a certain degree, and the labeled expert data are stored in the expert database.(4)The stored data in the expert database are used as training samples. Then the 2L-CatBoost algorithm is modeled with the existing data. At this point, the basic framework of the expert classification is completed.(5)The expert database and classifier will be improved and evolved in the application. When the new data of experts are imported, the experts will be classified automatically with the trained 2L-CatBoost classifier. Meanwhile, the new experts can be set as the supplement of the training samples to retrain and improve the existing classifier.
The flow above is designed for the automatic building and updating of the expert library in DSS. In the process of group decision-making, the anticipant experts can be selected by the system according to the category result. The subsequent decision-making activities can be executed by other modules in the DSS. With the operation of DSS, the data in the expert library will increase, which can help to improve and evolve the system and library based on the proposed classification learning method. Then it can help the DSS to obtain the strong ability of automatic decision-making in the mode of human thinking.
4. Case Study
4.1. Data and Experiment Setting
The proposed classification method for the decision expert selection is verified with a decision-making case. In the previous studies [42, 43], we have analyzed the decision-making of water pollution governance. For the pollution of algal bloom in the urban lakes, it is necessary to monitor and predict its trends by using some parameter estimation methods [44–52] such as the recursive algorithms [18, 50, 53–58]. When it breaks or is going to break, rational decision-making should be carried out for emergency management. In order to promote the scientific and efficient governance of algal bloom, it is the first task to select and invite experts from the DSS. Then the other decision-making activities can run based on the experts’ decision opinion.
In the selection of experts, the administrators usually give priority to ones with high academic level and rich practical experience in the field. The classification and evaluation of experts will be tested in the experiment.
The basic resumes of experts are collected in previous studies. The information is from the affiliated websites and China National Knowledge Infrastructure (CNKI). The collected expert information is stored in a structured datasheet. A total of 56 experts and scholars related to algal bloom management are collected . Each expert is represented with 11 attribute variables that can reflect the academic level and professional experience. The names and meanings of attributes are shown in Table 1. The professional title is set as a categorical feature and other attributes are of a numerical value. The basic data set of experts is shown in Table 2.
The experts in Table 2 are divided into five categories. 56 samples are tagged with the category labels of I, II, III, IV, and V. Level I means the expert is the highest professional, and V means the lowest. The distribution of the expert category is shown in Figure 2. For the demand for the minimum size of samples in the classifier training, the existing data set is expanded. The Monte Carlo simulation  is used to expand the data set to 50 samples in each category. Then 250 expanded samples are obtained, which are set as the training set, and the original 56 samples are used as the test set.
The training and test of the classifier are conducted in the preset environment. The concrete setting of the hardware and software is shown in Table 3. The 2L-CatBoost is trained with the expanded samples. The parameters in the training are shown in Table 4.
The original 56 samples of experts are imported into the trained 2L-CatBoost classifier. The classification results are shown in Table 5, where the value in the table means the number classified into the class of columns from the class of the row. The confusion matrix of the classification result is shown in Figure 3.
From the results in Table 5, it can be seen that only 3 of the 56 test samples are misclassified. The total classification accuracy is 94.64%. In the misclassified samples, 2 experts in class III are classified into class II; and 1 expert in class IV is classified into class III. The error can also be seen in Figure 3.
The proposed 2L-CatBoost mainly improves the classifier by processing the input with 2-tuple linguistic. Then the effect of the processing approach is tested and compared with the traditional ways, of which the one-hot encoding and the embedded feature conversation in CatBoost are set as the contrast. The samples of 56 experts are preprocessed with the three methods; then they are imported into the same CatBoost classifier. The classification accuracy of the three methods is shown in Figure 4.
For the different preprocessing methods, the proposed method obtains the best result. The result of the embedded feature conversation is the worst. It indicates that the mechanism embedded in CatBoost cannot effectively dispose of the samples in small size. Its accuracy is even worse than the one-hot encoding method.
Moreover, the classical classifiers are set as the contrast methods, including the SVM, DT, AdaBoost, random forest, naive Bayes, KNN, logistic regression, and XgBoost. For the SVM, the different kernel functions are used. The classification accuracy of the methods is shown in Figure 5. In the contrast experiments, two preprocessing approaches of the inputs are adapted, the one-hot encoding and the 2-tuple linguistic, represented with different colors in Figure 5. The results show that 2L-CatBoost has the highest classification accuracy.
As the core information source of the multicriteria decision-making, the experts should be evaluated and selected rationally and automatically. In this paper, the classifier of machine learning is introduced to solve the problem. An improved fuzzy classifier 2L-CatBoost is proposed. The experiments are designed to test the performance. The proposed classifier is discussed based on the results.
For the proposed 2L-CatBoost classifier, it integrates CatBoost and the 2-tuple linguistic, which can take the advantages of both methods. For the CatBoost algorithm, it is proved to be effective in the expert classification. The proposed classifier has obtained a better result than other classifiers, including KNN, XgBoost, SVM, and naive Bayes. Besides, the training time of the 2L-CatBoost classifier is very short. In the experiment, it takes 7 iterations to obtain the optimal model. The advantage makes rapid updates and involvement of the expert library in the DSS possible.
The other main contribution is the processing of fuzzy inputs. The 2L-CatBoost classifier introduces the 2-tuple linguistic for the feature conversation of the input fuzzy information. The 2-tuple linguistic helps to increase the accuracy of the CatBoost classifier. Compared with the classical one-hot encoding method, the 2-tuple linguistic makes logistic regression, naive Bayes, and CatBoost better on the categorization. The results also show that not all classifiers are suitable to use 2-tuple linguistic to deal with categorical features in the case of small samples. CatBoost has been proved to be effective in the case of a large sample size, but it fails when the data size is limited with the embedded feature processing. In this case, the proposed 2L-CatBoost still performs well with the help of the 2-tuple linguistic.
In summary, the intelligent classification method of decision experts based on machine learning proposed in this paper can effectively process expert information. After the construction of the expert information database, the task of expert classification can be completed. The process is objective enough, which is helpful to promote the standardization and efficiency of the decision-making process. The selection and classification of experts will help the subsequent group decision-making with multiple approaches [60, 61].
The method proposed in this paper still has some shortcomings that need to be further improved in the follow-up work. The method has not been proved to be useful in any other data set. It is indeed difficult to adapt to all the data situations. The transferability of the method should be explored in the future. Besides, if the input data contains semistructured data, it needs the manual concertation from semistructured data to structured data, which undoubtedly increases the workload of users and reduces the efficiency. In the future, it is expected to explore the automatic way to convert the unstructured and semistructured data.
The issue of the decision source is studied for the multicriteria group decision-making. An automatic classification method of the experts is expected to be the important support for expert selection. In the solution, the improved fuzzy CatBoost classifier is proposed, integrated with the 2-tuple linguistic. It can dispose of the fuzzy input features effectively for accurate classification. Meanwhile, the general creation and update of the decision expert database are also designed. The experiment and results indicate that the proposed 2L-CatBoost is available and suitable for the expert classification with small samples and fuzzy inputs of features. In the future, more intelligent techniques can be introduced and studied to improve the fuzzy information in multicriteria decision-making, including automatic text collection and analysis, data prediction, and natural language understanding. Then the multicriteria group decision-making method will be efficient and intelligent. The proposed method in this paper can combine other estimation algorithms [62–65] to study the multicriteria decision-making problems.
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
This research was funded by National Social Science Fund of China (no. 19BGL184), Beijing Excellent Talent Training Support Project for Young Top-Notch Team (no. 2018000026833TD01), the National Natural Science Foundation of China (nos. 61903008 and 61673002), and the Research Foundation for Youth Scholars of Beijing Technology and Business University (no. QNJJ2020-26).
W. Che, Encyclopedia of Psychological Counseling, Zhejiang Science and Technology Press, Hangzhou, China, 2001.
L. Breiman and J. Friedman, Classification and Regression Trees, Thomson Wadsworth, Belmont, CA, USA, 1984.
T. Chen and C. Guestrin, “XGBoost: a scalable tree boosting system,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 2016.View at: Google Scholar
F. Ding, L. Lv, J. Pan, X. Wan, and X.-B. Jin, “Two-stage gradient-based iterative estimation methods for controlled autoregressive systems using the measurement data,” International Journal of Control, Automation and Systems, vol. 18, no. 4, pp. 886–896, 2020.View at: Publisher Site | Google Scholar
L. Prokhorenkova, G. Gusev, A. Vorobev et al., “CatBoost: unbiased Boosting with categorical features,” in Proceedings of the 32nd Conference on Neural Information Processing Systems, Montreal, Canada, December 2018.View at: Google Scholar
G. A. Gorry and M. S. Morton, “A framework for management information systems,” Sloan Management Review, vol. 30, no. 3, pp. 49–61, 1989.View at: Google Scholar
G. Wei, “2-tuple intuitionistic fuzzy linguistic aggregation operators in multiple attribute decision making,” Iranian Journal of Fuzzy Systems, vol. 16, no. 4, pp. 159–174, 2019.View at: Google Scholar