Abstract

Information is considered to be the major part of an organization. With the enhancement of technology, the knowledge level is increasing with the passage of time. This increase of information is in volume, velocity, and variety. Extracting meaningful insights is the dire need of an individual from such information and knowledge. Visualization is a key tool and has become one of the most significant platforms for interpreting, extracting, and communicating information. The current study is an endeavour toward data modelling and user knowledge by using a rough set approach for extracting meaningful insights. The technique has used different rough set algorithms such as K-nearest neighbours (KNN), decision rules (DR), decomposition tree (DT), and local transfer function classifier (LTF-C) for an experimental setup. The approach has found its accuracy for the optimal use of data modelling and user knowledge. The experimental setup of the proposed method is validated by using the dataset available in the UCI web repository. Results of the proposed study show that the model is effective and efficient with an accuracy of 96% for KNN, 87% for decision rules, 91% for decision trees, 85.04% for cross validation architecture, and 94.3% for local transfer function classifier. The validity of the proposed classification algorithms is tested using different performance metrics such as F-score, precision, accuracy, recall, specificity, and misclassification rates. For all these performance metrics, the KNN classifier outperformed, and this high performance shows the applicability of the KNN classifier in the proposed problem.

1. Introduction

With the passage of time, the information and user knowledge become increasing. This is due to the advancements and rapid development in technology. Essential information has become the need of users in their daily life which requires the support of advanced tools like Hadoop, Tableau, Informatica PowerCenter, and so on. The data and knowledge exist in diverse shapes such as structured and unstructured. The structured data are mostly easily understandable and can be managed, while extracting meaningful insights from unstructured data has become a challenging task. According to the report of IDC [1], in late 2011 about 1.8 ZB of data were created. Globally, electronic data of approximately 1.2 ZB (1021) are generated per year by diverse sources [2]. By 2020, 40 ZB data are expected [3]. Human beings are always interested to capture the knowledge in an easy and effective way. This easiness is due to the translation of data and knowledge through graphs or maps for user understanding.

The role of visual context is obvious through which the patterns are identified from huge bulk of data and can be transformed through graphics and visualizations. Conclusions are drawn from the data through collection of data, modelling of data, and processing of data finally to plot the derivations. From interrelated perspectives, the data, knowledge, and information are mostly used in visualization. The aim of visualization is to gain meaningful insights from the data [4]. Users can interact with the data using the techniques of visualizations and go for analysis of the data and knowledge. One can communicate through data visualization in an effective and easy way for easy transferring of message and technical drawing for scientific purposes.

In order to support the data modelling and user knowledge, the proposed research contribution is to use a rough set approach toward user knowledge and data modelling for extracting insights. Different algorithms of rough set such as KNN, decision rules, decomposition tree, and LTF-C were used for the experimental setup. The dataset was used for the experimental setup of the proposed method which is available in the UCI web repository [5]. KNN is suggested in different problems such as text recognition [6].

The organization of the paper is as follows. Section 2 represents the related work to user knowledge, data modelling, and visualization with different literatures. Section 3 shows the research method and modelling of the proposed study with the detail of visualization of the dataset. Section 4 gives the results and discussion. The paper is concluded in section 5.

Researchers are trying to use different approaches, tools, and techniques in order to analyze user knowledge, data modelling, and visualization. Table 1 shows the brief descriptions of the existing approaches available in the literature.

3. Rough Set Approach toward Data Modelling and User Knowledge for Extracting Insights

Machine learning algorithms play an important role in different areas of research [16, 2530]. In this paper, a rough set approach is used for data modelling and user knowledge to extract meaningful insights. The rough set approach works well in a situation of uncertainty by plotting the lower and upper approximations. The obtainable model or rough set consists of “IF THEN rules.” The rough set was presented by Pawlak in 1982 [31]. It has a specific lower and upper approximation boundary area. Lowering the degree of precision in the data makes the data pattern more clearly. Rough sets and boundaries can be mathematically presented as follows [32]:

It shows two possibilities: the element belonging to the set and the element possibly belonging to the set. Figure 1 shows the concept of rough set.

Figure 2 represents the workflow of the rough set theory application. The main parts in the workflow are explained in this section.

The experimental process of the above flow shown in Figure 2 has been implemented using RSES [33]. Rough set and fuzzy rough set theories are based on some preliminary parts [34]. The reason behind the selection of the rough set approach for the proposed research is that it works very well in situations of uncertainty and vagueness. The following main parts were considered for the experimental setup:(i)Decision/information table(ii)Indiscernibility, reduct, and core(iii)Cut and discretization(iv)Rules generation

3.1. Classification Measurements

Various formulations have been performed for the classification measurements. The formulation of measures is given below:

4. Results and Discussion

Different algorithms of rough set were applied for the experimental setup of the proposed research. These algorithms include KNN, decision rule, decomposition tree, and LTF-C. Figure 3 shows the knowledge level of the user along with the number of decision instances.

Figure 4 shows the algorithms along with the number of rules for the given decision instances.

Different performance metrics such as specificity, accuracy, precision, F-score, recall, and misclassification rates are followed to check the validity of the proposed model based on different classification algorithms. These algorithms include KNN, cross validation/k-fold mechanism, decision rules, decomposition trees, and local transfer function classifier. The accumulated results and discussion are discussed below in detail.(i)KNN-based results: the results of the KNN classifier for four different keywords are depicted in Figure 5. From the figure, it is concluded that the KNN provides prominent results in terms of accuracy, misclassification rates, and other performance metrics selected. These high-performance results ultimately show the applicability of the proposed algorithm.(ii)Cross validation method: the results of the cross validation architecture for four different keywords are depicted in Figure 6. From the figure, it is concluded that the cross validation provides good results but its performance is not as good as the KNN-based model. It generates comparatively large values for the misclassifier than the KNN-based model. Also, it generates small % age values for other performance measures. These small accuracy values and high misclassification rates show the inability of the cross validation mechanism in the proposed field.(iii)Decision rules: the results of the decision rules-based classification architecture are depicted in Figure 7. Compared to both the KNN and cross validation models, its accuracy results are too small and its misclassification rate is very high. This low performance reflects the inability of the decision rule-based architecture in the proposed field.(iv)Decomposition tree: the results of the decomposition tree-based classification architecture are depicted in Figure 8. Compared to the prescribed KNN cross validation and decision rules-based models, its accuracy results are too small and its misclassification rate is very high. This low performance reflects the inability of the decision rule-based architecture in the proposed field.(v)LTF-C-based results: the results of the LTF-C-based classification architecture are depicted in Figure 9. For some keywords, it generates the optimum results, but for some instances, it generates high misclassification rates. For two objects, it generates a misclassification rate greater than 60% and 17% that can generate vague results. In recognition task, vague results are never acceptable, and this ultimately reflects the nonapplicability of the LTF-C-based architecture in the proposed model.

5. Conclusion

With the enhancement of technology, the level of user knowledge is increasing day by day. This increase of information is in volume, velocity, and variety. Extracting meaningful insights is the dire need of an individual from such information and knowledge. Visualization is a key tool and has become one of the most significant platforms for interpreting, extracting, and communicating information. The current study is an endeavour toward data modelling and user knowledge by using the rough set approach for extracting meaningful insights. The technique has used different rough set algorithms such as KNN, decision rules, decomposition tree, and LTF-C for the experimental setup. The approach has found its accuracy for the optimal use of data modelling and user knowledge. The experimental setup of the proposed method is validated by using the dataset available in the UCI web repository. The KNN algorithm shows good accuracy among the algorithms used for the experimental setup of the proposed research. The results have an accuracy of 96% for KNN, 87% for decision rules, 91% for decision trees, 85.04% for cross validation architecture, and 94.3% for LTF-C. The validity of the proposed classification algorithms is tested using different performance metrics such as F-score, precision, accuracy, recall, specificity, and misclassification rates.

Data Availability

No data were used to support this study.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the Science and Technology Project of State Grid Xizang Electric Power Co., Ltd. (SGXZJY00JHJS2000007), Influence of Energy Storage Technology Application on Power Grid, and Science and Technology Project of State Grid Zizang Electric Power Co., Ltd (SGXZJY00JHJS2000008), Research Technology Service of Multi Energy Complementary Demonstration Application.