Abstract

Program accreditation is important for determining whether or not a program or institution meets quality standards. It helps employers to evaluate the programs and qualifications of their graduates as well as to achieve its strategic goals and its continuous improvement plans. Preparing for accreditation requires extensive effort. One of the required documents is the program’s self-study report (SSR), which includes the PEO-SO map (which allocates the program’s educational objectives (PEOs) to student learning outcomes (SOs)). It influences program structure design, performance monitoring, assessment, and continuous improvement. Professionals in each academic engineering program have designed their PEO-SO maps in accordance with their experiences. The problem with the incorrect design of map design is that the SOs are either missing altogether or cannot be assigned to the correct PEOs. The objective of this work is to use a hybrid data mining approach to design the correct PEO-SO map. The proposed hybrid approach utilizes three different data mining techniques: classification to find the similarities between PEOs, crisp association rules to find the crisp rules for the PEO-SO map, and rough set association rules to find the coarse association rules for the PEO-SO map. The work collected 200 SSRs of accredited engineering programs by the ABET-EAC. The paper presents the different phases of the work, such as data collection and preprocessing, building of three data mining models (classification, crisp association rules, and rough set association rules), and analysis of the results and comparison with related work. The validation of the obtained results by different fifty specialists (from the academic engineering field) and their recommendations were also presented. The comparison with other related works proved the success of the proposed approach to discover the correct PEO-SO maps with higher performance.

1. Introduction

In designing academic programs, more emphasis is placed on improving students’ knowledge and skills; this is accomplished in undergraduate programs through a series of courses in the subject area. General courses provide core foundational knowledge, and each course in these courses provides a range of activities to enhance students’ skills in knowledge, cognitive skills, communication, leadership, teamwork, presentation, technical writing, and psychomotor skills domains. The life cycle of the program includes various phases such as design and specification, implementation, and continuous updating as shown by the authors in [13]. PEOs are specified by various stakeholders and components and are defined as a set of skills that relate to knowledge, skills, and attitudes that learners are likely to demonstrate. PEOs are then mapped to the expected student learning outcomes in each course. Figure 1 illustrates a hierarchical structure of academic program design. The design of the PEO-SO map is a core phase of academic program design. In terms of student performance, PEOs are assessed 4–5 years after graduation based on the academic model presented by the authors in [4]. The ABET-EAC (American Board for Engineering and Technology-Engineering Accreditation Commission) has accredited 380 computer engineering programs by 2019. The ABET-EAC criteria for engineering programs are related to the knowledge, skills, and student behaviors acquired during the program. The PEO-SO direct and indirect assessments presented by the authors in [5] are used to ensure the achievement of program objectives.

The guide to accreditation policies and procedures of ABET describes the details of the PEOs in Criterion 2 and the student learning outcomes in Criterion 3 (see Figure 2, Appendix A) and shows the ABET SOs in their old and new updated versions. One of the most common problems encountered in engineering program design is the incorrect mapping of PEOs and SOs. This influences program design, implementation, assessment, and accreditation processes. It also increases the burden of the accreditation process and decreases the quality of the program. Therefore, a robust design of the PEO-SO map is a critical issue to avoid these problems and minimizes the effort required to prepare the accreditation documentation. On the other hand, the proper design of the PEO-SO map improves the SO selection and allocation, improves student performance, enhances graduate skills, and increases the satisfaction of the professional community, society, creativity, professionals, ethical issues, goal achievement of academic programs, economic issues, and local and international competition.

The work in this paper aims to introduce a hybrid of data mining techniques (three different data mining models) to discover the correct PEO-SO map that avoids the problems of incorrectly mapping PEOs to their SOs. The paper uses the rough set association rule mining technique to eliminate the confusion in the association of PEOs to SOs and minimize the number of SOs associated with the corresponding PEOs and eliminate the ambiguous association. Only the rules that are certain to result in the correct mapping of PEOs and SOs are found.

This in turn minimizes the effort required to design academic engineering programs and prepare them for accreditation. In this work, a dataset of 200 SSRs of academic engineering programs was used to develop and validate the proposed model. The remainder of this paper is presented as follows: Section 2 provides a summary of related work using data mining techniques in education or higher education. Section 3 presents the proposed approach. Section 4 presents the different stages of data acquisition, preprocessing and presentation, experimental design, results obtained, discussion, and analysis. Finally, Section 5 is reserved for the conclusion and future work.

Many researchers have used artificial intelligence and machine learning algorithms, as well as statistical theories and techniques, in their work to discover key patterns in educational datasets. Their goals are to support academic program design and assessment, accreditation and reaccreditation processes, and decision-making processes and to improve program performance in higher education institutions. Data mining techniques help remove difficulties and impurities to produce a good analysis of large datasets and discover the hidden knowledge in datasets captured by various information systems. Various researchers have worked on developing data mining models to summarize, classify, cluster data, and develop association rules and other features to be applied to various datasets in educational data mining (EDM). The data mining techniques in [6] detect the correspondence between course content learning objects and program level. In [7], an analysis of two different datasets of academic courses is presented using graphical, statistical, and quantitative techniques to select an appropriate ensemble learner from a combination of six potential learning algorithms. In [8], several relevant studies on computer-supported learning analytics (CSLA), computer-supported predictive analytics (CSPA), computer-supported behavior analysis (CSBA), and computer-supported visualization analysis (CSVA) from 2000 to 2017 are presented. Predicting student performance using educational data mining was presented in [9], where the base classifiers were random tree, J48, KNN, and naïve Bayes. The use of decision trees and hierarchical linear models using data from the Spanish PISA 2015 and high and low effectiveness schools is presented in [10]. In [11], an assessment framework for capstone courses is presented to evaluate student’s performance and project quality by assessing student learning outcomes. In [12], a methodology for collecting information and measuring student learning outcomes for the ABET accreditation preparation process is presented to assist in the completion of ABET SSR. The Apriori technique for identifying association rules was used to create the PEO-SO map that describes the relationship between the program’s educational goals and student learning outcomes, as shown in [13]. The Apriori algorithm was used to create association rules to describe the mapping of PEOs and SOs. The problem with this approach is that the number of rules is very high and not specific. In the work presented in [14], a tool for computational methods of information was introduced that includes rough sets, incompleteness of information, data mining, granular computation and extraction of association rules, and new mathematical frameworks. The tool is named RNIA (rough nondeterministic information analysis). In [15], an assessment and evaluation strategy for ABET student outcomes (SOs) of computer science and computer information systems programs is presented, where the assessment is developed through direct and indirect methods. Quality of education through accreditation, teaching, and learning in nursing programs is presented in [16]. A framework and work phase is developed to collect and document the selection for assessment of ABET curriculum requirements. The work developed a tool called ABETAS to automate this framework [17], which helps the institution prepare for ABET accreditation to minimize the burden of assessing student outcomes. The relationship between teaching quality and accreditation is presented in [18], illustrating how accreditation can assist the program in maintaining quality and programs in improving or achieving educational quality through the accreditation process. The approach used rough set theory to adapt the association rule model to discover customer favorites, and its analysis was presented in [19]. A modification mechanism for the attributes and association rules of the rough set was presented, and this proposal was applied to e-commerce platforms to categorize the rough recommendations. An investigation of mining class association rules using the rough set approach is presented in [20]. In [21], an algorithm for finding the finest class rules is presented that uses the adaptation of the Apriori association rule algorithm based on rough set theory to compute the support and confidence of the elementary set of lower approximation concepts. Rough set theory is used in association rule discovery. It simplifies the process of traditional association rule mining and avoids redundant rules introduced in [22] to determine rough set rules. In the work presented in [23], a collection methodology was presented for mapping PEOs to SOs derived from the SSRs of 32 engineering programs accredited by ABET. It minimizes the effort and time-consuming processes. An association rule mining algorithm based on the properties of rough set theory is said to improve the Apriori algorithm for association rule mining based on a decision table. Assessment methods for the ABET-CAC accreditation criteria for computer science undergraduate programs are presented in [24]. They adopted set of student outcomes for the computer science program to meet the ABET program outcomes and PEOs. Using the theory of rough sets and their properties to discover information in a simpler way than the normal Apriori association rule mining method presented in [25] minimizes the attributes in the dataset and develops a simpler data mining model.

In the above review, many research papers were presented that focused on the accreditation process and its requirements. They made good contributions in different areas related to the design of PEOs and SOs, but they did not pay attention to the size of the rules governing the relationship between PEOs and SOs in the PEO-SO maps. They presented the discovery of PEO-SO with high ambiguity and large scale with low confidence. This leads to many errors in program design and accreditation processes.

The idea of the approach proposed in this paper is to discover the correct PEO-SO map that generates minimum size and accurate rules for the relationship between PEOs and SOs. The proposed approach uses three different data mining techniques as follows:(i)The decision tree (J48) classifier is very popular to represent the data in a tree form similar to rules. Its goal is to discover the similarities and dissimilarities between different PEO categories. It discovers the confusion among different PEOs, which assists in eliminating this confusion during the PEO-SO map design phase. Its results can be used as a guide for PEO-SO map design.(ii)The Apriori algorithm for association rules is used to determine the clear rules describing the relationship between PEOs and SOs.(iii)The adaptation of the rough set theory for Apriori association rules is used to determine the association rules of the rough set. It consists of lower bound rules (describing the safe region, i.e., high conf.% rules) and upper bound rules (describing the uncertain region, i.e., low and high conf.% rules). The goal is to select the rules with the lower conf.% and avoid redundancy, eliminate the ambiguity between different PEO rules, and simplify the association rule mining with the correct rule sets that govern the relationship between PEOs and SOs.

The resulting PEO-SO maps are evaluated by 50 independent academic professionals to obtain their assessment and feedback. Figure 3 illustrates the complete structure of the proposed approach and shows the different phases of the work.

3. The Proposed Approach

This section presents the theoretical basis of the machine learning algorithms utilized in this paper to develop the proposed hybrid approach. We start with the bagged J48 decision tree algorithm for machine learning, then the Apriori algorithm for association rules, and finally the rough set Apriori algorithm for rough association rules (upper and lower bounds).

3.1. Bagged J48 Machine Learning Algorithm

Classification is a process of recognizing, understanding, and grouping objects into predefined classes using training datasets. Machine learning software uses different types of algorithms to classify future elements of datasets into correct categories. A decision tree is a classification algorithm that uses a divide and conquer algorithm, which consists of decision nodes and leaf nodes. The decision node identifies a test over one of the attributes and the leaf node represents the class value [26]. The classification error is the percentage of misclassified cases [27]. In practice, the training datasets are usually large, which leads to a larger number of branches and layers in the generated decision tree. When there are more class categories in the decision tree, the classification accuracy decreases significantly. There are various decision tree generation algorithms such as ID3, J48, FT, BF Tree, LMT, and many more. The performance is evaluated using the F-measure. By using machine learning algorithms, the proposed work is automated which increases the accuracy of the result [28, 29]. The J48 algorithm was proposed and developed by Quinlan in 1993. In this work, the objectives of using the J48 classifier are as follows: first, we validate the process of data collection and representation, and second, we discover how the classifier is confounded with different PEO categories of the different SSR reports in different academic programs. In this work, we used the J48 algorithm because it has higher accuracy. To increase the accuracy of the classifier, an ensemble technique can be used; the classification performance is greatly improved by combining the decisions of different classifiers into a single classifier. This J48 algorithm uses two ensemble learning approaches, bagging and boosting, which are applied to five traditional classifiers.

3.2. Apriori Association Rule

The association rules are formally described as introduced in [22, 30]. Let Z = {Z1, Z2, …, Zm} be the set of attributes and T be an instance in the dataset S, where T ⊆ Z. Each instance in S is identified by TID. If the set of objects satisfies X ⊆ Z, Y ⊆ Z, and X ∩ Y = φ, the implication X ⟶ Y is defined as an association rule, and if s% in D matches X ⟶ Y, then the support of the rule X Y is s%, which is computed by s% = support . If the instance contains X and Y, then the confidence of rule X ⟶ Y is c%, which is calculated by c% = support . Minsup and minconf if s% ≥ minsup and c% ≥ minconf, which is defined as a strong association rule. The Apriori algorithm is updated in [31, 32]. To reduce bias, a lift judgment measure is defined in [33, 34] and defined by the formula lift = c (X ⟶ Y)/s (Y). Table 1(see Appendix B) illustrates an explanatory example for the calculation of support % and conf. % using the Apriori algorithm.

3.3. Rough Set Theory

Rough set theory can make a proper measurable analysis of vague, unpredictable, and imperfect information [35, 36]. The universe (all instances in a dataset) is divided into units of imperceptible objects, which are defined as basic sets. This imperceptibility is related to the outcome and granularity of the information [37, 38]. Pawlak’s rough set model is the basis for formal reasoning and data analysis and self-directed decision-making [39]. Rough set theory classifies uncertain information expressed in terms of experience data. A set of similar objects is called an elementary set, which is a fundamental atom of knowledge. Any union of elementary sets is called a crisp set, while other sets are rough sets. Each rough set has boundary line elements that can belong to the set or its complement, as shown in [40, 41]. There are three types of approximation in rough set theory; the different regions that represent the approximation properties of rough set theory are the upper boundary BU, the lower boundary (BL), and the boundary region (BU-BL), which are shown in Figure 4.

3.4. Rough Set Association Rules (RSARs)

The algorithm for generating the rough set association rule is denoted by R_Apriori and was presented in [18]. It modifies the Apriori algorithm for generating association rules, which generates frequent rules. It consists of three phases. The first phase computes the support of each element with one rule (which contains only one element), while the second phase computes the support of each element with two rules (which contains two elements), and the third phase computes the support of each element with three rules (which contains three elements). The steps of the RSAR algorithm are described with an explanatory example illustrated in Table 2 (see Appendix C). The example uses a dataset from the instances of the collected SSR dataset for the PEO-SO map.

4. Datasets and Experimental Result Analysis

4.1. Dataset Description and Preprocessing

The raw datasets used in this work were collected manually using the Google search engine and accredited academic programs’ websites. From 200 SSRs of accredited engineering programs, the work aims to discover the robust and correct PEO-SO maps. Therefore, we focus only on the map section in the SSR documents. Each academic engineering program should illustrate the mapping between PEOs and a set of SOs (11 outcomes from a to k, see Figure 2, Appendix A). This is one of the requirements of ABET-EAC, presented in subsection B of Section 3 (outcome subsection). Therefore, the dataset used was selected from these subsections in the SSRs of the entire dataset. Then, we create a table for 200 PEO-SO maps extracted from 200 SSRs. Each PEO-SO map consists of a predefined number of entries, which are first collected in the form of symbols and words and then converted into numerical entries (on average, the total entries are 2008 = 1600 entries, and each entry is represented as 11 student learning outcomes. Some entries were omitted because they were not complete or were missing). The representation of the data is performed in several steps. First, we encode the set of PEOs with a set of symbols similar to those in [13], and Table 3 shows a symbolic form of the common set of PEOs for all engineering programs. PEOs were presented in text form, and in many cases, each PEO was confusingly prepared (not specific and/or merged with two or more PEOs), e.g., “excel in industrial or graduate work in computer engineering and related fields” represents two PEOs as graduate studies and career development. To solve this problem, an essential word processing step was performed by implementing a software program similar to the Word2Vec model to convert each PEO into 11-dimensional word vectors (each word represents a single PEO) using the collected SSR dataset described above, and Table 4 shows an example of the output. The third step is the conversion, normalization, and presentation of PEO-SO map datasets, which are essential for the further data mining steps; it strongly affects the output of the data mining model, and Tables 58 show examples of these preprocessing steps. The following remarks are intended to illustrate these steps:(i)Table 3 shows an example of the PEO-SO map for academic program A. PEO1 is divided into two PEO categories: career development and graduate studies, which are symbolically represented as C_D and G_S, respectively. The last column refers to the assigned student outcomes for these two PEO categories, where “x” means that this SO is unassigned for this PEO, while “√” means that it is assigned.(ii)Table 4 shows another example of the assignment of PEOs and SOs in academic program B. It illustrates the different representations of PEOs from one program to another and the difference in the assignment of SOs.(iii)Table 5 illustrates the conversion of PEOs, and the PEOs shown in Tables 3 and 4 into binary forms are suitable for input to different data mining models in the next steps.(iv)Table 6 illustrates the final binary representation of the mapping of PEOs and SOs. Each row contains 13 features such as the pattern ID, the PEO class, and 11 student outcomes from ak.

4.2. Result Analysis of Applying the Bagged J48 Machine Learning Algorithm

This section presents the interpretation and analysis of the results obtained by applying the bagged J48 classifier to the dataset. The obtained results are presented in Table 9 , Table 10, and Table 11 (see Appendix C). The results illustrate the confusion matrix for the classifier outputs. It is important to illustrate the misattribution and misunderstanding of PEOs to their SOs when designing PEO-SO maps for PEO categories:(1)Category C_D has 100 instances, and 84 of them are classified correctly as L_L_L and 16 instances are not, which means that specialists may get confused in assigning PEOs to SOs for these two categories.(2)Category E has 86 instances, and six of them are classified correctly as L_L_L and two of them are not.(3)Category P has 84 instances, and 70 of them are classified correctly, while 14 instances are not.(4)Category C has 44 instances, and 24 of them are classified correctly, and 8 instances are incorrectly classified as E, while 12 instances are incorrectly classified as T, which means that there may be confusion among the specialists in assigning PEOs to SOs for these three categories.(5)The L_L_L category has 204 instances, and 196 of them are classified correctly, while 8 instances are incorrectly classified as T, implying that the specialists might have made mistakes in assigning PEOs to SOs for L_L_L and T categories.(6)The G_S category has 40 instances, and 24 of them are classified correctly, eight instances are incorrectly classified as L_L_L, two instances are incorrectly classified as S category, and two are classified as T category, which means that the specialists may get confused when assigning PEOs to SOs for these four categories.(7)The S category has 112 instances, and 100 of them are classified correctly. Eight instances are incorrectly classified as L_L_L and 4 instances are incorrectly classified as P category, which means that specialists may get confused in assigning PEOs to SOs for these three categories.(8)The T_C category has 40 instances, and 24 of them are classified correctly, 8 instances are incorrectly classified as S category, 4 instances are incorrectly classified as P category, and 4 are classified as T category, which means that the specialists may get confused in assigning PEOs to SOs for these four categories.(9)Category T has 132 instances, and 120 of them are classified correctly, 8 instances are incorrectly classified as L_L_L, and 4 instances are incorrectly classified as P category, which means that specialists may get confused when assigning PEOs to SOs for these three categories.(10)Category K_C has 12 instances, but all are misclassified, which means that this class should be removed from the proposed 11 PEO categories.(11)Category L has 120 instances, and 56 of them are classified correctly, and 32 instances are incorrectly classified as T. Eight instances are incorrectly classified as category P. Eight instances are incorrectly classified as C. Eight instances are incorrectly classified as L_L_L. Eight instances are incorrectly classified as G_S. This means that specialists could get confused in assigning PEOs to SOs for these six categories.(12)Table 10 (see Appendix D) shows the summary of the results obtained using the bagged J48 decision tree classifier, which performs the highest for the datasets and provides more details, while Table 11 (see Appendix D) shows the detailed accuracy of each PEO class. It shows the percentage of correct PEO-SO mappings for each PEO category and indicates that the K_C category performs the worst in both the number of instances and percentage accuracy, so this category is eliminated as it may be implicitly included in another PEO category. This classification model is considered the first step in the analysis of the assignment of PEOs and SOs. It is very useful for the academic specialist to avoid the incorrect assignment of PEOs to corresponding SOs during the design of the academic program.

4.3. Result Analysis of Applying the Apriori Rough Set Association Rule Mining Algorithm
4.3.1. Upper Bound Apriori Rough Set Association Rule Mining Algorithm

This section presents the results of the interpretation and analysis of the upper bound rules by applying the RSAR algorithm to the dataset. The obtained results are presented in Table 12, which shows the upper bound rough set association rules that govern the relationship between PEOs and SOs. To interpret the rules shown in Table 12, we consider the first row of P in the PEO-SO map as follows.

If PEO has only “1” SO entry, it means that this SO exists, or only “0,” which means that this SO does not exist. The upper bound rule for P can be processed using the gray area of results that may or may not exist with an average confidence of 0.75. The obtained upper bound rules are shown in Table 12 and can be interpreted as follows: E, S, and C have the highest confidence level, which means that these PEOs are well defined in the proposed PEO-SO map, while G_S and T_C have the lowest confidence level, which means that these two PEOs are unclear in their mapping. Each PEO is defined with a rule associating all groups of student outcome SOs, and each association rule should be defined twice because each rule contains some of the student outcomes with two values of either 1 or 0. The upper bound rules are shown with SOs, which may or may not be present in the low confidence PEO-SO rules (see Tables 13 and 14).

4.3.2. Lower Bound Apriori Rough Set Association Rule Mining Algorithm

This section presents the results of the interpretation and analysis of the lower bound Apriori rough sets association rule (safe region) obtained by applying the (RSAR) algorithm to the dataset. Table 15 illustrates the lower bound rule sets of RSAR that govern the relationship between PEOs and SOs. To interpret the rules shown in Table 15 that govern the PEOs with 11 old SOs, we consider the first row P in the PEO-SO maps as follows.

P is the lower bound (safe region) rule that includes only the SOs with “1” or “0” entries, which means that the SO either exists or does not exist in rules such as a, c, e, f, i, j, and k SOs, while SOs b, d, , and h do not exist in rule P. Therefore, the lower bound rule for P can be defined with only one rule, not two rules with an average confidence of 0.79, which is higher than that of the upper bound rule with an average confidence of 0.75.

The rules obtained can be summarized as follows: the lower bound RSAR is more concise and robust in its description; each rule can be described once, but not twice as in the upper bound RSAR. The comparison between upper bound RSAR conf. % and lower bound RSAR conf. % is shown in Figure 5. From this, it can be seen that the average confidence level for the lower bound rules is higher than for the upper bound rules for all PEO domains, proving their concreteness and robustness to the Apriori association rules.

4.4. Result Mapping to the New Updated ABET-EAC SOs

In this section, we present the mapping of the results obtained with the proposed approach to the new updated ABET-EAC SOs listed in Figure 2 (see Appendix A). We have mapped 11 (a–k) old SOs listed in Figure 2 (see Appendix A) with the new updated SOs as shown in Table 16. The mapping between old and updated SOs is created using a software program similar to the Word2Vec model that maps 11 old SOs to seven new updated SOs, as shown in Table 16.

Each new SO is assigned to 11 old SOs; e.g., outcome “#” is assigned to both old outcomes “a” and “e,” while “#5” is assigned to only one old outcome “d.” Table 17 illustrates the assignment of outcomes from the old to the new updated ABET-EAC SOs. The interpretation of the rules is explained in Table 17, which governs the PEOs with the new seven SOs, as follows.

For example, we consider the first line of P in PEO-SO maps, which is the lower bound of the rule (safe area). It contains only the SOs with “1” or “0” entries, which means that the SO either exists or does not exist in this rule as SOs #1, #2, #3, #4, and #5 exist in rule P, while SOs #5 and #6 do not exist in rule P (see Tables 18 and 19).

4.5. Questionnaires, Analysis, and Feedback about the Obtained Results

This section is reserved for the validation and evaluation of the obtained results by 50 independent professionals and experts (lecturers and professors) from different academic engineering professions. The evaluation was performed by distributing a survey among these experts and obtaining their feedback for further analysis. The survey template was designed to determine the level of satisfaction with the results obtained from the proposed approach to PEO-SO maps. The survey and results of the analysis are presented in Table 20 and Figure 6 (see Appendix E); it includes 9 questions and five satisfaction levels: strongly agree, agree, neutral, disagree, and strongly disagree. The average column is calculated by taking the average of the 3 boxes: strongly agree, agree, and neutral. The survey showed a 97% agreement with the results obtained with the proposed approach.

4.6. Comparison with the Related Work Presented

In this section, we present the comparison of the proposed approach with the work presented in [13], which uses the Apriori association rule mining technique to discover the rules describing the relationship between PEOs and SOs, while our approach uses the rough set Apriori association rule mining technique. Our proposed approach uses the bagged J48 classifier to detect and resolve the confusion in the association of different PEO-SOs. Our approach uses the text processing software similar to the Word2V model to perform automatic preprocessing of the merged PEOs into PEO-SO maps, while this was performed manually in [13]. Figure 5 illustrates a graphical comparison of the obtained results as lower bound and upper bound rules conf. % of the Apriori association results in [13]. It shows that our proposed approach performs better for the upper bound and lower bound rules with higher conf. %. It concludes that our proposed approach develops a PEO-SO map with a higher conf. % and a smaller number of rules with a small size, as well as greater robustness. The work presented in [13] describes the PEO-SO map with 22 rules and ours with only 11 rules (lower bound 50% reduction in the number of rules). The work uses a questionnaire survey to evaluate the obtained results by 50 different specialists which is not the case in [13].

5. Conclusion and Future Extensions

This paper presents the framework for the proposed hybrid approach to support the design and accreditation process in academic engineering programs, which uses three different data mining techniques to determine the best design for PEO-SO. It can minimize the effort required to design a PEO-SO map and the time required to prepare the accreditation processes. The proposed approach has succeeded in designing a correct assignment PEO-SO map with a minimal set of rules and small rule sizes. The paper presented different phases for the development of the proposed approach, including dataset collection and preprocessing, data mining construction, modeling, analysis, interpretation, and evaluation of the obtained results. The main feedback from the questionnaires proved the validity of the proposed approach with 97% agreement in the design and minimizing the effort required for accreditation requirements. The comparison with related methods showed that our proposed approach performs better with a high percentage of performance. The limitation of the presented work is the choice of the confidence rate threshold, which distinguishes between lower and upper bounds. Future extensions of this work will be the use of natural language processing (NLP) techniques for the preprocessing phase of PEOs. The second extension will be the generalization of the proposed approach to all academic courses. The third extension will be the development of an interactive web-based system for the proposed approach. Finally, additional datasets will be collected and preprocessed for use as a benchmark dataset.

Appendix

A. A List of ABET 11 Old SOs (11 from a to k and New Updated 7 SOs (7 from 1 to 7)

The list of ABET is given in the following figure.

B. Examples of Support and Confidence Measure calculations

The measure support examples are as follows.

C. An Explanatory Example for the Rough Sets Association Rules (RSAR) for PEO-SO Map

This section illustrates an explanatory example to illustrate the steps for modifying the Apriori association rules using the rough set theory technique as shown in Table 1, which presents the comparison between the association rule and the rough set association rule algorithms in the way of computing the support and confidence percentage. This section uses a sample from the collected preprocessed dataset PEO-SO maps as follows: the value of an attribute is described as ɛ = (a, α), a ∊ A, α ∊ α, and[ɛ] describes a universe unit in the U set (the set of all samples). Each attribute has α value and unit, and their s (support) can be estimated as follows:(i)[ɛ]1: [{a}] = {1, 2,7,11} and s ([ɛ]1) = 4.(ii)[ɛ]2: [{b}] = {1,2,7,12,13} and s ([ɛ]2) = 5.(iii)[ɛ]3: [{c}] = {1,2,3,4,7,10,11.13,14,15} and s ([ɛ]3) = 10.(iv)[ɛ]4: [{d}] = {1,6,7,10,11,12,15} and support ([ɛ]4) = 7.(v)[ɛ]5: [{e}] = {1,2,3,9,13} and s ([ɛ]5) = 5.(vi)[ɛ]6: [{f}] = {1,2,4,6,8,9,11,14} and s ([ɛ]6) = 8.(vii)[ɛ]7: [{}] = {1,6,8,9,10,11,12,13,15} and s ([ɛ]7) = 9.(viii)[ɛ]8: [{h}] = {1,5,7,9,13} and s ([ɛ]8) = 5.(ix) [ɛ]9: [{i}] = {1,2,3,5,8,9,11,13} and s ([ɛ]9) = 8.(x)[ɛ]10: [{j}] = {1,3,5,6,7,8,9,11,13} and s ([ɛ]10) = 9.(xi)[ɛ]11: [{k}] = {1,3,5,11,13} and s ([ɛ]11) = 5.

For example, to combine 6 attribute values (c, d, f, i, j, and k), the elementary set of B = {c, d, f, i, j, and k} is defined as follows:

First step:(i)[ɛ]3: [{c}] = {1,2,3,4,7,10,11.13,14,15} and s ([ɛ]3) = 10.(ii)[ɛ]4: [{d}] = {1,6,7,10,11,12,15} and s ([ɛ]4) = 7.(iii)[ɛ]6: [{f}] = {1,2,4,6,8,9,11,14} and s ([ɛ]6) = 8.(iv)[ɛ]9: [{i}] = {1,2,3,5,8,9,11,13} and s ([ɛ]9) = 8.(v)[ɛ]10: [{j}] = {1,3,5,6, 7,8,9,11,13} and s ([ɛ]10) = 9.(vi)[ɛ]11: [{k}] = {1,3,5,11,13} and s ([ɛ]11) = 5.

Second step:

{[ɛ]3, [ɛ]4, [ɛ]6, [ɛ]9, [ɛ]10, and [ɛ]11} = [{c, d, f, i, j, and k}] = {1, 11} and s ([ɛ]3, [ɛ]4, [ɛ]6, [ɛ]9, [ɛ]10, and [ɛ]11) = 2.

s% and c% of class association rules are computed directly, and S and C are calculated by the following two equations:

B L is defined as the lower bound approximation in rough set theory, indicates the number of i occurs in conjunction with y across the dataset in the table, and |Ω| is the number of all the samples in the dataset table.

Then, the rough set association rule is as follows: RSAR={c, d, f, i, j, k ⇒ CD}. The elementary of B’ set in the condition of the rule contains two attributes and is defined as follows:(i){condSet} = {[ɛ]c} = [{c, d, f, i, j, k}] = {1, 11, 13}, and s ([ɛ]c) = 2.(ii){decSet} = {[ɛ]d} = [{C_D}] = {1, 11} and s {[ɛ]d} = 2.(iii)S% of RSAR = s {condSet decSet} = s {[ɛ]c, [ɛ]d} = s {1, 11} = 2.(iv)|Ω| = 15.

Then, S% of (RSAR) is ([ɛ]d)/|Ω|=2/15=13%. c% of RSAR is S (RSAR)/s% ([ɛ]c) =2/2=100%.

D. Detailed Accuracy by PEO Category

The performance summary and detailed accuracy are given as follows.

E. Questionnaire Results for the Proposed PEO-SO Map

The questionnaire results are as follows.

Data Availability

The unavaliablity of the data set because it is collected manally and preprocessed to be suitable for further processing.

Conflicts of Interest

The author declares no conflicts of interest.