Abstract

Role-based access control (RBAC) is widely adopted in network security management, and role mining technology has been extensively used to automatically generate user roles from datasets in a bottom-up way. However, almost all role mining methods discover the user roles from existing user-permission assignments, which neglect the dependency relationships between user permissions. To extend the ability of role mining technology, this paper proposes a novel role mining framework based on multi-domain information. The framework estimates the similarity between different permissions based on the fundamental information in the physical, network, and digital domains and attaches interdependent permissions to the same role. Three simulated network scenarios with different multi-domain configurations are used to validate the effectiveness of our method. The experimental results show that the method can not only capture the interdependent relationships between permissions, but also detect user roles and permissions more reasonably.

1. Introduction

Access control is a fundamental concern in network security management. Role-based access control (RBAC) has become the dominant model for both commercial and research fields [1, 2]. The key point of RBAC is to determine proper roles to capture business needs, which is named as role engineering. There are mainly two kinds of approaches to find user roles: top-down and bottom-up. The top-down approaches always perform a deep analysis of business processes and identify user roles manually [3], while the bottom-up approaches always discover the user roles from existing datasets automatically, which are also named as role mining as they usually resort to data mining techniques [4, 5].

Existing role mining approaches mainly discover a proper user-role assignment relation and a proper role-permission assignment relation from an existing user-permission assignment relation . In the process, user-permission assignments are considered to be independent. However, considering a typical service authorization process, users are authorized by multiple policy control points, including gate machines, firewalls, or identity authentication systems. Those systems are always configured separately and may grant users with more permissions than they deserve. For example, users, who are authorized to enter certain space, may have the opportunity to use the terminals belong to other users in the same space; users can connect the server behind the firewall remotely to bypass the access control lists and access unauthorized services; users can use the assigned passwords to crack similar passwords for other unauthorized services, etc. In a word, if the interdependent relationships are not taken into consideration, users with certain roles would get extra permissions, introducing security vulnerabilities into network systems.

To address the above-mentioned issues, this paper proposes a novel role mining framework named as RMMDI from the perspective of network security management. Instead of mining user roles from user-permission assignments, the framework discovers user roles from the fundamental information in multiple domains, including the physical domain, network domain, and digital domain. The framework is aimed at outputting a flat RBAC state that divides user permissions into several disjoint subsets. The user permissions in one set tend to be interdependent while the permissions in different sets tend to be independent. If a permission set is assigned to a user role, a user assigned some roles is unlikely to get extra permissions assigned to other roles. As such, potential security risks involved in the user-permission assignments process can be avoided.

The rest of this paper is organized as follows. In Section 2, some general works are briefly reviewed. Section 3 presents the proposed framework in detail. Section 4 shows the experimental setup and results, and Section 5 presents a comprehensive discussion. At last, Section 6 provides concluding remarks.

2.1. Role Mining

RBAC has become a dominating model for access control in network security. Instead of assigning permissions to the user directly, RBAC introduces the concept of roles to make access control system more compact and comprehensive [6]. A role is defined as a collection of permissions. The key point of RBAC is to generate proper roles. In this process, the bottom-up approach named as role mining gets much more attention than the top-down approach as the latter is time-consuming and human-intensive [3].

Kuhlmann et al. first proposed the concept of role mining for finding roles from user-permission assignment data [7]. Traditional role mining approaches are mainly divided into two classes based on their output [5, 8, 9]. The first class is to output a prioritized list of candidate roles, each of which is assigned a priority value. A larger priority value means the role is more important or useful. Complete Miner (CM) and Fast Miner (FM) are two typical algorithms of the first class, which identify overlapping clusters by analyzing the subset enumeration in an unsupervised way [10]. The second class is to output a complete RBAC state under a certain cost. There are also a lot of classic algorithms in the class, for example, OFFIS Role mining tool with Cluster Analysis (ORCA) [11], Hierarchical Miner (HM) [12], Graph Optimization (GO) [13], HP Role Minimization (HPr) [14], and HP Edge Minimization (HPe) [14].

Besides those traditional role mining algorithms, there are also many important approaches that emerged in recent years. For example, Frank et al. proposed a probabilistic approach to improve the role mining process by taking account of the business information. The approach utilized the similarity between user-permission relations to detect exceptional assignments and wrong assignments [15]. Besides, entropy-based methods were used in this approach to analyze the impact of business knowledge on role mining [16]. Alessandro et al. presented an approach that allowed role engineers to leverage business information. In the role mining process, the access data was divided into smaller subsets from a business perspective firstly and then traditional methods can be used to discover roles with business meanings [5]. Iran et al. proposed a method based on formal concept lattices to discover roles with semantic meanings [12] as well as a method based on logistic PCA (Principal Component Analysis) to eliminate data noises [17]. Du X and Change X proposed two algorithms based on artificial intelligence, i.e., the genetic algorithm and ant colony optimization algorithm [18]. Dong et al. proposed both fast exact and heuristic methods based on biclique network cover to minimize role number or edge number [19].

With regard to goodness measure, several metrics have been proposed in the literature, including minimizing the number of roles [10, 20], minimizing the number of edges [13, 14, 19], minimizing the number of user-role assignment and permission-role assignment relations [13], minimizing both the number of roles and edges [21], and minimizing the administrative cost [22]. These optimization goals can be uniformly represented by the Weighted Structural Complexity (WSC) [8, 9].

Although there are a lot of effective role mining approaches, most of them neglect the relationships between user permissions. From the perspective of network security management, user permissions are not independent. A user or potential attacker may get extra permissions from the preassigned permissions, which may introduce fatal risks to network security. Hence, in the framework RMMDI, we model the interrelationships between user permissions from multi-domain configuration information and get more reasonable user roles, mitigating the vulnerabilities and strengthening network security.

2.2. Multi-Domain Information Modeling

Traditional network security analysis mainly concentrates on the network domain, with a few concerns on other domains. However, with the deepening of research on insider threat, an increasing number of studies have shown that the attacker will attack the network not only in digital ways, but also through the physical domain and social domain.

The existing methods of joint modeling of network multi-domain information mainly define multi-domain information by using the formalized methods and then make inference based on the logical rules to judge whether the system can reach the unsafe state. Probst et al. proposed a formal model for describing scenarios that span the physical and digital domain [23, 24]. Kotenko et al. proposed a model for describing attacks that use social engineering and physical access based on the preconditions and postconditions of atomic actions [25]. Scott et al. built a security model that adds a spatial relationship between the elements in the ambient calculus [26]. Dimkov presented a security model named as Portunes graph to abstract the environment of an organization into a stratified graph, which involved the information in physical, digital, and social domain information [27]. Kammuller and Probst combined formal modeling and analysis of infrastructures of organizations with a sociological explanation to provide a framework for insider threat analysis [28].

In this paper, we take possible interaction effects among multi-domain permissions into consideration, which are the basis of similar permission finding and role mining based on multiple domain information.

2.3. Multi-View Community Detection

The community is a universal property in many complex networks, which means that network nodes can be divided into small groups [26]. Traditional community detection methods only utilize single network information. And several multiview community detection methods have been proposed, which utilize more information and achieve better performance.

Nonnegative Matrix Factorization (NMF) [29] is a classic clustering method, and several multi-view community detection methods based on NMF are proposed. Akata et al. proposed a method to jointly factorize multiple data matrices through a shared coefficient matrix [30]. Liu et al. proposed MultiNMF that regularizes the coefficient matrices learned from different views towards a common consensus for clustering [31]. He et al. extended NMF for multiview clustering by jointly factorizing the multiple matrices through coregularization [32]. Pei et al. proposed a nonnegative matrix tri-factorization (NMTF) based clustering framework with three types of graph regularization [33]. Li et al. proposed a framework based on regularized joint nonnegative matrix factorization (RJNMF) to utilize link and content information jointly to enhance the community detection accuracy [34].

In the framework RMMDI, we use the Pairwise Coregularized NMF clustering algorithm proposed in [32] to merge the information from two service networks (views). Experiments show that it can get more information than from a single view and make the role mining results more reasonable.

3. Role Mining Framework Based on Multi-Domain Information

In this paper, we proposed a role mining framework based on the multi-domain information, which is named as RMMDI. The framework is aimed at dividing possible user permissions into several disjoint subsets and assigning each subset to a user role. Then users are assigned with one or more necessary roles according to the permission they deserve. The structure of RMMDI is shown in Figure 1. The framework can be divided into three modules: basic information acquisition, relationship network construction, and community detection and role definition.

The basic information acquisition module obtains the necessary basic information from the target network, including multi-domain entity information and relationship information. The relationship network construction module constructs eight networks based on the obtained basic information, including the intermediate networks and ultimate networks. The community detection and role definition module detects permission communities on the ultimate networks by a multi-view community detection method and defines possible user roles.

3.1. Basic Information Acquisition

The basic information acquisition module is to collect network basic information, including the entities and entity relationships in the physical domain, network domain, and information domain, which are the foundation of relationship network construction.

3.1.1. Entity

There are five kinds of entities involved in the framework, i.e., space, object, service, info, and user.

Entity space represents specific physical space such as city, campus, building, or room, which is in the physical domain. All the space entities are represented as a set . Entity object is also located at the physical domain and represents network device like router, switch, or terminal. All the object entities are represented as a set . Entity service is in the network domain and represents network service like HTTP (Hypertext Transfer Protocol), FTP (File Transfer Protocol), and Email. All the service entities are represented as a set . Entity info is in the digital domain and represents the information like password, data, or digital file. All the info entities are represented as a set . Entity user represents network user. All the user entities are represented as a set .

3.1.2. Relationships

There are seven kinds of relationships involved in the framework, i.e., spatial similarity relationships, containment relationships, service access relationships, local management relationships, remote management relationships, service domination relationships, and info domination relationships.

Spatial similarity relationships are described by the matrix , where and . is determined by

where is the number of users who can move from space to space and is the threshold value, ranging from 0 to 2.

Device containment relationships are described by the matrix , where , , and . is determined by the following.

Service access relationships are described by the matrix , where , , and . is determined by the following.

Local management relationships are described by the matrix , where , , and . is determined by the following.

Remote management relationships are described by the matrix , where , , and . is determined by the following.

Service domination relationships are described by the matrix , where , , and . is determined by the following.

Info domination relationships are described by the matrix , where and . is determined by

where symbol indicates that information is dominated by information . It means there is a service , whose password is and from which the users can get information .

3.2. Relationship Network Construction

The relationship network construction module is to construct basic relationship networks based on the obtained basic information. As shown in Figure 2, there are eight networks to be constructed in total. The ultimate goal of this module is to form the Device View Service Network (DVSN), Information View Service Network (IVSN), and Multiview Service Network (MVSN). These three ultimate networks are used in community detection and role definition. Besides the three ultimate networks, there are other five networks involved in user-role mining, which are named as Local Management View Device Network (LMVDN), Remote Management View Device Network (RMVDN), Local Information View Device Network (LIVDN), Remote Information View Device Network (RIVDN), and Multiview Device Network (MVDN). The five intermediate networks are the foundation to construct ultimate networks. The meanings of intermediate networks and ultimate networks are described as follows.

3.2.1. Intermediate Networks

The five intermediate networks are described as undirected weighted graphs, whose adjacency matrices are constructed from the seven basic relationship matrices.

LMVDN. The LMVDN represents the similarity between devices from a spatial (local management) perspective, which means the devices located at similarity spaces are more similar than others. The network is represented by the adjacency matrix , whose values represent the similarity of two devices. The matrix is determined by the following.

RMVDN. The RMVDN represents the similarity between devices from a remote management perspective, which means the devices that can be managed by similar management services are more similar than others. The network is represented by the adjacency matrix , whose values represent the similarity of two devices. The matrix is determined by the following.

LIVDN. The LIVDN represents the similarity between devices from the perspective of local management service password, which means the devices with similar local management service password are more similar than others. The network is represented by the adjacency matrix , whose values represent the similarity of two devices. The matrix is determined by the following.

RIVDN. The RIVDN represents the similarity between devices from the perspective of remote management service password, which means the devices with similar remote management service password are more similar than others. The network is represented by the adjacency matrix , whose values represent the similarity of two devices. The matrix is determined by the following.

MVDN. The MVDN represents the similarity between devices from multiple perspectives, which merges the relationships from the local management perspective and the remote management perspective. The network is represented by the adjacency matrix , whose values represent the similarity of two devices. The matrix is determined by

where the symbol means dot product of two matrices.

3.2.2. Ultimate Networks

The three ultimate networks are also described as undirected weighted graphs, whose adjacency matrices are constructed from the seven basic relationship matrices and five intermediate networks.

DVSN. The DVSN represents the similarity of service permissions from a device perspective, which means the services accessed by similar devices are more similar than others. The network is represented by the adjacency matrix , whose values represent the similarity of two devices. The matrix is determined by

where is the function of filtering edges from the original graph, whose parameter is the original graph and is the ratio of edges to be reserved.

As the matrix is a fully connected matrix in which the edges with small weight have negative impacts on community results, we use a function to filter the low weight edges from the network. In function , for any node in the graph , we only reserve the top edges with the largest weight. If two edges have the same weight, we reserve the edge between the node and the neighbor node with a higher degree.

IVSN. The IVSN represents the similarity of service permissions from an information perspective, which means the services with a similar password are more similar than others. The network is represented by the adjacency matrix , whose values represent the similarity of two devices. The matrix is determined by

where is the same edges filtering function in formula (14).

MVSN. The MVSN represents the similarity of service permissions from multiple perspectives, which merges the similarity relationships from the device perspective and the information perspective. The network is represented by the adjacency matrix , whose values represent the similarity of two devices. The matrix is determined by the following.

3.3. Community Discovery and User-Role Definition

After building the ultimate networks, services can be divided into community relations through multi-view clustering algorithm, where all service permissions are divided into a community division . Then, for each , a role can be defined correspondingly. In this way, all network service permissions can be naturally assigned to classes, where the possible values of can be determined through algorithms such as maximum module degree.

In multiview service community discovery, we use the Pairwise Coregularized NMF clustering algorithm (PCoNMF) proposed in [32], which is based on regularized joint NMF. The objective function of service community discovery is formulated as follows.

The hypothesis behind PCoNMF is to regularize the coefficient matrices of the different views to a common consensus, which is then used for clustering. PCoNMF also adopts alternating optimization to minimize the objective function. The optimization works as follows: fix the value of and while minimizing over and ; then fix the value of and while minimizing over and . We repeat the two steps until the iteration threshold is achieved.

According to [32], the update rules are as follows.

Hence, the permission community detection algorithm is shown as Algorithm 1.

Input: nonnegative matrices , . number of communities , parameters
Output: Service Community Division .
  Initialize , , ,
  While Objective function does not converge and the Number of iterations is less than Threshold do
 Update according to Formula (18)
 Update according to Formula (19)
 Update according to Formula (20)
 Update according to Formula (21)
  end while
  Divide nodes to communities division according to the coefficient matrix
  return  

4. Experiments and Results

In this section, we evaluate our role mining method based on the multi-domain information of a simulated network, which is the simplification of the inner network of Corporation M.

4.1. Experiment Environment

We built a simulation network for experiments, including a router, a firewall, an Intrusion Prevention System (IPS), 3 switches (Switch1, Switch2, and Switch3), 6 servers (WServer, DServer, FServer, GServer, OServer, and IServer), 3 gate machines (GM1, GM2, and GM3), and 13 terminals (T1, T2, T3, …, T13). We used a HUAWEI S7706 as the core router, three HUAWEI S5700 as switches, a TOPSEC NGFW 4000-UF as the firewall, a TOPSEC IDP 3000 as IPS, and computers from Dell and HP as the servers or terminals. The router enabled 3-layer routing and the firewall were configured with bidirectional access control lists. All the servers and terminals were installed with different versions of Windows, including Windows 2003 Server, Windows XP, and Windows 7. We deployed an entrance guard system including 3 gate machines and a server (GServer). The gate machines used face recognition technology to determine whether a person can pass or not. An office automation system was deployed on the OServer, whose database was deployed on the DServer. We also deployed two websites and an FTP using IIS (Internet Information Services) on WServer, IServer, and FServer. Similarly, the websites depended on the same database deployed on DServer. The physical link relationships among devices are shown in Figure 3.

All the devices are distributed in 12 rooms in 3 buildings. 10 devices are located in building 1: terminal T1, T2, and T3 are in room 1-1; T4 and T5 are in room 1-4; T6 and T7 are in room 1-5; Switch1 is in room 1-2; and GM1 is in the hall of building 1 (room 1-3). 8 devices are located in building 2: terminals T8 and T9 are in room 2-1; T10 and T11 are in room 2-4; T12 and T13 are in room 2-5; Switch2 is in room 2-2; and GM2 is in the hall of building 2 (room 2-3). 10 devices are located in building 3: router, firewall, IPS, Switch3, and all servers are in room 3-1, and GM3 is in the hall of building 3 (room 3-2).

There were 34 services in the network, including 28 management services and 6 business services. The management services were used for device management, while the business services were used for corporation business. Each device was managed by a management service. The router and switches enabled SSH service. The servers and terminals enabled the Remote Desktop Service. In addition, the gate machines enabled web-based management interfaces. The website deployed on WServer provided a web service on port 80 named as WS_W, which was used to publish public information. The FServer provided an FTP service on port 21 named as FS_F, which was used by Network Administrators to share information. The GServer provided a data transmission service on port 8080 named as GS_T, which was used to synchronize data between GM machine and GServer. The OServer provided a web service on port 80 named as OS_W, which was used to document circulation for all users. The IServer provided a web service on port 80 named as IS_W, which was used by Server Administrators to share information. The DServer provided a database service on port 1433 named as DS_D, which was used to provide underlying support for WS_W, OS_W, and IS_W.

There were 33 passwords in the analysis. Each service, except for WS_W and OS_W, has a password. Besides, was added to represent the empty password. All the information involved is shown in Table 1.

There were 13 users involved in analysis named from User1 to User13, who used terminals T1 to T13 and knew passwords T1_M_P to T13_M_P, respectively. Using the top-down approaches, the network security administrators had gotten 5 user roles for the business information, which were named as Ordinary User, Server Administrator, Database Administrator, Network Administrator, and Security Administrator. The role-permission assignments are listed in Table 2.

4.2. Baseline Methods

To demonstrate the effectiveness of our method, we compare our approach with two groups of baselines. The first group comprises 5 clustering methods: 2 single view methods and 3 multiview methods. The second group comprises 4 traditional role mining methods: ORCA (OFFIS Role mining tool with Cluster Analysis), CM (Complete Miner), HPr (HP Role Minimization), and HPe (HP Edge Minimization)

4.2.1. Clustering Methods

SP (Spectral Clustering). SP [35] is a classical single view clustering algorithm, which makes use of the eigenvalues of the data similarity matrix to perform dimensionality reduction before clustering in fewer dimensions. The similarity matrix is provided as an input, consisting of a quantitative assessment of the relative similarity of each pair of points in the dataset.

SymNMF. SymNMF [36] is a clustering algorithm based on NMF, which takes a nonnegative and symmetric matrix as an input. The matrix contains pairwise similarity values of a similarity graph and is approximated by a lower rank matrix instead of the product of two lower rank matrices.

PCoSpec (Pairwise Coregularized Spectral clustering) and CCoSpec (Center-wise Coregularized Spectral clustering). Two coregularization schemes are adopted in spectral clustering framework [37], PCoSpec utilizes a pairwise coregularization to enforce the eigenvectors of each pair to be similar, and CCoSpec employs the centroid-based coregularization to enforce the eigenvectors to be similar with a common center.

CCoNMF (Cluster-wise Coregularized NMF clustering). CCoNMF extends NMF for multiview clustering by jointly factorizing the multiple matrices through cluster-wise coregularization [32], which enforces the cluster similarity matrices to be similar.

RMSC (Robust Multiview Spectral Clustering). RMSC [38] is a multiview spectral clustering method based on Markov chain, which explicitly handles the possible noise in the transition probability matrices associated with different views.

4.2.2. Mining Baseline Methods

ORCA. ORCA [11] is the first role mining algorithm, which uses the hierarchical clustering technology to discover user roles. The algorithm defines each permission as an initial cluster first, then merges the clusters, and forms a role hierarchy.

Complete Miner (CM). CM [10] is another classic role mining algorithm proposed in 2006. It starts by creating an initial set of roles for the distinct user-permission sets, then computes all possible intersection sets of the initial roles, and outputs a list of candidate roles.

HP Role Minimization and HP Edge Minimization. HP Role Minimization (HPr) and HP Edge Minimization (HPe) [14] are the role mining algorithms based on minimum biclique coverage. HPr tries to find a minimal set of roles that override the user-permission assignment relationship, while HPe uses a heuristic method to find the smallest number of edges of an RBAC system.

4.3. Experiments Setup
4.3.1. Scenarios Construction

To validate our framework and method, we built 3 scenarios named as Scenario1 (S1), Scenario2 (S2), and Scenario3 (S3) based on the basic experimental environment shown in Figure 3 and assigned users one or more user roles, which is shown in Table 3. We can find out that each user in S1 was assigned only 1 user role, while they were assigned 2 user roles in S2. In S3, users working in one room may be assigned different user roles, which may introduce more vulnerabilities to network security.

For each scenario, we first configured the gate machines and firewall according to Tables 2 and 3. Spatial access control lists were added on gate machines, making users have physical access to devices they managed or used, while network access control lists were added on the firewall, making the terminals have network access to the target services.

It should be noted that there were potential conflicts among multi-domain configurations on the semantic level. Take the user User4 in S1 as example. User4 was a Server Administrator and should not access service DS_M and the firewall had forbidden T4 to access service DS_D directly, but T4 was permitted to access service WS_M and there was no firewall between WServer and DServer. Thus, User4 can use T4 to log in WServer remotely first and then access service DS_D (he can get the password DS_D_P from the configuration files on WServer). This is a typical semantic conflict between the network access control lists. Similarly, as User4 had the ability to access DS_D physically by entering the room 3-1, there is another conflict between the network access control list and the spatial access control list. Those conflicts may result in extra permissions for users.

Then, we extracted basic information from the network and established the necessary relationship matrices. Note that there were 16 space entities, 28 object entities, 34 service entities, 32 information entities, and 13 user entities in all the 3 scenarios. The relationship matrices , , , and were the same in all three scenarios, where and . The matrices and varied from scenario to scenario, depending on the configurations of gate machines or firewall. For each space pair , we counted the users who can move from to under each scenario and set the parameter when constructing the matrix . The results showed that in S1, in S2, and in S3. We used the scanner NMAP to get the accessibility relationships between devices and services, establishing the matrix . The results showed that in S1, in S2, and in S3.

Finally, we detected the user roles by RMMDI and compared the results with the two groups of baseline methods. On the one hand, we performed the role mining baseline methods based on the user-permission assignment (UPA) matrices constructed from the firewall configurations and compared the results with RMMDI. On the other hand, we studied the best parameters for each clustering method and then compared the effectiveness of RMMDI with the clustering baseline methods. Accuracy and normalized mutual information (NMI) [3133, 36] were adopted to evaluate the community detection effectiveness of different parameters, whose values both range from 0 to 1 and a higher value means better effectiveness. We calculated the accuracy and NMI of different clustering methods with ground truth after studying the best parameters for each clustering method. In the experiments, we constructed two ground truths manually. In one ground truth, we divided 21 service permissions into 5 roles according to Table 2. In the other ground truth, we combined the roles “Database Administrator” with “Server Administrator” and classified 21 service permissions into 4 roles. For one community detection result, we compared it with the two ground truths and calculated metrics separately.

4.4. Result
4.4.1. Role Mining Results

Firstly, we performed the baseline role mining methods based on the firewall configurations. As the firewall only conducted the network access control lists, it can only reflect the accessibility between devices and services. Since each terminal was assigned to a user, we can get the 3 different UPA matrices from it. As we wanted to find disjoint service subnets, we used as the optimization objective. In order to save space, only the permission divisions are shown in Table 4. We found that the permission divisions were almost the same as those in Table 2, which meant that the role mining methods can find no errors from the top-down approach.

Then, we also performed the RMMDI on all scenarios with the role number ,100 times for each scenario. The majority of the results were different from the result shown in Table 2. The most frequent result (222 in 300 times) is listed as Table 5. Comparing with Table 2, we counted the number of inconsistent classification results of each service. All 18 services were classified inconsistently for 572 times in total. And the top 2 services with the most inconsistent classification times were DS_D (282 times) and GS_T (258 times).

Finally, we changed the role number and performed the experiments 100 times under each scenario. The most common results are shown in Table 6.

4.4.2. Parameter Study Results

We studied the parameters used in RMMDI as well as the baseline clustering methods. We performed a series of experiments for a series of different parameters and tried to find out the optimal parameters. The experiments were conducted under S2 with role number .

We first studied the parameters used in baseline methods, including in PCoSpec, and in CCoSpec, and in RMSC. With each parameter, we set and studied 30 different values between 0.005 and 100. When , , , and , the methods had relatively better effectiveness on the dataset.

Then, we studied the parameters in PCoNMF and CCoNMF. There are 3 parameters: , , and . and represent the weights of view and , while is the regularization parameter. We studied 27 different ratios of to between 10 and 0.02, as well as 10 different values of between 0.1 and 10. The experiments were conducted 50 times for each pair. The results are shown in Figures 4 and 5. We found that the parameters and had little impact on both the accuracy and NMI when and , so we used , , and in the study of and the clustering experiments shown in Section 4.4.3.

Finally, we studied the parameter used in RMMDI. We performed an experiment with a series of from 0.05 to 1 with step size 0.5 and observed their impacts on the clustering effectiveness (shown in Figure 6). The other parameters were set as mentioned in the previous paragraph.

We found that the curves showed a downward trend in whole, and the accuracy and NMI got greater values when was around 0.3, which was used for the experiments shown in Section 4.4.3.

4.4.3. Clustering Results

We also conducted experiments to compare the effectiveness of RMMDI with the clustering baseline methods. We performed all algorithms 200 times on each scenario and compared results with the ground truth shown in Table 4. All the other parameters were set as the optimal values mentioned in Section 4.4.2. The results are listed in Tables 79, in which the results of SP and SymNMF were the larger result of 3 different inputs, , , and . Most of those values were from the input .

5. Discussion

We propose a novel user role framework, which uses multiple domain information to mine user roles other than the preassigned user-permission assignment matrix.

It is proved that the framework is suitable for role mining. For the three scenarios used in the experiment, different users are assigned to different user roles. One user may be assigned one or more user roles, and one user role may be assigned to several users. For the results listed in Tables 7 and 8, we can find that the accuracy of the proposed framework is greater than 93.5% in all the three scenarios, while the NMI is greater than 87.0%. It means that the framework can detect user roles from the multiple domain configuration information successfully.

More importantly, it is also demonstrated that the framework has the ability to find interdependent relationships between permissions, avoiding potential errors. From the experimental results in Section 4.4.1, we find that RMMDI tends to integrate user roles “Database Administrator” and “Server Administrator”. Analyzing user potential permissions, it can be found that all Server Administrators can access service DS_D as they can both reach the service from other servers and get its password from configuration files in WServer. Therefore, it may be more appropriate to integrate user roles “Database Administrator” and “Server Administrator”. This trend cannot be found by traditional methods.

It is also proved that the performances of different clustering methods vary in the framework. As shown in Tables 7 and 8, the accuracy and NMI of PCoNMF are always the best among all the three scenarios, 30% better than the worst method. It means that a reasonable clustering method will promote the effectiveness of the framework significantly. Compared with the single view clustering methods, PCoNMF promotes the accuracy and NMI by more than 1%, which means the reasonable utilization of information from multiple views will get more structure information than from single view. Both the PCoSpec and RMSC have a lower accuracy or NMI, which means the multiview methods based on spectral clustering may not be suitable to the datasets.

There are 4 parameters involved in the RMMDI in total and it is important to select proper values to the parameters. The first two parameters and are the weights of views and . From the results in Figure 4, we find that the view has more structure information than in the experiments and it is reasonable to set a larger and a smaller . The third parameter is the regularization parameter that indicates the degree of community proximity between the two perspectives. A too low will not establish the connection between two views. Nevertheless, as the view has more structure information than , a too big will reduce the accuracy of the algorithm. Therefore, a moderate parameter will make the algorithm perform better. The last parameter is used in the function , which means to reserve the top edges with the largest weight. From the results shown in Figure 6, we find that it is vital to reserve an appropriate proportion of links. A big will reserve more low weight links and the existences of low weight links will impact the effectiveness of the framework. However, a low value of will lost a lot of useful structure information, which will have an impact on the effectiveness of the algorithm too.

6. Conclusion

In this paper, a novel framework for role mining based on multi-domain information named as RMMDI is proposed. The key idea of the framework is to mine user roles from multiple domain information rather than existing user-permission assignment matrices. In the framework, information from the physical domain, network domain, and digital domain is used to find the relationships between user permissions, and multi-view community detection methods are used to integrate information from different domains. Experiments on 3 simulated network scenarios demonstrate that RMMDI can capture the interdependent relationships between permissions and perform user-role mining more effectively and reasonably.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work is supported by the grants from the National Key R&D Program of China (Project No. 2017YFB0802800).