Complexity

Research Article

Discovering Organizational Hierarchy through a Corporate Ranking Algorithm: The Enron Case

Algorithm 1

The CorpRank algorithm.

	Input: a set of corporate emails with number of individual accounts.
(1)	Build an undirected graph , where is the set of vertices that represents the e-mail accounts, is the set of edges, and is the edge between vertices and that have exchanged at least emails. The value of the edge is the number of emails exchanged between and .
(2)	Find all maximal complete cliques (subgraphs) using a recursive algorithm such as 457 [62].
(3)	Calculate the adjacency matrix and geodesic distance matrix (the matrix of all shortest paths between every pair of vertices) for . and are the elements of and , respectively. The mean of all the distances is .
(4)	The following features are calculated for each vertex :
Number of emails (e-mail): total number of emails sent and received.
Average response time (AvgTime): average amount of time elapsed between every email sent from to any other account and the next email received by from account .
Number of responses (NResponse): sum of all the responses to emails sent by to any other accounts .
Number of cliques (Clique): number of all cliques that is contained within.
Raw clique score (RCS): , where is the number of users in the clique.
Weighted clique score (WCS): raw score weighted by the “importance” of according to the average response time .
Degree centrality (Degree):
Betweenness centrality (Betweenness): , where is the number of geodesic paths between vertices k and j that include vertex i, and is the number of geodesic paths between k and j [63].
Clustering coefficient (CC): , where . Each vertex has a neighborhood M defined by its immediately connected neighbors: .
Average distance (AvgDistance): mean of the shortest path length from a specific vertex to all vertices in the graph : , where , and is the number of vertices in .
“Hubs-and-authorities” importance (Hubs): calculated with a recursive algorithm as proposed by Kleinberg [64]. “hub” refers to the vertex that points to many authorities, and “authority” is a vertex that points to many hubs.
(5)	Each feature is mapped to a [0, 100] scale and weighted with the following formula: is the value of the feature for , is the weight for the feature ; the supremum and infimum are computed across all .
(6)	Run a principal component analysis (PCA) on all features and select the principal components that explain at least 80% of the variance of the dataset. The weight for each feature is its normalized contribution to the variations of the selected principal components as follows: , where is the contribution of the feature to explaining the variation of principal component , and is the eigenvalue of the principal component .
(7)	The CorpRank score, a ranking score between 0 and 100, is obtained for as a weighted sum of the indicators:
Output: CorpRank score for each account .