Abstract

Leaderboards and other game elements are present in many online environments, not just in videogames. When such environments have relatively few users, the implementation of those leaderboards is not usually a problem; however, that is no longer the case when they have dozens of thousands or more. For those situations we propose a method that is easy and cheap to implement. It is based on two particular data structures, a Self-Balanced Ordering Statistic Tree and a hash table, to perform proper leaderboard calculations in a fast and cheap way. More specifically, our proposal has time complexity, whereas other approaches also based on in-memory data structures like linked lists have , and others based on Hard Disk Drive operations like a relational database have . Such improvement with regard to the other approaches is corroborated with experimental results for several scenarios, also presented in this paper.

1. Introduction

The online games industry is in constant growth. According to NewZoo [1], between China and the US only, there were more than 1 billion online gamers in 2016, who represented altogether almost $48 billion in revenues from the $99.6 billion worldwide, up 8.5% compared to 2015. Just as example, according to SteamCharts [2], Dota 2, a Multiplayer Online Battle Arena (MOBA) game for the Steam platform, had a peak of over a million players in January 2017. That number seems a lot but it looks pale compared with League of Legends (LoL), another MOBA game for Microsoft Windows and MAC OS, which in January 2014 had up to 67 million monthly players, with 27 million playing daily [3].

But not only games themselves are a force to consider, also other online environments are now adopting game-based features to improve users’ experiences. A clear example of that is the use of gamification, that is, the use of game elements within nongame contexts, in a great diversity of contexts [4, 5] including education [6, 7], health [8], and marketing [9].

One particular element of game-based environments, and therefore of gamified environments, is leaderboards, a common mechanism to rank players according to their relative success. Leaderboards measure players against a particular criterion, usually the underlying score, and are thus indicators of progress that relate the player’s performance to the performance of others looking for intrinsic motivation [1014].

Most games and gamified environments use leaderboards and, in order to implement them, as well as for solving persistence, a common solution is using a database approach. A database, or more precisely a Data Base Management System (DBMS) generally stores data in the Hard Disk Drive (HDD) using certain data structures, typically -ary trees of several types, which allows for fast insertion, deletion, and search, but not so much for ordering (referring specifically to the ORDER BY clause). When the number of players is low, that is, up to hundreds or few thousands, this does not represent a problem for online environments. However, when they have dozens or even hundreds of thousands of players, the response times become a major issue. As an anecdotic reference, that was exactly what motivated this research in the first place: we were working on a gamified learning environment called TICademia (https://ticademia.com) and when the number of users only came to a thousand, the leaderboard based on a relational data base approach worked just fine. However, when such a number raised to the several dozens of thousands, a lot of response time problems started to appear. In rush hours, with almost 25000 active students in the course “pre-calculus,” the response time for the leaderboard page was larger than a minute. It might not sound like too much for most users, but it certainly is for the more “enthusiastic.” Of course, such response time would depend on the technology used in the online environment but, any case, it would never scape the intrinsic time of the ORDER BY task.

A very similar situation happened to Applibot, one of the major social apps providers in Japan. With popular games like Legend of the Cryptids and Gang Road, they were able to scale smoothly and handle the massively growing traffic but found some troubles to maintain up-to-date player rankings or, at least, with their initial database approach [15].

An alternative, not necessarily to replace but to complement the database approach, is to store particular information in faster memory schemes like in the Random-Access Memory (RAM) and use efficient data structures to manipulate it. What we propose is exactly the following: to use an order-statistic tree jointly a hash table, both in RAM, to obtain considerably lower response times.

The rest of the paper is structured as follows. In Section 2, we present some related works. In Section 3, we describe our proposal and then in Section 4 we show and discuss some experimental results. Finally, in Section 5, we present the conclusions of this research.

When we searched the scientific literature about how other researchers addressed the leaderboard implementation problem, we found three main obstacles. First, we did not find anything in games themselves, nor commercial, neither other kinds. That does not mean they do not deal with the problem. Our educated guess is that they are not interested in showing it, or at least in that context. Second, what we did find were some works on gamified learning environments that use leaderboards, but most of them focused on the motivational or learning outcomes, not in implementation details. Third, just a few of those works present validation scenarios with real users, but in all cases with a reduced number of them, dozens or hundreds at most.

In the case of ALEPS, for example, a gamified learning environment for physics problem solving, the leaderboard shows the top students based on the results of various game elements such as the score, levels, experience, and number of badges. Even if they did not explicitly state using a database approach for the leaderboard calculation, they did state using SQL Server as DBMS for manipulating all user data [16].

The same happened with a gamified learning environment for solving computer programming assignments. Here, they implemented two leaderboards, one that shows the overall score and one that shows the score for the current week. In this case, they mention that the system incorporates a leaderboard calculation service in the application layer. Again, they did not explicitly state the database approach, but they did report the use of Hibernate for data storage, an Object-Relational Mapping (ORM) library that supports multiple relational database systems, such as HSQLDB and MySQL [17].

Another example is a gamified online course for multimedia content production, implemented in the Moodle Learning Management System. In this case, they use a leaderboard to display all enrolled students sorted in descending order by level and then by experience points (XP). Because they used Moodle instead of creating their own system, they do not present any implementation details. However, looking into Moodle’s documentation, it turns out that it uses XMLDB, a library in the abstraction layer that lets Moodle interact with and access the database, which may be managed by several DMBS like Postgres, MySQL, MSSQL, and Oracle [18].

Outside the scientific literature, it is possible to find some interesting works. In [15], for example, they rightly point out that maintaining a real-time leaderboard is not an easy task because (a) the game environment may have hundreds of thousands players; (b) whenever a player fights enemies or performs other activities, their score changes; and (c) you want to show the latest ranking for the player usually on a web page. They even contrast the simplicity of the implementation with a relational database approach to its poor performance as the scale grows. As a solution, they use an algorithm with similar time complexity as the one we propose in the next section, . We tried to use this work as a reference in Section 4, but there are two main difficulties. Conceptually, they only use the player score for the ranking function so, unlike our proposal, multiple players may have the same rank. Technically, its throughput has a limitation of about 300 updates/second due to its cloud architecture.

There are also other commercial approaches with similar time complexity, like Amazon ElastiCache for Redis, but they work only in the cloud and are not necessarily cheap.

3. Method Proposed

In order to handle a leaderboard, we first assume that players have at least two attributes, one related to their identifiers and another related to their scores. The identifier, that we will call from now on id, may be numeric or alphanumeric, but in any case, unique. The score may come from a single or multiple data, but in any case, must be comparable. Particularly, we assume that it refers to a single numeric value and that the higher, the better. These two attributes should be stored together in RAM into two different data structures: a Self-Balanced Order-Statistic Tree (SBOST) and a hash table [1922]. In both cases, the two attributes are part of an object that we will call player. The same two attributes, in addition to all other relevant information like name, alias, avatar, and so on should be stored separately in a database.

A SBOST is a particular kind of a binary search tree. A binary search tree stores nodes, in this case of player type, and each node is linked at most to two subtrees, commonly denoted left and right. A node with no subtrees is called leaf, and the unique top of the whole tree is called root. Now, besides these features, a binary search tree must fulfill a condition: each node must be greater than all nodes in its left subtree, and not greater than any in the right subtree. There must be, of course, a way to determine which of two nodes is greater than the other. In our case, this comparison is made according to Algorithm 1.

Compare (Player  a, Player  b)
if  
(2) return  
(3) else if  
(4) return   b
(5) else if  
(6) return  a
(7) else
(8) return  b
end compare

Notice that this algorithm returns the player with the higher score and, in case of a tie, it uses as an untying criterion the higher id. The reason for doing so is that we assume that a higher id means that such a player started later the game or at least checked in later so he/she had less advantage. Of course, this criterion is completely subjective and may be altered according to the designer needs just by adjusting Algorithm 1. For instance, it might use extra information of the player like time spent on the game or time since the last login, and so on. It might also use more information of the score in the case that it refers to multiple, instead of single, data.

For being balanced, a binary search tree must fulfill another condition: the difference between the heights of the left and right subtrees of any node must be at most 1. The height of a subtree is the maximum number of jumps between the root of the subtree and its deepest leaf. Being balanced has a critical repercussion: considering the binary layout of the tree, as well as the relation of each node with its two subtrees, all the basic operations insertion, deletion, and search can be achieved in time complexity where is the number of elements stored.

Now, for addressing self-balancing, there are several alternatives, being two of the most popular the Adelson-Velsky and Landis (AVL) tree and the Red-Black Tree. In the second, formerly known as symmetric binary B-tree [23], each node has an extra bit which is often interpreted as the color red or black of the node, so that explains its name. Despite all the particular algorithms for doing so, these color bits are used to ensure the tree remains balanced during insertions and deletions without affecting the complexity.

Finally, being an order-statistic tree means that it supports two additional operations beyond the three mentioned above: selection and ranking. The first one refers to finding the th smallest element stored in the tree, whereas the second one refers to the opposite, finding the rank, or position in a linear order, of a given element in the tree. For our proposal, we are only interested in the ranking operation. Nevertheless, both can also be performed in when a self-balancing tree is used. However, for doing so, all nodes must store one additional attribute, which is the size of the subtree starting at that node. In other words, it refers to the number of nodes below and including it. All operations that modify the tree (insertion and deletion, knowing that update may be implemented as a composite function of the two) must consider this attribute and preserve the relation presented in the following without altering the time complexity:

An example of an order-statistic tree, using the comparison between nodes according to Algorithm 1, is presented in Figure 1.

Considering the previous structure, we can compute the ranking operation as presented on Algorithm 2. Notice that besides the left and right attributes, each node contains a reference to its parent. All nodes have a parent except for the root. Here, we assume that search() returns the node of the player we are looking for and that .

ranking (Integer score, Integer id)
Let be a node
(2) = search(score, id)
(3) if   = null
(4) return  −1
(5) end if
(6)
(7)
(8) while  :
(9) if   :
(10)
(11) end if
(12)
(13) end while
(14) return  
end ranking

To demonstrate that Algorithm 2 works correctly, we may think of node x’s rank as the number of nodes preceding in an in-order tree walk, plus 1 for itself. Then, Algorithm 2 maintains the following loop invariant.

At the start of each iteration of the while loop of lines (8)–(13), r is the rank of in the subtree rooted at node y.

And we use this loop invariant to show that the algorithm works correctly as follows [19].

Initialization. Prior to the first iteration, line (6) sets to be the rank of player within the subtree rooted at . Setting in line (7) makes the invariant true the first time the test in line (8) executes.

Maintenance. At the end of each iteration of the while loop, we set . Thus we must show that if is the rank of in the subtree rooted at at the start of the loop body, then is the rank of in the subtree rooted at at the end of the loop body. In each iteration of the while loop, we consider the subtree rooted at . We have already counted the number of nodes in the subtree rooted at node that precede in an in-order walk, and so we must add the nodes in the subtree rooted at y’s sibling that precede in an in-order walk, plus 1 for if it, too, precedes . If is a right child, then neither nor any node in .parent’s left subtree precedes x, and so we leave alone. Otherwise, y is a left child and all the nodes in .parent’s right subtree precede x, as does itself. Thus, in line (10), we add : right:size + 1 to the current value of r.

Termination. The loop terminates when y = root, so that the subtree rooted at is the entire tree. Thus, the value of is the rank of in the entire tree.

Now, if the tree is self-balanced, we already discuss that insertion, deletion, and search operations are performed in . Now, notice that, in the previous algorithm, once the node is found it goes up to the root one level at the time, which means that the running time is proportional to the tree height. Therefore, that is why the ranking operation also runs in .

Now, this ranking operation requires both attributes score and id. In a typical situation, however, we would have only the id of the player, and that is where that hash table enters the scene. A hash table is a data structure that can map keys to values using a hash function and is able of doing so in amortized time. In this case, the key refers to the id and the value to the corresponding score.

If these two data structures are stored in the RAM (of the environment server) and contain the relevant information of all players, performing a complex task as “increment the score of the player with id 1002 in 50 points and determine how many positions he/she gained and the corresponding final position in the leaderboard” could be made in the following manner.

Steps 1 and 5 have time complexity, whereas steps 2, 3, and 4 have ; therefore, the entire operation requires Of course, in a real game environment, some of this data should be stored additionally in the database which has a complexity on its own. We are not actually discarding this task, what we are doing is performing the most “expensive” operations, that is, inner ordering and therefore ranking, in a lot more efficient way.

In order to clarify all the algorithms presented so far, consider the next hypothetical situation. There are six players with scores and ids as shown in Figure 1. Then, as described in Algorithm 3, the player with id 1002 increments its score in 50 points. Before the increasing, and considering the comparison criteria described in the Algorithm 1, the corresponding leaderboard would stand as presented in Table 1.

Step 1. Search for the score of the player id in the Hash Table
Step 2. Use ranking to obtain current position
Step 3. Update score for player id in both structures, tree and Hash Table
Step 4. Use ranking to obtain new position
Step 5. Return , and

Once the increasing has been done, the corresponding order-statistic tree stands as shown in Figure 2, whereas the corresponding leaderboard changes as presented in Table 2.

Notice that player 1002 moved up one position in the leaderboard. Checking Algorithm 2, it passes from position 4 to 3. It is important however to make a clarification. The ranking operation returns the current position of a player given its id and score (the score may be obtained from the id with the hash table), but there is no need of running it times to get the whole leaderboard. Instead, taking advantage of the binary search structure of the tree, it is possible to just traverse the tree in backward “in-order” as in Algorithm 4.

Step 1. Set the root as the current node
Step 2. Check if the current node is null, if not, proceed to step
Step 3. Traverse the right subtree by recursively calling the in-order function
Step 4. Display information of the current node
Step 5. Traverse the left subtree by recursively calling the in-order function

4. Experimental Results

To validate the method proposed, we performed a comparison against two other approaches. The first of those approaches, as we stated earlier, corresponds to the typical solution used in most online game-based environments, which is a database. More specifically, we use a relational database assuming that all relevant information, that is, player’s score and id, is stored in a single table, so no additional operations like joins are required. As for DBMS we used PostgreSQL 9.6. As for the second approach, we used a linked list data structure running in RAM. As in the method proposed, it does not require HDD operations. However, it differs in several aspects. From the technical point of view, its implementation is a lot easier. In fact, just a few Java code lines are needed considering that a native class LinkedList is available, including methods for the insertion, deletion, update, and search. From the algorithms point of view, the time complexities of the required operations are entirely different. The native insertion is if made at the beginning or end of the structure. However, with a few modifications, it can be done orderly in . When doing so, the ranking operation can take place also in , as well as deletion and search.

For the method proposed, we made the corresponding implementation also in Java. For the hash table we used native classes, but for the SBOST we implemented it from scratch. More specifically, we used a Red-Black Tree scheme to achieve self-balancing of the binary search tree and performed the SBOST operations according to the algorithms described in the previous section.

Now, to compare the three approaches, we arranged a scenario in which there are players, each one with a unique id and an initial score. After that, there are up to queries. Similar to the example presented in the previous section, a query refers to the increment of an individual player score, expecting as a result the corresponding new position in the leaderboard, as well as the number of positions gained. With the aim of determining the scalability of each alternative, we used random values for queries using from 1,000, to 2,000, 5,000, 10,000, 20,000, 50,000, and 100,000. For a more robust statistical comparison, we run each case at least ten times and then present the corresponding mean. All runs were made using the same conditions and equipment: Java SE1.8 with Eclipse 4.5.0 in an Intel Core i7-4710HQ at 2.50 GHz, 8 Gb RAM, and 64-bit Windows 8.1.

The results obtained are presented in Table 3. Even with the lower input size, there is a considerable difference between “in-memory” approaches and the HDD-based, that is, between the data structures running in the RAM and the database solution. The difference between the SBOST and the linked list is not too high at the beginning but, as the input size raises, such a difference becomes bigger and bigger. With ,000, the SBOST solution is almost 1,000 : 1 faster than the linked list and 15,000 : 1 faster than the database.

To visualize these results, but particularly the relation with the input size, Figure 3 presents them as a chart. For the SBOST solution, this chart bears out that running time exhibits a behavior, whereas the linked list exhibits behavior and the database behavior. Even when the shapes in the three cases seem similar, the scales show how different they are. In fact, when running multiple linear regressions for the outcomes of the three solutions, the results presented in Table 4 are obtained whereas the dots on Figure 3 represent the predicted values. In other words, there is empirical evidence for the theoretical time complexity of the three approaches evaluated, including the one of our proposal.

Using those models to extrapolate running times, it results in the fact that, with an input size million (nothing too outlandish considering the game related numbers presented at the very beginning of the introduction), the SBOST would obtain nearly 10 seconds, whereas the linked list solution would obtain approximately 28 hours and the database 19 days.

5. Concluding Remarks

Leaderboards are a common element of both online games and game-based environments. Even if there is a lot of evidence of their psychological effects on users, we in this research did not focus on that aspect. Instead, considering such importance, what we proposed is an efficient way to implement them.

Our method has three main features. First, it runs “in-memory,” so it exploits fast data access, unlike slower HDD solutions. Although it might sound as a disadvantage as well, because the reduced space may limit how many users you can have, it is not so problematic considering that in its basic form only the user score and its identifier are needed. For instance, if a four-byte unsigned integer is used for both attributes, a scenario with 100,000 players would require 800,000 bytes, which is less than 1 Mb. Second, it uses specific data structures, so no ordering at all is actually needed for obtaining players positions. More specifically, it uses an SBOST jointly with a hash table which allows for performing all important operations in time complexity. The SBOST was implemented from a Red-Black Tree, but other alternatives for Self-Balanced Binary Search Trees could be adopted as well. Third, the comparison criterion, which ultimately defines the rank of a player, may be easily modified in order to adjust to the designer needs. For instance, it could incorporate more information about the player, rather than just a single score and an identifier.

From the algorithmic point of view, such a proposal surpasses typical solutions as the ones based on databases, as well as other “in-memory,” simpler, alternatives as ordered linked lists. As presented in the experimental results section, we achieved speedups on all the scenarios we tested. In fact, the more difficult the scenario, the higher the speedup. For example, such a speedup with an input size = 100,000 was nearly 1,000 : 1 and 15,000 : 1 compared to the other two approaches presented. With the forecast coming from a multiple linear regression with million (actual running of such a scenario would be impractical) the corresponding speedups would be nearly as large as 10,000 : 1 and 160,000 : 1. This finding turns out to be very relevant in massive environments where dozens or even hundreds of thousands of users are common.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.