#### Abstract

I present a new algorithm for computing binomial coefficients modulo . The proposed method has an preprocessing time, after which a binomial coefficient with can be computed modulo in time. denotes the time complexity of multiplying two -bit numbers, which can range from to or better. Thus, the overall time complexity for evaluating binomial coefficients modulo with is . After preprocessing, we can actually compute binomial coefficients modulo any with . For larger values of and , variations of Lucas’ theorem must be used first in order to reduce the computation to the evaluation of multiple binomial coefficients (or restricted types of factorials ) modulo with .

#### 1. Introduction

In this paper I present a novel efficient algorithm for computing binomial coefficients modulo . The definition of the binomial coefficient is the usual one:

We will mainly consider the case where , and after fully handling this case, we will discuss how to compute modulo for .

The presented algorithm consists of a preprocessing stage which takes time, where is the time complexity for multiplying two -bit numbers. can range from (the naive multiplication algorithm) to or slightly better [1, 2].

After the preprocessing stage, a binomial coefficient (with ) can be computed modulo in time. Of course, the presented algorithm can evaluate a binomial coefficient modulo any with (after computing it modulo , we only need to keep the least significant bits out of the computed bits).

The rest of this paper is structured as follows. In Sections 2–6 I present the steps of the preprocessing stage. In Sections 7–9 I present the algorithm which computes the binomial coefficients modulo , considering that the preprocessing stage was completed. In Section 10 I discuss related work, and in Section 11 I conclude and discuss future work. I will summarize below the contents of each section describing the preprocessing steps and the algorithm itself.

The preprocessing stage consists of 5 steps. The first step consists of computing all the *small* binomial coefficients . The term *small* refers to the values of and . A binomial coefficient is defined to be *small* if . This step is presented in Section 2. There are * small* binomial coefficients, and we can compute all of them with only additions of pairs of -bit numbers.

The second preprocessing step (presented in Section 3) consists of computing a set of *large* binomial coefficients. All the binomial coefficients of the form , with , are defined to be *large*. There are such binomial coefficients and all of them can be computed in time overall. This step requires the largest amount of memory (among all the preprocessing steps) in order to store the large binomial coefficients.

In Section 4 I present the third step of the preprocessing stage, which consists of computing power sums of consecutive numbers (where the number of such numbers is a power of 2). There are power sums which can all be computed in time.

The 4th preprocesing step (presented in Section 5) consists of computing the sums of the products of the elements of all the subsets of a given size of a set consisting of the first positive integer numbers. In order to achieve this goal, inclusion-exclusion-based equations from the theory of elementary symmetric functions and polynomials are used. There are only values being computed, but it takes time to compute them. This step is the performance bottleneck step in the preprocessing stage.

Finally, in Section 6 I present the last step of the preprocessing stage, computing the sums of the products of the elements of all the odd-element subsets of a given size of a set consisting of the first positive integer numbers. Like in the previous case there are values which need to be computed during this step.

In Section 7 I will show how to efficiently find the largest odd divisor (modulo ) of (for ). In Section 8 I will present the actual algorithm for computing binomial coefficients modulo (for ), and in Section 9 I will discuss extensions to the case .

Note that by precomputing all the mentioned values, we are capable of achieving a running time of for computing a single binomial coefficient (with ). When computing binomial coefficients we achieve a running time of . Since computing a binomial coefficient requires the values from the preprocessing stage, the time complexity would increase significantly if we had to compute all those values for each binomial coefficient (instead of computing them only once and then reusing them for each binomial coefficient).

#### 2. Computing Small Binomial Coefficients: for

The first step of the preprocessing algorithm consists of computing the binomial coefficients for *small* values of and . This can be achieved easily with -bit additions time overall, as an addition takes time). We have

For , we use the well-known formula:

#### 3. Computing Large Binomial Coefficients: for

The second step of the preprocessing algorithm consists of computing *large* binomial coefficients modulo for ; we denote these values by . Obviously, if or , then . Otherwise, let

For , from the definition of the binomial coefficients, we have that

Since all the computations are performed modulo , we need to perform multiplications by . But only has a multiplicative inverse (modulo ) if it is odd. Thus, we will have to compute two different values:(i) = the largest odd divisor of ;(ii) = the exponent of 2 in the prime factor decomposition of .

We start with and . For , we will compute the following:(i) = the largest odd divisor of ,(ii) = the exponent of 2 in the prime factor decomposition of ,(iii) = the largest odd divisor of ,(iv) = the exponent of 2 in the prime factor decomposition of .

Finding and can be performed in time by examining the bits of . With these values computed we will have(i);(ii).

We will assume that the multiplicative inverses of all the odd numbers from 1 to were precomputed. The inverse of an odd number (modulo , for ) is equal to

For , the inverse of an odd number is the odd number itself. Equation (6) provides a way of computing the inverse of an odd number using -bit multiplications. Note that more efficient alternatives exist; for example, in [3], an algorithm with -bit additions, left bit-shifts, and right bit-shifts for computing the inverse is presented (which would take only time overall instead of ). However, the algorithm obtained from (6) will never be the bottleneck in any step of the presented algorithm, so there will be no problem using it.

After computing the values , and we have

We will assume that we previously precomputed all the numbers for . This way we have a method for computing all the values .

Let us perform a time complexity analysis. For each of the tuples , we spend time. Precomputing the inverses of *small* numbers takes time (it takes time to apply (6)). Precomputing the powers of two takes only time (we simply shift one bit to the left to obtain ).

A more efficient method is to apply the previously defined algorithm only for . Then, for and , we have

With this method, we spend time only for tuples (the tuples ). For the remaining tuples, we perform a simple addition (which takes time). Thus, the overall time complexity is .

#### 4. Computing Power Sums of Consecutive Numbers

In this section we will efficiently compute power sums of consecutive numbers starting from 1 and ending at a power of two. We define where and .

We will start with the easy cases: (independent of the value of ) and .

For and , we can write

Let us consider the terms (). Using Newton’s binomial theorem, we can write this term as

Thus, can be written as

Note that all the terms with are zero modulo (because is a multiple of ). Thus, we only need to evaluate such terms:

Assuming that we previously computed all the values , then this part takes time (because, for each value of , the algorithm performs at most multiplications for all the values of ).

#### 5. Computing Sums of Subset Products

In this section we will compute the values = the sum of all the products of the elements of the -element subsets of the numbers (modulo ) for and . To be more precise,

We have and (i.e., the sum of all the numbers ).

For , we will use formulas based on the inclusion-exclusion principle derived from the theory of power sum and elementary symmetric polynomials [4]:

The problem that we face now is that exists only when is odd. This implies that we will not be able to compute all the values (for a fixed value of ) with the same precision. Let us define = the maximum value such that the last bits of are correct (i.e., we were able to compute modulo but not modulo for , unless we perform computations modulo for ). Obviously, .

Let us consider the case . We will have : the precision either stays the same or decreases as increases. At first we evaluate the following sum:

This sum is correctly computed modulo . Then we compute the following:(i) = the largest odd divisor of ;(ii) = the exponent of 2 in the prime factor decomposition of .

We set . We then multiply by , obtaining . Finally, we can compute as divided by . It should be mentioned that is computed as the multiplicative inverse of modulo instead of modulo .

We should now prove that for . This is obviously true for . In general, is equal to minus the exponent of 2 in . Since the exponent of 2 in is equal to it is obvious that this value cannot exceed . Thus, . This proof is very important. Although we cannot compute all the values with the same precision, we will see that this precision will be sufficient in order to obtain exact and correct results when computing binomial coefficients modulo . This is because in all the equations where is involved, it will be multiplied by .

Let us perform the time complexity analysis now. For each pair , we need to perform multiplication of -bit numbers. It would seem that we also need to compute for each pair (which would require another multiplications of -bit numbers for raising to the appropriate power). Although this would not affect the theoretical time complexity, we should notice that depends only on the numbers and does not depend on at all (we only need to have ). Thus, we only need to compute once for the first value of for which we will compute (and then we will cache and use it later when it is needed). Overall, the time complexity is dominated by the multiplications which need to be performed, obtaining an time complexity.

This step of the preprocessing stage is the bottleneck, having the largest time complexity (it is the step which dominates the computations of the preprocessing stage).

#### 6. Computing Sums of Odd-Element Subset Products

In this section we will compute the values = the sum of all the products of the elements of the odd -element subsets of the numbers , where (for and ). To be more precise,

This time we are interested in subsets with large sizes, that is, sizes ranging from to (thus having ).

For , we are interested in computing the product

By using Newton’s binomial theorem, we can rewrite (19) as

We notice that, for , each term of the sum is zero modulo . Moreover, notice that is multiplied by . Since the precision of is larger than , the result of this multiplication is exact modulo . Thus, by iterating through all the values of from 0 to , we have an algorithm which requires multiplications of -bit numbers for computing .

For , we will need to use a different approach. Let us choose a subset of odd numbers from 1 to of size : . The product of all the elements in the subset is equal to (we wrote for ). By using Newton’s binomial theorem, we can rewrite this product as where we denoted by the sum of products of the elements of the subsets of size of the set . In order to compute , we will need to sum (21) over all the odd-element subsets (or, equivalently, over all the subsets of the set ). Again, we should notice that all the terms of the sum from (21) are zero modulo for .

Let us consider a subset of the set (with ). This subset is part of subsets (where each element is chosen from the set ). With this observation we can rewrite (21) as

Since and we already computed these values as , we can finally rewrite (22) as

We can now use (23) directly in order to obtain an algorithm performing multiplications of -bit numbers for computing . Of course, as before, all computations are performed modulo . The time complexity in this case is simple to analyze: there are multiplications performed, so the complexity is .

This part can be improved because, as we will see in Section 7, for a given value of , we will only need the values such that . Thus, overall, we actually need to compute the values for only pairs instead of such pairs. In this case the time complexity drops down to .

#### 7. Computing the Largest Odd Divisor of Modulo

In this section we will compute the largest odd divisor of (modulo ). We will denote this number by . Let us consider the binary decomposition of ; that is, where (we will assume that , and thus, ).

We will first evaluate = the product of all the odd numbers less than or equal to . For each bit of , we will compute a value . will be equal to the product of all the odd numbers from the interval . will be equal to the product of all the odd numbers from the interval . Thus, we will have

is easy to compute. If , then ; otherwise it is equal to .

Let us consider now the case . Let be equal to . If , then . Otherwise we can write as

By using Newton’s binomial theorem, we can rewrite (26) as

We must notice that we only need to consider values of up to , because for , the corresponding term of the sum is zero modulo . We should also notice that is multiplied by . Since its precision is larger than , the multiplication of and is exact modulo . Then we iterate with in descending order. For the initial value of , we compute ; for the other values, we only multiply by in order to obtain as . We now have an algorithm performing multiplications of -bit numbers for computing . Overall, we have an algorithm performing multiplications of -bit numbers for computing ( multiplications for each of the bits of ). However, we can do better by rewriting (26) in a different way:

Note that is even. In fact, it is a multiple of , meaning that it is at least a multiple of . For , each term of the sum will be zero modulo . Thus, we only need to consider at most terms (from to ). We will start with when , and then for each subsequent value of , we will multiply by in order to have at each iteration.

Overall we only need to consider terms for computing ; for each such term, an -bit multiplication needs to be performed.

Now that we have a method of efficiently computing , we can use it in order to write as

is equal to 1. Since is of the order of magnitude of and can be evaluated with -bit multiplications, computing will require -bit multiplications, obtaining an time complexity.

#### 8. Computing the Binomial Coefficient

Let consider the binomial coefficient with . In order to evaluate it modulo we will first need to compute , , and . Then, we will need to find the largest exponent such that is a divisor of , for , and . We can use (17) for this. Then, the answer will be

As discussed earlier, the multiplicative inverse of an odd number modulo can be computed by using (6), using -bit multiplications. Computing where can be performed in time ( bit shifts to the right and -bit number additions). Thus, the time complexity for computing the binomial coefficient modulo is dominated by the computation of , , and .

#### 9. Extensions for Computing Modulo for

In order to compute a binomial coefficient modulo for (and ), we need to make use of some variations of Lucas’ theorem [5]. If and , then, according to [5], we have

Thus, in order to compute modulo , we will need to evaluate binomial coefficients , where . After the preprocessing stage, evaluating modulo will take only time (the factor appears when because we need to perform divisions, each of which takes time because they can be implemented by shifting bits to the right, in order to obtain the binomial coefficients or factorials which are needed in order to compute modulo ).

Other methods for reducing the computation of modulo (for ) to the computation of multiple factorials (of a restricted type) with were presented in [6].

#### 10. Related Work

The problem of computing binomial coefficients modulo various numbers has been widely studied in the scientific literature. In [7] properties of binomial coefficients modulo prime numbers are discussed, including Lucas’ theorem, which provides a simple way of computing modulo , where is a prime number. The computation of is reduced to the computation of multiple binomial coefficients modulo , with . In [8] the authors studied periodicity properties of the binomial coefficients modulo both prime and composite numbers. Congruence properties of binomial coefficients modulo prime powers were presented in [9], and congruence properties of products of binomial coefficients modulo composite numbers were studied in [10]. The general method of computing binomial coefficients modulo a composite number is to evaluate them modulo the (maximal) prime powers which are divisors of and then use the Chinese Remained Theorem [11] in order to obtain the result modulo (as observed in [10]), but in [10] a more direct study of these values is performed (i.e., the Chinese Remainder Theorem is not used). Properties of the residues of binomial coefficients and their products modulo prime powers were studied in [12].

An algorithm for computing binomial coefficients modulo prime powers (for any prime) was presented in [6]. The algorithm takes time for computing modulo , where is a prime number (this time complexity was stated in [6], but a sufficiently detailed complexity analysis was not provided). When and , this reduces to . If we consider , our algorithm takes time for preprocessing and in order to compute in the general case. These time complexities are slightly worse than the ones obtained in [6]. However, when considering [1, 2], we obtain an time complexity for the preprocessing stage and an time complexity for actually computing the binomial coefficient modulo . In this case our time complexities are slightly better than the ones presented in [6] (when ). However, it is not clear which time complexity for the multiplication of two -bit numbers was considered in [6].

Reference [13] uses sums of binomial coefficients modulo 2 when obtaining results related to the Garsia entropy. Binomial coefficients also occur in other equations regarding information theory formulas (e.g., [14, 15]). Binomial coefficients also have applications in many other areas (e.g., statistics [16], binomial distribution [17], and Chebyshev polynomials [18]).

#### 11. Conclusions and Future Work

In this paper I presented a novel efficient algorithm for computing binomial coefficients modulo . The algorithm consists of a preprocessing stage, after which any number of binomial coefficients can be computed modulo (or modulo any number with ).

The time complexity of the presented algorithm is comparable with that of state-of-the-art algorithms for computing binomial coefficients modulo prime powers [6]. In fact, the time complexity of the algorithm presented in this paper is slightly better than that of the algorithm presented in [6], but since a sufficiently detailed analysis of the time complexity in [6] is not provided by the authors, it is not clear if the algorithm from [6] cannot be improved any further. In any case, because the algorithm presented in this paper consists of a preprocessing stage, a slightly worse preprocessing time complexity (in some cases) could be balanced by computing multiple binomial coefficients (when the preprocessing stage, which is the bottleneck, is run only once).

When computing a small number of binomial coefficients (e.g., just one), the bottleneck of the algorithm (in the preprocessing stage) consists of the computation of sums of products of elements of subsets having sizes 0 to (described in Section 5). That step requires multiplications of -bit numbers, while all the other steps need fewer multiplications (and one of the steps requires additions of -bit numbers). If the values defined in Section 5 could be computed faster, the algorithm presented in this paper could be considerably improved. In future work, I intend to study the problem of computing the values in a more efficient manner.