Abstract

Whenever ranking data are collected, such as in elections, surveys, and database searches, it is frequently the case that partial rankings are available instead of, or sometimes in addition to, full rankings. Statistical methods for partial rankings have been discussed in the literature. However, there has been relatively little published on their Fourier analysis, perhaps because the abstract nature of the transforms involved impede insight. This paper provides as its novel contributions an analysis of the Fourier transform for partial rankings, with particular attention to the first three ranks, while emphasizing on basic signal processing properties of transform magnitude and phase. It shows that the transform and its magnitude satisfy a projection invariance and analyzes the reconstruction of data from either magnitude or phase alone. The analysis is motivated by appealing to corresponding properties of the familiar DFT and by application to two real-world data sets.

1. Introduction

Ranking data, which arise in scenarios such as elections or database searches, describe how many times a given ordering of objects is chosen. It is frequently the case that when, ranking data are collected, partial ranking data are obtained in addition to, or perhaps instead of, full rankings. A partial or incomplete ranking only specifies the ordering of the top out of possibilities and usually indicates that the ranker is either unable to, or indifferent to, the ordering of the remaining items. Full ranking data are obviously a special case of partial ranking data. A classic approach is to treat full ranking data for items as a function on the symmetric group ; for each permutation , the value of is the number of times the ordering represented by that permutation is chosen [1]. For example, if 3 items are ranked, then is the number of times the survey respondents chose to rank item 2 first, item 1 second, followed by 3. As discussed in more detail below, partial ranking data also form functions on that are piecewise constant over cosets of the subgroup fixing the first items.

The analysis of ranking data, including both full and partial rankings, is well established. Statistical methods exist both for data in the “time domain” (using signal processing terminology), which in this case is the permutation group , and in the “frequency domain” that is obtained through Fourier analysis on the group. Recent papers by Lebanon and Mao [2] and Hall and Miller [3] explore, respectively, the nonparametric modeling and bootstrap analysis of partial ranking data in the time domain. Time domain analysis does not allow such interesting possibilities as using band-limited or “smooth” approximations to the data, on analyzing the strength of various components. Diaconis [1, 4] and Diaconis and Sturmfels [5] use the Fourier transform on to analyze frequency components of both full and partial ranking data. Those papers, while addressing the fundamentals of Fourier analysis in terms of invariant subspaces, do not consider signal processing aspects as considered here. Other papers using the Fourier transform on include Huang et al. [6] for inference on permutations of identities in tracking, and Kondor and Borgwardt [7] to provide labeling-invariant matching of graphs. Kakarala [8] shows that the Fourier transform on may be interpreted in terms of signal processing concepts such as magnitude and phase, but the work is limited to full rankings. In this paper, we take a similar approach to analyze the properties of the Fourier transform on for partial rankings, with particular emphasis on the role of phase in forming the top three ranks, .

Underlying our approach is the intuition that, in any frequency-domain approach, whether on the symmetric group or on the more familiar discrete domain , the Fourier transform values may be separated into magnitudes, which indicate component strengths, and phases, which indicate relative component locations. Such a separation is basic to a signal processing approach, and is well understood in the ordinary discrete Fourier transform (DFT) on , and also in two dimensions in the case of images. A familiar demonstration of the importance of phase is to combine the magnitude spectrum of image with the phase spectrum of image and observe that, after inverse transform, the result appears very similar to [9]; in other words, phase is more important to our perception of image structure. Therefore, it seems appropriate to ask the following question: what is the role of phase in forming partial ranking data?

The problem of analyzing phase on is not as straightforward as with the DFT on , because the Fourier transform on has matrix-valued coefficients, not scalars as with the DFT, making even such elementary concepts as “frequency” nonobvious. Though various papers describe the transform in detail [6] and, code for computing a fast Fourier transform (FFT) on has been published by Kondor [10], the level of abstraction required to understand the transform is high. Therefore, this paper makes a concerted effort to reason from the familiar DFT to explore the relevant concepts on . It shows that the coefficients of the Fourier transform for top choice partial ranking data are invariant under projections that are determined by the subgroup . The projection approach provides a relatively simple explanation of the roles of magnitude and phase for partial ranking. The explanation is tested on two real-world data sets.

It should be noted that the concept of partially measured ranking data has interpretations other than the one explored in this paper, which is top out choices data. For example, an “incomplete” ranking specifies a preference among a subset of the choices, not which is most preferred. Among choices , , and , an incomplete ranking might simply say that is preferred to , but nothing about versus , or versus ; mathematically, this may be modelled as a partial order on the choices [2]. Diaconis [4] describes other kinds of incomplete rankings: “committee selection,” where one chooses the top out of choices but does not rank among the choices; “most and least desirable,” where one chooses the most important and least important attributes among choices but does not specify the order of the middle elements. What is common mathematically to the previous types of data is that they are constant on cosets of a suitably chosen subgroup of . The mathematical results of this paper concerning magnitude and phase apply to every coset space . However, the results provided below on approximation by linear phase or unit-magnitude functions are limited to top -choice data, whose domain in . Though mathematically a special case of partially measured rankings, top -choice data appears in sufficiently many scenarios to be worth analysis on its own.

2. Background Material

Fourier analysis on the symmetric group is normally described in abstract terms involving group representation theory, which makes the subject difficult to understand for non-specialists. As mentioned in the Introduction, we use analogy to the better known DFT on . The DFT is defined for data by the familiar pair of equations for transform and inverse: Each complex-valued DFT coefficient is expressed in terms of magnitude and phase by writing , where the absolute value determines the magnitude, and the angle measures the starting value at in the period of the constituent sinusoid . The translation property of the DFT shows that the transform of the circularly shifted function has coefficients , which shows that the magnitude does not change but the phase changes linearly, that is, . Hence, phase is closely connected with location.

Suppose now that the data has the additional symmetry of having a subperiod, that is, where divides . Then, it is well known that the DFT coefficients are zero unless is a multiple of . For example, if and , then, of the possible DFT coefficients, only four are nonzero: , , , and . It is helpful to see the previous example in a different way to better understand the discussion of the symmetric group below. Suppose that we define as the data within one period, that is, for , and otherwise. Let denote the periodic pulse train of Kronecker functions defined as follows: Then, , where denotes circular convolution over points. We have, therefore, by the convolution property of the DFT that , where both and are the respective DFTs on points of and . It is easy to see that for but otherwise. We might consider the function a projection of the DFT coefficients; the term projection is appropriate because takes values of either or , and therefore for all . With the projection so defined, we have that , which shows that the data are invariant to the projection and therefore lie in its image. The projection approach helps considerably below in formulating the transform for partial rankings on the symmetric group.

The symmetric group is the collection of all possible permutations of the set . If and represent two permutations in , then the product denotes applied first followed by . For example, if and , which indicates that , , , , and similarly , then . With that product, forms a group, with identity-denoted and inverse being the unique permutation that exactly undoes the action of , that is, . For example, the inverse of is .

Data consisting of full rankings form functions on in the manner described in the Introduction. The same domain also serves for partial ranking data. If we have data where only the first of the items is ranked, then, for each , let us define the value of to be the number of times the first elements of is chosen. The definition leads to piecewise constant functions on . An example illustrates the approach. Suppose items are to be ranked in an election given to 600 voters, but the respondents give only their top choices as follows: item 1 gets 100 votes, item 2 gets 200 votes, and item 3 gets 300 votes. Then, we construct on by extending the votes to all permutations based on first item, so that , and similarly for the other 4 choices of . If we were to view the previous construction in group-theoretic terms, the function is such that it is constant on left cosets of the subgroup fixing the first element, that is, for all where by definition, and for the item being chosen. Though the constant vote given to each coset is mathematically convenient, it does not capture certain effects that may be interesting; for example, if I choose oranges as my favorite fruit, I may be more likely to choose apples than durians as my next favorite, even if I am not required to state my next favorite. Nevertheless, due to its convenience, we use the constant on cosets approach in the remainder of this paper.

A detailed example helps to illustrate the model. In the famous American Psychological Association (APA) election data [1], which is available online (http://www.stat.ucla.edu/data/hand-daly-lunn-mcconway-ostrowski/ELECTION.DAT), 5,738 voters provided full rankings of each of candidates for president. The full rankings form a function on and are shown plotted in Figure 1(a) against the elements of the group arranged in lexicographic order. In the same election, many voters chose not to submit full rankings but provided instead partial rankings. Specifically, 5,141 voters submitted only their top choice, 2,462 voters submitted only the their first and second choices in order, and 2,108 voters submitted only their top three choices in order. Consequently, there were a total of 9,711 voters giving only partial rankings, more than the 5,738 that gave full rankings. After forming piecewise constant functions as described above, the partial ranking data are displayed in Figures 1(b)1(d).

An advantage of placing both full and partial rankings on the same domain is that we may apply the same Fourier transform in both cases. The Fourier transform on , which is formally obtained from the theory of group representations, has important differences to the DFT. We review some basic facts from the literature [4]. First, the Fourier coefficients on are matrix valued, unlike the scalar values of the DFT. Second, they are indexed by arithmetic partitions of with nonincreasing elements, which are roughly analogous to the frequency index of the DFT. For example, for , the seven such partitions are , and . For every partition of , the Fourier basis elements belonging to it are collected into a square-matrix-denoted whose dimensions are calculated using standard formulas [4]. For , the seven partitions described previously have square basis matrices with respective dimensions 1, 4, 5, 6, 5, 4, 1, giving a total of 120 basis functions on , where the number 120 is obtained by summing squares of dimensions. The basis may be constructed using real-valued functions, using the Young orthogonal representation (YOR). The Fourier transform and its inverse are, respectively, written The symbol on the right hand sum indicates a sum over all partitions for which is defined. Algorithms for constructing the matrices are given in Huang et al. [6, Algs 3,4] and are used in obtaining the experimental results of this paper. In particular, we have , so that is a scalar containing the “d.c” value of the signal, and is also scalar alternating between +1 and −1 in the manner similar to the Nyquist frequency in the DFT.

Two important properties of the Fourier transform are relevant to this paper: the Fourier basis matrices that are obtained from the YOR are orthogonal, , which mimics the exponential unitarity in the DFT; under a left translation of the data on obtained by , the coefficients undergo the transformation , and, under a right translation , the coefficients transform as . Those two properties suggest an interpretation of the matrix-valued Fourier coefficients in terms of magnitude and phase [8]. The Fourier coefficient may be written using the matrix polar decomposition as , where representing magnitude is the positive semidefinite matrix obtained as the square root of , and is an orthogonal matrix representing phase. A standard result in matrix theory [11, page 190] shows that the magnitude is unique, though the phase needs not be unless is nonsingular. Under left translation by , the magnitude remains invariant while the phase changes by , which is analogous to the phase shift for the DFT. Note that both magnitude and phase may be computed using the singular value decomposition (SVD), , by setting and . Below, we use the polar decomposition of magnitude and phase and analyze its properties for partial ranking data.

3. Fourier Analysis of Partial Rankings

In the previous section, we saw that translational symmetry in the DFT domain results in a projection invariance for the DFT coefficients. Inspired by that result and noting that our method of placing partial ranking data on results in a kind of translational symmetry, we look for the relevant projection characteristics of the Fourier coefficient matrices on . Finding the projection characteristics provides significant reduction in computational complexity and also shows the role of phase for partial ranking data as discussed below. For that purpose, define for each subgroup of and each the matrix Then, it is known [12, page 111] that and , so that is an orthogonal projection. The main result of this paper is now stated.

Theorem 3.1. Let denote a function on that is piecewise constant with respect to a subgroup, that is, for every and in the subgroup with elements. Then, each Fourier coefficient of is invariant under the corresponding projection: , and that is true of its magnitude as well .

Proof. The projection invariance of follows from the translational property of the Fourier transform, from which results in , when averaged over all elements of result in . (This fact has been shown in the literature; see [12] and Kondor [13, Section 5]). To prove that is invariant, note that being a projection means that there exists an orthogonal matrix such that , where is the identity matrix up to the first entries. Then, is the unique positive semidefinite square root of . Since , implies that , so that is zero outside the upper left subblock. Consequently, , and, therefore, .

The theorem may be applied to partial ranking data consisting of out of elements ranked by using the subgroup that fixes the first elements and varies the remaining ones. Table 1 shows the ranks of the projections for the first three values of . The reader may note that Diaconis [1] provides essentially the same numbers as in Table 1, though not obtained through projections. For , only two frequencies are involved, each with rank 1. The dimension of the representation is , and consequently the projection has only degrees of freedom. Therefore, the degrees of freedom for first-choice-only data () are divided between the one-dimensional “d.c.” value obtained for frequency and the degrees of freedom for .

The theorem and table are illustrated with examples in the next section.

We examine the roles that magnitude and phase play in partial ranking data by appealing to the more familiar DFT for intuition. If is the DFT of real-valued data , with magnitude-phase decomposition , then the inverse DFT of the magnitude alone is the zero-phase signal The zero-phase signal has certain properties: its peak value occurs at the origin since ; it is symmetric with respect to sign inversion, since . We may shift the peak of from to any desired location by applying the linear phase shift . The resulting linear phase signal is The properties of the linear phase signal are now as follows: its peak value occurs at ; it is symmetric about since . In other words, we see that, in the absence of phase, the basic components add directly to peak at the starting point, and by shifting the starting point to any given location produces a linear phase version of the signal. Analogous to the zero-phase signal, we may define the unit-magnitude signal by applying the inverse DFT to only the phase: For the DFT, magnitude, and phase, each contains half the degrees of freedom of the original signal, and therefore both are equally important to exact numerical reconstruction. The concepts discussed also apply for the symmetric group as we now show.

Using the inverse transform (2.3), we define the zero-phase signal on corresponding to the data as Noting that for the identity permutation, we see that the positive semidefiniteness of implies that for every orthogonal matrix , as easily seen by using the eigen-decomposition and applying the circular invariance of trace. Consequently, for all . Furthermore, there is inversion symmetry since due to the trace property . The properties of a zero-phase signal are formally similar to those of an “autocorrelation,” which we define on as follows: The connection between zero-phase signals and autocorrelations is made clear in a theorem stated below.

Reasoning as above, we see that we may shift the peak of the zero-phase signal to any given permutation by the linear phase transformation , resulting in the linear-phase signal Properties of the linear-phase signal are established in the following theorem, the proof of which is given in an earlier paper [8].

Theorem 3.2. For every real-valued function on with Fourier transform , we have the following. (a) The transform is symmetric with respect to matrix transpose if and only if is symmetric with respect to inversion: (b) is positive semidefinite for all if and only if there exists a function such that is the autocorrelation of , that is, using the notation of (3.6). (c) Symmetric functions are precisely those with linear-phase transforms: there exists such that with if and only if for all .

The theorem shows that each linear-phase signal is inversion symmetric about its peak location , that is, . As above, we may define the unit-magnitude signal by using only the phase in the polar decomposition in the inverse DFT on as follows: Noting that the polar decomposition of an matrix places degrees of freedom in the positive definite matrix and in the orthogonal matrix , we see that magnitude is slightly more important (by ) to numerically reconstructing full ranking data. However, the situation is much different when partial rank data is involved. By examining Table 1 and using Theorem 3.1, we show that the unit-magnitude signal is nearly complete in the case of first rank data.

Theorem 3.3. If is top choice only data () on , then there exist constants and such that .

The proof follows after noting that, by Theorem 3.1 and Table 1, the magnitude is a scalar, so that and .

4. Examples

Consider the group used for the APA data shown in Figure 1. For the top two choice data (), the ranks of the projections in Table 1 show that the degrees of freedom are allocated as follows: 1 in the d.c. term ; in the term ; the remaining 11 degrees of freedom allocated as 5 and 6, respectively, in each of the Fourier coefficients for and . By choosing a basis in which , we obtain the following for the nonzero entries of the Fourier coefficient and its magnitude (rounded to integers): Each matrix is actually , and the zero entries are not shown.

To illustrate the properties of phase for partial ranking data on , we reconstruct each of the partial rank signals in Figure 1 using only zero and linear phase and show the results in Figure 2. In Figure 2(d), we see a strikingly good fit between the partial rank data with two preferences and its linear phase approximation: numerically, we have , where is the norm. This suggests that the phase structure of the two-preference data is relatively simple, and the inversion symmetry property indicates that voters are equally content with transposing the order of the two top preferences moving away from the peak. The result is made more interesting by noting that, of the 20 degrees of freedom in top-two preference data, only 6 are constrained by the magnitude spectrum given by the matrices; hence, adding only the linear phase term necessary to shift the peak should not be sufficient to reconstruct 92% of the signal, but it is.

The different levels of fit between the partial rank data and its linear-phase approximations may be understood also by considering the degrees of freedom involved. On the domain for the APA data, first preference data has 5 degrees of freedom. From Table 1, we see that are two frequencies involved, both with rank 1. As discussed above, is a scalar. Consequently, the magnitude spectrum constrains 2 out of the 5 degrees of freedom. The case is discussed above, and, for , we have that 24 out of the 60 degrees of freedom are constrained by magnitudes. However, as increases, the degrees of freedom for the magnitude spectrum do not increase, because the ranks of the projection matrices are independent of . For example, for , the magnitude spectrum for top three choices data () constrains only 24 out of the 117,600 degrees of freedom. Consequently, for three choices data with large , the phase spectrum by far exceeds the magnitude component in constraining data.

To illustrate the role of phase for top-choice data for large , we examine the college rankings from 2009 by US News and World Report that is available online (http://supportingadvancement.com/potpourri/us_news_and_world_report/us_news_rankings.htm). In this data, American universities are ranked on 17 numerical categories, including acceptance rate, percentage of classes with fewer than 20 students, and alumni giving rate. We consider each category as a voter giving a vote to only the university having the top category value. In the event of ties, which happens only in one category—the percentage of need met for full time students, where 23 universities met 100% of the need—all of the universities having the top value were given a vote. Figure 3 shows the data is poorly fit by with zero phase, as expected, but the shape of the data is well fit up to a scale factor by the unit-magnitude signal as expected from Theorem 3.3.

5. Discussion

We have seen in the previous section that the fit between partial ranking data and its linear phase approximation can be surprisingly good, especially in the case of the APA data for . The quality of linear phase fit is not limited to partial rank data. Full ranking data, which are discussed in [8], may also show a good linear phase approximation. Consider the German survey data, which consists of full rankings of four items by 2,262 voters [14]. Figure 4(a) shows that the data is well reconstructed by a linear-phase approximation; in fact, the linear-phase approximation reproduces 93% of the original signal as measured by . Similarly, Figure 4(b) shows that the full ranking data for the APA election is well approximated () by its linear-phase version. However, with full ranking data, the magnitude spectrum dominates: on , as with the German survey data, we obtain 17 out of the 24 degrees of freedom in the full-ranking data from the magnitude spectrum, while, on , we obtain 73 of the 120 or 62.5% of the d.o.f. from the magnitude spectrum of full rankings. Therefore, with full ranking data, we should not be as surprised by the quality of fit by linear-phase approximation as we might be with partial rank data.

It is reasonable to wonder what we gain by approximating data that we already have in exact form. Diaconis [1] states a general principle in analyzing data: “if you've found some structure, take it out, and look at what's left.” The results in this and the previous section show cases where linear-phase structure exists in full rank and, more surprisingly, given the degrees of freedom argument, in partial-rank data. The high level of fit in the cases we have analyzed suggest that, once we remove the linear phase structure, there is little left. it would be interesting to apply linear phase approximation to a larger variety of data sets to see whether such symmetry is common. Also, a potential application of the linear-phase formulation is that it provides a way of reasoning about ranking data with reduced complexity, where phase is essentially eliminated except for a single component. It would be interesting to apply the linear phase approximation as a simplifying means to compare graphs up to relabeling of data [7].

5.1. Complexity

One of the limitations of ranking data is that the size of the domain increases as , making it impractical to capture a complete set of fully ranked data for much larger than 10. Furthermore, the complexity of the group theoretic FFT for is , as shown in Maslen [15, Theorem  1.1]. This is very difficult to compute for . However, partial ranking data and their spectral analysis allow data for much larger to be analyzed. For example, the number of data points for the top 3 out of choices is , which remains tractable for up to 100. Maslen [15] showed that the group-theoretic FFT on when adapted for has complexity; in comparison, the ordinary FFT on for can be completed in 3 seconds on a 2.6 GHz quad-core Xeon processor. Therefore, we see that processing only partial rank data allows a capability of roughly an order-of-magnitude increase in over fully-ranked data. If we restrict to only top choice data (), then there is a linear-time algorithm for computing the Fourier transform [16].

Knowing the complexity of the transform helps to determine the complexity of either the zero-phase (3.5) or the unit-magnitude (3.9) approximations. Each of those approximations requires the following three steps: computing the forward transform, separating each coefficient matrix into magnitude and phase components, and computing the inverse transform. The inverse transform has the same complexity as the forward transform. The magnitude-phase separation requires performing an SVD of each matrix coefficient, followed by two matrix multiplications for the magnitude, or one for the phase. The cost of each SVD is , where is the size of each representation . Unfortunately, there are no simple, closed-form, expressions for . However, when using partial rank data, the number of coefficients involved is relatively small due to the projection property. From Table 1, we see that there are only coefficient matrices for top-three choices data (), the largest of which has rank . Note that the ranks listed in the Table are independent of . We may use reduced SVDs for these coefficients, resulting in efficient calculation of the magnitude-phase separation due to their low ranks. Consequently, for large , the cost of either the zero-phase or the unit-magnitude approximation is dominated by the cost of the forward and inverse transforms, which are each for top-three choice data.

5.2. Approximation and Compression

It is reasonable to wonder whether we may obtain signal compression by approximating partial rank data by either (3.5) or (3.9). Clearly, for large and small , the zero-phase approximation (3.5) is poor because magnitude constrains only a small number of degrees of freedom, as described in the previous section. Conversely, the phase spectrum constrains much of the data; as discussed previously, phase constrains all but 24 of the 117,600 degrees of freedom for , meaning that it is really a very minor compression. To summarize using (3.5) to replace the signal is too much compression, while using (3.9) is too little compression.

The error in approximating data with either its zero-phase version (3.5) or unit-magnitude (3.9) may be determined as follows. Considering the inverse transform on is determined by Fourier coefficients , we see that the error in zero-phase approximation is governed by , where the norm means the sum of squared entries. Consequently, due to the submultiplicative property of the matrix norm, we estimate a relative error at each of Here, is the dimension of , and we used the identity . A similar calculation for unit-magnitude approximation shows that the error at each is These are weak upper bounds, and it would be desirable to improve on them in future work.

6. Summary

This paper analyzes the properties of the Fourier spectrum for partial ranking data and shows that the transform coefficients satisfy a projection invariance. The coefficients may be converted to magnitude and phase components, with the magnitude also showing projection invariance. We show that first rank data is essentially determined by its phase spectrum, but that as increases, the phase dominates magnitude in forming partial rank data.

Acknowledgment

The author thanks the anonymous reviewers for their comments, which greatly improved the paper.