#### Abstract

Understanding the mechanisms underlying the formation of cultural traits is an open challenge. This is intimately connected to cultural dynamics, which has been the focus of a variety of quantitative models. Recent studies have emphasized the importance of connecting those models to empirically accessible snapshots of cultural dynamics. In particular, it has been suggested that empirical cultural states, which differ systematically from randomized counterparts, exhibit properties that are universally present. Hence, a question about the mechanism responsible for the observed patterns naturally arises. This study proposes a stochastic structural model for generating cultural states that retain those robust empirical properties. One ingredient of the model assumes that every individual’s set of traits is partly dictated by one of several universal “rationalities,” informally postulated by several social science theories. The second, new ingredient assumes that, apart from a dominant rationality, each individual also has a certain exposure to the other rationalities. It is shown that both ingredients are required for reproducing the empirical regularities. This suggests that the effects of cultural dynamics in the real world can be described as an interplay of multiple, mixing rationalities, providing indirect evidence for the class of social science theories postulating such a mixing.

#### 1. Introduction

A solid theoretical understanding of how preferences form is currently lacking. There is little doubt that preferences, opinions, values, and beliefs, which are generically referred to as “cultural traits,” are dynamical entities, and that interpersonal social influence plays an important role in driving their dynamics, among other factors. Moreover, a complete theoretical understanding should account for the fact that the dynamics of traits takes place in parallel along multiple dimensions, namely, that opinions and preferences can develop in relation to multiple topics or aspects of life. Along these lines, various dynamical models have been developed and studied [1], such as the Axelrod model [2], which is very representative for studies of multidimensional dynamics, commonly referred to as “cultural dynamics,” in contrast to studies of unidimensional dynamics, commonly referred to as “opinion dynamics.” Various studies of cultural dynamics extending the Axelrod model can be found in the literature [3–11]. Recent studies [12–14] have shown that models of cultural dynamics are sensitive to the initial conditions, namely, to how the initial vectors of agents’ traits are chosen: initial cultural states constructed from empirical data show systematic deviations from their shuffled and random counterparts. In fact, [14] argues that such deviations point towards universal structural properties inherent in empirical cultural states. More insights about the formation of cultural traits should be achievable by studying these states, since they can be regarded as partial snapshots of cultural dynamics in the real world.

The universal properties mentioned above are expressed in terms of the effects the empirical cultural state has on social influence models using it for their initial conditions—here, a “cultural state” is a set of cultural vectors (SCV), where each cultural vector encodes the sequence of cultural traits associated with one agent in the model. On one hand, an Axelrod-type model [2] of (multidimensional) cultural dynamics is used to evaluate the propensity of the cultural state to long-term cultural diversity (LTCD). On the other hand, a Cont-Bouchaud-type model [15] of (one-dimensional) opinion dynamics is used to evaluate the propensity of the cultural state to short-term collective behavior (STCB). Both measures are functions of a common parameter , controlling for the range of social influence in cultural space, which allows for an LTCD-STCB correspondence to be drawn for a given cultural state. It turns out that an empirical cultural state generally induces an LTCD-STCB curve that is close to the second diagonal , while exhibiting, for a given STCB value, higher LTCD values than a trait-shuffled cultural state, which in turn exhibits higher LTCD values than a randomly generated counterpart [12, 14]. These results seem universal [14], namely, independent of the dataset used for constructing the cultural vectors composing the empirical cultural state, suggesting that real-world cultural dynamics are governed by universal laws. Moreover, as argued in [14], this type of analysis suggests that interagent social influence, the essential ingredient of cultural dynamics models, is insufficient for explaining the observed structure. Although it is meaningful to incorporate additional ingredients into social influence models, while attempting to give rise to empirical-like structure in a dynamical setting, this study does not aim for that. Instead, it aims at providing an effective, phenomenological, static description of the observed structure, which should provide additional insights before developing a more fundamental, dynamical description.

The purpose of this study is to develop a structural stochastic model that would generate realistic cultural states, while incorporating plausible ingredients from social science. Specifically, these states should retain the universal properties inherent to empirical cultural states that are observed in [14]. In fact, [13] has already investigated various ways of generating sets of cultural vectors in random, but nonuniform, ways. A method that appeared particularly promising relied on the notion of “cultural prototypes”: a few underlying, abstract sequences of logically compatible, self-enforcing cultural traits, which govern the way the generated vectors are distributed in cultural space. According to the method, each cultural vector is partly a copy of one of the prototypes and partly random. The implicit claim is that each cultural prototype is induced by one of a few (3 to 5) fundamental and universal “principles of social life,” or “rationalities,” that would strongly affect any process of trait formation in any social system. Such entities are postulated, under different names and in slightly different numbers, by several theoretical frameworks in social science [16–20]. The exact number of such entities depends on the exact theory that is considered, as different theories are built on somewhat different arguments and pieces of evidence. It is important that the number is larger than 1 but not too large, while independent of system size. From a natural science perspective, such ideas are attractive, since they exhibit a certain reductionist tendency of trying to understand the observed sociocultural variability in terms of combinations of a few, elementary, and universal building blocks. Various parallels and similarities between these theories are discussed in the literature [21–23]. For the purpose of the current study, all these theories are equivalent. Still, for creating an instructive and compact context, the discussion is restricted to one of them, namely, to Plural Rationality Theory, chosen for reasons discussed in Section 5.

Plural Rationality Theory (PRT), also referred to as “(Grid-Group) Cultural Theory” [16], is a qualitative description of sociocultural structure and dynamics as an interplay between a small number of irreducible “ways of life,” or “rationalities.” These ways of life are understood as abstract, “elementary building blocks” of societies and are supposedly recognizable regardless of the geographical context, of the historical context or of the scale of the system that is studied. It is believed that the ways of life go along with different perceptions of risk [24, 25] and, interestingly, that they always coexist, although either of them is often dominant for a given period of time, for a given (part of the) system that one studies. (It may be useful to think of the ways of life as being the elements of a complete, orthogonal basis of some abstract vector space; one may then associate a vector in this space with a certain part of a certain sociocultural system, at a given moment in time; it is not clear to what extent such vectors would be related to the cultural vectors used in this study; this is only a semiformal analogy that is not exploited further here, nor in any other study so far, to the extent that the authors are aware of.) Such ideas appear compatible with recent empirical findings concerning the existence of a small number of behavioral phenotypes in dyadic games [26]. In PRT, each way of life is understood as a self-enforcing combination of a “pattern of (social) relations” and a “cultural bias.” On one hand, a pattern of relations is often understood as a tendency of organizing the social ties between people in a certain way, thus a connectivity pattern in the social graph. On the other hand, a cultural bias is a combination of preferences, opinions, values, and beliefs that are compatible with each other and with the associated pattern of relations. By comparison to the definitions in [14], one can easily realize that a cultural bias can be thought of as a point or a region in “cultural space” that is representative for the respective “way of life.” A cultural bias is formally represented here by the notion of “cultural prototype,” previously used in [13].

This notion is at the core of two stochastic, structural models of culture that are defined and studied here. The first model, called “Prototype Generation” (PG), postulates that each cultural vector is partly a copy of one of the prototypes and partly random. This generation method is similar to the “Prototype Evolution” method of [13], though with small technical differences. The second model, called “Mixed Prototype Generation” (MPG), postulates that each cultural vector is an asymmetric mixture (or combination) of all the prototypes. From the perspective of PRT, this “mixing” is a formal realization of the idea that every person combines the ways of life in a unique way, such that preferences and opinions related to different aspects of life—cultural traits of different cultural features (or variables)—are due to the “influence” of different cultural biases, though at any given moment in time one cultural bias is usually dominating. In the literature concerned with PRT and the other, similar, theories, this mixing aspect often goes under the name of “the multiple self” and was not implemented in [13]. The importance of mixing for correctly interpreting (and testing) PRT has been already stressed on [25], while the general importance of multiple selves for social science has also been extensively discussed [27]. Moreover, research on preferences in economic contexts also suggests that the multiple self is important [28–30]. On the other hand, research in cross-cultural psychology appears to be divided: some studies seem to ignore the multiple self [31], while others seem to acknowledge it [32, 33]. This study provides further insights on this matter, by directly comparing the PG and MPG models with each other and with empirical data.

Section 2 explains the models in detail, while Section 3 describes how the free parameters are tuned, so as to reproduce some lower-order properties of one empirical cultural state. Cultural states generated with the two models are then evaluated, in Section 4, by means of the LTCD-STCB analysis of [12, 14]. It is shown that cultural states generated by PG are structurally dissimilar to the empirical ones, as they do not exhibit the universal LTCD-STCB behavior, after tuning the free parameters to empirical data in terms of simpler, but meaningful, quantities. On the other hand, cultural states generated with MPG are structurally similar to the empirical ones, as they reproduce the universal LTCD-STCB behavior, after applying an analogous tuning procedure. This suggests that the mixing, multiple-self ingredient is crucial for describing the effects of preference formation in terms of cultural prototypes and that MPG should be regarded as the successful model. Section 5 further discusses the results, their limitations, and extensions of this work and questions that are worth investigating in the future. The manuscript is concluded in Section 6.

#### 2. Model Description

This section describes the two stochastic models of culture: the Prototype Generation (PG) model and the Mixed Prototype Generation (MPG) model, which are used below for generating sets of cultural vectors (SCVs) that can be quantitatively studied with the LTCD-STCB tool, previously applied to empirical SCVs in [12, 14]. Both models rely on the concept of cultural prototype introduced above.

An SCV can be visualized as a table of cultural traits, where the columns correspond to cultural vectors (or sequences) and the rows correspond to cultural features (or variables). If the SCV is constructed from empirical data, the columns correspond to real people that are sampled by a social survey, while the rows correspond to questions that are asked in the social survey. This is illustrated by Figure 1, which is explained in detail below. Consistently with [14], a “cultural space” is the set of all possible cultural vectors (or combinations of traits) allowed by the given set of cultural features: one combination of traits is one point in this discrete space. For the purpose of this work, the general set-up is restricted to cultural spaces defined in terms of features that are exclusively nominal. In this setting, distances between points in the cultural space are given by (5) of Section 3. Disregarding ordinal features makes the modeling paradigm compatible with the (arguably strong) assumption that one prototype corresponds to one point in cultural space, meaning that a prototype picks up one and only one trait of any given feature. Other limitations of this assumption are extensively discussed in Section 5, together with possible ways of relaxing it, for the purpose of generalizing the current modeling paradigm in future work.

The two models are schematically illustrated in Figure 1. The figure first shows a sketch of an empirical SCV, where the rows correspond to cultural features, the columns correspond to cultural vectors, and the letters correspond to cultural traits—the th row shows the traits of the agents that are expressed (or formulated) with respect to the th feature. Then, it shows a set of 3 cultural prototypes (their number could have been different), in 3 different colors, all of them spanning over all features (or questions) relevant for the empirical set of vectors. Finally, it illustrates a typical set of vectors generated using the PG method, followed by one generated using the MPG method. The colors distinguish between the prototypes, while indicating how the traits are copied from the prototypes to the cultural vectors, while black denotes traits generated in an explicitly random way (uniform distribution, independently of the prototypes).

There are several things worth noting in relation to Figure 1. First, the possibility that two or more prototypes pick the same trait for a certain feature is allowed by the current modeling paradigm (note that any of the traits that can be copied from one of the prototypes can also be generated via explicit randomness). This is essential for controlling the average prototype-prototype distance, as will become apparent below. Second, a PG vector is partly copied from one prototype and partly generated in an explicitly random way, while a MPG vector is a mixture of copies from all the prototypes, with one dominating prototype and with few traits generated in an explicitly random way. Third, both models make use of another type of randomness, in addition to the explicitly random trait generation and to the randomness involved in generating the prototypes. This randomness has to do with assigning every trait of every vector to a “prototype of origin,” once the random generation fraction and the influence fractions of the prototypes are specified. In the case of MPG, it is mainly this trait-assignment randomness that allows for the generation of a multitude of distinct cultural vectors from a small set of fixed prototypes, in the presence of little explicitly random trait generation.

The procedure for generating the cultural prototypes is the same for both the PG and the MPG models. One needs to specify the number of prototypes , as well as the value of another parameter , which controls for the expected cultural distance between the prototypes. This parameter governs the expected number of overlaps (or coincidences) between prototypes in terms of how they are distributed over the traits of a specific feature. In the extreme case of , all prototypes pick the same trait for every feature, yielding the smallest possible separation between the prototypes in cultural space (which coincides with the minimum of 0 allowed by the cultural distance definition in (5)). In the other extreme case of , the prototypes are distributed as uniformly as possible over the traits of every feature, yielding the largest possible separation between the prototypes in cultural space (which only coincides with the maximum of 1 allowed by (5) if the number of traits is larger than or equal to the number of prototypes for every feature). This is achieved by a formulation in terms of the set of integer partitions describing the possible ways of distributing the prototypes over the traits of a certain feature. The parameter actually controls the probability distribution over the set , via the “compactness” of the integer partitions in this set. Appendix A.2 precisely describes how these probabilities are assigned and how the set is computationally generated in the first place, for any combination of and . Once the prototypes are chosen, everything else is conditional on them, for both models.

According to the* Prototype Generation (PG)* model, each cultural vector is a partial realization of one of the prototypes. Each of the cultural vectors is generated by copying a random sequence of traits from one of the prototypes, while generating the other traits in a uniformly random way—choosing the prototype is done randomly for every vector. Then, a subset of the features of length is randomly and independently selected for each vector and the traits of these features are copied from the prototype to the vector. Here, “round” returns the integer that is closest to its argument, while is a third model parameter, in addition to and (which are already needed for the purpose of specifying the prototypes, in the manner described above). The parameter specifies the fraction of traits that are directly copied from the prototype, thus controlling for the expected distance between a vector and its prototype. The traits for the remaining features are generated randomly and independently, according to uniform feature-level probability distributions—the explicit random generation mentioned above. Thus, also controls for the amount of explicitly random generation of traits. The PG method effectively specifies that there are “classes” of cultural vectors and those of a certain class are located at a certain, -controlled average distance from the associated cultural prototype. This is similar to the “Prototype Evolution” method of [13], although there are small differences in how exactly the vectors are generated in the two cases. Moreover, the method of [13] did not allow for controlling the expected cultural distance between the prototypes.

According to the* Mixed Prototype Generation (MPG)* model, each cultural vector is a combination of all prototypes, though an unbalanced combination, meaning that the numbers of traits copied from the different prototypes are deliberately unequal. The extent of this discrepancy is explicitly controlled via the third model parameter, which, like for PG, is called . Although the exact definition and usage of the parameter are different in MPG than in PG, its role is quite similar. Specifically, also in the context of MPG, (indirectly) controls for the fraction of traits copied from the dominating prototype to the vector: more traits are copied from the dominating prototype if the discrepancy between the prototypes is higher. In addition to traits copied from the prototypes, there are traits that are generated in an explicitly random way, but in a small number. For each generated vector, this number is by construction not higher than the number of traits copied from the lowest-contributing prototype. Consequently, if there are prototypes, the number of traits generated via explicit randomness does not exceed . Thus, is an upper bound for the fraction of explicit randomness in an entire set of cultural vectors generated with MPG. It is also important to note that, like for PG, this fraction is controlled by and that the upper bound is reached when is in the limit of minimal imbalance. The limited usage of explicitly random trait generation by MPG means that cultural vectors are more strongly constrained by the prototypes, compared to PG. Still, MPG allows for generating a large variety of possible cultural vectors, since the prototypes can mix in many different ways.

The MPG model needs a procedure of specifying, for each generated vector, the values of the numbers of traits that are to be copied from the prototypes, along with the number associated with explicitly random generation. These positive, integer numbers should add up to and have their discrepancy controlled by the parameter. Moreover, there is no reason to believe that the sequence of numbers associated with one value should be the same across all generated vectors, so randomness should be involved in choosing these numbers. Therefore, the model needs a probabilistic way of drawing random, positive integers satisfying , such that their expected discrepancy is controlled via a single parameter . The procedure chosen for this purpose is described below.

This procedure heavily relies on isometrically mapping the discrete set of integers to the interval of the real axis. For each generated vector, the latter interval is split into parts, by performing “cuts” in randomly chosen points. In this manner, a sequence of preliminary weights , subject to , is numerically obtained. These weights are obviously independent of and have a fixed expected discrepancy. A -dependent transformation (explained below) is applied on the preliminary weights , thus providing a sequence of -dependent weights satisfying , with expected discrepancy controlled by . Finally, the sequence of -dependent weights is converted back into the desired sequence . This final operation is nontrivial, requiring a self-consistent, joint rounding procedure, which is generally difficult to choose, since one cannot generally ensure that —a nontrivial problem of weight discretization. Here, a simple, pragmatic choice is made: converting the lowest weights into the closest, lower integer, while converting the highest weight into the integer needed for satisfying the summation constraint—this ensures that the highest weight, which should correspond to the dominating prototype, is converted into the highest integer.

The only aspect of MPG remaining to be explained is how the -dependent weights are obtained from the preliminary weights . This is done by raising the latter to a common power and then normalizing:where the common power controls for the average discrepancy between these weights and maps to via where the tangent is a convenient choice of a smooth, continuous function, with the appropriate domain and range. Thus, a value implies a value and a higher discrepancy of than that of , while a value implies a value and a lower discrepancy of than that of .

Before describing the fitting and the outcomes of the PG and MPG models, it is worth summarizing a few important aspects. Both models rely on the notion of cultural prototypes, which is currently formalized in a simplistic manner, which is only sensible for cultural spaces defined exclusively in terms of nominal features. The procedure for generating the prototypes is the same for both models and relies on two parameters, and , which specify, respectively, the number of prototypes and the expected distance between them. The differences between PG and MPG consist in how the cultural vectors are generated conditionally on the prototypes: for PG, every vector is in part a copy from one of the prototypes and in part explicitly random; for MPG, every vector is an imbalanced mixture of all prototypes and explicitly random to a much lower extent, which is how the “multiple-self” ingredient is implemented. Nonetheless, in both cases, there is a third model parameter, , which governs, in different ways, the lengths of the randomly selected subsets of features whose traits are copied from the prototypes. In both cases, effectively controls for the expected distance between a vector and its (dominating) prototype, as well as for the fraction of explicit randomness.

#### 3. Model Fitting

Before applying the LTCD-STCB analysis on SCVs generated with either the PG or MPG models, it is useful to somehow constrain some of the free model parameters. This is done in terms of statistical quantities simpler than the LTCD and the STCB measures, which can be evaluated on both empirical SCVs and the model SCVs. On the empirical side, the quantities are averaged over several, empirical SCVs constructed by randomly selecting cultural vectors from the 13000 available ones in the Eurobarometer dataset [34], with restriction to the nominal features—let “()” stand for the nominal part of the Eurobarometer dataset. The empirical data is formatted according to the procedure explained in [14]. On the model side, these quantities are averaged over many SCVs, of the same size , which are realizable in the cultural space of (), for the given combination of parameters—the prototypes are independently generated upon creating every model SCV.

The two simple quantities in terms of which the models are tuned to empirical data are the average and the standard deviation of the intervector distances in the SCV, which are here denoted by “AIVD” and “SIVD,” respectively:where is the number of cultural vectors and is the cultural distance, as defined and used in [12–14]. The notation denotes that the respective summation is carried out over all distinct pairs . In the case of a fully nominal cultural space, with which this study is dealing, reduces to the Hamming distance between the two sequences of symbols encoding cultural vectors and :with taking values within the interval. Here, iterates over the nominal features, , are the traits of vectors and with respect to feature , and stands for the Kronecker-Delta function. The second equality shows that the cultural distance can be expressed as an average over feature-level contributions, which becomes useful below. Previous work has shown that an empirical SCV is characterized by a lower AIVD than its random counterpart and a higher SIVD than both its random and shuffled counterparts [12, 13]. The AIVD and SIVD quantities, which incorporate pairwise distance information, are conceptually different than what is often used in the context of cultural dynamics and of the Axelrod model, namely, the size of the largest connected component, which can be regarded as an overall measure of similarity. Instead, the latter is somewhat similar to the STCB quantity explained and used in Section 4.

It is instructive to see that the expressions of AIVD and SIVD can be rewritten in the following way:by using a feature-level cultural distance introduced via (5)—the transition from (4) to (7) was suggested by the SI of [12].

Note that the AIVD can be understood as an average over feature-level AIVD contributions, which are represented by the expression within the -summation of (6). It can be checked that the (nominal) feature-level AIVD contribution is a measure of how uniformly the vectors are distributed over the possible traits of that feature. This is more obvious when expressing the expected value of the AIVD contribution in terms of probabilities associated with the traits, which is shown in (8) below. Thus, for an empirical SCV containing only nominal features, the AIVD is a measure of average uniformity of the empirical frequency distributions associated with the features. Consequently, the AIVD is also a measure of how subjective the questions/topics associated with the features are on average—when the frequencies of possible answers are more similar to each other, there is less justification to talk about a “better,” a “more correct,” or a “more agreed upon” answer, so the question is inherently more subjective.

Also note that, in (7), the quantity inside the average over pairs of features is the covariance between features and , defined in terms of the feature-level cultural distances. Given that this quantity is averaged over all possible pairs of features and that the square-root is a monotonous function, the SIVD encodes information about the pairwise correlations between features, though in a somewhat indirect way.

For both models, the choice made here is that of(i)tuning the parameter in terms of the AIVD quantity (see (3), (6)), for any combination of values of the and parameters;(ii)tuning the parameter in terms of the SIVD quantity (see (4), (7)), for any value of the parameter, based on the previous fitting of in terms of AIVD;(iii)simply repeating the tuning (and the LTCD-STCB analysis in Section 4) for several values of .

This implies that, for every value of , the tuning (or fitting) is done at two levels: the -AIVD level and the -SIVD level, the former being nested into the latter. In practice, the fitting is carried out automatically, using a nested, 2-level algorithm that relies on a modified bisection-type method for each level. The algorithm is precisely described in the supplementary materials (available here). In order to work, this approach relies on the assumption that there is one, unique solution for the fitting problem, for every value of . This uniqueness is demonstrated via Figures 2 and 3, which are also used for providing a general intuition of how the fitting works and of how the AIVD and SIVD quantities depend on , , and , for the two models.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

**(a)**

**(b)**

Before entering this description, it is worth mentioning that the computer time for the fitting algorithm is greatly reduced by being able to evaluate the average (model) AIVD quantity analytically, in a manner that properly accounts for all SCVs that can be generated for any combination of , , and . While the calculation is described in detail in Appendix B, a schematic understanding can already be provided here. The essential ingredient of the calculation is a simple, exact formula for the expected AIVD contribution of one feature of range :which assumes that the probabilities of its traits are all known—see Appendix B for the proof. For a discrete probability distribution, (8) is a measure of uniformity very similar to the Shannon entropy. Conditional on a specific choice of the prototypes, this set of probabilities (thus the feature-level probability distribution) is fully determined by the integer partition describing how the prototypes are distributed over the traits and by the fraction of traits that are randomly generated, the latter being controlled by . In this context, (8) already assumes that an averaging is performed over SCVs generated from the same set of prototypes. One still needs to perform an average of this expression over integer partitions ((B.2) of Appendix B), according to the probability distribution controlled by ((A.3) and (A.4) of Appendix A.1), followed by another average over all features ((B.1) of Appendix B), since different features will in general have different ranges . At a superficial inspection, using a similar approach for analytically computing the SIVD quantity appears very complicated, if at all possible. Numerical calculations are instead employed for computing the (model) SIVD.

Figure 2 deals with the first-level fitting. It shows the dependence of the analytically computed AIVD quantity (see above) on the parameter, for several values, for several values, and for both the PG and MPG models. Moreover, it shows the empirical AIVD uncertainty range (an uncertainty range, as defined in the supplementary materials, is the interval spanned by one standard mean error on each side of the mean) via the horizontal bands in the six panels. Thus, a solution of the first-level fitting is indicated by an intersection between a model curve of a given combination of and and the horizontal band. Note that, for either of the two models and for any combination of and , if a solution exists, this solution is actually unique. In order to understand the behavior implicit in Figure 2, which is explained below, one should keep in mind that AIVD measures the average uniformity of the feature-level probability distributions.

First, it is worth focusing on the AIVD dependence on the and parameters. Note, on one hand, that, for a given combination of and , the AIVD generally decreases with or at least remains constant. This is due to the fact that the AIVD decreases with decreasing distance between prototypes, thus with increasing . For PG, this decrease is stronger for higher values, since for low value the uniformity is anyway high, because of the large fraction of randomly generated traits. For MPG, this -dependence of the decrease is not that strong, since the fraction of randomly generated traits cannot exceed . On the other hand, for a given combination of and , the AIVD generally decreases with increasing . This is due to the fact that the AIVD decreases with decreasing fraction of randomly generated traits, thus with increasing .

Second, it is worth focusing on the AIVD dependence on the number of prototypes . For PG, for a given , a larger number of prototypes implies a higher AIVD, since traits copied from prototypes are more uniformly distributed, but this has a significant effect only for large values, again due to the uniformity being anyway in place for small values. For MPG, the corresponding behavior is more subtle. While, for large values, the AIVD still increases with increasing at a given (for the same reason as for PG), and the AIVD() curves corresponding to small approach the AIVD() curve corresponding to large with increasing , rather than remaining in place (which is the case for PG). This is related to the fact that the upper bound on the fraction of randomly generated traits decreases with increasing , thus decreasing the role of in controlling the AIVD via the uniform component of the feature-level probability distributions.

Figure 3 deals with the second-level fitting. Everything shown in this figure relies on already being tuned (at the first level) such that the empirical AIVD is matched—as apparent from Figure 2, the tuned value depends on and on . Figure 3 shows the dependence of the numerically computed SIVD quantity (with uncertainty ranges) on the parameter, for several values and for both the PG and MPG models. Moreover, it shows the empirical SIVD uncertainty range via the horizontal bands in the two panels. Thus, a solution of the second-level fitting is indicated by an intersection between a model curve of a given and the horizontal band. Note, again, that, for either of the models and either of the values, if a solution exists, this solution is actually unique. The exact technical procedure employed for producing any of the model points in Figure 3 is described at the end of the supplementary information, followed by the explanation of the final choice of values for the and parameters, for use in the analysis of Section 4.

Note that the SIVD increases with for both models and for all values, suggesting that the extent of feature-feature correlation increases with decreasing distance between vectors dominated by the same prototype. For PG, all SIVD() curves meet for some , at which point they also end. No points are plotted for lower because cannot be tuned in terms of AIVD, which can be understood from Figure 2 when noticing the AIVD() curves of low that do not cross the empirical line. For MPG, the SIVD() curve of ends at a value of , before crossing the empirical line, meaning that the MPG model cannot be entirely tuned when only 2 prototypes are used. No points are plotted for higher because cannot be tuned in terms of AIVD, which can be understood from Figure 2, by noticing the AIVD() curves of and high that do not cross the empirical line. This is due to certain limitations of the current modeling paradigm, which are further discussed in Section 5.

#### 4. Model Outcomes

Here, the most important results of this work are presented. The focus is on the LTCD-STCB analysis, applied to sets of cultural vectors generated with the PG and MPG models. The aim is to assess how well the two models reproduce the universal empirical patterns described in [14]. Figure 4 illustrates the results obtained with the two models, whereas Figure 5 summarizes, for comparison purposes, the empirical results, focusing on the nominal part of the Eurobarometer dataset ()—formatted according to the procedure explained in [14].

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

Before describing the results, it is worth recalling the main ingredients of the LTCD-STCB analysis. This is essentially a two-dimensional plot showing the correspondence between the LTCD quantity versus the STCB quantity, both of them being evaluated on empirical, on shuffled, and on random SCVs. Drawing the LTCD-STCB correspondence is made possible by the fact that, for each of the three scenarios, both quantities depend on the bounded-confidence threshold , which controls the maximal cultural distance over which social influence can act. On one hand, the LTCD quantity is a measure of cultural diversity after a long-term process of cultural dynamics driven by -bounded social influence, starting from an initial cultural state specified by the respective SCV. Essentially, it counts the number of distinct points in cultural space (commonly referred to as “cultural domains”) towards which the agents converge in the final state of a minimalist, bounded-confidence Axelrod model. The STCB quantity is a measure of collective behavior (or social coordination) after a short-term process of opinion dynamics driven by -bounded social influence. Essentially, it is the standard deviation of the aggregate opinion distribution of the agent population, resulting from a minimalist Cont-Bouchaud-type model applied to the (cultural) graph obtained by drawing a link for each pair of agents separated by a cultural distance smaller than . Mathematically, the two quantities, as functions of the bounded-confidence threshold , are captured by the following two expressions: where is the number of cultural domains in the final state of the Axelrod-type model, is the number of agents (and cultural vectors), and is the size of the th of connected components in the -determined cultural graph. The average in the LTCD formula is taken over multiple simulations of the Axelrod-type model. The STCB quantity is calculated analytically, once the cultural connected components are found, based on the assumption of independent opinion-agreement within each connected component. An essential difference between the two quantities, reflected in the long-term/short-term distinction, consists of an idealized separation between two time-scales, in terms of the role that the SCV specified as input plays: cultural vectors, together with the distances between them, are assumed to be dynamical by the LTCD definition and static by the STCB definition, such that one deals with dynamics of vectors and with dynamics on vectors in the two cases, respectively. The interested reader is referred to [12, 14] for more details and remarks about the LTCD-STCB analysis.

For both the PG and the MPG models, the and parameters are tuned in the manner described in Section 3 for every value of the number of prototypes , while the latter is simply iterated over. In Figure 4, the LTCD-STCB plot is shown for the values , , and , for the PG (Figures 4(a), 4(c), and 4(e)) and the MPG (Figures 4(b), 4(d), and 4(f)) models. The value is omitted since the and parameters could not be both tuned for MPG with two prototypes. All SCVs are generated using the cultural space of , whose empirical SCVs also served for providing the AIVD and SIVD values in terms of which the tuning was conducted (Section 3).

When looking at Figure 4, one should ask whether the universal, empirical patterns are reproduced by any of the six illustrated model scenarios. Qualitatively, the patterns are defined first in terms of a higher compatibility between LTCD and STCB in the model-generated SCV than in the shuffled SCV and a higher compatibility in the shuffled SCV than in the random one and second in terms of the model-generated LTCD-STCB curve being close to the second diagonal. These empirical features are visible in Figure 5. It is clear that PG does not satisfy these criteria for any value of . Indeed, the model-generated curve is far below the second diagonal for most of the relevant interval and often below the shuffled curve. MPG, however, appears to satisfy all these criteria for all values, although for it is not obvious that the shuffled curve is indeed above the random one, due to the lack of points in the lower-left corner. This has to do with the effective discreteness of the bounded-confidence threshold spectrum, due to the finite number of nominal features available—in other words, it is meaningless to split the axis into intervals that are smaller than the nearest-neighbor spacing of the cultural space lattice. For a direct comparison with analogous empirical curves, one should use Figure 5, which shows the results of the LTCD-STCB analysis applied to data. However, it is only meaningful to compare the qualitative nature of the empirical and the model curves, rather than the exact values, since, as discussed in Section 5, neither model has a maximum-likelihood nature, due to a certain simplicity in the way prototypes are formalized and chosen here. Still, MPG apparently does generate SCVs that are structurally similar to the empirical ones. Thus, the notion of cultural prototypes, even if implemented in a simplistic way, can be used to reproduce the important, universal properties of empirical cultural states, as long as mixing of prototypes is in place.

#### 5. Discussion

The purpose of this study was to develop a way of generating cultural states that reproduce the apparently universal properties of the empirical ones, namely, those described by [14]. This naturally calls for input from social science, in particular from social science theories that are intended to describe universal aspects of culture and society. There is an entire “class” of social science theories that appear relevant for this purpose, originating from either psychology or cultural anthropology [16–20], some of them being explicit attempts at unifying social science. All of them make use of cultural prototypes, though in somewhat different ways, under different names and numbers. Moreover, they had all been overlooked by previous studies of cultural dynamics, on which [14] largely builds: [13] was the first study that connected quantitative studies of cultural dynamics with these theories, via the generic, formal notion of cultural prototypes. For creating an instructive and compact context, this work focused on one of these theories, namely, on Plural Rationality Theory (PRT).

There are several aspects justifying the focus on Plural Rationality Theory. First, its informal notion of cultural bias matches very well the more formal notion of cultural prototype, in the manner used in [13] and here. Second, it is more appealing from a natural science perspective than the others, in particular from a physics and complex systems perspective. This is largely due to various concepts that are qualitatively (and sometimes just implicitly) invoked by PRT, such as the following: energy landscapes, symmetry breaking, graph/network theory, dynamical systems, crossovers (possibly phase transitions), self-organization, and fractals. Third, it explicitly claims to provide some insight into how preferences form: preferences are formed in the process of building social relations, while different patterns of relations (and types of institutional settings) go along with different conglomerates of preferences (the cultural biases). Finally, this dualism between patterns of relations on one hand and cultural biases on the other hand comes along with distinguishing between a “social plane” and a “cultural plane” of interacting human systems, while acknowledging the dynamical nature of both, as well as the strong coupling and interdependency between the two. Thus, PRT seems to resonate well on one hand with research on social network structure and dynamics and on the other hand with research on cultural structure and dynamics.

Up to now, little work has been done to explore either of these two connections. While [13] and the present work are the first steps in exploring the latter connection, some steps have also been taken in exploring the former connection [35, 36]. Note, however, that [13] refers to several theories similar to PRT, without explicitly mentioning PRT, that [36] focuses on a social theory similar to PRT, while still discussing a connection with PRT, and that [35] works with an earlier, more rudimentary version of PRT, which gave less importance to the notions of “way of life,” “rationality,” and “cultural bias.” Although the coupling between social dynamics and cultural dynamics is recognized and studied by quantitative complex systems research (e.g., [9, 37]), this has been carried out in isolation from PRT.

In loose terms, each rationality of PRT has, as a “projection” on the cultural plane, one distinct cultural bias. These cultural biases correspond to the cultural prototypes used in this study. In agreement with [13], a cultural prototype is a combination of cultural traits, thus one point in cultural space—the limitations of this assumption are extensively discussed below. Relying on these notions, two stochastic, structural models of culture are developed and studied here: Prototype Generation (PG) and Mixed Prototype Generation (MPG). It is important that, regardless of which model is used, once the prototypes and the remaining free parameters (parameter , for either PG or MPG) are specified, one implicitly defines a cultural space distribution (CSD): a probability mass function taking the cultural space as a support, as defined in [14]. Generating a set of cultural vectors is then equivalent to selecting points at random according to this distribution. Thus, the resulting cultural states are generated in a nonuniformly random way, with nonuniformities depending on the prototypes and on other model specifications.

For this study, the usage of both stochastic models is restricted to cultural spaces constructed only from sets of nominal features. This is due to the assumption that every prototype picks one and only one trait in any feature, which from a PRT perspective means that, upon answering a question under the influence of one cultural bias, a respondent can only provide one specific answer. In reality, even a specific cultural bias would generally point towards several answers, though with different probabilities, so it would be more realistic to say that every prototype corresponds to one probability distribution defined over that feature. Not allowing for this freedom makes this modeling paradigm incompatible to ordinal features, whose associated traits are by construction sorted along an axis, in which case it is not reasonable to assume that a prototype points to one trait of a feature with full probability and to its nearest-neighbors with zero probability. Nonetheless, the paradigm is reasonably compatible with nominal features, in which case the distance between any two traits of one feature is anyway assumed to be the same.

The current study belongs to a preliminary, simplistic paradigm which makes use of what one may call “sharp prototypes.” A more realistic paradigm, which would account for the probabilistic nature of the cultural biases, would make use of what one may call “diffuse prototypes.” Using sharp prototypes comes at the cost of not having enough flexibility to reproduce the empirical, feature-level frequency distributions, with either of the two models, since every prototype corresponds to a probability distribution entirely peaked on one trait. Instead, using diffuse prototypes would allow this by enforcing, for every feature, that the empirical distribution is a linear combination of the prototype distributions. Nonetheless, as shown in Section 3, both models are still able to reproduce the empirical average uniformity of the feature-level frequency distributions, namely, the AIVD quantity. This is partly due to both models making some use of uniformly random trait generation, independently of the prototypes. This translates to a flat noise component in the probability distribution of every feature, which in a sense compensates for the rigid peaks of the sharp prototypes. When also considering the results of Section 4, the usage of sharp prototypes restricted to nominal variables appears to be enough as a proof of concept. This justifies further research towards the more sophisticated paradigm relying on diffuse prototypes. Although this is left for future studies, it is worth contemplating upon, in order to better understand the purpose, greater context, and limitations of the current paradigm.

Working with diffuse prototypes should go hand in hand with a method of inferring them from data. One can imagine doing this by applying a sensible clustering method on the empirical set of cultural vectors, followed by a sensible method of constructing one diffuse cultural prototype from every cluster, as a probabilistic entity that is representative of that cluster. The main advantage of this approach is that once the prototypes are constructed and provided as input to a sensible stochastic model, the artificial SCVs generated with this model would be close-to-representative of the same distribution in cultural space as the empirical SCV on which the method is applied in the first place. This means that the model would have a maximum-likelihood flavor and could be used for generating synthetic data, which would also reproduce the feature-level frequency distributions.

By contrast, the approximation of sharp prototypes used here is too strong to be employed together with a method of inferring them from data. Instead, sharp prototypes are being assigned to randomly chosen positions in the given cultural space. On one hand, the fact that the prototypes are randomly chosen makes any model symmetric up to any permutation of the traits of any feature, as long as all features are nominal, which is the case here, a symmetry which is broken by an empirical SCV and also by an artificial SCV generated from a specific choice of the prototypes. On the other hand, the fact the prototypes are sharp does not allow for the exact frequency distribution of a specific feature to be reproduced, not even up to a permutation of the traits. Still, after parameter tuning, one should expect from a good model to provide a cultural space distribution whose rough “shape” is compatible with the empirical data, though the “orientation” and the structural details implied, for instance, by the feature-level distributions would not be compatible. This should reflect in roughly reproducing the universal LTCD-STCB patterns emphasized in [14]: on the one hand, the formulation of the LTCD and STCB observables is also symmetric up to permuting the traits of any feature and thus independent of the “orientation”; on the other hand, the empirical, feature-level frequency distributions should heavily depend on the specific dataset, thus being of little relevance for the universal patterns.

There are various aspects that make the random generation of prototypes sensible for the purpose of the present work. First, results are evaluated for various values of the number of prototypes , which is considered a free parameter for both the PG and MPG model. Second, the expected prototype-prototype distance is controlled for via parameter . Third, for every choice of parameters, the prototypes are independently drawn for each realized cultural state in the set used for computing the model AIVD and SIVD quantities for fitting purposes. These compensate somewhat for not inferring the prototypes from empirical data.

In order to give an example of how the sharp prototypes approximation can be pushed beyond its limits, it is worth recalling that fitting the MPG model is not possible for prototypes, as pointed out at the end of Section 3: the parameter can be successfully tuned in terms of the AIVD only for small values, which do not allow for the subsequent fitting of the parameter in terms of the SIVD. This is related to there being at least traits associated with every nominal feature selected from the Eurobarometer dataset, while there are only two, prototype-induced peaks in the model probability distribution of every feature, on top of the uniform component. Since the integrated probability of the uniform component cannot exceed by construction, all the distributions are bound to be relatively nonuniform, such that the empirical average uniformity is only attained for small- (few coincidences between the prototype-induced peaks) and small- (large uniform component) combinations. This does not hold for the PG model, as in this case the integrated probability of the uniform component can attain any value between 0 and 1. Nonetheless, if , the fitting of the MPG leads to generated cultural states that reproduce much better the universal empirical patterns than PG. This justifies considering MPG the successful model, while emphasizing the importance of the mixing ingredient, which validates the multiple-self assumption.

When thinking in terms of the feature-level probability distributions, it might seem that the MPG and PG models are not that different from each other. As mentioned above, for both models, if there are prototypes, the probability distribution of a certain feature would consist of peaks of equal probability contents and of a uniform component associated with the explicitly random trait generation. Although the probability content of the uniform component of MPG is bounded from above, that of PG is not bounded in any way, so one might think that MPG is just a particular realization of PG. However, this reasoning is misleading, as it focuses on partial information encoded in the feature-level probability distributions, disregarding the rest of the information encoded in the complete cultural space distribution. With PG, a cultural vector whose trait, with respect to a certain feature, is generated under the probability peak of a certain prototype will have its trait generated, with respect to another feature, under the well-determined probability peak of the same prototype or under the uniform component. By contrast, with MPG, a cultural vector, whose trait, with respect to a certain feature, is generated under the probability peak of a certain prototype, will have its trait generated, with respect to another feature, under the probability peak of any prototype—though with a higher likelihood under the peak of the dominating prototype—or under the uniform component. Thus, for the same choice of the prototypes and the same extent of explicitly random generation of traits (and consequently the same AIVD), PG implies a different level of cross-feature correlation and a different shape of the cultural space distribution than MPG. This conceptually explains the impact of the mixing ingredient.

Although this study does not attempt at providing a complete mathematical theory of trait dynamics and formation, one can argue that the MPG model qualifies as a good effective, static description of (generic snapshots of) trait dynamics (“effective description of” stands for “description of the effects of,” for “approximate description,” or for “phenomenological description,” as used in the physics literature, rather than for “successful or “efficacious”). This static description is inspired by Plural Rationality Theory which, though originating in cultural anthropology, does seem to integrate notions of both psychology and of a (complex) system based understanding of society. Although it is formulated in an a qualitative, informal way, Plural Rationality Theory and related research should be of use for developing a complete formal theory of trait dynamics, at least as a source of guidance and inspiration.

#### 6. Summary and Conclusions

This study was dedicated to developing and testing a stochastic model for generating cultural states that would be structurally similar to the empirical ones. The aim was to reproduce the universal, empirical properties pointed out in [14], while relying on some social science hypothesis. Following up on previous work, the idea of cultural prototypes was used for this purpose. The study first tested the hypothesis that each cultural vector is a partial realization of one prototype and random for the rest, which is what was previously assumed. This turned out to be insufficient for reproducing the empirical patterns. Instead, one has to assume that each cultural vector is a combination, or mixture, of all prototypes, though still dominated by either of them, which is what the MPG model encodes. This additional, mixing ingredient is actually suggested by the same social science theories that inspired the prototypes idea in the first place. In this specific, social science context, this aspect is often referred to as “the multiple self.” These results provide indirect evidence for social science theories like PRT, which postulate, in one way or another, some notion of cultural prototypes, along with some associated notion of mixing.

Still, there is a certain rigidity in the way prototypes are currently formalized (Section 5), related to the assumption that every prototype corresponds to one and only one value of every cultural variable, instead of corresponding to a probability distribution over the variable. This makes the cultural space distribution induced by the successful MPG model generally incompatible with the cultural space frequency distribution with respect to which it is fitted. As it stands, MPG is far from being a maximum-likelihood type of model and thus cannot be used to generate synthetic data. Nonetheless, this is arguably achievable once diffuse prototypes are used instead of sharp ones, while being inferred from the data rather than randomly chosen. In this sense, this work can be seen as an important step towards a realistic, maximum-likelihood model of empirical cultural states and towards generating synthetic sets of cultural vectors. Moreover, MPG can be considered an effective description of the outcome of trait dynamics, since the generated cultural states seem to reproduce the generic structure of the empirical ones. The LTCD-STCB analysis, used for validating this effective theory, could also be used for validating a more fundamental, dynamical theory of culture. It appears likely that Plural Rationality Theory has more to say for aiding the development of such a theory.

#### Appendix

#### A. Controlling the Generation of Prototypes

This section describes the calculation of probabilities attached to sets of cultural prototypes employed by the PG and MPG models defined in Section 2. These probabilities are collectively controlled via a parameter (), which effectively dictates the expectation value of the average prototype-prototype cultural distance for one set of prototypes. The assignment of traits to prototypes is conducted independently for every feature, so the discussion is reduced to assigning probabilities to prototype-to-trait mappings at the level of a single feature. Furthermore, since generating the prototypes neglects empirical occurrence frequencies of specific traits, the problem is symmetric with respect to permutations of the traits, so the discussion is further reduced to assigning probabilities to “topologies” of prototype-to-trait mappings at the level of a single feature. Mathematically, such a topology is an “integer partition.” Integer partitions turn out to be the mathematical objects to which elementary probabilities are to be assigned. Appendix A.1 explains the procedure for assigning the probabilities to integer partitions, while Appendix A.2 explains the procedure for generating the integer partitions.

##### A.1. Integer Partition Probabilities

*Let * be the set of all integer partitions of elements, where an integer partition of elements is an ordered sequence of integers that add up to , also called “parts.” Let the ordered sequence be one generic element of this set, where counts the number of nonzero parts. This notation implies that the parts are sorted for descending values and that they add up to . For instance, is an integer partition of 8 elements with 4 parts. For the purpose of this work, an element of the integer partition corresponds to one prototype. For a specific choice of the prototypes and a specific feature, an integer partition is a representation of how the prototypes are distributed over the traits of this feature, up to a permutation of these traits. Thus, when the fraction of traits that are randomly generated vanishes, the probabilities of the traits are just the normalized part sizes—in the example above, the ordered sequence of probabilities associated with the traits would be . Random trait generation then simply introduces a uniform noise component to the feature probability distribution, whose contribution increases with the fraction of traits that are randomly generated. Thus, the integer partition is in any case a proxy for the feature probability distribution, regardless of which stochastic model is used.

*Let* be the “compactness” of integer partition , defined by which counts the number of pairs of elements belonging to the same part. For instance, the compactness of integer partition is = . The compactness thus counts the prototype-prototype coincidences for one feature. In light of the above paragraph, a small compactness implies a high uniformity for the feature probability distribution and thus a high value of the associated (feature-level) AIVD contribution.

Let be the set of integer partitions of elements of at most parts (which implies that ). This definition is needed for working with features with range . Furthermore, let and be the minimal and maximal compactness values attainable by the elements of . These notions are needed for normalizing generic compactness values. They formally read where the “.” (dot) notation stands for “with the property that.”

At this point, it is possible to define a nonnormalized probability mass function parametrized by over the discrete set of integer partitions , function whose shape would depend on . High values correspond to integer partitions of high compactness values being favored over those of low compactness values, while low values correspond to integer partitions of low compactness values being favored over those of high compactness values. For simplicity, the function is chosen to be monotonous when reexpressed in terms of compactness. A simple choice for such a function, denoted here by , is given bywhere the inner fraction linearly maps the compactness from interval to interval , while the argument of the function linearly maps from interval to interval , from where it is further mapped to by the function. In this manner, the function is increasing with for (implying a relatively low expectation value of average prototype-prototype separation), the function is decreasing with for (implying a relatively high expectation value of average prototype-prototype separation), and the function is a constant of for . The actual probability associated with integer partition can then be obtained via the normalization:with the sum in the denominator being taken over all integer partitions in .

##### A.2. Integer Partition Generation

Let be the set of all integer partitions of any size, together with a “null” element and a “unity” element , which are meaningful in relation to the operation defined below and are needed for keeping some of the following definitions compact and self-consistent.

Let the integer partition “merging” , acting on two integer partitions of and elements, with and parts, respectively, be defined in the following way: producing another integer partition of elements and parts, such that the sequence of parts in the resulting partition is a sorted merging of the two original sequences of parts. For instance, . Moreover, any integer partition satisfies and .

Let the integer partition “multimerging” , where is the set of all subsets of , be defined by where are all integer partitions. The operation produces a set of integer partitions of elements from an initial set of integer partitions of the same size and another integer partition , by merging with each element in the initial set via the operation.

Relying on the notions above, the following recursive definition of function encodes the procedure for generating the set of integer partitions of elements, of maximally parts, with maximal part value : The definition is inspired by the work [38], where the order of the four cases matters, in the sense that one case is considered only if none of the conditions of the above cases is valid. The last line returns the set that resulted from the reunion “” of all sets of integer partitions of type , where spans the indicated interval. This general formulation, which also takes the maximal part value as argument, is required for a compact recursive definition. But of actual interest for this work is the set of integer partitions of elements and maximal part value , , given by where the last part of the expression takes out the null and/or the unity element, which might be present in the set of integer partitions as leftovers from the computation. Here we explicitly show how the sip function works when calculating the set of integer partitions of 4 elements of maximally 3 parts, given by , whereyielding , which is the expected result.

#### B. Analytic Calculations of Model Average Intervector Distance

This section explains the analytic calculation for the expectation value of the average intervector distance (AIVD) for sets of cultural vectors generated using either the PG or MPG model. The first part of this section just gives the essential formulas—(B.1) and (B.2) are common for the two models; the difference between the models becomes apparent when comparing (B.3) with. (B.4). The second part gives the proof equation (8), which is the basis for (B.2).

The expectation value of the AIVD, as a function of the three model parameters , , and , is given by the average over the feature-level expectation values:where the sum goes over all possible values ranges and is the number of features with range , with being implicitly satisfied, where is the number of features. Note that the feature-level contribution also depends on . In turn, this contribution is given bywhich is essentially a weighted averaging of (8) over the set of integer partitions , where the weights are the integer partition probabilities . These are calculated in the manner described in Appendix A.1, while the integer partitions themselves are generated in the manner described in Appendix A.2. The set of ’s of (8) depends on the integer partition in the manner illustrated between the braces of (B.2), where the first term accounts for the traits that are covered by the (nonzero) elements of the integer partition, namely, those under the peak(s) of one (or more) prototype and under the flat noise component, while the second term accounts for the remaining traits, namely, those that are only under the flat noise component. The dependence on whether the PG or the MPG model is used is captured by , which is the average fraction of traits directly copied from prototypes, given byfor PG, where the “round” function accounts for the fact that only integer numbers of traits can be copied andfor MPG, where iterates over all values of , which is a large sequence of lowest MPG discrete weights (see Section 2), which are numerically generated during a previous step, for each used combination of values. is the number of elements in this sequence of discrete weights. For this study, elements were generated for every combination, which allows for a very precise numerical calculation of in the case of MPG.

The consistency between the analytical AIVD calculation explained above and the numerical calculation is illustrated here via Figure 6. The expected AIVD value is shown as a function of the parameter, for 5 values of the parameter and 3 values of the parameter, for both the PG and MPG models. The analytical values are shown by the lines, while the numerical ones are shown by the dots, which have small, almost indiscernible error bars attached. For the numerical case, 50 sets of cultural vectors are generated for each combination of parameters. Note that the numerical profiles follow closely the analytical ones, with small deviations that are consistent with the expected fluctuations of the mean.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

It is now worth presenting a proof of (8), on which (B.2) is based. Consider a feature with traits and a set of a priori probabilities attached to them. Then, the entry of each cultural vector generated with respect to this feature is an independent, random choice from the traits, according to the probability mass function . Thus, the expected AIVD contribution from cultural vectors is given bywhere denotes the probability that the independent random variables fill the traits with the frequency distribution , given the associated probability distribution , where . This is conventionally called the multinomial distribution. In the above derivation, stands for the summation over all elements of the multinomial except that which has a certain number of entries for the th trait, which can be further manipulated:This shows that is just a term of the binomial distribution. By inserting the final expression of (B.6) in the final expression of (B.5), one getswhich concludes the proof of (8), after using the well-known expressions for the first and second moments and of the binomial distribution. Note that the dependence on is cancelled out during the derivation.

Another, arguably shorter, proof can be formulated with the aid of indicator functions of the type , which gives if cultural vector is an entry of trait and gives otherwise. One can express the feature-level AIVD of one generic set of cultural vectors in terms of indicator functions and write the expected feature-level AIVD as an average of this expression. The part of (8) then appears from an averaging of the product, where and are two arbitrary cultural vectors.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

#### Acknowledgments

The authors are grateful to Maroussia Favre for her thoughtful comments on previous versions of this manuscript. Alexandru-Ionuț Băbeanu also acknowledges discussions with Ulf Dieckmann, Gerard ’t Hooft, Petter Säterskog, Frank Takes, Leandros Talman, Michael Thompson, Marco Verweij and Jorinde v.d. Vis. Diego Garlaschelli acknowledges financial support from the Dutch Econophysics Foundation (Stichting Econophysics, Leiden, Netherlands). This work was also supported by the Netherlands Organization for Scientific Research (NWO/OCW).

#### Supplementary Materials

The algorithm used for fitting the PG and MPG models to empirical data is described.* (Supplementary Materials)*