#### Abstract

Strongly nonlinear stochastic processes can be found in many applications in physics and the life sciences. In particular, in physics, strongly nonlinear stochastic processes play an important role in understanding nonlinear Markov diffusion processes and have frequently been used to describe order-disorder phase transitions of equilibrium and nonequilibrium systems. However, diffusion processes represent only one class of strongly nonlinear stochastic processes out of four fundamental classes of time-discrete and time-continuous processes evolving on discrete and continuous state spaces. Moreover, strongly nonlinear stochastic processes appear both as Markov and non-Markovian processes. In this paper the full spectrum of strongly nonlinear stochastic processes is presented. Not only are processes presented that are defined by nonlinear diffusion and nonlinear Fokker-Planck equations but also processes are discussed that are defined by nonlinear Markov chains, nonlinear master equations, and strongly nonlinear stochastic iterative maps. Markovian as well as non-Markovian processes are considered. Applications range from classical fields of physics such as astrophysics, accelerator physics, order-disorder phase transitions of liquids, material physics of porous media, quantum mechanical descriptions, and synchronization phenomena in equilibrium and nonequilibrium systems to problems in mathematics, engineering sciences, biology, psychology, social sciences, finance, and economics.

#### 1. Introduction

##### 1.1. McKean and a New Class of Stochastic Processes

In two seminal papers, McKean introduced a new type of stochastic processes [1, 2]. The stochastic processes studied by McKean satisfy drift-diffusion equations for probability densities that are nonlinear with respect to their probability densities. While nonlinear drift-diffusion equations have frequently been used to describe how concentration fields evolve in time [3], the models considered by McKean describe stochastic processes. A stochastic process not only describes how expectation values (such as the mean value) evolve in time, but also, provides a complete description of all possible correlations between two or more than two time points [4–6]. Consequently, the study of stochastic processes goes beyond the study of concentration fields or the evolution of single-time point expectation values such as the mean. The observation by McKean that nonlinear drift-diffusion equations can describe stochastic processes opened a new avenue of research.

##### 1.2. From McKean’s Stochastic Processes to Strongly Nonlinear Stochastic Processes

Due to the key role that the nonlinearity has for the processes considered by McKean and due to the fact that the processes satisfy the Markov property, processes of this type have been called “nonlinear Markov processes” (see, e.g., [7]). Moreover, in this context the term “nonlinear Fokker-Planck equations” has also been introduced because the drift-diffusion equations of nonlinear Markov processes exhibit formally the mathematical structure of Fokker-Planck equations. However, the phrases “nonlinear Markov processes” and “nonlinear Fokker-Planck equations” are misleading because they refer both to processes that are subjected to nonlinear forces and to processes that are described by drift-diffusion equations that are nonlinear with respect to their probability densities. In search for a more specific label, two alternative terms have been proposed in the literature. Wehner and Wolfer [8] suggested the term “truly nonlinear Fokker-Planck equations,” whereas Frank [9] coined the term “strongly nonlinear Fokker-Planck equations.” Only recently, it has become clear that the class of stochastic processes addressed by McKean is not limited to be described by drift-diffusion equations of the Fokker-Planck type. Therefore, the research field inspired by the work of McKean is about nonlinear stochastic processes in general. To give processes of this kind a more specific name, we may adopt the suggestions by Wehner and Wolfer or Frank and say that there is a novel class of processes called “truly nonlinear stochastic processes” or “strongly nonlinear stochastic processes.” In what follows, we will use the term “strongly” that should be considered as a synonym to “truly” just to avoid repeating the phrase “truly or strongly” over and over again.

For the purpose of this paper, the following unifying definition of a strongly nonlinear process will be used:

A strongly nonlinear stochastic process is a stochastic process for which the evolution of realizations depends on at least one expectation value of the process or on the appropriately defined probability density of that process.

Note that for strongly nonlinear processes evolving on discrete state spaces the realizations are required to depend on at least one expectation value or on the occupation probability rather than the probability density. However, as we will demonstrate in Section 2, for our purposes the occupation probability of a discrete-state process can be derived from an appropriately defined probability density. Therefore, the definition as given above holds both for processes evolving on continuous and discrete state spaces.

In this review of strongly nonlinear stochastic processes a distinction will be made between four classes of processes. These classes arise naturally if it is taken into consideration that in many disciplines time is not necessarily considered to be a continuous variable. Rather, time might be considered as a discrete variable such that processes evolve in discrete time steps. For example, a process describing the evolution of a stock price on a day-to-day basis is typically modelled by means of a time-discrete process [10, 11]. In short, time can be a discrete or continuous variable. Likewise, the state space of a process might be discrete or continuous. The four possible combinations of time and space characteristics give rise to four classes of stochastic processes. We will address these classes in separate sections, see Table 1.

Before we proceed in Sections 3, 4, 5, and 6 with the systematic presentation outlined in Table 1, we will present in Section 2 benchmark examples of processes for each of the four classes. For the examples discussed in Section 2 a unifying mathematical definition irrespective of class membership will be presented in the final subsection of Section 2. Mean field theory and the mathematical literature will be addressed in Sections 7 and 8. As it turns out, all examples reviewed in Sections 2 to 6 exhibit the Markov property. Therefore, Section 9 is devoted to the review of some more recent studies on non-Markovian strongly nonlinear stochastic processes. In Section 10 some key issues of strongly nonlinear stochastic processes will be highlighted.

#### 2. Benchmark Examples of Strongly Nonlinear Processes and on a General Stochastic Evolution Equation for a Broad Class of Strongly Nonlinear Markov Processes

##### 2.1. Some Notations

At the present stage, it is useful to introduce some notations in order to give some benchmark examples for each of the four classes indicated in Table 1.

Time-discrete processes evolve on discrete time steps . In contrast, time-continuous processes are described by a continuous time variable defined on some interval. For our purposes this interval will go from zero to infinity, meaning that we will consider initial value problems starting at an initial time and extending in time infinitely. Note that the symbol will be used to denote both discrete-valued and continuous time variables.

Processes evolving in discrete, finite state spaces exhibit a number of states labeled by . Let denote the state variable of a state discrete process, which implies that assumes an integer between 1 and . Moreover, let denote the th realization of the process. A fundamental (although not complete) stochastic description of processes evolving on finite, discrete state spaces is given by the occupation probabilities . The occupation probability defines the probability that the state is occupied. It can be computed from a number of realizations in the limiting case of like The function is the Kronecker function, which equals 1 for and equals 0 otherwise. The probabilities defined by (1) satisfy the normalization condition

Processes evolving on finite dimensional, continuous state spaces can be described by a state vector with coefficients . The elements or coordinates are defined on certain domains of definition . For example, a domain of definition may correspond to the real line. An important measure characterizing stochastic processes evolving on continuous state spaces is the probability density defined as the ensemble average over the Dirac delta function . Note that the same symbol will be used for the Kronecker function and the Dirac delta function in order to stress the analogies between the classes listed in Table 1. Mathematically speaking, the probability density can be defined by In (3) the variable is the state vector with coordinates , whereas the elements are the elements of the th vector-valued realization of the process. The probability density satisfies the normalization condition where are the domains of definitions mentioned previously.

Let us return to processes defined on a discrete state space. In addition, we put and assume that denotes the real half-line from 0 to infinity. Formally, the probability density assumes the form where are the occupation probabilities of the states defined in (1). In turn, the occupation probabilities can be calculated from like where is a positive number smaller than 1 (e.g., ). In view of (6) the occupation probabilities may be considered as functions of an appropriately defined probability density , as stated in Section 1. With these notations at hand, some benchmark examples of strongly nonlinear stochastic processes can be given.

##### 2.2. Strongly Nonlinear Markov Processes Described by Nonlinear Markov Chains

In general, the occupation probabilities of time-discrete stochastic processes evolving on discrete state spaces constitute a time-dependent vector with coefficients . For the important special case of nonlinear Markov processes described by nonlinear Markov chains, the occupation probabilities satisfy [12] Here, describes the transition probability matrix of the process. The matrix is an times matrix with coefficients . The coefficients describe the conditional probabilities that transitions from state to state occur given that the process is in the state . The conditional probabilities may depend explicitly on time such that . A nonlinear Markov process is characterized by the property that at least one conditional probability depends at least on one expectation value computed from the process at time step or on one occupation probability . In this case we have as indicated in (7). A complete description of the stochastic process can be derived from (7) as shown in Frank [12]. In view of the definition of a strongly nonlinear stochastic process given in Section 1.2 that is based on realizations of stochastic processes, it is worth while to consider an alternative definition of the process. To this end, the stochastic map can be defined that maps a state to a state like The mapping depends on a random variable that is uniformly distributed on the interval at any time step . Across time, is statistically independent. The stochastic map describes the jumps from to satisfying the transition probabilities . In order to derive an explicit definition of , we first note that the conditional probabilities are by definition real numbers between 0 and 1 that add up to unity. Next, it us useful to consider the cumulative conditional probabilities defined for and by for . For we put . For the cumulative measures describe the probabilities that transitions from state to one of the states occur at time given that the process has an occupation probability vector at time . The cumulative measures represent a set of monotonically increasing points on the interval such that the interval between two adjacent points and equals . Consequently, the point spectrum in combination with the random variable can be used to construct realizations of the process under consideration. If holds and if falls into the interval , then we put . If we do so, then transitions from to happen with the probability if the process is in state at and distributed like at —as required. A concise mathematical formulation of this idea can be obtained with the help of the indicator function defined by Then, the map can be defined as follows: As indicated in (11), the mapping depends on the probability vector because the indicator function depends on via the conditional probabilities and . Therefore, each realization of the process defined by (8)–(11) depends on the occupation probabilities of the process. The occupation probabilities in turn can be computed from the realizations; see (1). Therefore, the mathematical model given by (1), and (8)–(11) provides a closed (and complete) description for the strongly nonlinear stochastic process described by the nonlinear Markov chain (7).

##### 2.3. Strongly Nonlinear Markov Processes Described by Stochastic Iterative Maps

The realization of a time-discrete stochastic process on an -dimensional, continuous state space is described by a time-dependent state vector with time-dependent components . From (3) it follows that at any time step , the vector is distributed according to the time-dependent probability density: Strongly nonlinear Markov processes defined by stochastic iterative maps with nonparametric white noise satisfy equations of the general form A simplified version of (13) was introduced earlier in Frank [13]. In (13) the vector-valued variable is called the drift function and describes the deterministic evolution of the process. Second, the term in (13) describes the vector-valued fluctuating force of the stochastic process. The random vector with coefficients is an -dimensional normally distributed white noise process. That is, each coefficient is normally distributed at any time step and statistically independent in time. The random coefficients are statistically independent among each other. The variable is an times matrix with coefficients . The coefficients with for a fixed index weight the contributions that the random components have on the evolution of the process in the direction (or along the coordinate) . In particular, for the random coefficient does not occur in the evolution equation of at time . Therefore, may be considered as weight matrix. Importantly, the coefficients measure the strength of the fluctuating force of the stochastic process described by (13). The reason for this is that the random vector is normalized at any time step . Most importantly, in the case of a strongly nonlinear Markov process, the drift function or the fluctuating force or both depend on at least one expectation value or on the probability density as indicated in (13). The operator maps the probability density to a real number or a set of real numbers. This mapping may depend on the value of . For example, the probability density may be evaluated at the value of the process like In this context, it should be noted that the formal dependencies and include the dependencies on expectation values (such as the mean) as special cases. For example, where is the operator symbol for an ensemble average such that denotes the mean of the random vector . Note that (12) and (13) provide a closed description of a stochastic process. Moreover, from the structure of (13) it follows immediately that processes defined by (12) and (13) satisfy the definition of strongly nonlinear stochastic processes presented in Section 1.2. An illustrative example of a stochastic iterative map (13) is the autoregressive model of order 1 subjected to a so-called mean field force proportional to the mean of the process. For , the model reads [13] with , where the parameters and are scalars. We will return to model (16) in Section 4.

##### 2.4. Strongly Nonlinear Markov Processes Described by Nonlinear Master Equations

For time-continuous stochastic processes evolving on discrete state spaces the occupation probabilities are functions of time , where is a continuous variable. Likewise, the probability vector becomes a time-dependent variable with coefficients just as in Section 2.2 but with being a continuous rather than a discrete variable. Time-continuous, strongly nonlinear Markov processes on discrete state spaces can be described by means of nonlinear master equations of the form (see, e.g., [14]) for and with for all states . Here, the variables are transition rates that describe the rate of transitions (number of transitions per unit time) from states to states . We will use the convention for all states . Note that as opposed to the conditional probabilities occurring in a Markov chain, the rates of the master equation are not normalized. Transition rates may depend explicitly on time such that . For strongly nonlinear stochastic processes transition rates do not only depend on the states and and on time but also depend on an expectation value or the occupation probabilities . Note that (17) is the integral form of the master equation. If the integral is evaluated for a small time interval , then the integral version exhibits the structure of a Markov chain (7) with which can be written in a more concise form like Note again that we will use the convention that the diagonal elements of the transition rates equal zero, which implies that for the diagonal elements only the second term in (19) matters as stated in (18). For the sake of completeness, note that the differential version of the master equation (17) reads Just as in Section 2.2, in order to describe the four classes of stochastic processes listed in Table 1 on an equal footing, we consider an alternative description of the process defined by (20). Accordingly, we introduce the flow function that maps initial states defined on the integer set to states : The variable is a random variable that is uniformly distributed on the interval at every time point and is statistically independently distributed across time. In view of the fact that the time-discrete version of the master equation exhibits the structure of a Markov chain, trajectories defined by the flow function can be regarded as time-continuous counterparts to the trajectories of Markov chains as defined by (1), and (8)–(11). This analogy can be exploited to construct iteratively the flow function in the limiting case like with The iterative map involves the indicator function and the cumulative transition probabilities Equations (1), (18), and (22)–(25) provide a closed approximate description of the trajectories of interest. By definition, the description becomes exact in the limiting case of infinitely small time intervals . Note that the conditional (transition) probabilities defined by (18) are in the interval provided that is chosen sufficiently small. In the limiting case this is always the case assuming that the rates are bounded from above. Note also that the probabilities defined by (18) are normalized to 1. Furthermore, from (18), and (22)–(25) it follows that holds for . That is, the iterative map (22) describes transitions from states to states in the small time interval . Most importantly, according to (22), the evolution of realizations depends on the occupation probability vector at every time point . Therefore, processes defined by (1), (18), and (22)–(25) belong to the class of strongly nonlinear stochastic processes (see the definition in Section 1.2).

##### 2.5. Strongly Nonlinear Markov Diffusion Processes Defined by Strongly Nonlinear Langevin Equations and Strongly Nonlinear Fokker-Planck Equations

Realizations of time-continuous stochastic processes evolving on -dimensional, continuous state spaces are described by time-dependent state vectors with time-dependent components . At every time point , the vector is distributed according to the probability density defined by (12), where the variable is the state vector with coordinates . Strongly nonlinear Markov diffusion processes have realizations (or trajectories) defined in the limiting case of infinitely small time intervals by strongly nonlinear stochastic difference equations [9] of the form Carrying out the limiting procedure, (26) can be written in a more concise way in terms of a so-called Ito-Langevin equation: Just as in the case of the stochastic iterative maps discussed in Section 2.3, (26) and (27) exhibit two different components. First, the vector-valued function occurring in (26) and (27) is the drift function introduced in Section 2.3 that accounts for the deterministic evolution of a process. Second, the expressions and , respectively, describe a vector-valued fluctuating force. The variable is the -dimensional random variable introduced in Section 2.3. The function in the Ito-Langevin equation (27) is the counterpart to the random vector in (26) and corresponds to a so-called vector-valued Langevin force [4, 6]. More precisely, each component is a Langevin force. We will consider Langevin forces normalized with respect to a magnitude of 2, like where are realizations of the th component. The variable is the times weight matrix introduced in Section 2.3.

The stochastic process defined by the Ito-Langevin equation (27) can equivalently be expressed by means of a strongly nonlinear Fokker-Planck equation: The evolution equation for is nonlinear for . In contrast, the corresponding evolution equation for the conditional probability density with is linear with respect to and reads For details, see Frank [9]. A complete description of the stochastic process can be obtained either from the realizations defined by (26) and (27) or from the evolution equations (29) and (30) for and . The expression occurring in (29) and (30) is the diffusion coefficient of the process.

A fundamental example of a strongly nonlinear Markov diffusion process is a generalized Ornstein-Uhlenbeck process involving a mean field force proportional to the mean of the process [9, 15, 16]. For , the stochastic difference equation reads with . Typically, the case , is considered. For (32) the strongly nonlinear Langevin equation reads Finally, the corresponding strongly nonlinear Fokker-Planck equation reads The generalized Ornstein-Uhlenbeck process in its form as stochastic difference equation (32) corresponds to a special case of the generalized autoregressive model of order 1 defined by (16). Model (34) has been studied in the context of the Shimizu-Yamada model. We will return to this issue later.

##### 2.6. On a General Stochastic Evolution Equation for Strongly Nonlinear Markov Processes

In the previous sections we considered some benchmark examples of strongly nonlinear stochastic processes belonging to the four classes of processes defined in Table 1. All processes considered in Sections 2.2 to 2.5 satisfy the Markov property. A more general, formal description of strongly nonlinear Markov processes that holds for all four classes reads The additional variable defines the characteristic jumping interval of the process under consideration and can be used to distinguish between time-discrete and time-continuous processes. For we are dealing with time-discrete stochastic processes. In the case of processes defined on discrete state spaces is related to the occupation probability by means of (5) and (6) and (35) reduces to the special case of (8) of the Markov chain model. For processes defined on continuous state spaces and the general form (35) becomes (13) in the case of stochastic processes driven by nonparametric white noise. In the limiting case we are dealing with time-continuous stochastic processes. In this case, the function possesses the property for . For processes that evolve on discrete state spaces and satisfy a nonlinear master equation it follows that (35) becomes the iterative map (22) provided that the argument in is expressed in terms of the occupation probability ; see (6). In contrast, for processes evolving on continuous state spaces and satisfying strongly nonlinear Ito-Langevin equations, the general form (35) reduces to the stochastic difference equation (26) when we consider the limiting case .

#### 3. Time-Discrete Strongly Nonlinear Stochastic Processes on Discrete State Spaces

##### 3.1. The Strongly Nonlinear Two-State Markov Chain Model

A fundamental strongly nonlinear stochastic process is the two-state nonlinear Markov chain process [17]. For the sake of convenience, the states and will be labeled with “” and “.” Transitions from to , to , to , and to occur with conditional probabilities , , , and , respectively. Accordingly, the nonlinear Markov chain equation (7) reads Taking the normalization condition into consideration, (36) can equivalently be expressed by which is a closed nonlinear equation in . Model (37) involves two functions only: the conditional probabilities and which can be chosen independently from each other and in the case of a nonlinear Markov process depend on as indicated in (37). The remaining conditional probabilities can be calculated from and . Model (37) has been examined in physics, social sciences, and psychology as will be discussed next.

##### 3.2. Chaos in Probability: Time-Discrete Case

Following Frank [18], let us put . Then, (37) becomes Although (38) describes a stochastic process, formally (38) exhibits the structure of a nonlinear deterministic map subjected to the constraint that the function must be in the interval at any time step and for any occupation probability . Iterative maps in general are known to exhibit chaos [19]. Therefore, (38) suggests that time-discrete strongly nonlinear Markov processes exhibit chaos as well. Such strongly nonlinear chaotic processes exhibit chaos in the evolution of the occupation probabilities and the conditional probabilities. For example, in Frank [18] the logistic function [19, 20] was considered for which (38) reads For in the function is in the interval such that (39) maps the occupation probability from the interval to the interval as required. For parameters [20] the map exhibits chaotic solutions. That is, evolves in a chaotic fashion. However, as stated previously, not only the occupation probability exhibits chaos, but the conditional probabilities exhibit a chaotic behavior as well which was demonstrated explicitly by numerical simulations [18]. Chaos in strongly nonlinear time-discrete Markov processes is not limited to the two-state models. In fact, the two-state model can be generalized to an -state model. In this case, (38) becomes [18] for . The occupation probability is computed from the normalization condition. Consequently, depends only on the occupation probabilities . For example, the two-dimensional chaotic Baker map [20] was described by means of (40) using [18].

##### 3.3. Social Sciences and Group Decision Making: Time-Discrete Case

Group decision making has frequently been described by nonlinear stochastic processes [21–24]. The reason for this is that under certain circumstances individuals are likely to be affected by the opinion that is presented by a group as a whole rather than by the opinion of individuals. This so-called group opinion in turn is produced by the opinions and decisions of the individual group members. Consequently, the decision making process of an individual is affected by the sample average obtained from the opinions of the others. If the groups can be considered as being relatively large, the sample average may be replaced by the ensemble average. The decision making behavior of an individual is then considered as a realization of a stochastic process that is affected by the ensemble average of the process. In the context of voting behavior, the two-state model introduced in Section 3.1 was applied [17]. The members of the US House of Representatives had the choice to vote for or against passing the so-called bailout bill (US House of Representatives, USA, 2008). The bill failed to pass in the September 29 vote. In a second vote, on October 3, the bill finally passed. That is, a critical number of “No” voters changed their opinions between September 29 and October 3. The states and were labeled by “” and “” representing “Yes” and “No” voters. According to model (37), the conditional probabilities and describing a change in the opinion from “No” to “Yes” and a consistent “Yes” opinion were considered. Using the notation of (37), the voting model proposed in Frank [18] reads The data available in the literature suggested that . For the the function was used to describe the impact of the group opinion on the voting behavior. This relationship assumes that the pressure to change from a “No” voter to a “Yes” voter was high immediately subsequent to the September 29 votes, when the probability at = Sept 29 was not high enough to make the bill pass. In this context, it should be mentioned that the failure of the bill to pass on September 29 triggered a dramatic breakdown of the US stock market. That is, the members of the House of Representatives who had voted with “No” were not only subjected to political pressure but also experienced essential pressure from the economy. For and the stationary occupation probability is given by such that (41) eventually reads The stationary value was estimated from the data. A detailed model-based analysis of the data showed that the members of the two main parties, the Democratic and Republican members, were differently affected by the group pressure.

##### 3.4. Psychology and Human Memory

A pioneering experiment in human memory research is the free recall experiment in which participants are asked to recall items of a particular category [25, 26]. For example, they are asked to recall animal names. Typically, participants are instructed to recall as many items as possible and to do so as fast as possible and without repetition. The total number of recalled items is by definition a monotonically increasing function of time. Strikingly, free recall involves clusters in which item subcategories are recalled more or less as a whole. For example, when recalling animal names a participant may recall a number of insect names in succession. From a theoretical point of view, at any given time point during the free recall process the ability to recall items in clusters depends on the size of the pool of items available for recalling. In other words, if there is a large pool of items that have not yet been recalled, then there is a high chance that a whole item subcategory can be recalled. In line with this argument, a stochastic model for free recall dynamics was developed in which the recall process depends on the size of the pool of items available for recall [27]. More precisely, time was parceled in discrete time intervals in which recall attempts are executed. The aim was to develop a model for the probability that a single item has been or has not yet been recalled at a given time step . To this end, the two-state model of Section 3.1 was applied. Accordingly, the states and correspond to being recalled or being not yet been recalled and are labeled by “” and “,” respectively. The two relevant conditional probabilities are and . By the design of the experiment, we have . Moreover, as argued previously, if the pool of items that have not yet recalled is large, that is, if the probability that an item has not yet been recalled is high, then the conditional probability should be high as well. Likewise, if the probability that an item has been recalled is high, then should be relatively low. Therefore, it was suggested to put with . This function implies that the occupation probability converges to the stationary fixed at which . The full model can be obtained by substituting and into (37) and reads Note that from a mathematical point of view, this model is a special case of model (42) for voting behavior (hint: put in (42)). Since the model describes the single item recall probability, the model must be scaled to the number of items recalled by any given participant. To this end, a parameter describing the maximal number of items for recall was introduced. The unknown parameters and were estimated for free recall data collected in a study by Rhodes and Turvey [28], and the sample mean values (SD = 0.005) and (SD = 138) were found for a sample of eight participants [27]. Note that (43) predicts that the fixed point will never be reached in a finite number of time steps. Consequently, the model predicts that the total number of recalled items at the end of the experiment is smaller than the number of available items for recall. This prediction was confirmed by the data. Moreover, the model-based analysis revealed a statistically significant positive correlation between and for the participant sample studied in Frank and Rhodes [27].

##### 3.5. Biology and Evolutionary Population Dynamics

A primary concern of evolutionary population dynamics is the understanding of the origin of life. Among the many models that have been proposed in this context, the studies by Crutchfield and Görnerup [29] and Görnerup and Crutchfield [30] are of interest for our review. In these studies a number of macromolecules with different functional properties were considered. These macromolecules compete with each other for survival during evolutionary time scales. However, not only competition but also cooperative behavior is taken into account. Roughly speaking, it is assumed that in the early stages of the development of life, macromolecules of different types existed and could change their types due to interactions with other macromolecules of the same or different types. Eventually, only a subset of functionally different types survived the selection process. Let denote the functionally different types of macromolecules and the occupation probability of the th type. Then, at every discrete reaction time step the macromolecules can change their types such that transitions from type to type occur. This so-called metamodel can be formulated in terms of a nonlinear Markov chain model [31]. Accordingly, the conditional (transition) probability describes the probability of a transition from type to type given that a macromolecule is of type at time step . As suggested by the studies of Crutchfield and Görnerup [29] and Görnerup and Crutchfield [30] interactions between two different types of macromolecules can be modeled by assuming that the evolution of occupation probabilities is affected by terms of the form . More precisely, in general, a macromolecule of type may interact with a macromolecule of type and turn into a macromolecule of type . It can be shown that from the metamodel it follows that assumes the form [31] In (44) the parameters are weight coefficients that describe the relevance of interactions between macromolecules of different types. Note that the conditional probabilities satisfy the normalization condition that the sum over all probabilities with index equals 1. This normalization can be guaranteed by requiring that the sum over all weight coefficients with index equals 1. Substituting (44) into (7), the metamodel for evolutionary population dynamics can be cast into a nonlinear Markov chain model and reads It has been shown that the nonlinear Markov chain model (45) is a useful tool for investigating evolutionary population dynamics. First of all, trajectories that describe the evolution of individual macromolecules were computed numerically using (1), and (8)–(11); see Section 2.2. That is, the model allows for generating temporal sequences of type changes for individual macromolecules. Second, it was shown that the degree to which a type is attractive can be quantified. In particular, for evolutionary selection process that become stationary, the attractors of the surviving macromolecule types can be analyzed and distinctions between weakly and strongly attractive types can be made (e.g., see Figure 3 in [31]).

#### 4. Time-Discrete Strongly Nonlinear Stochastic Processes on Continuous State Spaces

##### 4.1. The Generalized Autoregressive Model of Order 1 as a Strongly Nonlinear Markov Process

The generalized autoregressive model of order 1 defined by (16) was studied in Frank [13] as time-discrete version of the time-continuous strongly nonlinear Shimizu-Yamada model; see Section 6. Let us discuss this generalized autoregressive model in more detail. To make this discussion more self-consistent, (16) is presented here once again as (46). Since the random variable exhibits a normal distribution at any time step , the conditional probability density can be calculated that describes the distribution of at time given the value of at the previous time step and the probability density of at time step . Assuming that the variance of equals 2, the solution reads As indicated in (47) the conditional probability depends only on the first moments (i.e., the mean value) of the probability density . By means of the conditional probability density can be computed from like From (46) it follows that the mean value evolves like For the mean value is an exponentially decaying function. For the mean value alternate between negative and positive values but decays in amplitude monotonically. In summary, for the mean value converges to zero in the limiting case . A detailed calculation shows that the second moment evolves like For and the first moment converges to zero which implies that the second moment converges to a stationary value given by such that the variance of the stationary processes is determined by . Let us substitute the variable (which is a normally distributed random variable with variance ) into (40). Then, for and the stationary variance of is given by Not surprisingly, (51) is just the expression for the variance of the ordinary stationary autoregressive processes of order 1. Moreover, for normally distributed initial values , the probability density satisfies a normal distribution for all times . This follows from (47) and (48) or alternatively from the linearity of (46). Consequently, for and the probability density converges in this case to a normal distribution with zero mean value and variance given by (51). In the stationary case, the correlation function for has exactly the same form as the correlation function of the ordinary autoregressive process of order 1 because the mean value equals zero and drops out of (46): In conclusion, for and the generalized autoregressive process (46) behaves in the stationary case exactly as the ordinary autoregressive process (i.e., the process with ). The two processes however exhibit different transient characteristic properties.

##### 4.2. The Generalized Deterministic Autoregressive Process of Order 1 and the Sensitivity of Strongly Nonlinear Processes to Initial Conditions

In the deterministic case, we put in (13) such that Let us consider the case and assume that the drift function depends only linearly on the state . If, in addition, does not explicitly depend on time and is a map (functional or linear operator) that maps to a real number, we arrive at the model In (54) we simplified the notation by introducing the operator that maps to a real number (e.g., calculates expectation values of and then uses them as arguments in a function). Equation (54) may be considered as a generalized deterministic autoregressive model of order 1. Model (54) was studied in detail in Frank [32]. A striking feature of the model is that the stationary probability density if it exists depends crucially on the initial probability density . For sake of readability, let us put . Then, it can be shown that if it exists is a rescaled function of like [32] with the scaling parameter In other words, the strongly nonlinear process (54) might exhibit infinitely many stationary probability densities. This was demonstrated more explicitly for a specific form of which will be discussed next.

##### 4.3. Economics and the Emergence of Stationary Volatilities as a Group Dynamics Process

In economics the standard deviation or the variance of a price of an asset is a measure for how risky the assist is. That is, the variability measures reflect the beliefs of traders how risky it is to hold the asset. Zero variability reflects a risk-free asset. In this context, the variability of a price is also called the price volatility. Volatility is observed by traders and informs individual traders how risky the market as a whole considers a particular asset. On the other hand, the market is composed of individual traders. An autoregressive model of the form (54) can be used to model such a circular relationship. To this end, we assume that the operator calculates the variance of . In this case, the evolution of is determined by the variance of , on the one hand. On the other hand, the variance by definition is calculated from the realization of . Let us consider the following special case of (54) (for details see [32]): with . It can be shown that for the process exhibits stable but not asymptotically stationary probability densities. The shape of these probability densities depends on the initial distributions as discussed in Section 4.2. However, the variance of all stable stationary distributions equals . This is intuitively clear, when we realize that for , (57) becomes the identity . Model (57) also exhibits unstable stationary probability densities. One of these unstable solutions is the Dirac delta function . It can be shown that any process converges to a stationary process with variance provided that the initial probability density of the process has a variance smaller than a certain critical value. For example, if the Dirac delta function is perturbed slightly such that the variance becomes finite but is still close to zero, then the process will not converge back to the Dirac delta function. Rather, the process will converge to a stationary process with variance equal to .

We distinguished previously between stable and asymptotically stable probability densities. Stable but not asymptotically stable stationary solutions have also been called marginally stable stationary solutions. Recently, research has been focused on the importance of marginally stabile stationary solutions for biochemical and biological systems [33–35]. In the context of strongly nonlinear processes, stable but not asymptotically stable probability densities or alternatively marginally stable probability densities refer to processes that exhibit a family of stationary probability densities with two properties [9]. First, by an infinitely small perturbation a particular stationary probability density can be transformed into another stationary probability density of the family of probability densities. Second, the family of probability densities as a whole acts as an attractor. That is, for any perturbation that does not transform a member of the family into another member of the family, the perturbed stationary probability density eventually converges to one of the members of the family of marginally stable probability densities. A simple example can be given for model (57). As mentioned previously, for the marginally stable probability densities with , (57) reads . A shift of the coordinate system is consistent with the identity and does not affect the variance. Therefore, any stationary probability density with can be shifted along the -axis by an infinitely small amount to give rise to another stationary probability density. Clearly, this kind of perturbation will not decay.

In closing the discussion on model (54), let us return to the interpretation of the model as a model for volatility dynamics. The model predicts that the emergence of a stationary volatility in the market is not guaranteed but depends on market parameters (here the inequality must be satisfied) and the initial variance. Furthermore, the model predicts that the price itself is not affected whether or not a stable, stationary volatility measure emerges.

##### 4.4. Phase Synchronization of Inhomogeneous Ensemble of Globally Coupled Oscillatory Units

Phase synchronization of coupled oscillatory subunits has frequently been studied by means of time-continuous strongly nonlinear stochastic model, see Section 6. However, also time-discrete models have been considered. Phase synchronization is a special case of synchronization when the amplitude dynamics is neglected. Accordingly, each oscillatory unit is described by a single real number representing a phase . Let us envision each oscillatory unit as an oscillator evolving along a closed orbit in an appropriately defined state space. Then the phase is the coordinate along that closed orbit. The phase is a periodic variable. The oscillators are de-synchronized if has a uniform distribution on the domain of definition. If exhibits a nonuniform distribution (in particular, a unimodal distribution), then the oscillators are partially synchronized. If for all oscillators, then there is complete synchronization.

Let us consider an all-to-all coupling between the oscillators. In this case, the evolution of the phase of an individual oscillator depends on the evolution of all other oscillator phases. If the set of oscillators is relatively large, the limit of infinitely many oscillators may be considered as a good approximation and the sample of oscillators may be replaced by a statistical ensemble. In this case, the phase dynamics of an individual oscillator depends on the expectation value of the statistical ensemble of oscillators. Moreover, any individual oscillator represents a realization of the ensemble. In summary, a closed description of the phase dynamics in terms of a strongly nonlinear stochastic process can be obtained.

In the deterministic case, we put in (13) such that (13) becomes (53) mentioned in Section 4.2. Without loss of generality, the period can be put equal to unity such that is defined in the interval , and and indicate the same location on the closed orbit. In line with seminal works by Kuramoto [36], Daido [37] proposed an explicit form of (53) that can be written for individual realizations like The variable is the order parameter of the oscillator system. For desynchronized oscillators (i.e., for a uniformly distributed ) we have . If differs from zero, the oscillators are partially synchronized. In (58) the parameter describes the coupling strength between the oscillators. More precisely, it is the coupling between individual oscillators and the mean field force defined by that acts on an individual oscillator and is generated by all oscillators. In (58) the parameter denotes the phase velocity for the uncoupled oscillator. As indicated, is taken from a distribution and differs among the oscillators. Therefore, model (58) describes an inhomogeneous ensemble of globally coupled oscillatory units.

Model (58) is a useful model for studying the emergence of phase synchronization from the perspective of nonequilibrium phase transitions. At a critical value of the coupling parameter , a nonequilibrium phase transition occurs and the probability density bifurcates from a uniform distribution to a nonuniform distribution indicating that the ensemble makes a transition from a desynchronized state to a partially synchronized state [37]. For Lorentzian distributed phase velocities , an analytical expression of the critical coupling parameter can be found [37]. Similar aspects of time-discrete models for synchronizations have been studied in Daido [38, 39] and by Teramae and Kuramoto [40].

Following more recent work [32], the conditional probability density of the phase synchronization model (58) in terms of the so-called Frobenius-Perron operator involving the Dirac delta function can be determined and reads The modulus-1 operator may be eliminated by casting (59) into the form The conditional probability density holds for a particular unit with phase velocity . Let describe the probability density of phase velocities. Then, can at least formally be computed from and via the integral equation Accordingly, stationary solutions satisfy The uniform distribution satisfies (62) for any coupling parameter and consequently is a stationary probability density. However, for oscillator systems with Lorentzian velocity distributions , the uniform distribution is an unstable stationary probability density if the coupling parameter exceeds the critical value which can be shown by numerical simulations [37].

#### 5. Time-Continuous Strongly Nonlinear Stochastic Processes on Discrete State Spaces

Strongly nonlinear stochastic processes evolving on discrete state spaces but exhibiting a continuous time variable have been studied in the context of nonlinear master equations as introduced in Section 2.4; that is, in the context of Markov processes. Frequently, nonlinear master equations have been studied as a tool to derive nonlinear Fokker-Planck equations (see, e.g., [14, 41–44]). However, nonlinear master equations have also been studied in their own merit (see, e.g., [45]).

##### 5.1. Nonlinear Master Equations as a Tool for Identifying the Generic Forms of Nonlinear Fokker-Planck Equations

As mentioned in Section 2.4, transition rates of master equations may depend on occupation probabilities. Curado and Nobre [14] considered only transition rates between neighboring states . In addition, the explicit time dependency of rates was neglected. In this case, (20) reads Note that there are only two types of transition rates. First, there are transition rates that describe the rate of transitions from a level to the next higher level (“upward rates”). Second, there are the transition rates of transitions from a level to the next lower level (“down rates”). Using and , we can re-write (63) as Furthermore, Curado and Nobre [14] assumed that the states represent positions on a one-dimensional grid with spacing . They assumed that the transition rates depend on the spacing but also on the occupation probabilities . More precisely, the model involves the following up- and downrates: In the limiting case of an infinitely small grid (i.e., for ), the master equation (64) with (65) behaves like the nonlinear Fokker-Planck equation of the form (29). The diffusion term of that Fokker-Planck equation is proportional to the probability density . This model, in turn, is a standard model for diffusion in porous media [46, 47]; see Section 6.

##### 5.2. Strongly Discrete-Valued Nonlinear Stochastic Processes Maximizing Generalized Entropy Measures

A key property of stochastic processes described by ordinary master equations such as (20) with rates independent of the occupation probabilities is that the processes approach stationarity in the long time limit [48]. More precisely, processes evolve such that the Kullback distance measure [49] is a monotonically decreasing function of time and approaches a stationary value. The stationarity of the Kullback distance measure indicates that the process under consideration has become stationary. The Kullback distance measure is defined on the basis of the Boltzmann-Gibbs-Shannon entropy, which is a fundamental entropy measure not only for equilibrium systems [50] but also for nonequilibrium systems [51]. It was suggested to exploit this link between the Boltzmann-Gibbs-Shannon entropy, the Kullback distance measure, and the ordinary master equation to define nonlinear master equations based on generalized entropy and generalized Kullback distance measures [9, 45]. Accordingly, entropy measures of the form are considered that involve a kernel function . Furthermore, it is assumed that the kernel satisfies certain properties. The second derivative of the kernel with respect to is negative for any argument in the interval which implies that the entropy is concave [52]. In addition, the first derivative of the kernel with respect to in the limiting case goes to plus infinity (i.e., exhibits a singularity). This property is important for the master equation that will be defined in the following and guarantees that occupation probabilities solving the master equation do not become negative. The following master equation has been proposed [45]: The nonlinearity in the master equation (67) is conceptually different from the nonlinearity in the master equation (20). While in (20) it is assumed that the transition rates depend on occupation probabilities, in (67) the transition rates are independent of the occupation probabilities. However, transitions are not proportional to the occupation probabilities as in (20). Rather, they are proportional to the terms , whose explicit form depends on the entropy kernel . In view of this property, transitions are entropy driven. It can be shown that stochastic processes satisfying (67) converge to occupation probabilities that maximize the entropy measures . Therefore, we may say that the transitions of processes defined by (67) are proportional to an entropy drive that maximizes the entropy under consideration. Note that for the special case of the Boltzmann-Gibbs-Shannon entropy the function reduces to the identity , which implies that (67) includes the ordinary, linear master equation as special case.

In closing our considerations on the nonlinear master equation (67), it is useful to note that formally (67) can be cast into the form of the nonlinear master equation (20) irrespective of any conceptual differences. If we put for , then model (67) assumes the form of (20). Note that in the limiting case , we have because of the aforementioned assumption that holds for . This, in turn, implies that for can be defined as the product of and the limiting case “0/0,” where the ratio “0/0” has to evaluated for any given entropy kernel . The relation (68) is useful because it can be used to compute realizations of the nonlinear master equation (67) from (1), (18), and (22)–(25).

##### 5.3. Physiological, Partitioned (Discrete-Valued) Signals and Strongly Nonlinear Stochastic Processes Exhibiting Stationary Cut-Off Distributions

It has been argued that any probability density with tails that extend to infinity fails to capture the boundedness of physiological signals [53]. In other words, while probability densities with tails that extend to infinity such as the normal distribution are crucially important for developing theory and for modeling and are frequently good approximations, they imply that signals can assume arbitrarily large values in magnitude. This is not plausible. Physiological signals such as muscle force produced by human and animals, limb velocities, limb accelerations, blood flow velocities, heart rates, electroencephalographic recorded brain signals, pulse rates of neurons, and dendritic currents are bounded and cannot exceed certain maximum values. Taking an information theoretical point of view [51], stochastic processes describing physiological signals may maximize information or entropy measures. However, they cannot maximize the Boltzmann-Gibbs-Shannon measure under ordinary constraints because this measure yields normal distributions or other distributions with tails that extend to infinity. In contrast, physiological signals may maximize generalized information and entropy measures that exhibit the so-called cut-off distributions as maximum entropy distributions. This argument was developed for stochastic processes defined on continuous state spaces [53] but holds for processes evolving on discrete-state spaces as well. In particular, signals from sensory systems are known to be partitioned in a certain way. Differences between two magnitudes can only be detected if they exceed certain thresholds. For this reason, modeling of sensory physiological signals and physiological signals in general may assume that the state space of consideration is of discrete nature.

In Frank [45] the coordination of rhythmic movements of two limbs has been addressed exploiting the master equation model (67). It has been assumed that the physiological relevant variable, namely, the relative phase between the limbs, is appropriately described on a partitioned (i.e., discrete-value) state space. Furthermore, it has been assumed that the coordination dynamics corresponds to a stochastic process maximizing an entropy measure that exhibits a maximum entropy cut-off distribution. Out of all possible entropy measures, the entropy measure, nowadays called Tsallis entropy, has been used for that purpose [54–56]. The model can be considered as a modified version of the seminal coordination model by Haken et al. [57]. The benefits of the nonlinear master equation model (67) relative to the original Haken-Kelso-Bunz model are twofold. First, model (67) offers a consistent approach that addresses physiological data within the framework of cut-off distributions. Second, the model predicts that coordination is bistable in a distributional sense. That is, under certain conditions the model exhibits two stationary distributions . This feature is in line with experimental observations. For human coordination models with continuous state spaces similar attempts have been made to deal with bistability in a distributional sense [58, 59].

#### 6. Time-Continuous Strongly Nonlinear Stochastic Processes on Continuous State Spaces

There is a relatively large literature on time-continuous strongly nonlinear stochastic processes defined on continuous state spaces. Most of these studies are concerned with the Markov diffusion processes. Reviews can be found in Frank [9, 60]. Strongly nonlinear phase synchronization models based on the Kuramoto model [36] are reviewed in Acebrón et al. [61]. In this section, some of applications in physics and the life sciences will be highlighted without going into much detail.

##### 6.1. Chaos in Probability: Time-Continuous Case

In Section 3.2 strongly nonlinear time-discrete stochastic processes were considered that exhibit occupation probabilities that evolve in a chaotic fashion. As an example, the logistic map model defined by (39) was discussed. A key feature of this model and of similar models is that the chaotic behavior arises from the coupling of the dynamics of the realizations to the occupation probabilities. In particular, without this coupling equation (39) reads with fixed parameters taken from the interval and clearly does not exhibit a chaotic solution. Chaos is due to the fact that transition probability is replaced by a transition probability that depends on the process occupation probabilities like . That is, probability measures of strongly nonlinear processes can exhibit chaotic behavior due to the very nature of strongly nonlinear processes, which is the influence of a probability measure on realizations from which the probability measure is calculated. Such a circular causality structure may be realized in many-body systems composed of interacting subsystems. In such systems, chaos may emerge due to the coupling of the single subsystem dynamics to the relative frequency with which states are occupied by subsystems. Chaos emerges even if the isolated subsystem dynamics is not chaotic [18]. Consequently, chaos may be considered as a self-organization phenomenon. Recall that self-organizing phenomena in general are emerging properties of many-body systems that do not exist on the level of the subsystems [62]. In our context, the many-body system as a whole behaves qualitatively differently from the individual, isolated parts. The catchphrases “the whole is different from the sum of its parts” or “more is different” apply [63].

This feature of strongly nonlinear stochastic processes has also been demonstrated for time-continuous models involving continuous state spaces [64–67]. Subsystems described by ordinary stochastic differential equations that do not exhibit chaotic solutions have been coupled together and the limiting case of a many-body system composed of infinitely many subsystems has been considered. The limiting case models were described by strongly nonlinear stochastic processes in the sense defined in Section 1.2. In particular, in these studies the dynamics of realization was coupled to certain expectation values of the respective processes. It was found that the expectation values under appropriate conditions exhibit a chaotic dynamics.

##### 6.2. Plasma Physics and the Landau Form of Strongly Nonlinear Fokker-Planck Equations

Plasma physics is concerned with the evolution of many-particle systems composed of charged particles. The particles can interact with each other in various ways. Two interaction forms are particle-particle collisions and Coulomb interaction. The Landau form of the Fokker-Planck equation accounts for these two interactions and describes the evolution of the particle density in the three-dimensional velocity space. That is, let denote the velocity vector of a particle. Let denote the particle mass density at time . Assuming that particles are neither created nor destroyed, the total mass is invariant over time. The particle probability density can be introduced. According to the Landau form, the probability density satisfies a strongly nonlinear Fokker-Planck equation of the form (29). More precisely, the Landau form reads [68–74] Stationary solutions of (69) have been studied, for example, by Soler et al. [75]. In addition, several studies have been concerned with the derivation of transient solutions using different (semi)numerical approaches [74, 76–79].

##### 6.3. Astrophysics: Stellar Cut-Off Mass Distributions Defined by Strongly Nonlinear Stochastic Models

Astrophysical mass distributions may exhibit relative sharp boundaries. More precisely, particle distributions in the stellar space may be spatially bounded due to gravity. The gravitational forces are produced by the mass distributions themselves. That is, the dynamics of an individual particle of a mass distribution is affected by the particle distribution as a whole. In turn, the distribution is composed of its individual particles. In view of this circular causality, it does not come as a surprise that strongly nonlinear stochastic models have been investigated that describe cut-off distributions of stellar particles [80–84]. The models typically assume the form of the strongly nonlinear Fokker-Planck equation (29).

Two more comments may be appropriate. First, in Section 5.3 we addressed cut-off distributions in the context of physiological signals. However, the motivation in Section 5.3 for considering cut-off distributions was quite different from the argument used in astrophysical applications. Second, in addition to the aforementioned studies on strongly nonlinear processes describing astrophysical cut-off mass distribution, strongly nonlinear stochastic processes satisfying the Landau form (69) have also been used for studying astrophysical problems [85, 86].

##### 6.4. Accelerator Physics: Shortening of Bunch-Particle Distribution of Particle Beams

Particles in accelerator beams can travel in localized cluster of particles. In a longitudinal beam, the so-called bunch-particle distribution is a mass density that describes the distribution of particles with respect to the center of the cluster. Let denote relative position of a particle with respect to the bunch center. Typically, the particle dynamics also depends on the particle energy. That is, beam particle dynamics involves a second coordinate which is a rescaled energy measure (rather than a velocity or momentum variable). Dividing the mass density with the total mass, the bunch-particle distribution becomes a probability density of the two-dimensional vector . Particles in accelerator beams are charged particles that are accelerated by means of electromagnetic forces. Although particles may interact with each other by means of Coulomb forces, the crucial force that determines the shape and length of a cluster is the so-called wake-field [87, 88]. The wake-field is an electromagnetic field generated by the whole bunch of particles that travels ahead of the bunch but is reflected at the accelerator walls and then acts back on the particles of the bunch. The wake-field can cause a shortening of the particle bunch. That is, under the impact of the wake-field the cluster is smaller in size than it would be theoretically in the absence of the wake-field. The understanding of bunch shortening is an important step towards an understanding of beam instabilities that should be avoided.

The dynamics of bunch particles is typically described by strongly nonlinear Markov diffusion processes. The reason for this is that the dynamics of individual particles is affected by the wake-field which can be calculated from an appropriate expectation value of the bunch particle probability density . As a result, in various studies it has been proposed that satisfies a strongly nonlinear Fokker-Planck equation of the form (29) (see, e.g., [89–98]). A useful model, in particular for the purpose of theory development, is the Haissinski model [95, 96, 99–104] for which analytical solutions of the reduced density can be derived. In particular, according to the Haissinski model, the dynamics of single particles satisfies a strongly nonlinear Langevin equation (27), which reads [100] where describes a phenomenological damping constant. For the Haissinski model the wake-field function assumes the form of a Dirac delta function: . The parameter measures the strength of the impact of the wake-field. For the width of the reduced probability density becomes smaller when increasing the magnitude of the impact of the wake-field. In doing so, model (70) provides a dynamic account for the squeezing effect of wake-fields on particle clusters traveling in accelerator beams.

##### 6.5. Physics of Liquid Crystal Phase Transitions from the Perspective of Strongly Nonlinear Stochastic Processes

Liquid crystals typically consist of rod-shape macromolecules that interact with each other by means of weak Van-der-Waals forces. Due to this interaction, molecules can align themselves such that the molecules point with their primary axes more or less in the same direction. The physical properties of a crystal with aligned molecules are qualitatively different from the properties that the crystal exhibits when the macromolecules are isotropically (randomly) distributed. Therefore, liquid crystals exhibit two phases: an ordered phase with molecules that are aligned at least to some degree, which is called the nematic phase, and a disorder phase with molecules pointing in random directions, which is called the isotrope phase. A phase transition from the isotropic to the nematic phase occurs when the temperature is decreased below a critical temperature [105–108]. The degree of orientational order can be captured by the Maier-Saupe order parameter [102, 105–111]. For the isotropic phase the order parameter equals zero. In the nematic phase, the Maier-Saupe order parameter is larger than zero.

Stochastic models describing the emergence of orientational order (i.e., isotropic-nematic phase transitions of liquid crystals) exploit the notion that the order parameter itself expresses the magnitude of an overall force that acts on individual molecules and tends to align them with the mean orientation of the liquid crystal macromolecules. The model is based conceptually on the notions of circular causality and self-organization: individual molecules are affected by the order parameter and at the same time constitute the order parameter. Hess [112] as well as Doi and Edwards [113] have proposed a strongly nonlinear stochastic process describing the rotational Brownian motion of the molecules under the impact of the (self-generated) order parameter. The Hess-Doi-Edwards model exhibits the structure of a strongly nonlinear Fokker-Planck equation (29) and can be cast into the form of a strongly nonlinear Langevin equation (27) (see, e.g., [114]). Exploiting the concept of strongly nonlinear stochastic processes, the martingale for the Hess-Doi-Edwards model can be defined [60].

Transient solutions of the Hess-Doi-Edwards model may be computed from approximate coupled moment equations [115]. Using Lyapunov’s direct method [62, 116], it can be shown that the ordered state exhibits an asymptotically stable probability density [114]. Importantly, since stochastic models provide a dynamic account, it is possible to study response functions of liquid crystals. Transient responses describing the relaxation to equilibrium [117, 118] as well as correlation functions and mean square displacement functions in the stationary case [114] can be determined at least semi-numerically.

Beyond the application of the concept of strongly nonlinear stochastic processes to the Hess-Doi-Edwards Fokker-Planck equation, in general, strongly nonlinear stochastic models have turned out to be useful models in liquid polymer physics (see, e.g., [119–121]).

##### 6.6. Classical Descriptions of Quantum Systems by means of Strongly Nonlinear Stochastic Processes

The relaxation towards equilibrium of quantum mechanical Fermi and Bose systems can be modeled using a classical approach by means generalized Boltzmann equations that exhibit appropriately modified collision integrals [122–125]. Applying a diffusion approximation to these collision integrals, such quantum mechanical Boltzmann equations reduce to Fokker-Planck equations that are nonlinear with respect to their probability densities [122–124]. In addition to this approach via the Boltzmann equation, alternative derivations of classical quantum mechanical Fokker-Planck equations that are nonlinear with respect to probability densities have been proposed [126–133]. A key property of these models is that they exhibit stationary probability densities that correspond to Fermi or Bose distributions. Interpreting these models in terms of strongly nonlinear Fokker-Planck equations of form (29) allows us not only to consider the evolution of probability densities but also to examine predictions of semiclassical stochastic processes for quantum systems. For example, solutions of trajectories computed from the corresponding strongly nonlinear Langevin equations of form (27) have been studied [126, 127]. In particular, the relaxation of the free electron gas towards its stationary Fermi energy distribution can be determined by computing numerically individual electron trajectories in the relevant energy space [9, 126]. Moreover, martingales for quantum processes within this classical Langevin approach can be defined [60].

Finally, note that classical stochastic descriptions of Fermi and Bose systems do not necessarily involve strongly nonlinear stochastic processes. It has been shown that ordinary Fokker-Planck equations with appropriately defined drift functions can also exhibit Fermi and Bose distributions as stationary probability densities (see, e.g., [134]).

##### 6.7. Material Physics of Porous Media

Gas particles and fluids diffusing through porous media satisfy a nonlinear diffusion equation, frequently called the porous media equation. In one spatial dimension, the porous medium equation for the particle mass density reads [3, 9, 46, 47, 135] with . Dividing the mass density by the total mass , (71) becomes an evolution equation for a probability density normalized to unity with , and assumes the form of the nonlinear Fokker-Planck equation (29). The connection between the porous medium equation and nonlinear master equations has been addressed in Section 5.1 (see the aforementioned). Equations (71) and (72) exhibit exact time-dependent solutions, frequently called Barenblatt-Pattle solutions [136, 137].

The parameter captures the nonlinearity of the equation. For (linear case) the porous medium equation reduces to the ordinary diffusion equations and the variance of the diffusion process (or the second moment of assuming a zero first moment) increases linearly with time . In contrast, for and (nonlinear case) the variance increases as a nonlinear function of like with as can be shown by various methods [9, 138–140].

Interpreting (72) in terms of a strongly nonlinear Fokker-Planck equation (29), trajectories of realizations can be computed from the corresponding strongly nonlinear Langevin equation [138]. For a generalized version of (72) involving a drift force such a Langevin equation approach has been exploited to derive analytical expressions of certain autocorrelation functions [141].

The porous medium equation in its form as a nonlinear Fokker-Planck equation (72) plays an important role in the theory development of nonextensive thermostatistics. As it turns out, a particular generalization of (72), the so-called Plastino-Plastino model, exhibits stationary solutions that maximize the nonextensive entropy proposed by Tsallis. We will return to this issue later.

##### 6.8. Material Physics and the Liouville Dynamics of Many-Body Systems with Interacting Subsystems

The particle distribution of a many-body system in the overdamped, deterministic case satisfies a Liouville equation. For many-body systems with interacting subsystems the Liouville equation involves an integral term describing the interaction force on a single subsystem. This interaction integral may be evaluated using a diffusion approximation. In doing so, the Liouville equation assumes the form of the nonlinear Fokker-Planck equation (29) when considering the probability density rather than the subsystem density. The nonlinearity occurs in the diffusion term. This approach has been developed in a series of studies by Andrade et al. [142] and Ribeiro et al. [143, 144]. For example, the distribution functions of interacting particles in dusty plasmas, interacting vortices, and interacting polymer chains in complex fluids have been studied within this framework.

Note that as mentioned previously the dynamics of the subsystems is deterministic. Nevertheless, the effective model for the subsystem probability density involves a diffusion term. That is, according to the effective, approximative model in terms of a nonlinear Fokker-Planck equation, due to the interaction between the subsystems the many-body system as a whole generates a fluctuating force that acts on the individual subsystems of the many-body system.

##### 6.9. Nonextensive Physical Systems and the Plastino-Plastino Model

Tsallis (Tsallis, [54, 55]; see also Abe and Okamoto [56]) introduced in physics a generalized form of the Boltzmann-Gibbs-Shannon entropy, nowadays frequently called the Tsallis entropy. The Tsallis entropy exhibits a parameter . For the entropy reduces to the Boltzmann-Gibbs-Shannon entropy capturing properties of extensive systems. For the Tsallis entropy describes nonextensive systems. A. R Plastino and A. Plastino [145] proposed a nonlinear Fokker-Planck equation of the form (29) that may be written like The model describes the probability density of a particle living in a one-dimensional continuous state space given by the whole real line. The term involving a force function in (73) is linear with respect to the probability density . The diffusion term of the model involves the parameter , which using the notation suggested by (73) corresponds to the parameter of the Tsallis entropy (see Frank [9]). Note that in the original model a slightly different notation was proposed [145]. A key feature of the Plastino-Plastino model (73) is that stationary probability densities of (73) maximize the Tsallis entropy. For , that is, in the special case in which the Tsallis entropy reduces to the Boltzmann-Gibbs-Shannon entropy, the model is linear with respect to the probability density . For , that is, for the general case that the Tsallis entropy describes a nonextensive system, the diffusion term is nonlinear with respect to . In view of these observations, the Plastino-Plastino model establishes a connection between nonextensivity and nonlinearity.

The Plastino-Plastino model (73) has been examined in various studies. In particular, it has frequently been pointed out that (73) may be considered as a generalized version of the porous medium model (72) for gas and fluid flows that are subjected to an additional driving force captured by the force function .

Exact analytical solutions for the time-dependent probability density are available in the case of a linear drift force [140, 145, 146]. Trajectories describing the stochastic dynamics of individual particles can be computed when interpreting (73) as a strongly nonlinear Fokker-Planck equation by means of the corresponding strongly nonlinear Langevin equation [138, 141, 147]. In this case, second order statistics measures such as autocorrelation functions can be determined [141] and martingales can be found [60]. Exact time-dependent solutions have been derived for generalized versions of model (73) that account for source terms [148–150]. The Kramers escape rate has been determined for processes described by the Plastino-Plastino model involving an appropriately defined drift force [151]. The multivariate generalization of (73) with [139, 152, 153] and without [129, 154] time-dependent coefficients has been considered. The Plastino-Plastino model (73) with explicit state-dependent diffusion coefficients has been studied [153, 155]. The model has been generalized for the Sharma-Mittal entropy that involves two parameters and includes the Tsallis entropy as a special case [156]. Interestingly, H-theorems have been found that show that transient solutions of (73) converge to stationary probabilities densities provided that is chosen appropriately [122, 157–164].

##### 6.10. Oscillatory Coupled Nonequilibrium Systems: Canonical-Dissipative Approach

Coupled oscillators with short-range interactions can be studied within the framework of strongly nonlinear stochastic processes [165]. In this study isolated oscillators were modeled using the canonical-dissipative approach [166–168] that allows to exploit the concepts of Hamiltonian and Newtonian dynamics, on the one hand, and to address nonequilibrium systems such as self-excited limit cycle oscillators, on the other hand.

##### 6.11. Phase Synchronization and Kuramoto Model: Time-Continuous Case

Synchronization is a phenomenon studied in various disciplines ranging from physics to the social sciences [169]. In the context of strongly nonlinear stochastic processes, we are primarily concerned with the synchronization of a large ensemble of oscillatory subunits. Synchronization can be studied both in the phase space of the oscillators (e.g., the space spanned by position and momentum variables) and in reduced spaces. An example for the latter case was discussed already in Section 4.4: phase synchronization. In fact, we will focus in this section on the phenomenon of phase synchronization.

A benchmark model that describes the emergence of phase synchronization in a population of phase oscillators is the Kuramoto model [36]. Accordingly, each oscillator is described by a phase that evolves on a continuous time scale and represents a periodic variable. Each oscillator is coupled to all remaining oscillators and the limiting case of a group of infinitely many oscillators is considered. The oscillators as a whole generate an order parameter, which is a complex-valued variable. The complex-valued order parameter has an amplitude, the so-called cluster amplitude , and an angle, the so-called cluster phase . Both cluster phase and cluster amplitude are measures defined in circular statistics (see, e.g., [36, 170, 171]). Mathematically speaking, for an oscillator phase defined on the interval the complex order parameter is defined by the expectation value where is the imaginary unit (i.e., the square root of −1) and the cluster phase is chosen such that in any case. Note that both and are time-dependent variables.

The order parameter induces a force that acts on the individual phase oscillators. That is, the oscillators interact with each other indirectly via the self-generated order parameter. The population of phase oscillators can be regarded as a statistical ensemble. Accordingly, the order parameter can be computed as an expectation value from the probability density , that is, the probability density of the phase of an arbitrarily chosen phase oscillator. Since the order parameter acts back on the realizations (individual phase oscillators), the model satisfies the definition of a strongly nonlinear stochastic process provided in Section 1.2. The strongly nonlinear Langevin equation of the Kuramoto model reads and assumes the form (27). In (75), the parameter measures the strength of the force induced by the order parameter .

The uniform probability density describing a desynchronized oscillator population is a stationary solution for any parameter because for the cluster amplitude equals zero. However, only for the uniform distribution is asymptotically stable. For the uniform distribution is unstable meaning that for any initial condition that slightly deviates from the uniform probability density the strongly nonlinear stochastic process will not converge to a stationary process with uniform probability density. Rather, for the Kuramoto model exhibits stable, stationary unimodal distributions [9, 36]. These unimodal probability densities indicate that the oscillator population is partially synchronized. Moreover, the order parameter amplitude is larger than zero for these unimodal stationary probability densities. The unimodal probability densities are stable but not asymptotically stable (see, e.g., [9]). A shift of such a stationary probability density along the -axis transforms a member of the family of stable stationary probability densities into another member of that family. This feature becomes intuitively clear when we realize that the form of (75) is invariant against transformation and , where is an arbitrarily small real number. We may use the term “marginally” stable to characterize the stability of these stationary probability densities (see also Section 4.3).

The Kuramoto model has been reviewed in Strogatz [172] and Acebrón et al. [61]. In particular, there exists an H-theorem of the Kuramoto model showing that transient probability densities converge to stationary ones [173]. This H-theorem is related to a free energy approach to the Kuramoto model (Frank [174]; see also Frank [9]).

A key question in the context of the Kuramoto model concerns the bifurcation from the desynchronized state to the synchronized state for oscillator populations with inhomogeneous eigenfrequencies. In this case, (75) may be written for realizations of the process like where is the phase velocity of the th realization of the process. If we restrict our considerations to symmetric distributions of velocities and to symmetric initial probability densities, then the symmetry property remains conserved during the evolution of the process and the cluster phase can assume only one of the two values or . Moreover, the order parameter becomes a real-valued variable. It can be shown that in this case (76) reduces to Comparing (77) with the Daido model (58) discussed in Section 4.4, it becomes at this stage clear that the Daido model (58) can be considered as a time-discrete counterpart of the time-continuous, fully symmetric Kuramoto model (77) with distributed eigenfrequencies. Note that the Daido model exhibits phases defined on the interval , whereas in the context of the Kuramoto model typically phases defined on the interval are considered.

As mentioned previously, reviews for the Kuramoto model are available in the literature and the reader may consult these works for more details ([61, 172]; see also Tass [175] and Section 7.5 in [9]).

##### 6.12. Biology: Population Dispersion and Population Dynamics

The spatial distribution of insects forming swarms may be described in terms of cut-off distributions. For example, the insect density has been proposed [176], where the coordinate x measures the location of an insect with respect to the center of the swarm and . The operator corresponds to the identify for positive arguments and equals zero for negative arguments (i.e., for and otherwise). Normalizing by the total mass , a stochastic model for the probability density needs to explain the emergence of a stationary insect cut-off probability density. In fact, to this end, a model similar to the Plastino-Plastino model (73) with an appropriate drift term was proposed [176, 177]: with . The stationary probability is obtained for . However, exponents different from have also been proposed [178, 179]. Moreover, descriptions of population dynamics in terms of nonlinear Fokker-Planck equations alternative to (78) have been considered in the literature as well ([81, 168]; see also the Shigesada-Teramoto model in Okubo and Levin, [176], Section 5.6).

##### 6.13. Social Sciences and Group Decision Making: Time-Continuous Case

Group decision making has already been addressed in Sections 3.3 and 4.3 in the context of time-discrete models. In a series of studies, Boster and coworkers developed the so-called linear discrepancy model of group decision making and compared predictions with experimental data [180–182]. In particular, the model accounts for a key phenomenon in the social sciences: the risky shift. A risky shift occurs when people arrive at riskier decisions when discussing a given topic in a group than when making their own decisions individually. According to the linear discrepancy model due to discussions the opinion in favor or against a particular issue shifts by an amount that is proportional to the opinion that an individual holds and the opinion that is presented in the group by another individual. The linear discrepancy model predicts that a risky shift can be observed when the initial distribution (i.e., the prediscussion distribution) of opinions is skewed. Accordingly, during the discussion session, the opinion distribution tends to become more symmetric, which is accompanied with a shift of the mean value. Some of the mathematical properties of the linear discrepancy model have been studied in more detail within the framework of a strongly nonlinear stochastic process [183]. Interestingly, the stationary, symmetric probability densities describing the distribution of opinions among the group members are only stable and not asymptotically stable. In particular, a shift along the opinion axis transforms one member of the family of stable, stationary probability density into another member. In this regard, the group decision making model exhibits the same feature as the Kuramoto model.

##### 6.14. Finance and Option Pricing

A stochastic option pricing model has been developed on the basis of the Plastino-Plastino model (73). In particular, the option pricing model exhibits a diffusion term similar to the one of the Plastino-Plastino model [184–186]. A particular benefit of the model is that it features non-Gaussian distributions. This is due to the nonlinearity of the diffusion term (or the nonextensivity of the Tsallis entropy measure involved in the definition of the process). In fact, non-Gaussian distributions frequently fit financial data better than Gaussian distributions.

##### 6.15. Economics and Modeling the Dynamics of Target Leverage Ratios

The total capital (firm value) of a company is typically composed of borrowed money and equity. The leverage of a company or the leverage ratio is the ratio of borrowed money relative to equity. A company needs to decide what leverage ratio the company should have. The decision is not necessarily made once and for all. Rather, the desired leverage ratio (i.e., the target leverage ratio) may be updated dynamically depending on the internal and external circumstances. For this reason, the leverage ratios of companies frequently correspond to dynamic variables subjected to fluctuations.

Lo and Hui [187] proposed a strongly nonlinear stochastic model for the dynamics of the leverage ratio. The model is based on a model developed earlier by Hui et al. [188] that involved an explicitly time-dependent function. In the strongly nonlinear stochastic model, this function is eliminated, and in this sense the strongly nonlinear stochastic model can be regarded as being closed. In order to achieve this closure, Lo and Hui [187] assumed that the leverage dynamics of a company is affected by the leverage ratios of similar companies. Within the framework of strongly nonlinear stochastic processes this implies that the leverage ratio dynamics of companies depends on an expectation value of the leverage ratio process. Formally, the model by Lo and Hui is closely related to the Shimizu-Yamada model that will be discussed later.

##### 6.16. Engineering Sciences: Generators for Noise Sources

In engineering sciences and related disciplines a common problem is to simulate numerically signals of noise sources. These signals can then be used to test the response of engineered systems to random signals emerging from noise sources. A characteristic feature of noise sources is that they are stationary. In view of this property, strongly nonlinear stochastic processes can be defined which for any initial probability density are stationary [147, 189, 190]. For example, the strongly nonlinear Langevin equation with , corresponds to a nonlinear Fokker-Planck equation of the form (29) whose right-hand side is by definition equal to zero [9, 147]. Consequently, the Langevin equation defines for any initial probability density a stochastic process with a stationary probability density . The parameters , , can be chosen in order to select a desired correlation time of the process.

##### 6.17. Further Benchmark Models

###### 6.17.1. The Shimizu-Yamada Model

The Shimizu-Yamada model is motivated by research on muscle contraction (Shimizu [191], and Shimizu and Yamada [192]; see also Section 5.5.4 in [9]). The Shimizu-Yamada model corresponds to the generalized Ornstein-Uhlenbeck process that is defined by (33) and involves a mean-field force proportional to the mean value of the process. The Shimizu-Yamada model is highly relevant for the theory development of strongly nonlinear Markov diffusion processes because it can be solved exactly. That is, exact transient solutions can be found and exact solutions for the conditional probability density are available. The conditional probability density, in turn, can be used to calculate all correlation functions of interest both in the transient and in the stationary cases. For details, see Frank [9, 193]. Since the model exhibits exact solutions, it has also been used to test the accuracy of numerical algorithms for solving nonlinear Markov diffusion processes [16].

###### 6.17.2. The Desai-Zwanzig Model

The Desai-Zwanzig model involves a mean-field force proportional to the mean value of the process just as the Shimizu-Yamada model. However, the individual, isolated subsystems are assumed to perform an overdamped motion in a double-well potential [194, 195]. The strongly nonlinear Fokker-Planck equation of the Desai-Zwanzig model reads