Abstract

In a previous work (S. Fiori, 2006), we proposed a random number generator based on a tunable non-linear neural system, whose learning rule is designed on the basis of a cardinal equation from statistics and whose implementation is based on look-up tables (LUTs). The aim of the present manuscript is to improve the above-mentioned random number generation method by changing the learning principle, while retaining the efficient LUT-based implementation. The new method proposed here proves easier to implement and relaxes some previous limitations.

1. Introduction

Random numbers are currently used for a variety of purposes such as: cryptographic keys generation, games, some classes of scientific experiments as well as “Monte Carlo” methods in physics and computer science [16]. Standard programming environments are endowed with basic pseudorandom signal generators such as the uniform and the Gaussian ones, while usually the needed distributions are more involved than uniform/Gaussian. A simple example of application is password generation: a random password generator is a software that inputs from a random or pseudorandom number generator and automatically generates a password. An example of application where involved probability distributions are needed is in independent component analysis (ICA, [7]) testing: as the behavior of an ICA algorithm might depend on the statistical distribution of the sources, ICA-algorithm testing tools might require random sequences generators capable of producing random numbers distributed according to involved probability laws.

The principal methods known in the literature to obtain a batch of samples endowed with an arbitrary distribution from a samples batch having a simple distribution are the “transformation method” and the “rejection method” [8]. In the present paper, we focus on the transformation method, which may be well implemented through a tunable neural system, because the availability of a random number source and of a tunable nonlinear system, along with a proper learning procedure, allows obtaining a wide class of pseudorandom signal generators.

A well-known effect of nonlinear neural systems is to warp the statistical distribution of its input. In particular, we assume that the system under consideration has a nonlinear adaptive structure described by the transference 𝑦=𝑓(𝑥), where 𝑥𝒳 denotes the system input random signal, having probability density function 𝑝𝑥(𝑥), and 𝑦𝒴 denotes the output signal, having probability density function 𝑝𝑦(𝑦), as shown in Figure 1. In the hypothesis that the neural system transference is strictly monotonic, namely 𝑓(𝑥)>0, for all 𝑥𝒳, the relationship between the input distribution, the output distribution, and the system transfer function is known to be [9] 𝑝𝑦𝑝(𝑦)=𝑥(𝑥)𝑓|(𝑥)𝑥=𝑓1(𝑦),𝑥𝒳,(1) where 𝑓1() denotes the inverse of function 𝑓(). Usually, (1) is interpreted as an analysis formula, which allows computing the output distribution when the input distribution and the system transference function are known. However, the cardinal equation (1) may also be interpreted as a formula that allows for designing the nonlinear system when the distribution 𝑝𝑥() is known and it is desired that the system responds according to a desired distribution 𝑝𝑦(). In fact, (1) may be rewritten as the differential equation: 𝑓𝑝(𝑥)=𝑥(𝑥)𝑝𝑦𝑓(𝑥),𝑥𝒳.(2) In general, such design operation is rather difficult, because (2) in the unknown 𝑓() involves the solution of a nonlinear differential equation, provided that a consistent boundary condition is specified.

In the recent contribution [10], we presented a pseudorandom samples generator based on a nonlinear monotonic neural system, whose transference function is denoted by 𝑓(), tuned on the basis of the differential equation (2). The cardinal design equation (2) was proposed to be solved via a (relaxation-type) fixed-point algorithm. The key advantages of the method proposed in [10] are as follows. (a) In order to obtain a fully-tunable neural transference function, a look-up-table representation was chosen. It guarantees high flexibility in the shape of the neural transference as well as easiness of representation and handling of the involved quantities. (b) The fixed-point learning algorithm exhibits fast convergence over other possible methods such as the gradient-based one: unlike these methods, the fixed-point learning algorithm does not require the computation of derivatives of the involved functions.

The resulting random-number generation method should be thus read as a two-stage procedure. The first stage consists in solving the cardinal differential equation (2) in the unknown function 𝑓(), given the distributions 𝑝𝑥() and 𝑝𝑦() as data. The second stage consists in generating input random samples drawn from the distribution 𝑝𝑥(), then letting such random samples pass through the learnt nonlinear neural system by computing output values 𝑦=𝑓(𝑥). The random samples 𝑦 are assured to be distributed according to the probability density function 𝑝𝑦().

However, we recognized that the method presented in [10] also suffers from some drawbacks, namely the following. (a) For numerical convergence purpose, each step of the fixed-point-type tuning algorithm needed to be followed by some normalization steps. Namely, from (2), it is easily seen that when the function 𝑝𝑦(𝑓(𝑥)) approaches 0, the computation of 𝑓(𝑥) becomes ill-conditioned, therefore the quantity 𝑝𝑦(𝑓(𝑥)) was replaced by 𝑝𝑦(𝑓(𝑥))+𝛾, with 𝛾>0 being a small constant to be properly sized. Also, in order to refine learning, after each iteration step, the solution 𝑓(𝑥) needed to be normalized either by affine scaling, in order to control the range of variable 𝑦 , or by linear scaling in order to match the true value of output distribution moment of preselected order. This, in turn, requires computing in advance the (closed form) moments of interest of the output distribution. (b) In spite of affine scaling, it was not easy to control the range of the output value 𝑦, as affine scaling does not guarantee convergence in every case of interest, therefore it could not be employed in every case. (c) The developed procedure was customized to generate output distributions that are either symmetric (namely, 𝑝𝑦(𝑦)=𝑝𝑦(𝑦)) or completely skewed to the right (namely, 𝑝𝑦(𝑦)=0, for all 𝑦<0) only. Asymmetric or general-shape distributions were not considered.

In the present paper, we consider the problem of extending the previous method to the generation of asymmetric distributions by removing the constraint of symmetry or skewedness to the right. Also, we propose a way to avoid probability density function. The solution of choice implies a change in the viewpoint of cardinal equation (1): instead of converting formula (1) into the differential equation (2), we convert it into a new differential equation, hereafter referred to as dual cardinal equation, which will prove easier to solve and more flexible to use in practice, while retaining the previous numerical representation/advantages. Thus, we will retain the effective numerical representation of the involved quantity already introduced in the works [10, 11], based on the “look-up table” (LUT) implementation of neural activation functions as well as the efficient numerical algorithm to solve the dual cardinal equation. LUTs were proven to provide an efficient way of representing and handling the variables appearing within the devised random number generation algorithm. A prominent advantage of the procedure is the lack of hard computational requirements except for LUT handling, which consists of sorting/searching on lists of numbers and of few simple algebraic operations on numbers.

The effectiveness of the proposed approach will be evaluated through numerical experiments. In particular, the designed experiments followed a logical succession, beginning with a basic assessment of the proposed method when applied to bi-Gaussian distribution, which is then followed by comparably more difficult distributions, namely a generalized Gaussian distribution and an asymmetric Gamma distribution.

The existing method presented in [3] is worth discussing. It concerns a neural-networks-type algorithm to generate random vectors with arbitrary marginal distributions and correlation matrix, based on NORTA method. The “normal-to-anything” (NORTA) method (see, e.g., [12]) is one of the most efficient methods for random vector generation. In [3], a technique was presented to generate the correlation matrix of normal random vectors based on an artificial neural networks approach. The NORTA algorithm works in the following way to generate random samples with prescribed probability density function. First, generate zero-mean unit-variance random samples 𝑥𝑖, 𝑖{1,,𝑄}. Then, generate the desired random samples as 𝑦𝑖=𝑃𝑦1(Φ(𝑥𝑖)), where Φ() denotes the cumulative distribution function of a standard normal random variable and 𝑃𝑦() denotes the desired cumulative distribution function, with 𝑃𝑦1(𝑢)=inf{𝑧𝑃𝑦(𝑧)𝑢}, 𝑢[01]. It appears, thus, as a transformation method.

Most of the methods of random vector generation known from the literature impose constraints on the size of the random vectors and many of them are applicable only for bivariate distributions whose components are equidistributed. Conversely, within the NORTA framework, marginal probability distributions for vector components as well as their correlation matrix may be specified. Obtaining the prescribed generated random vector correlation matrix requires solving an involved nonlinear system of equations, which is the most serious problem in this kind of approach. Paper [3] makes use of a multilayer perceptron neural network to estimate correlation matrices of normal random vectors, allowing thus to overcome the analytically involved equations of NORTA algorithm. While the method proposed here is more general than NORTA in the sense that it works for any kind of available generator (not only Gaussian), it is less general in the sense that it does not allow to generate multivariate random variables with prescribed joint statistics.

2. Dual Cardinal Equation AND Its Numerical Solution

The present section formalizes the learning problem at hand and illustrates a fixed-point-based numerical algorithm to solve the dual cardinal equation.

2.1. Dual cardinal equation and neural system

The key point of the new method consists in learning the inverse function 𝑓1() instead of the function 𝑓(). As it will be clarified in the next sections, this choice simplifies the learning problem while adding slight computational burden to the usage of the learnt neural system as a generative model.

We denote by 𝑥=𝑔(𝑦)def=𝑓1(𝑦) the inverse function of the actual neural transfer function and refer to the new neural system, having 𝑔() as transfer function, as the “dual neural system” (shown in Figure 1). The purpose here is to learn a dual neural system that warps 𝑝𝑦() into 𝑝𝑥() under the constraint 𝑔(𝑦)>0, for all 𝑦𝒴. We denote the interval of interest for the generated random variable as 𝒴=[𝑦𝑦]. With this hypothesis on the nonlinear dual neural transfer function, the cardinal equation (1) may be rewritten as 𝑔𝑝(𝑦)=𝑦(𝑦)𝑝𝑥𝑔(𝑦),𝑔(𝑦)=0,𝑦𝒴,(3) which will be hereafter referred to as “dual cardinal equation.” It is worth noting that the boundary condition 𝑔(𝑦)=0 is completely arbitrary. While there are no theoretical reasons to set the boundary condition in any specific way, the above choice is motivated by the observation that it simplifies the fixed-point adapting algorithm with respect to the previous version proposed in [10].

In general, a closed-form solution to (3) may not be realized, thus we should resort to an iterative learning algorithm to search for a solution. Formally, this means designing an algorithm that generates a succession of functions 𝑔𝑛(𝑦), 𝑛, whose limit coincides to the solution of (3). A way to generate such a succession is to employ the fixed-point algorithm: 𝑔𝑛+1(𝑦)=𝑦𝑦𝑝𝑦(𝑡)𝑑𝑡𝑝𝑥𝑔𝑛(𝑡),𝑛0,𝑦𝒴.(4) As a figure-of-convergence of learning process, we consider the weighted difference of function 𝑔() between two successive iterations, namely, Δ𝑔𝑛def=𝒴|||𝑔𝑛(𝑦)𝑔𝑛1|||𝑝(𝑦)𝑦(𝑦)𝑑𝑦,𝑛1.(5) As initial guess, we assume 𝑔0(𝑦)=0, for all 𝑦𝒴.

After learning an inverse function 𝑔(), the numerical procedure should calculate the actual nonlinear function 𝑓() by numerical inversion. As it will be clarified in the next section, within the framework proposed here, such operation involves a very little computational effort.

2.2. Numerical implementation of the learning procedure

From an implementation viewpoint, the algorithm (4) needs to be discretized in order to obtain a version suitable to be implemented on a computer.

We choose to represent function 𝑔𝑛(𝑦) by a numerical vector: in practice, we suppose the interval 𝒴=[𝑦𝑦] of interest to be partitioned into 𝑁1 discrete bins. This gives rise to the vector-type representation 𝐲𝑁+1 of the support of the output sequence probability density function, where 𝐲 contains 𝑁+1 values regularly spaced in 𝒴 with spacing-width denoted as Δ𝑦. Then 𝑔𝑛(𝑦) may be represented by a numerical vector 𝐠𝑛𝑁+1 and the neural input-output transference is now represented by the discrete relationship (𝐠,𝐲)𝑁+1×𝑁+1, namely, a numerical look-up table. The entries of a vector 𝐠𝑛 may be denoted by an extra footer, that is, by 𝑔𝑛,𝑘, with 𝑘{0,1,,𝑁}. The interval Δ𝑦 relates to the integer 𝑁 and may be defined as Δ𝑦def=(𝑦𝑦)/𝑁.

In order to translate the learning rule (4) into a version suitable to numerical representation, we should consider the inherent limitations of numerical integration of differential equations. The following notes are worth taking into account. (a) Output support selection: the ultimate purpose of the random number generation method under construction is to generate random samples with desired probability distribution within a range of interest, namely, with values within an interval that is deemed suitable for the purposes that random samples generation is launched for. Therefore, the output range 𝒴=[𝑦𝑦] is to be freely selected according to the needs the random samples are to be generated for. Then, the above-mentioned vector 𝐲 has entries 𝑦𝑘 computed as 𝑦𝑘=𝑦+𝑘Δ𝑦, with 𝑘{0,1,,𝑁}. (b) Input support selection: in order to prevent the denominator of the quantity 𝑔𝑛+1(𝑦) in (4) to become too close to zero, a sensible choice is to carefully select the support 𝒳. As in this paper we consider the input probability density function to be either (symmetric) Gaussian or uniform, we set 𝒳=[𝑅𝑥𝑅𝑥], with 𝑅𝑥>0. The value of constant 𝑅𝑥 is to be selected in such a way that 𝑝𝑥(𝑅𝑥)0. It is worth recalling that the support of the input distribution may be arbitrarily selected as it does not affect the support of the output distribution. (c) Iterative range scaling: after each learning step, an affine normalization operation is performed, that linearly scales the entries of the putative solution 𝐠𝑛 so that 𝑔𝑛,0=𝑅𝑥 and 𝑔𝑛,𝑁=𝑅𝑥.

In order to describe the numerical learning algorithm, the following operators are defined for a generic look-up table (𝐡,𝐲)𝑁+1×𝑁+1: 𝐡cumsum0𝐡=0,cumsum𝑘def=𝑘1𝑖=0𝑔𝑖Δ𝑦,ascale𝐡;𝑎,𝑏𝑘def=𝑎+𝑘𝐡min(𝑏𝑎)𝐡𝐡,maxmin(6) where the subscript 𝑘 denotes the 𝑘th entry of the vectors cumsum(𝐡) and ascale{𝐡;𝑎,𝑏}. The behavior of the “cumsum” operator is illustrated in Figure 2, which also provides a visual representation of look-up tables. In practice, the considered numerical version of the learning rule (4) writes (𝐀𝟎)𝐠0=𝟎,(𝐀𝟏)𝐠𝑛+1𝐩=𝑦𝑝𝑥(𝐠𝑛),𝑛0,(𝐀𝟐)𝐠𝑛+1=cumsum𝐠𝑛+1,(𝐀𝟑)𝐠𝑛+1𝐠=ascale𝑛+1;𝑅𝑥,𝑅𝑥,(7) where symbol = denotes vector values assignment and 𝐩𝑦 denotes the vector of 𝑁+1 entries containing the values of 𝑝𝑦() corresponding to the values in 𝐲, and its entries may be denoted as 𝑝𝑦𝑘, with 𝑘{0,1,,𝑁}.

In terms of look-up-tables entries, the learning relaxation index Δ𝑔𝑛 of definition (5) may be approximated as Δ𝑔𝑛𝑁𝑘=0|||𝑔𝑛,𝑘𝑔𝑛1,𝑘|||𝑝𝑦𝑘Δ𝑦,𝑛1.(8)

2.3. Use of the neural system as generative model

When a suitable dual neural system described by the transference 𝑔() has been learnt, it may be effectively used to generate random samples drawn from the desired statistical distribution. The number of available input samples (that coincides with the number of output samples to be generated) is hereafter denoted by 𝑄. The difficulty here is that the input samples 𝑥 are known while the output samples 𝑦 are supposed to be computed as 𝑦=𝑓(𝑥). However, unlike in [10], the function 𝑓() is not known in the present setting as its inverse 𝑔() only has been learnt. Nevertheless, the inversion of function 𝑔() is not required in order to employ the dual neural system as a generative model, provided an appropriate usage of the look-up table representing 𝑔() is made. First, it is necessary to produce a realization {𝑥𝑖}, 𝑖{1,,𝑄}, drawn from the available-generator distribution 𝑝𝑥() (having, e.g., zero-mean Gaussian or uniform probability density function) ranging in 𝒳. About generation of input samples, as they are generated by using an available generator whose range is wider than 𝒳, some generated input samples will be necessarily discarded. The amount of discarded input samples may be quantified. Let us denote by 𝑃𝑥() the cumulative distribution function of the input, namely, 𝑃𝑥(𝑥)def=𝑥𝑝𝑥(𝑡)𝑑𝑡.(9) The ratio 𝜌 of the number of discarded samples over the total number of generated samples is given by 𝜌𝑅𝑥def=#discardedsamples#generatedsamples=12𝑃𝑥𝑅𝑥.(10) The parameter 𝑅𝑥 may thus be selected in order to adjust the value of 𝜌(𝑅𝑥) to design needs. Then, it is necessary to address the proper values in the learnt look-up table (𝐠,𝐲)𝑁+1×𝑁+1corresponding to the values of {𝑥𝑖}, 𝑖{1,,𝑄}, by finding pointers 𝑟𝑖{0,1,,𝑁} such that 𝑔𝑟𝑖𝑥𝑖. This means searching, in the whole look-up table, for the closest value of 𝑔() to the sample 𝑥𝑖. Such operation should be performed in an efficient way. Finally, the desired set {𝑦𝑖} of output samples, approximately distributed according to the probability density function 𝑝𝑦(), may be obtained by setting 𝑦𝑖=𝑦𝑟𝑖,𝑖{1,,𝑄}, where 𝑦𝑟𝑖 denotes the 𝑟𝑖th entry of the look-up table (𝐠,𝐲). (Commented MATLAB code is available on request.)

3. Computer-based Numerical Experiments

In the following experiments, we consider generating random univariate samples with prescribed density function within prescribed ranges of interest, supposing that a prototype Gaussian random number generator is available. The prototype Gaussian distribution has zero mean and unitary variance. The parameter 𝑅𝑥 was set to 1 in all the experiments, which corresponds to a ratio 𝜌0.3173 that allows retaining about 70% of the generated input samples. The experiments were run on a 1.86 GHz, 512 MB-RAM platform.

3.1. Experiments on a “bi-Gaussian” distribution

The first case of generation of a random variable concerns a “bi-Gaussian” distribution defined by 𝐺2𝑦=12[12𝜋𝜎1exp(𝑦𝜇122𝜎21)+12𝜋𝜎2exp(𝑦𝜇222𝜎22)](11) that may assume fairly asymmetric shapes.

The numerical results presented below pertain to values 𝜎1=0.3, 𝜇1=0.5, 𝜎2=1 and 𝜇2=0.8. The interval of interest for the output variable is set to 𝒴=[2.54]. The total number of generated output samples amounts to 𝑄=68219. The number of points in which the function 𝑔() is computed is 𝑁=1000. The results obtained by running the learning algorithm (7) are shown in Figure 3. The values of the index Δ𝑔𝑛 shows that the fixed-point algorithm may be stopped after 5 iterations. In Figure 3, the histogram estimates (with 50 bins) of the generated Gaussian data and of the bi-Gaussian output–-obtained with the learnt dual system–-may be observed as well.

Cumulative results on repeated independent trials are illustrated. The number of iterations of the algorithm (7) was set to 10, while the other data stayed the same of the previous single-run experiment. The number 𝑁 of points in which the function 𝑔() was computed ranged from 200 to 2000 with step 200, in order to obtain some information about the sensitivity of the algorithm to the selection of the number of points in the domain 𝒴 and about the influence of the number 𝑁 in the computational complexity of the algorithm. In particular, the sensitivity of the algorithm with respect to the number 𝑁 was measured via a discrepancy index DSC computed as follows. (a) The histogram-based estimate of the probability density function of the generated samples is computed on a number of bins equal to 50. The discrete values of such estimate are denoted by ̂𝑝𝑦𝑏, 𝑏{1,2,,50}. (b) The true values of the probability density function 𝑝𝑦() are computed in correspondence of the histogram's bin-centers. The discrete values of such probability density function are denoted by 𝑝𝑦𝑏, 𝑏{1,2,,50}. (c) The weighted-square-difference-type discrepancy index is computed by the expression DSCdef=50𝑏=1(̂𝑝𝑦𝑏𝑝𝑦𝑏)2𝑝𝑦𝑏.

The average number of generated samples varies between about 68250 and 68290. The obtained results are summarized in Tables and . The tables show the average run-time required for learning (expressed in seconds), the average run-time required to generate the samples (use of the learnt systems as a generative model) and average DSC index value. As it is readily appreciated, the computational complexity owing to the learning phase depends on the number of points used to approximate the nonlinear transference 𝑔() as expected, while the computational complexity owing to the generation phase depends only slightly on 𝑁. The sensitivity of the method measured by the discrepancy index DSC is high for low values of the parameter 𝑁, while it becomes quite low for values of 𝑁 larger than 1000.

3.2. Experiments on a generalized Gaussian distribution

The second example of random samples generation is about a generalized Gaussian distribution [13]: sinh() where sech() denotes the hyperbolic sine function and 𝑝𝑦(𝑦)exp(𝜅2(𝑦)) denotes the reciprocal of the hyperbolic cosine function (namely, the hyperbolic secant function) The present generalized Gaussian distribution (GGD) differs by the standard GGD model encountered in literature (see, e.g., [14]). It belongs to the general exponential family of distributions of the type 𝜅(), with 𝜎=1 satisfying appropriate compatibility conditions. The distribution (12) as well as the GGD in [14] belong to the above exponential family.

The numerical results presented below pertain to values 𝜇=0.8, 𝜆=0.5, and 𝒴=[34]. The interval of interest for the output variable is set to 𝑄=68335. The total number of generated output samples amounts to 𝑔(). The number of points in which the function 𝑁=1000 is computed is Δ𝑔𝑛. The results obtained by running the learning algorithm (7) are shown in Figure 4. The values of the index 𝑁 show that the fixed-point algorithm may be safely stopped after 5 iterations again. In Figure 4, the histogram estimates (with 50 bins) of the generated Gaussian data and of the generalized Gaussian output may be observed as well.

Cumulative results are illustrated as well. The number of iterations of the algorithm (7) was set to 10, while the other data stayed the same of the previous single-run experiment. The number 𝑁of points ranged from 200 to 1000 with step 200. The average number of generated samples varies between about 68250 and 68290. The obtained results are summarized in Table .

3.3. Experiments on a Gamma distribution

The third example is repeated from [10]: we considered the generation of a (symmetric) Gamma distribution: 𝛼=0.8 This choice is motivated by the observation that the random number generation algorithm in [10] gives rise to the most inaccurate result when tested on the Gamma distribution .

The numerical results presented below pertain to values 𝛽=4 and 𝒴=[22]. The interval of interest for the output variable is set to 𝑄=68355. The total number of generated output samples amounts to 𝑔(). The number of points in which the function 𝑁=1500 is computed is Δ𝑔𝑛. The results obtained by running the learning algorithm (7) are shown in Figure 5. The values of the index 𝑁 show that the fixed-point algorithm may be safely stopped after 5 iterations again. Figure 5 shows the histogram estimates (with 50 bins) of the generated Gaussian data and of the Gamma-distributed output.

Cumulative results were obtained by setting the number of iterations of the algorithm (7) to 20, while the other data stayed the same of the previous single-run experiment. The number of points ranged from 1000 to 1800 with step 200. The average number of generated samples varies between about 68230 and 68280. The obtained results are summarized in Table .

4. Conclusion

The aim of the present manuscript was to present a novel random number generation technique based on dual neural system learning. We elaborated over our recent work [10] in order to obtain a new learning algorithm free of the need of choosing parameters and normalization-criteria. The main idea is to shift the learning paradigm from the viewpoint of cardinal equation solving to dual cardinal equation solving, which appears to be more easily profitable.

The proposed numerical results confirmed the agreement between the desired and obtained distributions of the generated variate. The analysis of computational burden, in terms of running times, shows that the proposed algorithm is not computationally demanding.