Asymmetric Variate Generation via a Parameterless Dual Neural Learning Algorithm

Fiori, Simone

doi:https://doi.org/10.1155/2008/426080

Computational Intelligence and Neuroscience

On this page

Abstract Introduction Conclusion References Copyright Related Articles

Research Article | Open Access

Volume 2008 | Article ID 426080 | https://doi.org/10.1155/2008/426080

Asymmetric Variate Generation via a Parameterless Dual Neural Learning Algorithm

Simone Fiori¹

Academic Editor: S. Cruces-Alvarez

Received16 Jun 2007

Accepted19 Sept 2007

Published24 Apr 2008

Abstract

In a previous work (S. Fiori, 2006), we proposed a random number generator based on a tunable non-linear neural system, whose learning rule is designed on the basis of a cardinal equation from statistics and whose implementation is based on look-up tables (LUTs). The aim of the present manuscript is to improve the above-mentioned random number generation method by changing the learning principle, while retaining the efficient LUT-based implementation. The new method proposed here proves easier to implement and relaxes some previous limitations.

1. Introduction

Random numbers are currently used for a variety of purposes such as: cryptographic keys generation, games, some classes of scientific experiments as well as “Monte Carlo” methods in physics and computer science [1–6]. Standard programming environments are endowed with basic pseudorandom signal generators such as the uniform and the Gaussian ones, while usually the needed distributions are more involved than uniform/Gaussian. A simple example of application is password generation: a random password generator is a software that inputs from a random or pseudorandom number generator and automatically generates a password. An example of application where involved probability distributions are needed is in independent component analysis (ICA, [7]) testing: as the behavior of an ICA algorithm might depend on the statistical distribution of the sources, ICA-algorithm testing tools might require random sequences generators capable of producing random numbers distributed according to involved probability laws.

The principal methods known in the literature to obtain a batch of samples endowed with an arbitrary distribution from a samples batch having a simple distribution are the “transformation method” and the “rejection method” [8]. In the present paper, we focus on the transformation method, which may be well implemented through a tunable neural system, because the availability of a random number source and of a tunable nonlinear system, along with a proper learning procedure, allows obtaining a wide class of pseudorandom signal generators.

A well-known effect of nonlinear neural systems is to warp the statistical distribution of its input. In particular, we assume that the system under consideration has a nonlinear adaptive structure described by the transference , where denotes the system input random signal, having probability density function , and denotes the output signal, having probability density function , as shown in Figure 1. In the hypothesis that the neural system transference is strictly monotonic, namely , for all , the relationship between the input distribution, the output distribution, and the system transfer function is known to be [9] where denotes the inverse of function . Usually, (1) is interpreted as an analysis formula, which allows computing the output distribution when the input distribution and the system transference function are known. However, the cardinal equation (1) may also be interpreted as a formula that allows for designing the nonlinear system when the distribution is known and it is desired that the system responds according to a desired distribution . In fact, (1) may be rewritten as the differential equation: In general, such design operation is rather difficult, because (2) in the unknown involves the solution of a nonlinear differential equation, provided that a consistent boundary condition is specified.

In the recent contribution [10], we presented a pseudorandom samples generator based on a nonlinear monotonic neural system, whose transference function is denoted by , tuned on the basis of the differential equation (2). The cardinal design equation (2) was proposed to be solved via a (relaxation-type) fixed-point algorithm. The key advantages of the method proposed in [10] are as follows. (a) In order to obtain a fully-tunable neural transference function, a look-up-table representation was chosen. It guarantees high flexibility in the shape of the neural transference as well as easiness of representation and handling of the involved quantities. (b) The fixed-point learning algorithm exhibits fast convergence over other possible methods such as the gradient-based one: unlike these methods, the fixed-point learning algorithm does not require the computation of derivatives of the involved functions.

The resulting random-number generation method should be thus read as a two-stage procedure. The first stage consists in solving the cardinal differential equation (2) in the unknown function , given the distributions and as data. The second stage consists in generating input random samples drawn from the distribution , then letting such random samples pass through the learnt nonlinear neural system by computing output values . The random samples are assured to be distributed according to the probability density function .

However, we recognized that the method presented in [10] also suffers from some drawbacks, namely the following. (a) For numerical convergence purpose, each step of the fixed-point-type tuning algorithm needed to be followed by some normalization steps. Namely, from (2), it is easily seen that when the function approaches , the computation of becomes ill-conditioned, therefore the quantity was replaced by , with being a small constant to be properly sized. Also, in order to refine learning, after each iteration step, the solution needed to be normalized either by affine scaling, in order to control the range of variable , or by linear scaling in order to match the true value of output distribution moment of preselected order. This, in turn, requires computing in advance the (closed form) moments of interest of the output distribution. (b) In spite of affine scaling, it was not easy to control the range of the output value , as affine scaling does not guarantee convergence in every case of interest, therefore it could not be employed in every case. (c) The developed procedure was customized to generate output distributions that are either symmetric (namely, ) or completely skewed to the right (namely, , for all ) only. Asymmetric or general-shape distributions were not considered.

In the present paper, we consider the problem of extending the previous method to the generation of asymmetric distributions by removing the constraint of symmetry or skewedness to the right. Also, we propose a way to avoid probability density function. The solution of choice implies a change in the viewpoint of cardinal equation (1): instead of converting formula (1) into the differential equation (2), we convert it into a new differential equation, hereafter referred to as dual cardinal equation, which will prove easier to solve and more flexible to use in practice, while retaining the previous numerical representation/advantages. Thus, we will retain the effective numerical representation of the involved quantity already introduced in the works [10, 11], based on the “look-up table” (LUT) implementation of neural activation functions as well as the efficient numerical algorithm to solve the dual cardinal equation. LUTs were proven to provide an efficient way of representing and handling the variables appearing within the devised random number generation algorithm. A prominent advantage of the procedure is the lack of hard computational requirements except for LUT handling, which consists of sorting/searching on lists of numbers and of few simple algebraic operations on numbers.

The effectiveness of the proposed approach will be evaluated through numerical experiments. In particular, the designed experiments followed a logical succession, beginning with a basic assessment of the proposed method when applied to bi-Gaussian distribution, which is then followed by comparably more difficult distributions, namely a generalized Gaussian distribution and an asymmetric Gamma distribution.

The existing method presented in [3] is worth discussing. It concerns a neural-networks-type algorithm to generate random vectors with arbitrary marginal distributions and correlation matrix, based on NORTA method. The “normal-to-anything” (NORTA) method (see, e.g., [12]) is one of the most efficient methods for random vector generation. In [3], a technique was presented to generate the correlation matrix of normal random vectors based on an artificial neural networks approach. The NORTA algorithm works in the following way to generate random samples with prescribed probability density function. First, generate zero-mean unit-variance random samples , . Then, generate the desired random samples as , where denotes the cumulative distribution function of a standard normal random variable and denotes the desired cumulative distribution function, with , . It appears, thus, as a transformation method.

Most of the methods of random vector generation known from the literature impose constraints on the size of the random vectors and many of them are applicable only for bivariate distributions whose components are equidistributed. Conversely, within the NORTA framework, marginal probability distributions for vector components as well as their correlation matrix may be specified. Obtaining the prescribed generated random vector correlation matrix requires solving an involved nonlinear system of equations, which is the most serious problem in this kind of approach. Paper [3] makes use of a multilayer perceptron neural network to estimate correlation matrices of normal random vectors, allowing thus to overcome the analytically involved equations of NORTA algorithm. While the method proposed here is more general than NORTA in the sense that it works for any kind of available generator (not only Gaussian), it is less general in the sense that it does not allow to generate multivariate random variables with prescribed joint statistics.

2. Dual Cardinal Equation AND Its Numerical Solution

The present section formalizes the learning problem at hand and illustrates a fixed-point-based numerical algorithm to solve the dual cardinal equation.

2.1. Dual cardinal equation and neural system

The key point of the new method consists in learning the inverse function instead of the function . As it will be clarified in the next sections, this choice simplifies the learning problem while adding slight computational burden to the usage of the learnt neural system as a generative model.

We denote by the inverse function of the actual neural transfer function and refer to the new neural system, having as transfer function, as the “dual neural system” (shown in Figure 1). The purpose here is to learn a dual neural system that warps into under the constraint , for all . We denote the interval of interest for the generated random variable as . With this hypothesis on the nonlinear dual neural transfer function, the cardinal equation (1) may be rewritten as which will be hereafter referred to as “dual cardinal equation.” It is worth noting that the boundary condition is completely arbitrary. While there are no theoretical reasons to set the boundary condition in any specific way, the above choice is motivated by the observation that it simplifies the fixed-point adapting algorithm with respect to the previous version proposed in [10].

In general, a closed-form solution to (3) may not be realized, thus we should resort to an iterative learning algorithm to search for a solution. Formally, this means designing an algorithm that generates a succession of functions , , whose limit coincides to the solution of (3). A way to generate such a succession is to employ the fixed-point algorithm: As a figure-of-convergence of learning process, we consider the weighted difference of function between two successive iterations, namely, As initial guess, we assume , for all .

After learning an inverse function , the numerical procedure should calculate the actual nonlinear function by numerical inversion. As it will be clarified in the next section, within the framework proposed here, such operation involves a very little computational effort.

2.2. Numerical implementation of the learning procedure

From an implementation viewpoint, the algorithm (4) needs to be discretized in order to obtain a version suitable to be implemented on a computer.

We choose to represent function by a numerical vector: in practice, we suppose the interval of interest to be partitioned into discrete bins. This gives rise to the vector-type representation of the support of the output sequence probability density function, where contains values regularly spaced in with spacing-width denoted as . Then may be represented by a numerical vector and the neural input-output transference is now represented by the discrete relationship , namely, a numerical look-up table. The entries of a vector may be denoted by an extra footer, that is, by , with . The interval relates to the integer and may be defined as .

In order to translate the learning rule (4) into a version suitable to numerical representation, we should consider the inherent limitations of numerical integration of differential equations. The following notes are worth taking into account. (a) Output support selection: the ultimate purpose of the random number generation method under construction is to generate random samples with desired probability distribution within a range of interest, namely, with values within an interval that is deemed suitable for the purposes that random samples generation is launched for. Therefore, the output range is to be freely selected according to the needs the random samples are to be generated for. Then, the above-mentioned vector has entries computed as , with . (b) Input support selection: in order to prevent the denominator of the quantity in (4) to become too close to zero, a sensible choice is to carefully select the support . As in this paper we consider the input probability density function to be either (symmetric) Gaussian or uniform, we set , with . The value of constant is to be selected in such a way that . It is worth recalling that the support of the input distribution may be arbitrarily selected as it does not affect the support of the output distribution. (c) Iterative range scaling: after each learning step, an affine normalization operation is performed, that linearly scales the entries of the putative solution so that and .

In order to describe the numerical learning algorithm, the following operators are defined for a generic look-up table : where the subscript denotes the th entry of the vectors and . The behavior of the “cumsum” operator is illustrated in Figure 2, which also provides a visual representation of look-up tables. In practice, the considered numerical version of the learning rule (4) writes where symbol denotes vector values assignment and denotes the vector of entries containing the values of corresponding to the values in , and its entries may be denoted as , with .

In terms of look-up-tables entries, the learning relaxation index of definition (5) may be approximated as

2.3. Use of the neural system as generative model

When a suitable dual neural system described by the transference has been learnt, it may be effectively used to generate random samples drawn from the desired statistical distribution. The number of available input samples (that coincides with the number of output samples to be generated) is hereafter denoted by . The difficulty here is that the input samples are known while the output samples are supposed to be computed as . However, unlike in [10], the function is not known in the present setting as its inverse only has been learnt. Nevertheless, the inversion of function is not required in order to employ the dual neural system as a generative model, provided an appropriate usage of the look-up table representing is made. First, it is necessary to produce a realization , , drawn from the available-generator distribution (having, e.g., zero-mean Gaussian or uniform probability density function) ranging in . About generation of input samples, as they are generated by using an available generator whose range is wider than , some generated input samples will be necessarily discarded. The amount of discarded input samples may be quantified. Let us denote by the cumulative distribution function of the input, namely, The ratio of the number of discarded samples over the total number of generated samples is given by The parameter may thus be selected in order to adjust the value of to design needs. Then, it is necessary to address the proper values in the learnt look-up table corresponding to the values of , , by finding pointers such that . This means searching, in the whole look-up table, for the closest value of to the sample . Such operation should be performed in an efficient way. Finally, the desired set of output samples, approximately distributed according to the probability density function , may be obtained by setting , where denotes the th entry of the look-up table . (Commented MATLAB code is available on request.)

3. Computer-based Numerical Experiments

In the following experiments, we consider generating random univariate samples with prescribed density function within prescribed ranges of interest, supposing that a prototype Gaussian random number generator is available. The prototype Gaussian distribution has zero mean and unitary variance. The parameter was set to 1 in all the experiments, which corresponds to a ratio that allows retaining about 70% of the generated input samples. The experiments were run on a 1.86 GHz, 512 MB-RAM platform.

3.1. Experiments on a “bi-Gaussian” distribution

The first case of generation of a random variable concerns a “bi-Gaussian” distribution defined by that may assume fairly asymmetric shapes.

The numerical results presented below pertain to values , , and . The interval of interest for the output variable is set to . The total number of generated output samples amounts to . The number of points in which the function is computed is . The results obtained by running the learning algorithm (7) are shown in Figure 3. The values of the index shows that the fixed-point algorithm may be stopped after 5 iterations. In Figure 3, the histogram estimates (with 50 bins) of the generated Gaussian data and of the bi-Gaussian output–-obtained with the learnt dual system–-may be observed as well.

(a)

(b)

(c)

(d)

Cumulative results on repeated independent trials are illustrated. The number of iterations of the algorithm (7) was set to 10, while the other data stayed the same of the previous single-run experiment. The number of points in which the function was computed ranged from 200 to 2000 with step 200, in order to obtain some information about the sensitivity of the algorithm to the selection of the number of points in the domain and about the influence of the number in the computational complexity of the algorithm. In particular, the sensitivity of the algorithm with respect to the number was measured via a discrepancy index DSC computed as follows. (a) The histogram-based estimate of the probability density function of the generated samples is computed on a number of bins equal to 50. The discrete values of such estimate are denoted by , . (b) The true values of the probability density function are computed in correspondence of the histogram's bin-centers. The discrete values of such probability density function are denoted by , . (c) The weighted-square-difference-type discrepancy index is computed by the expression .

The average number of generated samples varies between about 68250 and 68290. The obtained results are summarized in Tables and . The tables show the average run-time required for learning (expressed in seconds), the average run-time required to generate the samples (use of the learnt systems as a generative model) and average DSC index value. As it is readily appreciated, the computational complexity owing to the learning phase depends on the number of points used to approximate the nonlinear transference as expected, while the computational complexity owing to the generation phase depends only slightly on . The sensitivity of the method measured by the discrepancy index DSC is high for low values of the parameter , while it becomes quite low for values of larger than 1000.

3.2. Experiments on a generalized Gaussian distribution

The second example of random samples generation is about a generalized Gaussian distribution [13]: where denotes the hyperbolic sine function and denotes the reciprocal of the hyperbolic cosine function (namely, the hyperbolic secant function) The present generalized Gaussian distribution (GGD) differs by the standard GGD model encountered in literature (see, e.g., [14]). It belongs to the general exponential family of distributions of the type , with satisfying appropriate compatibility conditions. The distribution (12) as well as the GGD in [14] belong to the above exponential family.

The numerical results presented below pertain to values , , and . The interval of interest for the output variable is set to . The total number of generated output samples amounts to . The number of points in which the function is computed is . The results obtained by running the learning algorithm (7) are shown in Figure 4. The values of the index show that the fixed-point algorithm may be safely stopped after 5 iterations again. In Figure 4, the histogram estimates (with 50 bins) of the generated Gaussian data and of the generalized Gaussian output may be observed as well.

(a)

(b)

(c)

(d)

Cumulative results are illustrated as well. The number of iterations of the algorithm (7) was set to 10, while the other data stayed the same of the previous single-run experiment. The number of points ranged from 200 to 1000 with step 200. The average number of generated samples varies between about 68250 and 68290. The obtained results are summarized in Table .

3.3. Experiments on a Gamma distribution

The third example is repeated from [10]: we considered the generation of a (symmetric) Gamma distribution: This choice is motivated by the observation that the random number generation algorithm in [10] gives rise to the most inaccurate result when tested on the Gamma distribution .

The numerical results presented below pertain to values and . The interval of interest for the output variable is set to . The total number of generated output samples amounts to . The number of points in which the function is computed is . The results obtained by running the learning algorithm (7) are shown in Figure 5. The values of the index show that the fixed-point algorithm may be safely stopped after 5 iterations again. Figure 5 shows the histogram estimates (with 50 bins) of the generated Gaussian data and of the Gamma-distributed output.

(a)

(b)

(c)

(d)

Cumulative results were obtained by setting the number of iterations of the algorithm (7) to 20, while the other data stayed the same of the previous single-run experiment. The number of points ranged from 1000 to 1800 with step 200. The average number of generated samples varies between about 68230 and 68280. The obtained results are summarized in Table .

4. Conclusion

The aim of the present manuscript was to present a novel random number generation technique based on dual neural system learning. We elaborated over our recent work [10] in order to obtain a new learning algorithm free of the need of choosing parameters and normalization-criteria. The main idea is to shift the learning paradigm from the viewpoint of cardinal equation solving to dual cardinal equation solving, which appears to be more easily profitable.

The proposed numerical results confirmed the agreement between the desired and obtained distributions of the generated variate. The analysis of computational burden, in terms of running times, shows that the proposed algorithm is not computationally demanding.

References

P. L'Ecuyer, “Random number generation,” in The Handbook of Simulation, J. Banks, Ed., pp. 93–137, John Wiley & Sons, New York, NY, USA, 1998, chapter 4.
View at: Google Scholar
J. C. Lagarias, “Pseudorandom number generators in cryptography and number theory,” in Cryptology and Computational Number Theory, Proceedings of Symposia in Applied Mathematics, C. Pomerance, Ed., vol. 42, pp. 115–143, American Mathematical Society, Providence, RI, USA, 1990.
View at: Google Scholar
S. T. A. Niaki and B. Abbasi, “NORTA and neural networks based method to generate RANDOM vectors with arbitrary marginal distributions and correlation matrix,” in Proceedings of the 17th IASTED International Conference on Modelling and Simulation, pp. 234–239, Montreal, Canada, May 2006.
View at: Google Scholar
G. Marsaglia, “A current view of random number generators,” in Computer Science and Statistics: The Interface, L. Billard, Ed., pp. 3–10, Elsevier, Amsterdam, The Netherlands, 1985.
View at: Google Scholar
H. Niederreiter, Random Number Generation and Quasi-Monte Carlo Methods, SIAM, Philadelphia, Pa, USA, 1992.
B. D. Ripley, “Thoughts on pseudorandom number generators,” Journal of Computational and Applied Mathematics, vol. 31, no. 1, pp. 153–163, 1990.
View at: Publisher Site | Google Scholar
A. Cichocki and S.-I. Amari, Adaptive Blind Signal and Image Processing: Learning Algorithms and Applications, John Wiley & Sons, New York, NY, USA, 2002.
D. E. Knuth, The Art of Computer Programming: Seminumerical Algorithms, vol. 2, Addison-Wesley, Reading, Mass, USA, 3rd edition, 1997.
A. Papoulis, Probability and Statistics, Prentice-Hall, Englewood Cliffs, NJ, USA, 1996.
S. Fiori, “Neural systems with numerically-matched input-output statistic: variate generation,” Neural Processing Letters, vol. 23, no. 2, pp. 143–170, 2006.
View at: Publisher Site | Google Scholar
S. Fiori, “Neural systems with numerically matched input-output statistic: isotonic bivariate statistical modeling,” Computational Intelligence and Neuroscience, vol. 2007, Article ID 71859, p. 23 pages, 2007.
View at: Publisher Site | Google Scholar
S. Ghosh and S. G. Henderson, “Properties of the NORTA method in higher dimensions,” in Proceedings of the Winter Simulation Conference (WSC '02), E. Yücesan, C.-H. Chen, J. L. Snowdon, and J. M. Charnes, Eds., vol. 1, pp. 263–269, San Diego, Calif, USA, December 2002.
View at: Publisher Site | Google Scholar
A. C. Tsai, January 2006, Personal communication.
K. Kokkinakis and A. K. Nandi, “Exponent parameter estimation for generalized Gaussian probability density functions with application to speech modeling,” Signal Processing, vol. 85, no. 9, pp. 1852–1858, 2005.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2008 Simone Fiori. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

1112

Downloads

854

Citations