The Role of Nonpolynomiality in Uniform Approximation by RBF Networks of Hankel Translates
Given and , let the space (respectively, ) consist of all those continuous functions on (respectively, ) such that the limit exists and is finite; is endowed with the uniform norm Assume defines an absolutely regular Hankel-transformable distribution. Then, the linear span of dilates and Hankel translates of is dense in for all if, and only if, , where
Dedicated to Professor Fernando Pérez González on the occasion of his retirement
1. Introduction and Motivation
The radial basis function (RBF) method is nowadays one of the primary tools for interpolating multidimensional scattered data. Its simple form and ability to accurately approximate an underlying function have made the method increasingly popular in several different types of applications, some of which include cartography, medical imaging, the numerical solution of partial differential equations, and neural networks (see, e.g.,  and references therein).
Radial basis function neural networks (RBFNNs) as such were introduced in the 1980s by Broomhead and Lowe  and soon applied to problems of supervised learning such as regression, classification, and time series prediction [3, 4]. This type of network falls within the general class of nonlinear, single hidden layer feedforward neural networks. Given , the family of RBFNNs consists of all those functions of the formwhere (i) is the number of kernel nodes in the hidden layer(ii) is the vector of weights from the th kernel node to the output nodes(iii) is an input vector(iv) is a radially symmetric kernel function of a unit in the hidden layer(v) and are the centroid and smoothing factor (or width) of the th kernel node , respectively(vi) is the so-called activation function, which characterizes the kernel shape, often a Gaussian
The smoothing factors may be the same in all kernel nodes of a RBFNN or may vary across them. Park and Sandberg [5, 6] proved that under mild conditions on the kernel (or the activation function ) both classes of RBFNNs (with either the same or varying smoothing factors across nodes) have the universal approximation property, meaning that they are dense in suitable spaces of continuous or integrable functions. Chen and Chen  considered RBFNNs with a continuous activation function in the hidden layer defining a tempered distribution in and proved that the necessary and sufficient condition for such networks to uniformly approximate every continuous function on compacta is that is not an even polynomial. Nonpolynomiality is straightforwardly seen to be a necessary condition for these approximations and has been found necessary and sufficient for other types of networks to possess the universal approximation property as well, cf. [8–12]. In this paper we aim to extend the result in  to RBFNNs of Hankel translates. The precise meaning of this extension will be clarified in due course.
1.2. The Hankel Transformation and the Hankel Translation
Let . The Hankel integral transformation is usually defined by where and denotes the Bessel function of the first kind and order .
Aiming to obtain a distributional extension of , Zemanian introduced new spaces of test and generalized functions. The space [13, 14] consists of all those smooth, complex-valued functions such that When topologized by the family of norms , becomes a Fréchet space where is an automorphism provided that . Then the generalized Hankel transformation , defined by transposition on the dual of , is an automorphism of when this latter space is endowed with either its or its strong topology. For and , Zemanian  also introduced the space of all those smooth functions such that andEndowed with the topology generated by the family of seminorms , becomes a Fréchet space. The strict inductive limit of the family satisfies , with continuous embedding. Since is dense in , it turns out that can be regarded as a subspace of , the dual of .
The study of the Hankel #-convolution in spaces of generalized functions was initiated by Sousa Pinto , only on compactly-supported distributions and for . In a series of papers [17–19], Betancor and the author investigated systematically the generalized #-convolution in wider spaces of distributions, allowing . In this context, the Hankel convolution of is defined as the function where the Hankel translate of is given by Here, for , is the so-called Delsarte kernel. Note that , , is symmetric in , andwhere . Therefore, for any we have The formula and the exchange formula hold pointwise. The Hankel translation is defined on by transposition. The Hankel convolution of and is defined [19, Definition 3.1] by The formulasandhold in the sense of equality in (cf. [19, Proposition 3.5]).
The space of all those smooth functions with the property that to every there corresponds satisfyingwas characterized as the space of multipliers of and [20, Theorems 2.3 and 2.9].
The generalized Hankel transformation establishes an isomorphism between and the subspace of consisting of the Hankel convolution operators on and , which is a homeomorphism under the natural topologies of and [19, Propositions 4.2 and 5.2]. The distribution given bysatisfies and , cf. [19, Proposition 4.7] and [21, Proposition 3].
For the operational rules of the Hankel transformation and further properties of the Hankel translation and Hankel convolution that will be required, in particular those involving the Bessel differential operator the reader is mainly referred to [14, 17, 19]. Here we will highlight the following [14, Equation 5.5(8)]:
If and (a.e. ) is an integrable radial function, then its -dimensional Fourier transform is also radial and becomes a 1-dimensional Hankel transform of order [22, Theorem IV.3.3]: Actually, since it turns out that, on radial univariate—even—functions, the Fourier transformation, which reduces to a Fourier-cosine transformation, coincides with the Hankel transform of order ; similarly, the Hankel translation and Hankel convolution of order can be seen to coincide (modulo a multiplicative constant) with the usual translation and convolution on (cf. [23, Example 3.2]). Thus for the Hankel translation and the Hankel convolution provide strict generalizations of the usual translation and convolution operators, inasmuch as arbitrary orders are allowed.
1.3. RBFNNs of Hankel Translates
Motivated by the fact that the Hankel transformation is best adapted to deal with radial functions, Arteaga and the author [24–27] have proved that the Hankel transformation and the Hankel convolution are suitable tools for the description and analysis of a RBF interpolation scheme by functions of the form where , is a complex function defined on (the so-called basis function), is a Müntz monomial, denotes the Hankel translation operator of order , and are complex coefficients.
In analogy to the standard case (1), we set the family of RBFNNs of Hankel translates of order to consist of all those functions which can be represented aswhere is the number of kernel nodes in the hidden layer, for , , is the weight from the th kernel node to the output node, and are, respectively, the centroid and the smoothing factor of the th kernel node. Further, is a kernel function of a unit in the hidden layer which, in this case, coincides with the activation function and, as above, denotes the Hankel translation operator, while is a dilation operator. Note that, for and , (23) becomes (1).
An investigation on the universal approximation capabilities of a closely related class of RBFNNs defined on the nonnegative real axis has been carried out in several papers by Arteaga and the author [28–30]. It should be remarked that the results in the present paper can be derived neither from [24–27], where only the interpolation problem is addressed, nor from [28–30], where RBFNNs are constructed using the Bessel-Kingman hypergroup translation (or Delsarte translation) instead of the Hankel one, and where the universal approximation property, which is studied mainly in spaces of integrable functions, requires in turn integrability of the basis function.
In the sequel we assume and consider the following spaces:(i)Given , will denote the linear space of all those continuous functions on such that the limit exists and is finite. When endowed with the norm becomes a Banach space. In fact, the map is an isometry from onto , the space of all continuous functions on with the uniform norm.(ii)The linear space consists of all those continuous functions on such that the limit (24) exists and is finite. Endowed with the topology generated by the family of seminorms , where becomes a Fréchet space. Note that sequential convergence in is equivalent to convergence in for all .(iii)The space consists of all those smooth functions on such that the limits exist and are finite. Endowed with the topology generated by the family of seminorms , where becomes a Fréchet space.
Our aim here is to find necessary and sufficient conditions on the basis function for the family of RBFNNs to have the universal approximation property. More precisely, the above mentioned result in  is extended to the Hankel setting in the following way. Given , a necessary and sufficient condition for to be dense in is nonmembership in the class of Müntz polynomials generated by . This is the content of Theorem 9 in Section 3. In Section 2 we introduce the concept and give a characterization of zero-supported -distributions (Theorem 5), which is used in the proof of Theorem 9 and might be interesting in its own right.
2. Zero-Supported Hankel Distributions
Definition 1. Suppose . We say that the support of is , in symbols , if for all with , equivalently, if for all such that , for some .
Proposition 2. Assume and satisfies , for some . Then .
Proof. Let . Since , necessarily Therefore, This yields the desired conclusion.
Recall that if, and only if, the restrictions of to every are continuous. By (4), this means that to each there corresponds and such that
Definition 3. If, in (33), one will do for all (not necessarily with the same ), then the smallest such is called the order of . Otherwise, is said to have infinite order.
Remark 4. Note that every with has finite order. Indeed, fix with , and choose such that . By Proposition 2, . Now (33) yields and satisfying On the other hand, the Leibniz formula gives such that Thus as asserted.
Theorem 5. Assume , , and has order . Then there are constants such that Here is the functional defined bywith (cf. (8)). Conversely, every distribution of this form has for its support, unless .
Proof. It is clear that . This establishes the converse.
To prove the nontrivial half of the theorem, consider a that satisfies Our objective is to prove that . Since given , there exists such that implies . The mean value theorem yields such that If , , an induction process then shows that, for some , Choose such that for some , and define Fix and . By the Leibniz formula, On the other hand [14, Equation 5.2(6)], for suitable , with . Consequently, Since has order , there is a constant such that And since , from Proposition 2 we infer The arbitrariness of shows that . Hence vanishes on the intersection of the null spaces of the functionals , and the desired representation follows from [31, Lemma 3.9].
3. Nonpolynomiality of the Activation Function
We begin by establishing two auxiliary results.
Lemma 7. Let . (i)The dilation operator , defined by , is continuous.(ii)The translation operator , defined by is continuous.(iii)If , then .
Proof. Given , it is apparent that ; indeed, the function is clearly continuous, and the limit exists and is finite. Similarly, therefore, for any ,Equation (53) proves continuity of and establishes (i).
To prove (ii), first of all we pick and show that is well defined. In fact, using (8) we may writeNext, we want to see that is continuous on . To this end, fix . Since given there exists such that and imply Therefore, for we obtain Since the function is continuous at , given there exists such that and imply . Moreover, if and with , then . Again by (8), for we thus have