Table of Contents Author Guidelines Submit a Manuscript
Journal of Applied Mathematics
VolumeΒ 2012Β (2012), Article IDΒ 636078, 8 pages
Research Article

Estimation of Approximating Rate for Neural Network in 𝐿𝑝𝑀 Spaces

1School of Mathematics & Statistics, Southwest University, Chongqing 400715, China
2Department of Mechanical Engineering, Taipei Chengshih University of Science and Technology, No.2 Xue-Yuan Rd., Beitou, Taipei 112, Taiwan

Received 13 February 2012; Accepted 27 March 2012

Academic Editor: Juan ManuelΒ PeΓ±a

Copyright Β© 2012 Jian-Jun Wang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


A class of Soblove type multivariate function is approximated by feedforward network with one hidden layer of sigmoidal units and a linear output. By adopting a set of orthogonal polynomial basis and under certain assumptions for the governing activation functions of the neural network, the upper bound on the degree of approximation can be obtained for the class of Soblove functions. The results obtained are helpful in understanding the approximation capability and topology construction of the sigmoidal neural networks.

1. Introduction

Artificial neural networks have been extensively applied in various fields of science and engineering. Why is so mainly because the feedforward neural networks (FNNs) have the universal approximation capability [1–13]. A typical example of such universal approximation assertions states that, for any given continuous function defined on a compact set K of ℛ𝑑, there exists a three-layer of FNN so that it can approximate the function arbitrarily well. A three-layer of FNN with one hidden layer, 𝑑 inputs and one output can be mathematically expressed as𝒩(π‘₯)=π‘šξ“π‘–=1π‘π‘–πœŽξƒ©π‘‘ξ“π‘—=1𝑀𝑖𝑗π‘₯𝑗+πœƒπ‘–ξƒͺ,π‘₯βˆˆβ„›π‘‘,𝑑β‰₯1,(1.1) where 1β‰€π‘–β‰€π‘š,β€‰β€‰πœƒπ‘–βˆˆβ„› are the thresholds, 𝑀𝑖=(𝑀𝑖1,𝑀𝑖2,…,𝑀𝑖𝑑)π‘‡βˆˆβ„›π‘‘ are connection weights of neuron 𝑖 in the hidden layer with the input neurons, π‘π‘–βˆˆβ„› are the connection strength of neuron 𝑖 with the output neuron, and 𝜎 is the activation function used in the network. The activation function is normally taken as sigmoid type; that is, it satisfies 𝜎(𝑑)β†’1 as 𝑑→+∞ and 𝜎(𝑑)β†’0 as π‘‘β†’βˆ’βˆž. Equation (1.1) can be further expressed in vector form as𝒩(π‘₯)=π‘šξ“π‘–=1π‘π‘–πœŽξ€·w𝑖⋅x+πœƒπ‘–ξ€Έ,π‘₯βˆˆπ‘…π‘‘.(1.2)

Universal approximation capabilities for a broad range of neural network topologies have been established by researchers like Cybenko [1], Ito [5], and T. P. Chen and H. Chen [6]. Their work concentrated on the question of denseness. But from the point of application, we are concerned about the degree of approximation by neural networks.

For any approximation problem, the establishment of performance bounds is an inevitable but very difficult issues. As we know, feedforward neural networks (FNNS) have been shown to be capable of approximating general class of functions, including continuous and integrable ones. Recently, several researchers have been derived approximation error bounds for various functional classes (see, e.g., [7–13]) approximated by neural networks. While many open issues remain concerning approximation degree, we stress in this paper on the issue of approximation of functions defined over [βˆ’1,1]𝑑 by FNNS. In [10], the researcher took some basics tools from the theory of weighted polynomial of functions (The weight function is πœ”(π‘₯)=exp(βˆ’π‘„(π‘₯))), under certain assumptions on the smoothness of functions being approximated and on the activation functions in the neural network, the authors present upper bounds on the degree of approximation achieved over the domain 𝑅𝑑.

In this paper, using the Chebyshev Orthogonal series from the approximation theory and moduli of continuity, we obtain upper bounds on the degree of approximation in [βˆ’1,1]𝑑. We take advantage of the properties of the Chebyshev polynomial and the methods of paper [10], we yield the desired results, which can be easily extended to the space 𝑅𝑑.

2. Multivariate Chebyshev Polynomial Approximation

Before introducing the main results, we firstly introduce some basic results on Chebyshev polynomials from the approximation theory. For convenience, we introduce a weighted norm of a function 𝑓 [14] given by‖𝑓‖𝑝,πœ”=ξ‚΅ξ€œ[]βˆ’1,1π‘‘πœ”(π‘₯)|𝑓(π‘₯)|𝑝𝑑π‘₯1/𝑝,(2.1) where 1≀𝑝<∞, βˆπœ”(π‘₯)=𝑑𝑖=1πœ”(π‘₯𝑖) is multivariate weighted function, πœ”(π‘₯𝑖)=(1βˆ’π‘₯2𝑖)βˆ’1/2, π‘₯=(π‘₯1,π‘₯2,…,π‘₯𝑑)βˆˆπ‘…π‘‘, π‘š=(π‘š1,π‘š2,…,π‘šπ‘‘)βˆˆπ‘π‘‘, 𝑑π‘₯=𝑑π‘₯1𝑑π‘₯2…𝑑π‘₯𝑑. We denote the class of functions for which ‖𝑓‖𝑝,πœ” is finite by 𝐿𝑝,πœ”.

For function π‘“βˆΆπ‘…π‘‘β†’π‘…, the class of functions we wish to approximate in this work is defined as follows:Ξ¨π‘Ÿ,𝑑𝑝,πœ”=ξ‚†β€–β€–π‘“π‘“βˆΆ(πœ†)‖‖𝑝,πœ”||πœ†||,≀𝑀,β‰€π‘Ÿ(2.2) where πœ†=(πœ†1,πœ†2,…,πœ†π‘‘), |πœ†|=πœ†1+πœ†2+β‹―+πœ†π‘‘, 𝑓(πœ†)=πœ•|πœ†|𝑓/πœ•π‘₯1πœ†1β€¦πœ•π‘₯π‘‘πœ†π‘‘, π‘Ÿ is a natural number, and 𝑀<∞.

2.1. A Chebyshev Polynomial Approximation of Multivariate Functions

As we know, Chebyshev polynomial of a single real variable is a very important polynomial in approximation theory. Using the above notation, we introduce multivariate Chebyshev polynomials: 𝑇0√(π‘₯)=1/πœ‹, π‘‡π‘›βˆ(π‘₯)=𝑑𝑖=1𝑇𝑛𝑖(π‘₯𝑖), 𝑇𝑛𝑖(π‘₯π‘–βˆš)=2/πœ‹cos(𝑛𝑖arccosπ‘₯𝑖). Evidently, for any π‘š, π‘™βˆˆπ‘π‘‘, we haveξ€œ[]βˆ’1,1π‘‘π‘‡π‘š(π‘₯)𝑇𝑙(π‘₯)πœ”(π‘₯)𝑑π‘₯=1π‘š=𝑙,0π‘šβ‰ π‘™.(2.3)

For π‘“βˆˆπΏπ‘,πœ”, π‘šβˆˆπ‘π‘˜, let ξβˆ«π‘“(π‘š)=[βˆ’1,1]𝑑𝑓(π‘₯)π‘‡π‘š(π‘₯)πœ”(π‘₯)𝑑π‘₯, then we have the orthogonal expansion βˆ‘π‘“(π‘₯)βˆΌβˆžπ‘š=0𝑓(π‘š)π‘‡π‘š(π‘₯),π‘₯∈[βˆ’1,1]𝑑.

For one-dimension degree of approximation of a function 𝑔 by polynomials of degree π‘š, one has the following:πΈπ‘šξ€·π‘”,π‘ƒπ‘š,𝐿𝑝,πœ”ξ€Έ=infπ‘ƒβˆˆπ‘ƒπ‘šβ€–π‘”βˆ’π‘ƒβ€–π‘,πœ”,(2.4) where π‘ƒπ‘š stands for the class of degree-m algebraic polynomials. From [15], we have a simple relationship which we will be used in the following. Let 𝑔 be differentiable, then we have𝐸𝑔,π‘ƒπ‘š,𝐿𝑝,πœ”ξ€Έβ‰€π‘€1π‘šβˆ’1πΈξ€·π‘”ξ…ž,π‘ƒπ‘š,𝐿𝑝,πœ”ξ€Έ,𝐸𝑔,π‘ƒπ‘š,𝐿𝑝,πœ”ξ€Έβ‰€β€–π‘”β€–π‘,πœ”.(2.5)

Let π‘†π‘›βˆ‘(𝑓,𝑑)=π‘›βˆ’1π‘˜=1𝑓(π‘˜)π‘‡π‘˜(π‘₯), and the de la Valle Poussin Operators is defined, that is,𝑉𝑛1(𝑓,𝑑)=𝑛+1𝑛+1ξ“π‘š=𝑛+3/2𝑆𝑛(𝑓,𝑑).(2.6)

Furthermore, we can simplify 𝑉𝑛(𝑓,𝑑) as follows:𝑉𝑛(𝑓,𝑑)=π‘›ξ“π‘˜=1πœ‰π‘˜ξπ‘“(π‘˜)π‘‡π‘˜(𝑑),(2.7) whereπœ‰π‘˜=⎧βŽͺ⎨βŽͺβŽ©π‘šβˆ’12,(π‘š+1)if0β‰€π‘˜β‰€π‘š+32,π‘šβˆ’π‘˜,π‘š+1ifπ‘š+32β‰€π‘˜β‰€π‘š.(2.8)

A basic result concerning Valle Poussin Operators π‘‰π‘š(𝑓,𝑑) is𝐸2π‘šξ€·π‘“,𝑃2π‘š,𝐿𝑝,πœ”ξ€Έβ‰€β€–β€–π‘“βˆ’π‘‰π‘šπ‘“β€–β€–π‘,πœ”β‰€πΈπ‘šξ€·π‘“,π‘ƒπ‘š,𝐿𝑝,πœ”ξ€Έ.(2.9)

Now we consider a class of multivariate polynomials defined as follows:π‘ƒπ‘š=ξƒ―ξ“π‘ƒβˆΆπ‘ƒ(π‘₯)=0≀|𝑖|≀|π‘š|𝑏𝑖1,𝑖2,…,𝑖𝑑π‘₯𝑖11β‹―π‘₯𝑖𝑑𝑑,𝑏𝑖1,𝑖2,…,π‘–π‘‘βˆˆπ‘…,βˆ€π‘–1,…,𝑖𝑑.(2.10)

Hence, we have the following theorem.

Theorem 2.1. For 1β‰€π‘β‰€βˆž, let π‘“βˆˆΞ¨π‘Ÿ,𝑑𝑝,πœ”. Then for any π‘š=(π‘š1,π‘š2,…,π‘šπ‘‘), π‘šπ‘–β‰€π‘š, we have infπ‘ƒβˆˆπ‘ƒπ‘šβ€–π‘“βˆ’π‘ƒβ€–π‘,πœ”β‰€πΆπ‘šβˆ’π‘Ÿ.(2.11)

Proof. We consider the Chebyshev orthogonal polynomials π‘‡π‘š(π‘₯), and obtain the following equality from (2.7): 𝑉𝑖,π‘šπ‘–,(𝑓)=π‘šπ‘–ξ“π‘ =1πœ‰π‘ ξπ‘“π‘ ,𝑖𝑇𝑠π‘₯𝑖,(2.12) where 𝑓𝑠,𝑖=∫[βˆ’1,1]𝑑𝑓(π‘₯)𝑇𝑠(π‘₯𝑖)πœ”(π‘₯𝑖)𝑑π‘₯𝑖. Hence, we define the following operators: 𝑉(𝑓)=𝑉1,π‘š1𝑉2,π‘š2⋯𝑉𝑑,π‘šπ‘‘π‘“=π‘š1𝑠1=1β‹―π‘šπ‘‘ξ“π‘ π‘‘=1πœ‰π‘ 1β‹―πœ‰π‘ π‘‘π‘“π‘ 1,…,𝑠𝑑𝑇𝑠1ξ€·π‘₯1⋯𝑇𝑠𝑑π‘₯𝑑,(2.13) where 𝑓𝑠1,…,𝑠𝑑=∫[βˆ’1,1]𝑑(βˆπ‘‘π‘–=1πœ”(π‘₯𝑖)𝑇𝑠𝑖(π‘₯𝑖))𝑓(π‘₯)𝑑π‘₯. Then we have β€–π‘“βˆ’π‘‰(𝑓)‖𝑝,πœ”=β€–β€–π‘“βˆ’π‘‰1,π‘š1(𝑓)+𝑉1,π‘š1(𝑓)βˆ’π‘‰1,π‘š1𝑉2,π‘š2(𝑓)+𝑉1,π‘š1𝑉2,π‘š2β€–β€–(𝑓)βˆ’β‹―βˆ’π‘‰(𝑓)𝑝,πœ”β‰€π‘‘ξ“π‘–=1‖‖𝑉0β‹―π‘‰π‘–βˆ’1,π‘šπ‘–βˆ’1π‘“βˆ’π‘‰0⋯𝑉𝑖,π‘šπ‘–π‘“β€–β€–π‘,πœ”,(2.14) where 𝑉0 is the identity operator. Let 𝑔=𝑉0β‹―π‘‰π‘–βˆ’1,π‘šπ‘–βˆ’1𝑓, then 𝑉𝑖,π‘šπ‘–π‘”=𝑉0⋯𝑉𝑖,π‘šπ‘–π‘“, π‘”π‘Ÿπ‘–(π‘₯)=𝑉0β‹―π‘‰π‘–βˆ’1,π‘šπ‘–βˆ’1π·π‘Ÿπ‘–π‘“(π‘₯). We view 𝑉𝑖,π‘šπ‘–π‘” as a one-dimensional function π‘₯𝑖. Using (2.4), (2.5), and (2.6), we have β€–β€–π‘”βˆ’π‘‰π‘–,π‘šπ‘–π‘”β€–β€–π‘,πœ”β‰€πΆ1πΈπ‘šπ‘–ξ€·π‘”,π‘ƒπ‘šπ‘–,𝐿𝑝,πœ”ξ€Έβ‰€πΆ1π‘€π‘Ÿπ‘–1ξ‚΅1π‘šπ‘–ξ‚Άβ‹―ξ‚΅1π‘šπ‘–βˆ’π‘Ÿπ‘–ξ‚ΆπΈ+1π‘šπ‘–βˆ’π‘Ÿπ‘–ξ€·π‘”π‘Ÿπ‘–,π‘ƒπ‘šπ‘–βˆ’π‘Ÿπ‘–,𝐿𝑝,πœ”ξ€Έβ‰€πΆ1π‘€π‘Ÿπ‘–1ξ‚΅1π‘šπ‘–ξ‚Άβ‹―ξ‚΅1π‘šπ‘–βˆ’π‘Ÿπ‘–ξ‚Ά+1β€–π‘”π‘Ÿπ‘–β€–π‘,πœ”=πΆπ‘Ÿπ‘–ξ‚΅1π‘šπ‘–ξ‚Άβ‹―ξ‚΅1π‘šπ‘–βˆ’π‘Ÿπ‘–ξ‚Άβ€–β€–π‘‰+10β‹―π‘‰π‘–βˆ’1,π‘šπ‘–βˆ’1π·π‘Ÿπ‘–π‘“β€–β€–π‘,πœ”.(2.15) Letting π‘Ÿπ‘–=π‘Ÿ,β€‰β€‰π‘šπ‘–=π‘š,  𝑖=1,…,𝑑, if π‘š>π‘Ÿ(π‘Ÿβˆ’1), we get from (2.15), (2.13), (2.14) and the inequality βˆπ‘›π‘–=1(1+π‘Žπ‘–βˆ‘)β‰₯1+𝑛𝑖=1π‘Žπ‘–,(π‘Žπ‘–β‰₯βˆ’1), β€–π‘“βˆ’π‘‰(𝑓)‖𝑝,πœ”β‰€πΆπ‘Ÿπ‘‘ξ“π‘–=1ξ‚€1π‘šξ‚β‹―ξ‚€1ξ‚π‘šβˆ’π‘Ÿ+1β€–π·π‘Ÿπ‘“β€–π‘,πœ”β‰€πΆπ‘Ÿξ‚€1π‘‘π‘€π‘šξ‚β‹―ξ‚€1ξ‚π‘šβˆ’π‘Ÿ+1=πΆπ‘Ÿπ‘‘π‘€π‘šβˆ’π‘Ÿξ‚€11βˆ’π‘šξ‚βˆ’1ξ‚€21βˆ’π‘šξ‚βˆ’1β‹―ξ‚€1βˆ’π‘Ÿβˆ’1π‘šξ‚β‰€πΆπ‘Ÿπ‘‘π‘€π‘šβˆ’π‘Ÿξ‚΅1βˆ’π‘Ÿ(π‘Ÿβˆ’1)ξ‚Ά2π‘šβˆ’1≀2π‘‘πΆπ‘Ÿπ‘€π‘šβˆ’π‘Ÿ.(2.16) In order to obtain a bound valid for all π‘š, for π‘šβ‰€π‘Ÿ(π‘Ÿβˆ’1), we always have the trivial bound β€–π‘“βˆ’π‘‰(𝑓)‖𝑝,πœ”β‰€π‘€2 since ‖𝑓‖𝑝,πœ”β‰€π‘€2. Letting 𝐢=max{2π‘‘πΆπ‘Ÿπ‘‘π‘€,2βˆ’1𝑀2(π‘Ÿ(π‘Ÿβˆ’1))π‘Ÿ}, we conclude an inequality of the desired type for every π‘š.

This theorem reveals two things: (i) for any multivariate functions π‘“βˆˆΞ¨π‘Ÿ,𝑑𝑝,πœ”, there is a polynomial π‘ƒβˆˆπ‘ƒπ‘š that approximates 𝑓 arbitrarily well in πΏπ‘πœ”, (ii) quantitatively, the approximation accuracy of a polynomial π‘ƒβˆˆπ‘ƒπ‘š can attain the order of β—‹(π‘šβˆ’π‘Ÿ), where π‘š is the dimension of multivariate polynomial, and π‘Ÿ is the smoothness of the function to be approximated.

3. Approximation by Feedforward Neural Networks

We consider the approximation of functions by feedforward neural networks with a ridge functions. We define the approximating function class composed of a single hidden layer feedforward neural network with 𝑛 hidden units. The class of function is𝐹𝑛=ξƒ―π‘“βˆΆπ‘“(π‘₯)=π‘›ξ“π‘˜=1π‘‘π‘˜πœ™ξ€·π‘Žπ‘˜β‹…π‘₯+π‘π‘˜ξ€Έ;π‘Žπ‘˜βˆˆπ‘…π‘‘,π‘π‘˜,π‘‘π‘˜ξƒ°,βˆˆπ‘…,π‘˜=1,2,…,𝑛(3.1) where πœ™(π‘₯) satisfy the following assumptions.(1)There is a constant πΆπœ™ such that |πœ™(π‘˜)(π‘₯)|β‰₯πΆπœ™>0,β€‰β€‰π‘˜=0,1,…(2)For each finite π‘˜, there is a finite constant π‘™π‘˜ such that  |πœ™(π‘˜)(π‘₯)|β‰€π‘™π‘˜.

We define the distance from 𝐹 to 𝐺 asξ€·dist𝐹,𝐺,𝐿𝑝,πœ”ξ€Έ=supπ‘“βˆˆπΉinfπ‘”βˆˆπΊβ€–π‘“βˆ’π‘”β€–π‘,πœ”,(3.2) where 𝐹, 𝐺 are two sets in πΏπ‘πœ”. We have the following results.

Theorem 3.1. Let condition (1) and (2) hold for the activation function πœ™(π‘₯). Then for every 0<𝐿<∞, π‘š=(π‘š1,π‘š2,…,π‘šπ‘‘)βˆˆπ‘π‘‘+, π‘šπ‘–β‰€π‘š,πœ–>0 and 𝑛>(π‘š+1)𝑑, we have ξ€·distπ΅π‘ƒπ‘š(𝐿),𝐹𝑛,𝐿𝑝,πœ”ξ€Έβ‰€πœ–,(3.3) where π΅π‘ƒπ‘šξƒ―ξ“(𝐿)=π‘ƒβˆΆπ‘(π‘₯)=0β‰€π‘ β‰€π‘šπ‘Žπ‘ π‘₯𝑠;max0β‰€π‘ β‰€π‘š||π‘Žπ‘ ||ξƒ°.≀𝐿(3.4)

Proof. Firstly, we consider the partial derivative πœ™(𝑠)πœ•(𝑀⋅π‘₯+𝑏)=(|𝑠|)πœ•π‘ 1𝑀1β‹―πœ•π‘ π‘‘π‘€π‘‘(πœ™(𝑀⋅π‘₯+𝑏))=π±π¬πœ™|𝑠|(𝑀⋅π‘₯+𝑏),(3.5) where |𝑠|=𝑠1+β‹―+𝑠𝑑, and x𝐬=βˆπ‘‘π‘–=1π‘₯𝑠𝑖𝑖. Thus πœ™(𝑠)(𝑏)=xsπœ™|𝑠|(𝑏).
For any fixed 𝑏 and |π‘₯|<∞ (here βˆ‘|π‘₯|=𝑑𝑖=1π‘₯𝑖), we consider a finite difference of orders β–΅π‘ β„Ž,π‘₯ξ“πœ™(𝑏)=0≀𝑙≀𝑠(βˆ’1)|𝑙|πΆπ‘™π‘ πœ™(β„Žπ‘™β‹…π±+𝑏)=π±π‘ ξ€œβ„Ž0β‹―ξ€œβ„Ž0πœ™(|𝑠|)π‘Žξ€Ίξ€·ξ€·1+β‹―+π‘Žπ‘ 1ξ€Έπ‘₯1+ξ€·π‘Ž+β‹―|𝑠|βˆ’π‘ π‘‘+1+β‹―+π‘Ž|𝑠|ξ€Έπ‘₯𝑑+π‘π‘‘π‘Ž1β‹―π‘‘π‘Ž|𝑠|β‰π±π‘ π΄β„Ž|𝑠|πœ™(𝐱),(3.6) where 𝐢𝑙𝑠=βˆπ‘‘π‘–=1𝐢𝑙𝐒𝑠𝐒,β€‰β€‰β–΅π‘ β„Ž,π‘₯πœ™(𝑏)βˆˆπΉπ‘› with βˆπ‘›=𝑑𝑖=1(1+𝑠𝑖), So ||πœ™π‘ (𝑏)βˆ’β„Žβˆ’|𝑠|β–΅π‘ β„Ž,π‘₯||=||πœ™(𝑏)xπ‘ ξ€·πœ™|𝑠|(𝑏)βˆ’β„Žβˆ’|𝑠|π΄π‘ β„Žπœ™|𝑠|(x)ξ€Έ||=||xπ‘ ξ€·πœ™|𝑠|(𝑏)βˆ’πœ™|𝑠|ξ€Έ||(𝑏+πœ‚)β‰€πΆπ‘ πœ”ξ€·πœ™|𝑠|ξ€Έ,,β„Ž(3.7) where we derive (3.7) by using (3.6), the mean value theorem of integral, (i.e., there is a πœ‚βˆˆ[0,β„Ž|𝑠⋅𝐱|], such that π΄π‘ β„Žπœ™|𝑠|(𝐱)=β„Ž|𝑠|πœ™|𝑠|(𝐛+πœ‚)) and the moduli of continuity πœ”(𝑔,β„Ž)=sup|𝑑|β‰€β„Ž|𝑓(π‘₯+𝑑)βˆ’π‘“(π‘₯)|.
From the definition of  dist(𝐹,𝐺,𝐿𝑝,πœ”) and (3.7), we have ξ€·distπ΅π‘ƒπ‘š(𝐿),𝐹𝑛,𝐿𝑝,πœ”ξ€Έπ‘β‰€β€–β€–β€–β€–ξ“0β‰€π‘ β‰€π‘šπ‘Žπ‘ π‘₯π‘ βˆ’ξ“0β‰€π‘ β‰€π‘šπ‘Žπ‘ β–΅π‘ β„Ž,π‘₯πœ™(𝑏)β„Žπ‘ πœ™|𝑠|β€–β€–β€–β€–(𝑏)𝑝𝑝,πœ”β‰€(π‘š+1)𝑑max0β‰€π‘ β‰€π‘šξƒ―||π‘Žπ‘ ||β€–β€–β€–π‘₯π‘ βˆ’β–΅π‘ β„Ž,π‘₯πœ™(𝑏)β„Žπ‘ πœ™|𝑠|β€–β€–β€–(𝑏)𝑝𝑝,πœ”ξƒ°β‰€(π‘š+1)𝑑𝐿max0β‰€π‘ β‰€π‘šξ€½πœ™|𝑠|ξ€Ύ(𝑏)βˆ’π‘πœ”ξ€·πœ™|𝑠|ξ€Έ,β„Žβ‰€(π‘š+1)π‘‘πΏπΆπ‘πœ™πœ”ξ€·πœ™|𝑠|ξ€Έ,β„Ž<πœ–.(3.8) The last step πœ”(πœ™|𝑠|,β„Ž) can be made arbitrarily small by letting β„Žβ†’0.
Using the Theorems 2.1 and 3.1, we can easily establish our final result.

Theorem 3.2. For 1β‰€π‘β‰€βˆž, we have ξ€·Ξ¨distπ‘Ÿ,𝑑𝑝,πœ”,𝐹𝑛,𝐿𝑝,πœ”ξ€Έβ‰€πΆπ‘›βˆ’π‘Ÿ/𝑑.(3.9)

This theorem reveals two things: (i) for any multivariate functions π‘“βˆˆΞ¨π‘Ÿ,𝑑𝑝,πœ”, there is a single hidden layer feedforward neural network π‘βˆˆπΉπ‘› with 𝑛 hidden units that approximates 𝑓 arbitrarily well in πΏπ‘πœ”. That is, the feedforward neural networks can be used as the universal approximator of functions in Ξ¨π‘Ÿ,𝑑𝑝,πœ”; (ii) quantitatively, the approximation accuracy of a mixture network of the form (3.1) can attain the order of β—‹(π‘›βˆ’π‘Ÿ/𝑑), where 𝑑 is the dimension of input space, and π‘Ÿ is the smoothness of the function to be approximated.

4. Conclusion

In this work, the approximation order of feedforward neural networks with the form (3.1) has been studied. In terms of smoothness of a function, an upper bound estimation on approximation precision and speed of the neural networks is developed. Our research reveals that the approximation precision and speed of the neural networks depend not only on the number of hidden neurons used, but also on the smoothness of the functions to be approximated. The results obtained are helpful in understanding the approximation capability and topology construction of the sigmoidal neural networks.


This research is supported by Natural Science Foundation of China (no. 11001227), Natural Science Foundation Project of CQ CSTC (no. CSTC,2009BB2306), and the Fundamental Research Funds for the Central Universities (no. XDJK2010B005).


  1. G. Cybenko, β€œApproximation by superpositions of a sigmoidal function,” Mathematics of Control, Signals, and Systems, vol. 2, no. 4, pp. 303–314, 1989. View at Publisher Β· View at Google Scholar
  2. J. J. Wang, Z. B. Xu, and W. J. Xu, Approximation Bounds by Neural Networks in L(w, p), vol. 3173 of Lecture Notes in Computer Science, Springer, Berlin, Germany, 2004.
  3. J. J. Wang, B. Chen, and C. Yang, β€œApproximation of algebraic and trigonometric polynomials by feedforward neural networks,” Neural Computing and Applications, vol. 21, no. 1, pp. 73–80, 2011. View at Publisher Β· View at Google Scholar
  4. J. J. Wang and Z. B. Xu, β€œNew study of neural networks: the essential order of approximation,” Neural Networks, vol. 23, pp. 618–624, 2010. View at Google Scholar
  5. Y. Ito, β€œApproximation of continuous functions on Rd by linear combination of shifted rotations of sigmoid function with and without scaling,” Neural Networks, vol. 5, no. 1, pp. 105–115, 1992. View at Publisher Β· View at Google Scholar
  6. T. P. Chen and H. Chen, β€œApproximation capability to functions of several variables, nonlinear functions, and operators by radial function neural networks,” IEEE Transactions on Neural Networks, vol. 6, pp. 904–910, 1995. View at Google Scholar
  7. A. R. Barron, β€œUniversal approximation bounds for superpositions of a sigmoidal function,” IEEE Transactions on Information Theory, vol. 39, no. 3, pp. 930–945, 1993. View at Publisher Β· View at Google Scholar
  8. M. Leshno, V. Ya. Lin, A. Pinkus, and S. Schocken, β€œMultilayer feedforward networks with a non-polynomial activation function and approximate any function,” Neural Networks, vol. 6, no. 6, pp. 861–867, 1993. View at Publisher Β· View at Google Scholar
  9. H. N. Mhaskar, β€œNeural networks for optimal approximation for smooth and analytic functions,” Neural Computation, vol. 8, pp. 164–177, 1996. View at Google Scholar
  10. V. Maiorov and R. S. Meir, β€œApproximation bounds for smooth functions in C(Rd)by neural and mixture networks,” IEEE Transactions on Neural Networks, vol. 3, pp. 969–978, 1998. View at Google Scholar
  11. M. Burger and A. Neubauer, β€œError bounds for approximation with neural networks,” Journal of Approximation Theory, vol. 112, no. 2, pp. 235–250, 2001. View at Publisher Β· View at Google Scholar
  12. V. Kurkova and M. Sanguineti, β€œComparison of worst case errors in linear and neural network approximation,” IEEE Transactions on Information Theory, vol. 48, no. 1, pp. 264–275, 2002. View at Publisher Β· View at Google Scholar
  13. J. L. Wang, B. H. Sheng, and S. P. Zhou, β€œOn approximation by non-periodic neural and translation networks in Lwp spaces,” ACTA Mathematica Sinica, vol. 46, pp. 65–74, 2003 (Chinese). View at Google Scholar
  14. J. J. Wang, C. Yang, and J. Jing, β€œApproximation order for multivariate Durrmeyer operators with Jacobi weights,” Abstract and Applied Analysis, vol. 2011, Article ID 970659, 12 pages, 2011. View at Publisher Β· View at Google Scholar
  15. A. F. Timan, Theory of Approximation of Functions of a Real Variable, Macmillan, New York, NY, USA, 1963.