Abstract

In the present study, we investigate a universality of neural networks, which concerns a density of the set of two-layer neural networks in function spaces. There are many works that handle the convergence over compact sets. In the present paper, we consider a global convergence by introducing a norm suitably, so that our results will be uniform over any compact set.

1. Introduction

Neural network is a function that models a neuron system of a biological brain and is defined as alternate compositions of an affine map and a nonlinear map. The nonlinear map in a neural network is called the activation function. The neural networks have been playing a central role in the field of machine learning with a vast number of applications in the real world in the last decade. We refer to [1] and [2] for example.

We focus on a two-layer feed-forward neural network with ReLU (rectified linear unit) activation, which is a function of the form of for some . Here, the function is called the rectified linear unit defined by

The ReLU is one of the most popular activation functions for feed-forward neural networks in practical machine learning tasks for real-world problems.

We consider the space of two-layer feedforward neural networks defined by the following linear space

Then, it is natural to ask ourselves whether spans a dense subspace of a function space (topological linear space), which is called universality of . Historically, the density property of in the space of continuous functions on is investigated by several authors ([35]) since it is important to guarantee the existence of a feed-forward neural network that well approximates an unknown continuous function. Here, the topology of is generated by the seminorms , where ranges over all compact sets in . Thus, the approximation property of two-layer feed-forward neural networks makes sense only on a local domain.

In this study, we prove an approximation property of in a global sense. More precisely, we prove the space is dense in the Banach subspace of defined as equipped with the norm

Note that any element in , divided by , is a continuous function over . Our main result in this paper is as follows:

Theorem 1. The linear subspace is dense in .

Our main results claim that any function is close to a linear function both at and at . Near the origin, approximates any continuous functions.

Before we conclude this section, we will offer some words on some existing results. See [6] for the -approximation over the real line. Other attempts have been made to grasp the neural network by the use of the Radon transform [7] or by considering some other topologies [5, 8].

2. Proof of the Main Theorem

Definition 2. We define a linear operator .

Lemma 3. The operator is an isomorphism from to .

A tacit understanding here is that we extend , which is initially defined over , continuously to .

Thus, any continuous functional on is realized by a Borel measure over .

Our theorem can recapture the case where the underlying domain is bounded. Indeed, if the domain is contained in for some , then, we have which will give results by Cybenko [3] and Funahashi [4].

Now we start the proof of Theorem 1. As Cybenko did in [3], take any measure over such that annihilates . We will show that . Once this is proved, from the Riesz representation theorem, we conclude that the only linear functional that vanishes on is zero. Using the Hahn-Banach theorem, we see that is dense in .

Remark that

Thus, any element in can be approximated by a function in the -norm. Since annihilates , it follows that is not supported on . Or equivalently, is supported on . It remains to show that . Consider

Remark that

Likewise, if we test the condition on we obtain .

Thus, we conclude that is dense in .

Remark 4. The set spans a dense subspace in , where and are functions given in the above proof.

Data Availability

Data sharing is not applicable to this article as no new data were created or analyzed in this study.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Authors’ Contributions

The four authors contributed equally to this paper. All of them read the whole manuscript and approved the content of the paper.

Acknowledgments

This work was supported by a JST CREST Grant (number JPMJCR1913, Japan). This work was also supported by the RIKEN Junior Research Associate Program. The second author was supported by a Grant-in-Aid for Young Scientists Research (no. 19K14581), Japan Society for the Promotion of Science. The fourth author was supported by a Grant-in-Aid for Scientific Research (C) (19K03546), Japan Society for the Promotion of Science.