Complexity

Volume 2018, Article ID 1947250, 8 pages

https://doi.org/10.1155/2018/1947250

## The Spiral Discovery Network as an Automated General-Purpose Optimization Tool

Department of Informatics, Széchenyi István University, Győr, Hungary

Correspondence should be addressed to Adam B. Csapo; uh.ezs@mada.opasc

Received 29 September 2017; Accepted 22 January 2018; Published 12 March 2018

Academic Editor: Kevin Wong

Copyright © 2018 Adam B. Csapo. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

The Spiral Discovery Method (SDM) was originally proposed as a cognitive artifact for dealing with black-box models that are dependent on multiple inputs with nonlinear and/or multiplicative interaction effects. Besides directly helping to identify functional patterns in such systems, SDM also simplifies their control through its characteristic spiral structure. In this paper, a neural network-based formulation of SDM is proposed together with a set of automatic update rules that makes it suitable for both semiautomated and automated forms of optimization. The behavior of the generalized SDM model, referred to as the Spiral Discovery Network (SDN), and its applicability to nondifferentiable nonconvex optimization problems are elucidated through simulation. Based on the simulation, the case is made that its applicability would be worth investigating in all areas where the default approach of gradient-based backpropagation is used today.

#### 1. Introduction

The question of how to gain an understanding of the operation of a system arises naturally in a wide range of application areas. However, this question is not always easy to answer, in part because different use cases favor different approaches. While a set of closed formulae might be useful when it comes to predicting exactly how the system will operate under specific conditions, they may be difficult to formulate when the conditions themselves and/or their effects are hard to characterize. In such cases, black-box identification and heuristic modelling approaches are often used.

The neural network presented in this paper, referred to as the Spiral Discovery Network (SDN), is a generalized version of the Spiral Discovery Method, which is a semiautomated cognitive artifact [1, 2]. SDM originally served the purpose of helping users to discover systematic relationships between multiple inputs to a system and the system’s output behavior, even when the inputs have nonlinear effects and multiplicative cross-effects on the output. The goal in extending the SDM model is to extend its applicability to automated settings in which neural networks (or other parametric black-box models) tune their behavior based on a set of functional constraints, such as requirements on the structure of their output or other external error feedback signals.

Through the formulation proposed in this paper, it turns out that SDM is applicable whenever a data-driven approach is available to the identification of a system and whenever the effects of various changes in its inputs can be evaluated in a reasonable amount of time. When the evaluations are performed by humans, SDM shows motivations and characteristics similar to those of the paradigm of interactive evolutionary computation [3, 4]; however, it shows differences in terms of the logic through which it helps to discover parametric spaces. Its extended version, SDN, is also more generally applicable by allowing for automated evaluations. As discussed in the conclusions of the paper, SDN is noteworthy in that it does not rely on gradient information, a feature that can be seen to reduce the complexity of the required computations, as well as being potentially helpful in cases where the performance of gradient-based solutions is far from optimal (for a detailed discussion on such cases, the reader is referred to [5]).

The paper is structured as follows. Section 2 provides a short overview of the literature on nonconvex optimization in order to position the relevance of this work with respect to earlier results. Section 3 then briefly reviews the background of the original Spiral Discovery Method (SDM). Section 4 introduces the tensor-algebra based numerical structures behind the original SDM formulation. In Section 5, the neural network-based Spiral Discovery Network (SDN) is introduced. A simulation example is provided in Section 6 in order to demonstrate the viability of the model in handling nonconvex and nondifferentiable optimization problems. Finally, Section 7 concludes the paper.

#### 2. Historical Overview

Nonconvex optimization is a broad field of mathematics that finds many applications in engineering tasks where the goal is to find sufficiently good solutions on high-dimensional parametric manifolds. One of the most relevant examples today is finding useful architectures for (deep) neural networks or other kinds of graphical models, as well as finding the right set of parameters with which to operate them. The common approach in solving such problems is to iteratively refine a candidate solution in a way that incrementally improves upon it in terms of a globally defined loss function: this is known as gradient descent [6].

The general idea of gradient descent can be highly successful on parametric landscapes that are associated with a clearly defined cost function and contain no more than a small number of local minima in terms of that function. However, as soon as the value of a cost function becomes difficult to interpret or the cost function becomes so intractable that it is computationally difficult to determine its gradients and/or it produces an intractably large number of local minima, the naive solution of gradient-based iterative optimization often starts to break down.

The problem of dealing with local minima can be addressed to some degree by finding good trade-offs between exploration and exploitation, that is, by modifying the gradient descent approach slightly to counteract situations where the optimization process might slow down or stop. This approach is reflected in a host of existing solutions. One fruitful idea was to experiment with the scaling factor of the gradient, for example, by making it adaptive to changes in sign via the concept of “momentum” [7–9] or by making it specific to the different dimensions in the parameter space [10, 11]. Other ideas include the normalization of inputs across layers and batches (specifically in training neural network models) [12] or by simply adding noise to the gradients [13].

The above solutions notwithstanding, the general idea of modifying a candidate solution in the direction of the negative gradient of a loss function has largely remained unchallenged. Only recently have the remarks of G. Hinton and other highly regarded researchers become widely publicized, which suggest that gradient descent, at least based on backpropagation, may prove not to be the ultimate solution for training neural networks (see, e.g., the article entitled “Why We Should Be Deeply Suspicious of BackPropagation” by C. E. Perez on https://medium.com/intuitionmachine/the-deeply-suspicious-nature-of-backpropagation-9bed5e2b085e).

In this paper, the earlier idea of the Spiral Discovery Method is extended to the domain of automatic training in neural networks through a neural architecture. Instead of relying on gradients to update its search location, the method follows a hierarchical hyperspiral structure within the parametric space, thus gaining insight into search directions that may be fruitful.

#### 3. Original Problem Formulation Behind SDM

In this section, we consider a generic formulation of the class of problems to which the original Spiral Discovery Method (SDM) can be applied. To this end, we will make use of the following concepts and notations:(i)A vector of* generation parameters *(ii)A perceptually accessible* output *(iii)A* system transfer function *, which evaluates generation parameter vectors to produce perceptually accessible outputs(iv)An* evaluation function *, which associates perceptually accessible outputs with a real number referred to as the* perceptual value* of a given output(v)A set referred to as the* data set*, which contains tuples of generation parameter vectors and perceptual values.

In the original problem formulation, the goal is to find a set of generation parameter vectors that are suitable for the generation of a controlled set of outputs, controlled, that is, from the perspective of the perceptually driven evaluation function. Most often, the problem would present itself in such a form that a user is given a perceptual value, , and the goal is to find a generation parameter vector, , suitable for the generation of an output that yields as its perceptual value. In general, solving this problem amounts to more than just inverting the system transfer function (if such an inversion were even possible to begin with), as the relationship between system output and its perception value, which is usually much too complex to be formulated analytically, also must be taken into account.

Application areas in which the above formulation is of interest include the following:(i)Tuning a set of parameters to a uni- or multimodal synthesis algorithm for perceptual continuity: for example, in a virtual reality with object-to-sound and object-to-vibration mappings, given a set of parameters used to generate audio signals and vibration patterns for spherical and block-like objects, the goal might be to find an appropriate set of generation parameters for certain kinds of polyhedra, conceptually situated “somewhere between” spheres and blocks.(ii)Controlling inputs to complex black-box models based on derived quantifications of success: for example, inputs to a multispeaker system or a distributed heating system in a large auditorium might be fine-tuned in order to accommodate extrinsic requirements of comfort and cost-effectiveness.

The overall characteristic of the problem formulation is that it encompasses problems where a set of parameters can be used to control a model, usually a black-box model, whose functionality can best be evaluated indirectly through effects that are not well understood, for example, perceptual effects, qualitative measures such as comfort, or aggregated measures such as cost-effectiveness.

It is clear that such formulation can be easily generalized to cases where the evaluation is performed not by humans, but by any kind of automatic process extrinsic to the system. Such processes might still involve a weaker link to human perception or more generally to qualitative cognitive measures but would nevertheless be directly or indirectly measureable and interpretable.

#### 4. Tensor Algebraic Formulation of the Spiral Discovery Method

The original formulation of SDM is in a tensor algebraic form, shown in Figure 1. It is based on the discretization of a hypothetical function that maps vectors of perceptual values to generation parameters . In most cases, this function cannot be expressed analytically and might even be different depending on various circumstances, such as the user performing the evaluation. At the same time, a discretized form of the function can often be sampled through experiments (this idea is inspired by the Tensor Product model [14–16]). The discretization is stored in a tensor, , such that all dimensions, save for the last one, correspond to discrete gradations along perceptual scales (e.g., “roughness,” “softness,” “degree of comfort,” or “cost-effectiveness”), while the last dimension stores -dimensional generation parameter vectors corresponding to the perceptual configurations.