Computational Intelligence and Neuroscience

Volume 2019, Article ID 8212867, 14 pages

https://doi.org/10.1155/2019/8212867

## A Novel Hardware Systolic Architecture of a Self-Organizing Map Neural Network

^{1}University of Sousse, Higher Institute of Applied Sciences and Technology of Sousse, Sousse, Tunisia^{2}University of Monastir, LR12ES06-Laboratory of Technology and Medical Imaging, Monastir, Tunisia

Correspondence should be addressed to Khaled Ben Khalifa; nt.unr.ostassi@afilahkneb.delahk

Received 23 November 2018; Revised 1 February 2019; Accepted 5 March 2019; Published 1 April 2019

Academic Editor: Cornelio Yáñez-Márquez

Copyright © 2019 Khaled Ben Khalifa et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

In this article, we propose to design a new modular architecture for a self-organizing map (SOM) neural network. The proposed approach, called systolic-SOM (SSOM), is based on the use of a generic model inspired by a systolic movement. This model is formed by two levels of nested parallelism of neurons and connections. Thus, this solution provides a distributed set of independent computations between the processing units called neuroprocessors (NPs) which define the SSOM architecture. The NP modules have an innovative architecture compared to those proposed in the literature. Indeed, each NP performs three different tasks without requiring additional external modules. To validate our approach, we evaluate the performance of several SOM network architectures after their integration on an FPGA support. This architecture has achieved a performance almost twice as fast as that obtained in the recent literature.

#### 1. Introduction

Various hardware implementations of self-organizing map (SOM) neural networks on hardware circuits have been presented in the literature. They may be divided into two main categories. Firstly, analogical implementations on dedicated integrated circuits have been designed [1–5]. These supports are technically limited as they lack precision. Their performance greatly depends on the used technology. Second, digital implementations on ASIC circuits (neuroprocessors) have also been designed [6–10]. This is now the most used category of VLSI for neuromimetic algorithms.

Indeed, all these integration approaches have been generally widely used because they have the advantage of higher accuracy, better repeatability, lower noise sensitivity, better testability, and greater flexibility and compatibility with other types of neuroprocessors (NPs) constituting the neural network. However, the configuration of these systems is too complex for users who are not specialist, and it does not offer the reconfigurability by the users.

The above shortcomings of both types of implementation devices may be avoided, thanks to reprogrammable circuits, such as field-programmable gate arrays (FPGA). These circuits offer high-performance, high speed, and low cost, especially if we target prototyping applications and high-capacity programmable logic solutions that enhance design. These circuits provide also low power consumption design. The availability of material on chip enables the designer to imagine a parallel SOM architecture. Configurable hardware appears well adapted to obtain efficient and flexible neural network implementation. Several SOM implementations on FPGA supports have been proposed [11–18]. Indeed, Porrmann et al. in [11] successfully implemented, on a Virtex FPGA support, a reconfigurable SIMD architecture of an SOM network formed by a processing element (PE). The computation was performed exclusively in parallel between PEs of the same type that formed the system and the input vector. The suggested architecture was composed, in addition to the PE modules, of external control blocks and memory to save the weights of the neurons forming the SOM network. In [12], Tamukoh and Sekine put forward a dynamical SOM hardware architecture. The latter utilized flexible and reconfigurable PEs according to the number of neurons and the size of the input vector. In this work, the exploitation of the dynamic reconfiguration of FPGA circuits provided more flexibility, but at the detriment of performance. Ramirez-Agundis et al. in [13] proposed a massively parallel hardware solution for various neuron numbers on the SOM output map (16, 32, 64, 128, and 256 neurons). The authors evaluated their architectures on a video coding application. The solution suggested by the authors was structured around neural computation modules and a comparator whose resolution depended on the topology of an SOM network to be integrated. In [14], the authors presented an SOM-network implementation on an FPGA with a new asynchronous and parallel neighbourhood approach, based on the triangular neighbourhood function method. This approach was needed to calculate the distance between the winner neuron and its neighbouring neurons. The proposed architecture was not very efficient since it had a very complex neighbourhood-control module that used a large number of clock cycles, which consequently slowed down the overall performance of the SOM network. In [15], Kurdthongmee put forward an approach to accelerate the learning phase of an SOM hardware architecture (called K-SOM) by evaluating the mean square error (MSE) after image color quantization. The author used a single 16 × 16 map to evaluate this approach on different images varying from 32 × 32 to 512 × 512 pixels. The suggested approach was validated on a Xilinx Virtex-2 FPGA, providing real-time performances for image sizes up to 640 × 480 pixels. The K-SOM experimental results were more efficient than other approaches in terms of video rate frame and MSE: about 50% faster in the frame rate and 25% lower in the MSE. In [16], the same author put forward a new method to locate the winning neuron in an SOM network in one clock cycle. This method was based on the exploitation of memory formed by the neuron indices that were addressable by distance values. This new approach allowed reaching a maximal operating frequency of 47 MHz and a number of frames per second (fps) equal to 22. For image compression, the authors in [17] successfully integrated a completely parallel SOM on an FPGA circuit using a shared comparator to exploit the parallelism between different neuroprocessors. In [18], the authors have proposed a new scalable and adaptable SOM network hardware architecture. The suggested architecture would permit dynamically modifying the SOM network pattern only by reconfiguring each neuron. This scalability was obtained by separating the computational layer with neurons from the communication layer. Moreover, scalability is used to provide data exchange mechanisms between neurons by interposing routing modules based on the Network-on-Chip (NoC) technique. Despite the modularity and flexibility of this architecture, it had poor performances in hardware resources as well as execution time.

Most of these approaches depend on the SOM-architecture configuration, such as the number of input vector elements, output layer size, time constraints, and memory requirements. Almost all these parameters are specified during the design phase of the SOM.

Also, the internal architecture of (NP) computation units, presented in the literature, requires other additional modules necessary for (i) the localization of the neuron (called “winning neuron”), at the output layer of SOM, closest to the input vector *X*, using a shared comparator which consumes many blocks and whose complexity increases quadratically with the number of neurons, (ii) the weight adaptation of the winning neurons and their neighbours, and (iii) the global control of the SOM network. As a result, this makes it impractical to use these approaches for integrating large networks.

To overcome these limits, we propose to integrate a new neuroprocessor architecture ensuring the three specific tasks to the SOM network operation: (i) the calculation of the Euclidean distance, (ii) the extraction of the minimal distance, and (iii) updating the weights of the winning neuron as well as of these neighbours. This solution enables us to reduce the time and the number of connections between the various SOM modules by eliminating the shared comparator and replacing it with local comparators for each neuroprocessor.

In order to make our architecture more flexible and efficient in terms of clock cycles, we will adopt a systolic architecture. These architectures are based on a concept in which a single data path traverses all neural PEs and is extensively pipelined. This approach makes it possible to reduce the number of communication paths in the network and simultaneously reduce the number of cycles required to classify a data vector. This will provide a very high clock frequency. For example, we cite Ienne et al.’s work in [19] who implemented two architectures of the SOM algorithm using two-dimensional systolic approaches. They used the MANTRA I platform to validate their approaches. The achieved performance was about 13.9 MCUPS in the learning phase. Another systolic implementation of a 1D-SOM was proposed in [20] on an FPGA. The performance of the solution is 3208 MCUPS.

Consequently, this solution will make our architectures generic and flexible during the design phase, as it allows checking implementation constraints (embedded memory size, arithmetic operator resolution, and power consumption) and rapidly adapting to parameters related to the topology of the integrated SOM neural network. The main contributions of this work are as follows:(i)Implementing a new architecture with systolic interconnections, based on the use of configurable neuroprocessors, each of which provides neural calculation and local comparison(ii)Proposing a new local neighbourhood function for each neuroprocessor, based on the shift principle, while taking into account the neuron position as regards the winning neuron and considering the number of epochs used during learning(iii)Proposing a pipelined scheduling mode for searching the minimum distance and the identifier of the winning neuron in a systolic way

This article is organized as follows. In Section 2, we present the SOM Kohonen model with emphasis on its algorithmic aspect. In Section 3, we highlight the problems solved by our approach. In Section 4, we detail the proposed parallel architecture and the formalism adopted to estimate the execution time. Section 5 presents the internal architecture of the nodes for the SOM model implementation. The obtained results are provided in Section 6. In Section 7, we present a color quantization and image compression application to validate the SSOM architecture.

#### 2. The Self-Organizing Map (SOM)

SOMs are artificial oriented neural networks characterized by their unsupervised learning, as defined in [21]. Technically, these neural models perform a “vector quantification” of the data space by adopting a discretization of the space by dividing it into zones, each represented by a significant point called the referent vector or codebook.

Architecturally, SOMs are made up of a grid (usually one or two-dimensional). In each node of this grid, we find a “neuron”. Each neuron is linked to a referent vector, responsible for an area in the data space (also called the input space).

In an SOM, the reference vectors provide a discrete representation of the input space. They are positioned in such a way that they retain the topological shape of the input space of size Dim_{x}. By keeping neighbourhood relationships in the grid, they allow an easy indexation formed by *P* ∗ *Q* neurons where *P* and *Q* are, respectively, the number of columns and rows (via coordinates in the grid). This is useful in various areas, such as texture classification, interpolation between data, visualization of multidimensional data, etc.

The SOM learning algorithm is competitive and runs in two steps: selecting the winning neuron and then updating the weights of the winning neuron and its neighbours.

For each input vector selected at time *t*, the Euclidean distance to all weight vectors , where are the coordinates of each neuron, is computed as follows:

Subsequently, the neuron with the smallest distance, which is called the winner neuron , is determined as follows:

After selecting the winner node, the weight vectors associated to this node and its neighbours, located in a defined area around the node , are adjusted in such a way that their profile is close to the input data. This adjustment of weights, which characterizes the unsupervised learning of the model, can be described bywhere is the neighbourhood function whose value represents the strength of the coupling between two nodes during the learning process. This function depends on the position of the neuron of coordinates with respect to the winner’s one and on the epoch number representing the number of learning iterations.

#### 3. SOM Based on Systolic Architecture

The proposed architecture is formed by two parts. The first part concerns the computation of the Euclidean distances between the input vector and all the neurons forming the SOM network output layer. It is worth noting that computing all distances will be done in parallel at the same time for all neurons.

The second part concerns the extraction of the minimum distance as well as the identifier corresponding to the winning neuron. In this part, we will adopt a systolic formalism based on the pipeline transmission of distances and identifiers between the neighbouring neurons.

This architecture, called systolic-SOM (SSOM), is formed by a set of the same nodes placed in a two-dimensional space. The direction of data exchange between nodes can perform the entire neural algorithm in its decision and learning phases. In the decision phase, the intermediate comparison results already processed in each node are propagated in parallel to all the neighbouring neurons that surround it directly, to extract the global winner node identifier (Figure 1). In the following section, we present the formalism adopted for the implementation of the various generic SOM architectures. This formalism consists in presenting our neural network as a data-flow graph composed of nodes and arcs.