Abstract

An improved self-organizing map (SOM), parameterless-growing-SOM (PL-G-SOM), is proposed in this paper. To overcome problems existed in traditional SOM (Kohonen, 1982), kinds of structure-growing-SOMs or parameter-adjusting-SOMs have been invented and usually separately. Here, we combine the idea of growing SOMs (Bauer and Villmann, 1997; Dittenbach et al. 2000) and a parameterless SOM (Berglund and Sitte, 2006) together to be a novel SOM named PL-G-SOM to realize additional learning, optimal neighborhood preservation, and automatic tuning of parameters. The improved SOM is applied to construct a voice instruction learning system for partner robots adopting a simple reinforcement learning algorithm. User's instructions of voices are classified by the PL-G-SOM at first, then robots choose an expected action according to a stochastic policy. The policy is adjusted by the reward/punishment given by the user of the robot. A feeling map is also designed to express learning degrees of voice instructions. Learning and additional learning experiments used instructions in multiple languages including Japanese, English, Chinese, and Malaysian confirmed the effectiveness of our proposed system.

1. Introduction

Kohonen’s self-organizing map (SOM) is a kind of a neural network which maps a high-dimensional input onto a regular low-dimensional grid orderly by unsupervised learning schemes [14]. Because of its simple algorithm and powerful performance, SOM has been developed and applied widely to the fields of pattern recognition, signal processing, intelligent control, and so on [515]. In a website of SOM library [6], more than 7,000 papers concern with this technique are collected.

Generally, SOM algorithm maps an n-dimension feature data in an input space to a unit i in a low-dimensional output space with connections by a simple rule using Euclidean distance, winner-takes-all, that is, a high-dimensional input is corresponded to a most suitable unit i with position , best-match-unit (BMU) on the output map. For all inputs and initial connections with random values, a competitive learning rule enhances that the input data with similar features keep closely on the visualized topological output map where is learning rate and is a neighborhood function Here, denote the positions of an arbitrary unit on the output map and BMU, respectively, is a constant. Obviously, , , and .

In fact, the size of the output space N×M in the original SOM is fixed in advance, and parameters such as learning rate and the scale of neighborhood are often determined empirically. These constraints result in 2 kinds of problems in technical applications [614]:the fixed size of output map prevents additional learning when new feature data are presented and BMUs are difficult to be found on the trained output map;annealing schemes for tuning the learning rate and the neighborhood size are necessary to improve the operation rate of output map; however, it usually increases computational load to realize the annealing.

Variations of SOM with growing structures are proposed to solve the first problem [710]. The basic idea of these kinds of SOM is to set the output feature map with a small size initially, for example, 2 units, then insert rows/columns into the map in training, where/when a most visited BMU exists [7, 10] or the deviation of the distance between the units on input layer and output map [8, 9]. We proposed another kind of method to solve the lack of units by using a memory layer to store matured units on the feature map during training process and release the matured units to be initialized, that is, the units come to available to be reused [12, 13]. When a feature data set is input to the learning system, the process searches corresponding BMU on memory layer at first, feature map which is produced by SOM just become to an intermediate map, so we called it transient SOM (T-SOM).

To solve the second problem, there have been also various approaches such as reducing learning rate ( in (2)) and neighborhood size (in (3)) linearly, that is, multiplying attenuation coefficients, calculating the neighborhood size in the input space, or using Kalman filters to find BMU on the output space [6]. Berglund and Sitte proposed a low-cost parameterless SOM algorithm (PLSOM) recently which uses the fitting error between the input and the map only to decide the annealing schemes [11].

In this paper, we combine the idea of growing SOM algorithm and the method of PLSOM to construct a novel SOM names parameterless-growing-SOM (PL-G-SOM) to tackle both problems of SOM described above. This new PL-G-SOM increases its structure adapting to the input data, and anneals parameters to realize sensitive clustering on the output space automatically. We also adopt PL-G-SOM into a voice instruction learning system where it serves as an automatic classifier of input features as well as T-SOM has been applied to a hand image instruction learning system [12, 13] and a voice instruction learning system [14].

The rest of this paper is organized as follows. Section 2 presents the details of PL-G-SOM. Section 3 shows a voice instruction learning system using PL-G-SOM. In Section 4, instruction learning experiments with 4 languages were reported to confirm the ability of learning and additional learning of the proposed system. Section 5 is the conclusion.

2. A New SOM: PL-G-SOM

2.1. Growing of Output Map

To construct a growing SOM which is more sensitive to larger categories of input data comparing with the SOM with fixed size in advance, different criteria have been proposed. Fritzke chose to insert a new row/column adjacent to a most often visited BMU in his Growing Grid [7]. The reason for this criterion of map enlargement is that the earlier map may be considered as a coarse one and likelihood BMUs need to raise their resolution to deal with the change of input. Meanwhile, Bauer and Villmann suggested adding units in the direction or even new dimension of the largest error between input data and the output map in their GSOM [8, 9]. However, the process of enlarging the output map either in Growing Grid or GSOM is similar and it is shown in Figure 1. In fact, when a new row/column needs to be inserted to the neighbor of a BMU c, for example, in the middle of c and f, the weights of connections between input and new nodes take average values of c and f: and so do them of r’s neighbors where or M. Unit f is chosen which has a largest Euclidean distance from the BMU c among the neighbors of c, and after this process, the map size changes to , or.

We use the same growing process here however, a new criterion to choose the BMU is proposed by concerning with a reinforcement learning algorithm when SOM is adopted into a human-machine interaction learning system. The detail will be given in Section 3.

2.2. Annealing of Parameters

To decide the learning rate and the size of neighborhood function, we adopt the method of PLSOM proposed by Berglund and Sitte [11]. Either the learning rate or the neighborhood size is calculated by the distance between input and the BMU: where are positive parameters, for example, the value may be the size of the map and 1.0, respectively.

The competitive learning rule of the connections between input and output units that is, (2), can be changed to an online learning algorithm

3. A Voice Instruction Learning System Using PL-G-SOM

A voice instruction learning system is supposed as an internal model of an autonomous robot which performs kinds of available actions when external signal in voices is presented at first and learns to output requested actions using the reward or punishment from the instructor. So the system supports the robot to keep learning and additional learning abilities. For example, a robot with the voice instruction learning system is able to “understand” human’s instructions in different languages, or a pet robot like “AIBO” [16] comes easily to used to change a new owner.

3.1. The Structure

To realize the human-machine interaction, an internal model of autonomous robot is constructed as shown in Figure 2. The structure is similar to a learning system using Transient-SOM (T-SOM) which is proposed in our previous work [1214]. In [12, 13], a hand image instruction learning system which has 5 layers including Input Layer, Feature Map, Action Map, Feeling Map, and Memory Layer is composed with SOM algorithm and reinforcement learning rules. Instructions to the robot are presented by kinds of shapes of human’s hand, and robots categorize them, that is, image signals in an 80-dimensional space with SOM, and the instructions are labeled with a series of autonomous actions according to a stochastic policy. Instructor observes the action of the robot and provides reward/punishment of the action to robot, so the action policy of the robot is able to be modified to cooperate with the instructions of hand images. For online learning and additional learning, T-SOM adopted a memory layer which stores “matured” BMUs, and input features are matched with units on the memory layer before executing SOM on feature map. We also adopted annealing plan to decide the size of neighborhood and learning rate into T-SOM, and a voice instruction learning system using the improved T-SOM named PL-T-SOM was developed in [14]. However, a problem that exists in T-SOM is that its memory layer stores only the value of matured units without the topology of the feature map. Even if the memory layer could remember the topology of the feature map trained online, the new topology would not be able to be established on it. For this reason, we propose a new voice instruction learning system using PL-G-SOM given in Section 2 instead of T-SOM.

In Figure 2, Feature Map is the basic growing SOM and the size of Action Map and Feeling Map growing with the Feature Map too. In fact, instructions given by voice data are transformed into feature vectors of input space (layer) at first, the PL-G-SOM algorithm is then executed on the Feature Map, and the rules of growing given by (4) and (5) (Figure 1) are also applied to increase Action Map and Feeling Map. Action Map is composed by those units which correspond to the units on Feature Map; that is, each unit on Action Map represents each kind of features of input data. The units on Action Map are labeled by a reinforcement learning algorithm given by Section 3.2 to limit each feature to adaptive actions of the robot. Feeling Map has the same distribution of units as which of Action Map. The action number that comes from Action Map is furnished with a feeling value which expresses the degree of the action mastered by the robot. The details of Feeling Map are described in Section 3.3.

3.2. Reinforcement Learning Algorithm

The value of units on Action Map is given by a value function of state and action, that is, (8), where has the value of selected action when the robot is in the state, and = random numbers initially: where is the empirical value of reward ()/punishment () given by the instructor, for example, a positive constant when the robot acted correctly according to its policy function and a negative constant oppositely.

Now suppose that there are units that exist on Action Map; that is, states exist in the environment of Markov decision process (MDP), each unit has actions to be selected available, then a reinforcement learning (RL) algorithm [17] can be used to label the classes of the states which are units on Action Map yielded by the Feature Map. According to (8), a Q-value table can be established as shown in Table 1.

For each state that is, presented voice instruction, robot intends to select a valuable action according to a stochastic action policy given by Gibbs distribution (Boltzmann distribution) as shown in

Here, is a positive parameter named temperature [17], higher causes an active exploration of actions (each action is selected under a similar probability), and lower gives a greedy selection of the action with higher value oppositely.

We propose to use as a criterion of growing the size of Feature Map, Action Map, and Feeling Map. In fact, when the robot chooses an action with high but instructor judges that it is wrong, then a new row/column is inserted nearby the , that is, the BMU c. The growing process is described in Section 2.1.

3.3. Feeling Map

To express the degree of how a voice instruction is learned by robot, a Feeling Map which has the same number of units with Action Map is designed (Figure 2). The distance from input pattern to BMU of Feature Map and the reward from instructor are used to calculate feeling values which is normalized in [1.0, 1.0] where high positive value means happiness and 0.0 is the initial value of each unit; negative values express sadness. The learning algorithm which was also used in [1214] is given by where ) notes the feeling value of unit on the Feeling Map (zero initially), C notes the continue times of reward or punishment, is the Euclidean distance (squared error) between the unit on Feature Map corresponding to , and the input data, are constants and

4. Experiments

4.1. Descriptions

Learning and additional learning experiments were performed using the system with PL-G-SOM proposed in Section 3 and the system with T-SOM in [1214].

Four kinds of voice instructions were used in experiments: sit down, lie down, stand up, and walk. Instructions in Japanese were used to training the system. Additional learning using voice instructions given by other languages was executed after training using the Japanese. Three kinds of languages: English, Chinese, and Malaysian were used to confirm additional learning ability of the system. The voices were recorded in a normal room by 3 males who pronounced each instruction 3 times. So, there were 3 samples of one instruction used for each kind of languages while 4 actions with 48 samples.

Sound waves were preprocessed by normalization and noise elimination, and windowed by 20 intervals to yield 20 feature vectors of input space. Figure 4 shows an example of instruction “sit down” pronounced in Japanese (“Osuwari”), English (“Sit”), Chinese (“Zuoxia”), and Malaysian (“Duduk”). Parameters used in the experiments are shown in Table 2.

4.2. Results and Analyses

Either T-SOM or PL-G-SOM realized 100% recognition rates for 4 actions in different languages after learning and additional learning. However, PL-G-SOM showed faster and better convergence than T-SOM when the Euclidean distance (SE: squared error) between input and BMUs (Figure 5). This means that the classification to the input pattern was executed more efficiently by PL-G-SOM. Furthermore, the feeling values which express instruction recognition rate showed more obviously that correct actions of robot corresponding to instructions in voices were acquired more quickly and stably (Figure 6). Figure 7 shows the internal states of Feature Map (left) and Action Map (right) changing in training. The curves in each unit on Feature Map in Figure 7 express values of . Numbers with different colors on the Action Map express the different actions which were classified (labeled) by the reinforcement learning process described in Section 3.

Figures 7(a) and 7(b) show the initial states of T-SOM and PL-G-SOM, where random numbers were used. Figures 7(c) and 7(d) are the results of learning using Japanese instructions. Comparing with T-SOM, PL-G-SOM showed more effective on the topology formation of actions; that is, numbers of actions on the Action Map clustered more clearly. After additional learning, that is, using English, Chinese, and Malaysian 300 times, respectively, the size of Feature Map and Action map of PL-G-SOM grew from 25 () units to 165 () (Figure 8).

The scaling variable   used in PL-G-SOM ((6)-(7)) changed with the training, and by Figure 9, one can confirm that decreased eventually during learning at first; however, when a new kind of language was input, the scaling variable suddenly changed to be larger and repeated its annealing scheme. Figure 10 shows the increase of the number of units on Memory Layer of T-SOM and the increase of the number of units on PL-G-SOM. Both units grew with additional learning and the number of units on Memory Layer of T-SOM stopped at 33, meanwhile 140 units were inserted into PL-G-SOM each layer. To confirm the robustness of the two learning system, we also tested noisy samples.

Table 3 shows the results of recognition rates of different actions with 10%, 20%, and 30% noises added to the 48 voice samples (i.e., N% of data in 20 dimensions were replaced by random numbers between [] ). The average rate of success actions using T-SOM and PL-G-SOM was 48.0% and 86.7, respectively, given by 10 times of executions. Table 4 shows the results of recognition rates of different languages with the respective noisy samples.

Figure 11 shows the comparison of recognition rates of T-SOM and PL-G-SOM when 10% noises existed in all 48 instruction samples.

The results using PL-G-SOM proposed here show advantages than those with conventional learning system in all cases. In fact, we also investigated the use of frequency features for recognition of different instructions, however, similar results were observed in the experiments.

5. Conclusion

PL-G-SOM, a novel self-organizing map, was proposed using a reinforcement learning algorithm and annealing schemes of parameters. Online learning and additional learning are available with PL-G-SOM, and it was adopted into a voice instruction learning system of autonomous robot instead of conventional T-SOM. Experiments results showed that the advantage of the new learning system is speed and noise robustness.

Acknowledgment

This paper was supported by Grant-in-Aid for Scientific Research (JSPS nos. 20500207, 20500277).