Abstract

This research investigates if a computer and an alternative input device in the form of sensor gloves can be used in the process of teaching children sign language. The presented work is important, because no current literature investigates how sensor gloves can be used to assist children in the process of learning sign language. The research presented in this paper has been conducted by assembling hardware into sensor gloves, and by designing software capable of (i) filtering out sensor noise, (ii) detecting intentionally posed signs, and (iii) correctly evaluating signals in signs posed by different children. Findings show that the devised technology can form the basis of a tool that teaches children sign language, and that there is a potential for further research in this area.

1. Introduction

Communication involves the exchange of information, and this can only occur effectively if all participants use a common language [1]. Deaf people need an efficient nonauditory means of expressing and interpreting information in order to communicate, and sign language have proven effective in communicating across a broad spectrum of requirements from everyday needs to sophisticated concepts. Australian sign language (Auslan) is the native sign language used in Australia where the research has been conducted, but the work is equally applicable to other signed languages. It is important that intuitive and efficient tools for teaching sign language are available to ensure that hearing impaired people are able to develop extensive social networks with deaf and hearing people. In addition to ensure that deaf people are able to obtain the best possible education and services within the community.

This research investigates if a computer, and an alternative input device in the form of sensor gloves, can be used in the process of teaching children Australian sign language (Auslan). Each sign consists of a number of parts: hand shape, place of articulation, orientation, path of movement, and nonsign components including facial expression [1]. For this research we are focusing on the hand shape component as one important aspect of a sign. We wish to use a computer, because computers can act as an ideal medium for conveying details of sign language such as hand shapes, location, and hand movements. In addition to this, the learner can work at their own pace, at a place and time that is convenient to them. The learner can target the vocabulary that is relevant to their circumstances and multimedia can provide supplementary information to enhance the learning experience. The computer can also be used to effectively assess the learner’s receptive vocabulary. However, traditional computing systems are unable to provide feedback about the accuracy of the learner’s expressive signs [2, 3].

A central difference between current research in this area and our work is that current research tends to investigate how sensor gloves can be used for sign language interpretation [4], while we focus on investigating how sensor gloves can be used as a teaching aid. Another central difference is that current research mostly focuses on adults, while we focus on children. We have this focus because to date no research has been completed in the area of using sensor gloves to teach sign language to children, despite the possibilities this technology offers for teaching sign language.

This research paper presents the initial research into the viability for using data gloves in combination with a computer and software to provide feedback to children on the accuracy of their expressive signs. The aspects covered in this paper are a description of the gloves and hardware, how to identify intentionally posed handshapes, and an initial investigation into the viability of evaluating signals from two children with different hand sizes using one set of data gloves. This paper only briefly discusses how the results could be incorporated into a learning system.

We have decided to incorporate sensor gloves into our system design, as this technology has been used in a variety of application areas, which demands accurate tracking and interpretation of sign language. An example is the AcceleGlove technology developed by [5]. In their work, they use a computer and sensor gloves to manipulate a virtual hand, icons on a virtual desktop, and a virtual keyboard through the use of 26 different signs. They also show that computers and sensor gloves can be used to translate sign language into speech or text.

This paper is organized as follows. In Section 1, we provide a brief overview of existing sensor glove technologies. In Section 2, we describe how the sensor glove technology used in this research was devised. In Section 3, we start off by describing how a set of experiments, which investigates if the devised technology (i) can identify intentionally posed signs, (ii) is robust enough to correctly evaluate signs posed by more than one child. In the end of Section 3 we provide an analysis of results from the experiments, and in the final section of the paper we conclude by outlining the potential for further research in this area.

2. Sensor Glove Technologies in Literature

In this section we define what sensor gloves are, and describe some of the existing glove technologies. The hardware components of the gloves will be discussed first. We then go on to describe some of the processing techniques that are used to analyse and interpret data signals that are generated by the gloves.

2.1. Sensor Glove Hardware Described in Literature

Sensor gloves are hand worn devices with inbuilt sensors that can capture information about the movements and positioning of the user’s hands. Some of the most widely known sensor glove technologies are the (i) DataEntryGlove [6], (ii) Data Glove [7], (iii) CyberGlove [8], and (iv) AcceleGlove [9].

The DataEntryGlove was presented by Gary Grimes from Bell Telephone Laboratories in 1983, and was the first widely published sensor glove [6, 10, 11]. The DataEntryGlove was originally devised as an alternative to the keyboard, and made it possible to generate 96 printable ASCII characters from 80 different finger positions. The glove was made out of cloth and had flex sensors along the fingers, tactile sensors on the fingertips, and inertial sensors positioned on the knuckle side of the hands. The distribution of the sensors was specified with the aim of recognizing the Single Hand Manual Alphabet for the American Deaf [10]. The DataEntryGlove was researched but was never commercially developed.

Thomas Zimmermann developed the DataGlove in 1987. This glove was constructed of a lightweight fabric glove equipped with optical sensors on each finger, and magnetic sensors on the back of the gloves [7, 11]. The optical sensors were constructed of optical cables with a small light in one end and a photodiode in the other. When the fingers were bent, the light was reduced in strength before it reached the photodiode. The bending of the fingers could therefore be determined by measuring how much light the photo diode detected. The magnetic sensor measured the rotations of the hand in relation to a fixed reference point [7, 10]. The DataGlove was commercialized by VPL Research and could be purchased at a reasonable price, which lead to widespread use of this glove.

The CyberGlove was developed at Stanford University in 1988 and was specifically designed for the Talking Glove Project, which focused on translating American sign language into spoken English [8, 10, 11]. This glove was made up of a cloth glove with the fingertips and the palm areas removed. This made it possible for users to easily grasp objects and made it possible for deaf-blind users to conduct manual finger spelling while wearing the gloves [8]. The gloves were equipped with a total of 22 flex sensors, which was made out of thin foil mounted onto plastic modules. These sensors were sewn into pockets running over each joint, and could measure flexing of fingers and wrists. The maximum flex that could be detected by a sensor was regulated by adjusting the thickness and elasticity of the plastic modules. The plastic modules were selected in such a way that they of maximized the output signal, and at the same time minimized fatigue of the sensors [8]. Informal experiments have shown that this glove performs in a smooth and stable way, and that it is accurate enough to capture complex and detailed finger and hand gestures [10]. However, according to Sturman, one must calibrate the sensors to each user in order to accurately capture gestures from different hand sizes and hand shapes. The CyberGlove is commercially available from VR logic [12].

The AcceleGlove uses accelerometers and potentiometers to capture finger and hand poses. The accelerometers are placed on the fingers, the wrist, and the upper arm, and are used to provide orientation and acceleration information. The potentiometers are located on the elbow and the shoulder, and provide information about the hand’s absolute position with respect to the body [13]. The AcceleGlove also incorporates a wrist button, which allows the user to easily activate and deactivate the glove. To activate the glove the user simply presses the wrist button, and to deactivate it, the user presses the button a second time. This process is repeated for each sentence, in order to assist the system in interpreting the signals correctly.

Before we move on to the next section, it is important to notice that literature points out that signals from sensor gloves have to be converted from an analogue to a digital format before being interpreted by a computer [8]. This process can be conducted with an analogue to digital converter.

2.2. Sensor Glove Software Described in Literature

When the reviewed literature discusses issues related to the software components of glove technologies, the main focus is on how to classify signals. This focus is held because the classification process is central in determining if signs can be correctly identified. The requirements of the target application area determines what method is best suited for classifying the signals (e.g., sometimes it is sufficient to use a classification method that only takes into account the shapes of the hands, while other times it is necessary to use a classification method, which analyses hand shapes, hand locations, and hand movements). One must also determine if one wants to classify static or articulated signs. If one wishes to classify static signs, then it might be necessary to use a classification method that can filter out “transitional signs,” which not only are intentionally posed by the user, but rather arise as the fingers and hands move from one pose to another. If the target application area requires classification of articulated signs, then one might have to use a classification method, which takes into account (i) initial hand shapes, (ii) hand orientations, (iii) hand positions, (iv) hand motions, and (v) end hand shapes [14].

Some methods that have been used to successfully classify signals from sensor gloves are (i) neural networks (NNs), (ii) hidden markov models (HMMs), and (iii) template matching. When using NNs or HMMs, one must first construct a network with sufficient nodes and links to capture gestures at an abstraction level, which satisfies the requirements of the application area. Then one must train the network by iteratively processing representative samples of the type of data to be classified. The drawback with using NNs and HMMs, is that significant time and effort is required in order to design and train the networks. It is also hard to search for errors, and to explain the outcome of the classification process [15]. Template matching based classification methods on the other hand, can be devised relatively fast, makes it easy to search for errors, and makes it easy to explain the outcome of the classification process [15]. One form of template matching, which has been successfully used for classifying data from sensor gloves, is referred to as “conditional template matching.” “conditional template matching” compares incoming data signals with a prestored library of patterns. This is done by evaluating one component of the signal after another, until the signal has been compared to all the patterns in a library, or a condition is met. Conditional template matching has proven to provide an accuracy of 95% on 175 signs, and this better than results provided by HMMs and NNs [5]. We will therefore use a form of “conditional template matching” in this research.

3. System Design

In this section we will describe the hardware and software components that have been devised throughout this research. We start off by describing the hardware components. Then we continue by describing software components that have been devised to (i) filter out sensor noise, (ii) detect intentionally posed signs, and (iii) evaluate signals in signs posed by different children.

3.1. Hardware

A number of issues had to be considered when we were assembling the hardware for the sensor gloves. Some of these issues are the following.(i)What size the gloves had to be, to fit onto the hands of different children?(ii)How to make the gloves robust enough to ensure that they can withstand the wear and tear, which results from several children putting them on and taking them off?(iii)What type and number of sensors to select to successfully extract Auslan signs?(iv)How to attach the sensors to the gloves?(v)How to convert the analogue signals from the sensor gloves into a digital format, which can be readily interpreted by a computer?

To ensure that the gloves would have a size suitable for a child, we used a pair of children’s gloves as a base for the sensor gloves. The selected gloves were made up of robust and stretchy lycra material. We selected gloves with this material to ensure that the sensor gloves would be robust enough to withstand wear and tear, and to ensure that they would have enough stretch to fit the hands of different children.

10 flex and 10 tactile sensors were incorporated into the gloves. These sensors were selected because they would make it possible to detect finger flexion and the touch of fingertips, which is sufficient to register a number of different Auslan signs. A pair of the selected flex and tactile sensors are shown in Figure 1. To make it possible to detect finger flexion and the touch of fingertips correctly, the sensors had to be appropriately distributed onto the gloves. This was made possible by mounting pockets onto the gloves at both the palm and the knuckle side of each finger. The size of these pockets was specified so that they would fit the sensors and keep them in place. When the pockets had been mounted onto the gloves, the flex sensors were slid into the pockets located on the knuckle side of the fingers, while the tactile sensors were slid into the pockets located on the palm side. How the sensors were distributed across the gloves is illustrated in Figure 2. When the sensors had been slid into the pockets, a set of tubes was sewn onto the wrist area of the gloves. Wires from the sensors were then pulled through these tubes to keep them out of the way when the sensor gloves were in use.

The wires from the sensors were then plugged into an I-cube X converter so that sensor signals could be converted from an analogue to a digital format. The I-cube X is a system that enables a large variety of sensors to be connected to the I-cube X box which is a digitizer that converts the signal from the sensor into a digital message [16]. Data from the system can then be accessed via a computer and used to feed into programs that are written in a variety of languages. A third party plug in for Adobe Director was used to develop the children’s data gloves. To access the output from the system the MIDI cables from the I-cube X converter was plugged into a uno MIDI to USB converter before the setup was linked to a laptop. The sensor gloves, the I-cube X converter, and the uno MIDI to USB converter is shown in Figure 3.

3.2. Software

The main issues that had to be considered when we were designing the software for the sensor gloves were the following.(i)How to simplify the data to support fast processing?(ii)How to identify intentionally posed signs?(iii)How to correctly evaluate signals in signs posed by different children?

We will provide a quick overview of the raw data from the sensors before we go on to describe the architecture of the software, as this will make it easy to understand why we have employed the particular methods. Data packets with raw data are transmitted from the I-cube X box to the computer every fourth millisecond. These data packets include information about when the data was captured, the wire or channel the data is transmitted through, and the signal strength. Signals from the tactile sensors have a strength that range from zero to 118, where zero corresponds to no pressure, and 118 corresponds to hard pressure. Signals from the flex sensors range from zero to 115, where zero corresponds to no flex and 115 corresponds to maximum flex.

To support fast processing we devised a low-pass filter. This filter removes data signals if the input from all the sensors has a strength below a set threshold. The threshold value was set to 45, which is a value just above the maximum random signal fluctuation observed in the sensor system when the sensors are not stimulated. Signals that are not removed by the low-pass filter are scaled so that the minimum value of all signals is zero, the maximum value of signals from the tactile sensors is 73, and the maximum value of signals from the flex sensors is 70. These signals are then passed on to a classification module. This classification module first labels each data packet according to the finger and sensor the particular stimulus was detected from. This is done by using a mapping model, which relates wires or channels to particular label names. The mapping model is shown in Table 1.

When the data packets have been labeled, they are analyzed to discriminate between intentionally and unintentionally posed signs. This analysis is conducted by evaluating the strength of the sensor signals throughout pulses, which last for two seconds. These pulses have two main phases. We call the first of these phases (which lasts for one second) a “registration phase.” In this phase, signals from all the sensors are registered by the software. The second phase (which also lasts for one second) is referred to as a “constant phase.” In this phase the signals from the sensors can only fluctuate 30 units above or below the signals detected in the first phase, for input to be recognized as being part of an intentionally posed sign. If sensor signals that have been detected throughout the “constant phase” are stable enough to satisfy this criterion, then they are grouped into an intentionally posed sign and processed further. If some sensor signals fail to satisfy this criterion, then all the detected signals are discarded at the end of the pulse. This process is illustrated in Figure 4. The thick lines illustrate the “registration phases,” the thin lines with an X illustrate the “constant phases,” and the black dots illustrate the end of each pulse, which is when signals can be grouped into intentionally posed signs. Time is illustrated in seconds along the horizontal axis.

When an intentionally posed sign is detected, it is compared to a library of prestored model signs. This is done to classify the sign, and to determine if it is correctly posed. An intentionally posed sign is regarded as being correctly posed, if it satisfies two criterions. To satisfy these criterions the sensor signals in the intentionally posed sign must(i)be the same as the sensor signals in a model sign, when the signals are expected to be zero.(ii)deviate with less than 30 units from the sensor signals in a model sign, when the signals are expected to be higher than zero. (We allow for these deviations to make the software robust towards slight variations in accents.)

The parameters were specified throughout an empirical trial and error process. Six different model handshapes have been generated. These model signs have been labeled (i) fist, (ii) thumb up, (iii) little finger up, (iv) pointer up, (v) ok, and (vi) cup. How these model handshapes are expressed is shown in Figure 5. The sensor signals that are associated with each of the model signs, and actual sensor signals that have been generated as two children posed these signs, are shown in Table 2.

Model signs and their associated sensor signals are displayed in rows with bold bounding boxes. The sensor signals that were generated when the children posed the signs are displayed in the cells below each model sign. The first of the two values in the cells with sensor signals generated by children, was detected in the “registration phase” of pulses. The second value was detected in the “constant phase.” Gray cells contain sensor signals that have been successfully matched with corresponding signals in a model sign. Preliminary analysis shows that the technology has the potential to recognize aspects of intentionally posed signs. When one studies the table further, one also find that the flex and tactile sensors on the ring finger generated data that was most consistent with signals in model signs.

When one studies Table 2 one finds that sensor signals from one or both of the children were correctly matched with a corresponding signal in a model sign in 36 of 60 possible instances. In 15 of these instances, the signals from both children were correctly matched.

We will describe provide a more thorough analysis of the data presented in Table 2 in the following section.

4. Experiments on the Glove Technology

In this section we will describe two experiments, which have been conducted to investigate if the devised technology (i) has the potential to identify intentionally posed signs, (ii) is robust enough to correctly evaluate signs posed by more than one child. We will also explain how the data in Table 2 was collected and used in these experiments. In the end of this section we will describe the results from the two experiments.

4.1. Experiment Design

To properly test the devised technology, we asked three different children one 7-year-old and two 5-year-olds to pose the six-model signs illustrated in Figure 5. This would have enabled us to explore how children at different ages interact with the technology, and how the technology responds to similar and different hand shapes and hand sizes.

However, before we go on to describe the results we have to point out that one of the three participants (one of the children aged 5 years) did not want to interact with the technology in any way. When conducting ethical experiments it is important to give participants the right to refuse to participate at any stage during the experiment as exercised by one child [17]. We were therefore only able to obtain data from two participants (child 1 was 7 years old and child 2 was 5 years old), and only two datasets will therefore be analyzed for each experiment.

At the start of the first experiment we asked the children to put a sensor glove onto the right hand. We then asked them to pose the signs in Figure 5 in a sequential manner, and to hold each sign for a minimum of two seconds, so that we could obtain the data signals detected by the computer at the start and the end of a pulse. The signal information from the start and the end of the pulses were then stored and analyzed. The aim of this analysis was to investigate if the signals are similar enough to detect intentionally posed signs with the pulse concept described in Section 2.2. We regard it as possible to identify if a sign is intentionally posed and if the summed signal values (from sensors on all fingers) captured at the start and the end of a pulse deviates with less than 600 units. We used 600 units as an upper limit because (i) the software has been programmed to regard signs as being intentionally posed, if the signals detected from each sensor at the end of a pulse are less than 30 units smaller or greater than the signal detected at the start of the pulse (these specifications equates to an allowed deviation of 60 units per sensor, and a summed deviation of 600 units for a hand), and (ii) we want to test if these parameters enable us to identify intentionally posed signs.

In the second experiment, we compared the signals captured from different children as they intentionally posed the six hand shapes in Figure 5. This was done to investigate if signals in signs intentionally posed by different children are similar enough to be correctly evaluated with the devised technology. We regard it as possible to correctly evaluate signals in signs posed by different children, if the signals(i)deviate with less than 30 units below or above the model values, when the model values are greater than zero.(ii)are not different when the model values are zero.

We use these boundaries because a study of the model signs show that it is possible to discriminate between the six model handshapes in Figure 5, when these constraints are employed. Results from the experiments are presented in the following section.

4.2. Analysis and Results from the Experiment

Results from the experiments described in the last section are presented below. We start off by describing the data that was generated throughout the experiment that investigated if it is possible to identify intentionally posed signs.

4.2.1. Consistency at Start and End of Pulses

To investigate if it is possible to identify intentionally posed signs by using the pulse concept described in Section 2.2, We compared the data signals captured at the start and end of pulses which the children posed the six handhapes presented in Figure 5. The data that was generated is shown in Figures 6 and 7. One can observe that the difference between the summed signal values captured at the start and the end of the pulses are far less than 600 units in all cases. In addition, the difference is less than 50 units in four of the six cases, while the greatest difference is 105 units (when the cake sign is posed). The mean difference is 37.16 units. The difference between the signals detected by the computer at the start and the end of pulses as participant two posed the six signs is shown in Figure 7. One can observe that the summed deviations are less than 600 units in all situations also in this case. One can further observe that the maximum deviation is 43 units. The mean deviation is 21.83.

4.2.2. Consistency Across Participants

To investigate if the devised technology is robust enough to correctly evaluate signs from more than one child, we compared the signals registered as the children posed the six hand shapes shown in Figure 5. This allowed us to determine if signals in signs intentionally posed by different children are similar enough to be correctly evaluated with the defined constraints. The number of signal pairs that did, and did not, satisfy the defined constraints when the allowed deviation from a model sign was zero is shown in Figure 8. One can observe that seven of 12 signal pairs satisfied the constraints. One can also observe that five of 12 signal pairs were too different to be able to evaluate both signals correctly, when the current system specifications are used.

The number of signal pairs that did, and did not, satisfy the constraints when the allowed deviation from a model sign was 30 units below or above the specifications of the model sign is shown in Figure 9. One can observe that 45 of 48 signal pairs satisfied the constraints, and that 3 of 48 signal pairs did not satisfy the constraints in this case.

Results from this experiment show that a total of 52 of the 60 signal pairs, were similar enough to be correctly evaluated. We therefore regard it as being possible to correctly evaluate signals in signs posed by different children. However, the results also show that eight of the 60 signal pairs were so different that it is impossible to correctly evaluate at least one of the signals in the pair by using the current system specifications.

An analysis of why the differences between these eight signal pairs are so great, indicates that the differences arise because the children were unable to wear the gloves in the exact same way. The children had considerable difficulty in putting the glove on the individual fingers and pulling it up to the correct position. The other reason for the differences is that the children were of different age, and therefore had quite different hand sizes. This turned out to be a problem because the sensors ended up being distributed differently onto the hands of the different children.

5. Conclusions

This paper has described how to construct a set of sensor gloves, which could potentially be used as a component in a system that can provide feedback to children learning Auslan sign language from a computer. Experiments showed that the devised technology can (i) identify intentionally posed signs, and (ii) correctly evaluate signals in signs posed by different children. It is therefore worth pursuing this research further and extending the research to address other aspects of sign language including movement, hand orientation, and the location of where the sign is made relative to the body. Furthermore, work should be conducted before the technology can be used to teach children Auslan sign language in an accurate and efficient way. Some of the issues that should be addressed include (i) how to redesign the gloves to reduce the discrepancy between signals registered from different children, and (ii) how to devise a function, which provides intuitive feedback that can be used to guide children in the process of reducing the discrepancy between posed signs and model signs. A learning system could only be developed if the feedback that was given was timely and accurate for a wide range of learners.