Abstract

The aim of this paper is to evaluate user satisfaction based on System Usability Scale (SUS) questionnaire on Augmented Reality (AR) application for productive vocabulary using speech recognition. There is still lack of research focus on user satisfaction towards the use of AR-based app using speech recognition for vocabulary learning in early education. The first objective of this paper is to develop an AR application for children using speech recognition to enhance productive vocabulary learning that integrates visual script (orthography) and audio (phonology). The second objective is to evaluate and identify the user satisfaction in enhancing productive vocabulary methods by combining AR technology and speech recognition to the parents and teachers by doing a mixed method testing such as questionnaire and interview. To achieve this, an interview session was conducted with the experts and SUS questionnaire was given to the teachers and parents of the student to evaluate user satisfaction. The results show that the research hypotheses of this study were successfully achieved. It revealed that the teachers and students were satisfied with the application based on SUS score (SUS score > 68). Since the total SUS score is 80.3, which is above average, it shows that they were satisfied with the overall application. Apart from that, the mean of the usability study based on questionnaire also shows that the users have found that this application is usable in terms of learnability study and that this application is easy to use.

1. Introduction

Application development that combines AR technology and speech recognition is expected to improve the productive vocabulary method as it reinforces the connection between visual scripts (orthography) and audio (phonology) in the reading words [1]. A combination of speech recognition and AR can improve productive vocabulary methods through AR-based applications. This enhanced method is able to improve vocabulary learning among preschool children where correct pronunciation can be checked and reinforced.

Through productive vocabulary method, people will be able to examine whether the words that have been pronounced are correct or not. Proper pronunciation checks can be implemented digitally when speech recognition components are implied in applications based on productive vocabulary methods. This is the audio part in reading words. In order to connect between visual scripts which is the words and the audio, the AR technology can be used. This is because AR can overlay digital content on top of the scripts based on the audio received using speech recognition. However, the user satisfaction towards the use of the AR-based app using speech recognition is still lacking, especially for productive vocabulary learning.

The main objective of this study is to develop an application for children using speech recognition to enhance productive vocabulary learning that integrates visual script (orthography) and audio (phonology) to help in improving pronunciation problem. The second objective is to test and see the user satisfaction in enhancing productive vocabulary methods by combining AR technology and speech recognition to the parents and teachers by doing a mixed method testing such as questionnaire and interview.

This study was conducted based on the following research hypotheses:(i)Expect that the teacher will be satisfied with the prototype application that has been developed.(ii)Expect that this application can increase student interest in learning vocabulary.(iii)Expect that the user will find this application very usable.(iv)Expect that the user will find this application easy to use.

Today’s technological advancement has led to various reforms in teaching and learning systems in the classroom. The tendency for children to use smartphones has motivated researchers to study the advantages of using smartphones in children. Nowadays, AR technology in teaching and learning environment has attracted the interest of the community as it is able to attract children attention in learning environments [2]. There are several recent AR-related studies conducted in educational field to see the effects and benefits to the students (Table 1). The use of AR applications is believed to be of interest to students to replace conventional and static learning environments with a more edutainment environment [3].

In today’s life, technologies are very important and AR has become one of the most emerging technologies and started to gain attention among society [14]. AR’s ultimate goal is to provide better management and to access information wherever and whenever using combination of interactive real-world and computer-generated world or virtual world in a coherent space. According to [15, 16], AR has three main features, namely, a combination of virtual and real-world elements, a rushed in real-time interaction, and an enrollment and alignment into 3D-real-life of real and virtual objects. Many AR researchers have referred to the definition of Azuma as it provides a good and clear description of AR technology. AR is a new technology that has the potential to be used in today’s world of education [17, 18] and it has become more widespread and practical [19]. The paper [20] examines primary and secondary school students to determine whether AR can improve learning experience or not. The researchers have found that “AR-based education media could be a valuable and attractive additional material to the education in the classroom and overcome some of the limitations of text-based methods, allowing students to absorb the material according to their preferred learning style” [19]. AR can also be used to improve traditional and static learning content in the classroom. This will definitely appeal to children with additional media content such as audio, video, graphics, 3D objects, and more. In addition, diverse applications tools for mobile devices with AR features have been developed [21].

The productive vocabulary method is the process of remembering words to convey it and the word is pronounced in correct pronunciation [1]. This practice reinforces the subsystem relationship between visual script (orthography) and sound (phonology). This vocation helps people to realize whether the knowledge in reading the words is still lacking or to reinforce existing knowledge [22]. Therefore, Kumar et al. [1] have studied the incorporated speech recognition elements in productive vocabulary methods for the game. However, there are still not many software programs containing speech recognition especially for productive vocabulary methods involving artificial intelligence and AR. AR technology is not only focused in research area but it has also been used at school level in Malaysia when the Form 1 Science book has been using AR technology [23]. Digital information will be displayed on the mobile phone screen when the camera is directed to a particular textbook page.

Some researchers have taken the opportunity to diversify the learning system, especially with the combination of technology as [24] has done to determine the degree of motivation of master students on materials designed with AR technology to teach English vocabulary at primary level. The use of AR technology has a positive impact on student motivation in teaching vocabulary in a classroom [25]. The study was conducted by developing a handheld AR system and a specific use case that is vocabulary learning. The results of the assessment have shown that AR applications with good system usability have been developed successfully. Preliminary assessment also shows that AR can lead to the ability to remember the words learned, help attract students’ attention, and provide satisfaction to students.

A study has been conducted by developing and comparing two AR learning systems for third-graders to study English vocabulary [26]. The developed system is based on a collective game based (CGB) and another system based on a sequential mission game (SMG). CGB requires students to collect all seven checkpoints without limiting the starting point. SMG requires students to complete seven levels in sequence. AR and vocabulary apps research have also been conducted by [27]. The results show that field dependence (FD) students benefited from mobile AR instruction on learning outcomes. A study has been conducted by developing a miniature-based AR game for the vocabulary learning platform for utilized external context, pseudo-physical contexts to address task-based learning, and embedded learning [28]. The miniature game developed is believed to provide pleasure, motivation, and confidence as well as autonomous and pure learning to students and teachers.

The combination of AR technology and speech recognition that will be developed through this study is expected to help in improving vocabulary learning as well as providing more conducive learning environment with the combination of multimedia elements such as video, audio, 3D objects, and graphics and much more than the static learning environment. According to [29], if education and AR technology are brought together, students can have the experience and can learn while having fun. This can maximize the effectiveness of learning. The use of speech recognition is effective in improving the level of pronunciation as stated by [30] that uses two computer-based programs with speech recognition combinations and found that this software is a students’ choice in Taiwan to help them improve the English pronunciation problem. AR-based literacy software has also been developed such as Augmented Reality English Learning System [31], Augmented Reality English Vocabulary Learning System [32], and Handheld English Language Learning Organization [33].

A system has been developed to help people with hearing problem to communicate by combining AR technology and audio visual speech recognition (AVSR) [34]. According to [35] more researches have to be carried out on a multimodal combined strategy and how to combine gesture inputs and speeches so that the intent of the users can be conveyed to AR applications. The authors in [36] have stated in their study that the multimodal interface can be extended further to other AR application domains to see the benefits seen in the virtual environment whether it can be expanded to different fields. Based on these issues, a study to combine AR multimodal input involving selection of flash cards and speeches would be developed into an AR application environment to strengthen the relationship of two major components in vocabulary knowledge, namely, visual (orthography) and audio (phonology) [1]. This is to see how effective the AR multimodal input applications in assisting vocabulary learning or pronunciation problem.

3. Material and Method

3.1. Research Design

For this study, mixed methods were used to evaluate teachers’ satisfaction to facilitate teaching and learning sessions in the classroom. Teachers’ satisfaction was initially evaluated using qualitative method by interview and then students’ satisfaction was measured using a quantitative method based on SUS questionnaire [37].

3.2. Sample

The sample of the study involved two female teachers () for the qualitative method, which was done by interview and 30 female and male students () for the quantitative method using SUS questionnaire. The teachers involved in this study were teachers from local primary schools in Malaysia. Students from primary school also were involved in data collection process. The students came in small groups accompanied by their teacher. The students and teachers involved were from different races such as Malay, Indian, Chinese, and many more. The sample questionnaire used was from SUS questionnaire and the data analysis was done based on SUS score to identify if the teachers and the students are satisfied with this speech based AR application.

3.3. ARealSpeech Application

The application developed in this study is an application that combines AR technology and speech recognition. The main interface for this application is shown in Figure 1. There is a microphone button as well as a space that displays the text when the sentences pronounced by the user. The microphone button will produce a “beep” sound after being pressed and a “say something” text will appear in the text box to inform users that they have to say something so that the application can detect their sentences. The “smiley” or “sad” icon will appear at the end of the text box to tell the user whether the sentence pronounced by them is correct or not (Figure 2). Through the developed prototype, children are required to pronounce the words and sentences on the flash card (Figure 3). Each sentence pronounced by the children will be tracked using the speech recognition technology that has been implemented into this ARealSpeech application. Here teachers can find out whether the term pronounced by the children is correct or not.

In addition, AR technology implied in this application causes the appearance of 3D objects on the surface of the flash card. The children can relate to the visual script and the sound. This will increase the interest of children because the 3D object has simple animations that can attract them. The main purpose of development of this application and this study was to examine the effectiveness of speech recognition in assisting children’s pronunciation as well as providing an edutainment learning environment through AR technology.

In the design phase, researchers have designed flash cards that are suitable for children. Through this flash card design, the development of this application continues using Unity3D Editor and Vuforia. Through this Unity3D Editor, developers add interesting digital content such as animation, graphics, 3D objects, and audio. The development phase also involves the implementation of speech recognition into the application through the Unity3D Editor. Developers use the Speech Recognizer API as a plugin to generate speech recognition technology into the AR-based environment.

3.4. System Framework and Function

In this study, the application was developed based on AR and speech recognition technology component. The AR system architecture for this application is shown in Figure 4. System architecture describes the flow of the application that has been developed. First, the system needs to receive visual and speech input from the user to be processed in the application. For the visual tracking function, the process for marker detection will be applied. The system will detect and decode an image tracked by the phone’s camera to retrieve AR content. The background process will find the 3D position and estimation of the marker. After the identification of the marker, the system will render the AR content. The AR content will be composed into a virtual scene. In the speech tracking function, first speech detection will be tracked by the system. The system will recognize the speech by using an API. For this application, researchers use Speech Recognizer API, which has been implemented into the application and is suitable for the AR development. The system will extract the audio feature and speech to text conversion will happen. The fusion module is the process where both inputs will be combined. Lastly, the AR Scene Manager will generate an output.

3.5. Research Process

The research process was carried out in several ways. First, the authors explained how to use the application to the students. After that, the students were given an opportunity to use the application themselves (Figure 5). The time given to the students to explore this application was about 20 minutes. However, since the study was conducted on early education students, questionnaires were given to teachers or their parents to be filled up to see the level of achievement of students based on their observation after the children used the app. This was because the children still did not understand the questionnaire used by the researchers. Second, the research process was conducted through the interview. The researcher interviewed two of the teachers who are experts in early childhood education and AR. The interviews will make it easier to get the requirements for the developed applications.

3.6. Data Collection Tools

In this study, three data collection tools were used which include interview, observation, and questionnaire. First, an interview was conducted with expert users, which were teachers for early childhood education that have experience with AR technology. Semistructured interview questions related to AR and speech recognition were asked to the experts. The questions were asked based on the experts’ skill and their experience in using AR. The full list of question can be found in Table 2. This interview was conducted to analyze the importance and needs of the AR application that will be developed. The interview was conducted after demonstrating the application using the first application prototype to the expert. The interview session went smoothly and the data requirements for the application were successfully obtained from this interview. After that, the testing with the students was performed when the prototype has been improved based on the results of the experts’ interview. The testing was based on observation when the student uses the application. All observations were recorded as video, while the students used the ARealSpeech application. Afterwards, the teacher was required to answer the SUS questionnaire [37] based on their observation when the children used this application. The researchers suggested the teachers or the parents to answer the questionnaire because the children still did not understand the structure of the survey questionnaire. Detailed information regarding the survey questions is included in Table 3.

3.7. Data Analysis

Teachers’ and students’ opinion were recorded using descriptive method based on SUS questionnaire (frequency, SUS score, mean, median, mode, and standard deviation). Children’s behaviors were first being observed and the teachers needed to answer the SUS questionnaire [37] based on their observation. Each observation was also noted and recorded by the researchers. In addition, data analysis for interviews was also conducted by researchers and analyzed through discourse analytic method [38].

4. Results

4.1. Expert’s Interview

Interviews were conducted by interviewing two experts in the field of early childhood education and the use of AR technology. The experts indicated that this application is complementary since it is good for teachers to use it as one of the teaching materials. It is also good for the children because this application can increase their interest during the learning process with the digital content applied in this application. Students can also interact with the digital content contained in this application such as 3D models. This application can also make learning more meaningful because the student can rely on the visual content to enhance their memory. It will be easier for the students to understand and remember every word they pronounced through this application as it linked between the visual and the audio. Furthermore, primary school teachers perform many approaches for effective learning system, for example, by using the video so students can see and imagine every learning object. The experts also indicated that the using of speech recognition in this application would force student to pronounce the sentences or word correctly, which is very good for self-learning.

In addition, experts stated that there is a lack of existing learning systems because mostly all of the systems involve static learning materials and do not involve the use of the senses. While one of the ways to enhance learning for children is through the maximum uses of senses, because children are very exposed to the use of gadgets, the expert believes that this application is very interesting and in terms of vocabulary learning and it will be very helpful for the student to use it for learning. Based on the expert’s opinion the content which is good is the animation. This is because this kind of moving object is very difficult for the teacher to show to the student in the class. In conclusion, the opinions given by the experts to the application that have been developed are positive and can help the teaching and learning system in the classroom.

4.2. SUS Score

We calculated the SUS score (Table 4) and its factor from respondent responses in the experiment. According to [39], good SUS score for each individual should be above 68 (SUS score > 68). The calculated SUS score for the conducted experiment is above average (80.3). This shows that this application meets good usability characteristics. Based on the SUS score, it is shown that most of the participants recorded above average SUS score (SUS > 68). Only three of the participants scored below average SUS score and based on analysis it happened because they think that they need technical assistance to use this application and need to learn more to use this application. However, the overall SUS score shows that most of them were satisfied with this application. The participants thought that they will use this application frequently in the future and they agreed that various functions in this application were well integrated.

4.3. Usability and Learnability Factor

The researcher also reviewed the usability and learnability characteristics of this application based on the SUS questionnaire (Table 5). From the questionnaire, learnability factor shows the mean value of 2.17. Learnability involves items 4 and 10 contained in the SUS questionnaire. These items are inherently negative; hence, lower means suggest better learnability. The users did not agree that the application developed is very complex. In addition, they also disagreed with the opinion stating they need to learn many things to use this app. From the usability factor, the mean value is 4.17. This involves items 1, 2, 3, 5, 6, 7, 8, and 9 in the SUS questionnaire. It can be concluded based on the mean SUS score that the users agreed and gave positive responses for the developed application, such as I thought this application was easy to use, I found the various functions in this application were well integrated, I would imagine that most people would learn to use this website very quickly etc.

4.4. Feedback on Positive and Negative SUS Questionnaire

The classification of SUS questionnaire based on positive and negative question was also analyzed (Table 6). The mean SUS score of the positive questions is 4.33, which is the item of odd numbers (1, 3, 5, 7, and 9). The mean SUS score of negative questions is 1.92 for even number items (2, 4, 6, 8, and 10). The mean of 4.33 for positive questions indicates that the user agrees and holds a positive opinion about this application: I think that I would like to use this application frequently, I thought this application was easy to use, I found the various functions in this application were well integrated, I would imagine that most people would learn to use this application very quickly and I felt very confident in using this application.

The mean of 1.92 for negative questions indicates that the user disagrees with the opinion on item: I found this application unnecessarily complex, I think that I would need assistance to be able to use this application, I thought there was too much inconsistency in this application, I found this application very cumbersome/awkward to use and I needed to learn a lot of things before I could get going with this application.

In conclusion, the user’s opinion on this application is very positive.

5. Discussion

In this study, teachers’ satisfaction and students’ behavior towards ARealSpeech application were determined. This study was conducted using mixed methods to evaluate teachers’ satisfaction in facilitating teaching and learning sessions in the classroom. Teachers’ satisfaction was initially evaluated using qualitative method by interview and then using a quantitative method based on SUS questionnaire. It was found during the experts’ interview that there is a need to enhance learning for children through the maximum uses of senses. This is achieved by developing ARealSpeech, an application that combines AR technology with speech recognition. The calculated SUS score for the conducted experiment was found to be above average (80.3) suggesting that the developed application meets good usability characteristics. Higher learnability and usability means suggest that the application is easy to learn and easy to use by the users.

Based on literature, AR was found to attract children attention in learning environments [2] and this study reaffirms this claim based on findings during the course of the study. Productive vocabulary training, which enables language-based learning software applications using speech recognition technology to check student’s pronunciation, can generate stronger literacy benefits [1]. Thus, the developed application is expected to provide benefits in terms of vocabulary learning using speech recognition. ARealSpeech uses similar analysis approach as [19] and shows positive results in terms of usability and learnability of the teachers.

6. Conclusion and Future Work

In this work, the users’ satisfaction was analyzed based on SUS questionnaire. The results reveal that the teacher and children show more interest in using the application and they were very satisfied with the application. The function of speech recognition in this application can help in improving students’ pronunciation because they can keep repeating the words until the application gives correct feedback. This work is expected to assist the student in improving their learning knowledge in vocabulary using different approach that involves a few technologies such as AR and speech recognition. Based on the research hypotheses conducted in this study, they reveal that the teachers were satisfied with the application based on SUS score (SUS score > 68) because only three respondents recorded SUS score of below average. Since the total SUS score is 80.3, it shows that they were satisfied with this application. Next, based on observation, the students’ interest in using this application was also increased. Apart from that, the mean of the usability study based on questionnaire also shows that the user found this application very usable. Lastly, learnability study shows that this application is easy to use. Although the research yield good results, bigger sample size for interviews and usability testing can drastically improve the findings. Apart from satisfaction, effectiveness and efficiency should also be measured with detailed task design to measure better usability. Inferential statistics should be used to measure quantitatively the behavioral aspects and factors affecting the usability of the application.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors acknowledge the GUP-2015-008 Grant for funding this research.