Abstract

Augmented reality (AR) has been proposed to be an efficient tool for learning in construction. However, few researchers have quantitatively assessed the efficiency of AR from the cognitive perspective in the context of construction education. Based on the cognitive theory of multimedia learning (CTML), we evaluated the predesigned AR-based learning tool using eye-tracking data. In this study, we tracked, compared, and summarized learners’ visual behaviors in text-graph- (TG-) based, AR-based, and physical model- (PM-) based learning environments. Compared to the TG-based material, we find that both AR-based and PM-based materials foster extraneous processing and thus further promote generative processing, resulting in better learning performance. The results show that there are no significant differences between AR-based and PM-based learning environments, elucidating the advantages of AR. This study lays a foundation for problem-based learning, which is worthy of further investigation.

1. Introduction

Currently, with information technology playing an increasingly important role in various fields, people also pay increasing attention to the potential of information technology in education [1]. The construction industry is a complex environment and engineers need to deal with integrated information. Construction education has long been challenged. Traditional teaching or training is not effective enough to bridge the gap between academic and practice [2]. However, information technology enables new education strategies to be used to assist learning, one of which has gained much attention in recent years—the application of augmented reality (AR) [3]. AR is a technology that can enhance and augment reality by generating virtual objects in real environments [4]. Such coexistence of virtual and real objects helps learners visualize complex spatial relationships and abstract concepts [5].

The application of AR technology in education has been developing for more than 20 years, and AR has been applied to many fields like astronomy, chemistry, biology, mathematics, and geometry [6]. While referring to the effectiveness of the AR learning environment, it is always compared with the text-graph- (TG-) based tool for learning. While in the construction industry, apprenticeship programs are common site training methods where risk is unavoidable [7]. Besides, AR is also a significant education measure with no health or safety risks [8]. Many researchers proposed frameworks based on AR to bring remote job sites indoors [9], transform learning processes [10], or enhance the comprehension of complex dynamic and spatial-temporal constraints [11]. The use of AR technology can be an efficient way to assist learning, but there is still little quantitative evidence about the effects of AR [3]. Many researchers have evaluated the effects of AR on learning outcomes, ignoring its potential causes during the learning processes.

Eye tracking is a measurement of eye movement, which can reveal aspects of learners’ learning processes [12]. Because of the use of eye-tracking software for recording and producing data, studies on learners’ cognitive processes have entered a new phase [13].

TG-based and physical model- (PM-) based are common tools for construction learning and training. The authors of the present study conducted an experiment of construction class learning to (1) evaluate learning outcomes while comparing TG-based, AR-based, and PM-based environments and (2) investigate the underlying causes of the effects of the learning method from a cognitive perspective and the potential effects of AR by utilizing eye-movement data.

2. Literature Review

2.1. Does AR Facilitate or Inhibit Learning Efficiency?

Multimedia learning theory suggests that appealing design features can help increase cognitive engagement and retain learner attention when it was first used [14]. Through more investigation, the visual detail in the multimedia resource can result in effective learning and instructional multimedia design [15]. According to Mayer [16], the following cognitive load theory is the basis for instructional design principles [17], cognitive theory of multimedia learning (CTML) between three kinds of processing demands that arise during learning: (1) extraneous processing, which is led by the manner in which the material is presented, increasing the chances that attention will be split among various information. Poor instruction may enhance this process and thus inhibit the effects of transfer learning; (2) essential processing, which is done to focus on presented material and is caused by the complexity of the material; and (3) generative processing, which is done to comprehend the material. It is caused by learner’s efforts in the learning process such as selecting, organizing, and integrating. As asserted in previous studies, both extraneous and germane cognitive load can be manipulated and intrinsic cognitive loads cannot [17]. However, according to Mayer, extraneous, generative, and essential processing can be managed [18]. Furthermore, unnecessary and greater loads that stem from the design of instruction may impose extraneous cognitive loads [19]. Ineffectively searching for information may increase extraneous cognitive load and disturb essential processing. Therefore, the reasonable reduction of redundant information is an important way to reduce cognitive load and, further enhance cognitive learning. The measures include reducing extraneous processing, such as highlighting crucial materials with colors, managing essential processing, such as decomposing learning materials into several parts, and fostering generative processing [16].

AR is a useful technology with which to improve learning, as explained by the CTML [20]. It allows visual information to be registered to the real world [21]. The visual information, as instructional materials in this paper, can be designed following the CTML. Although the materials can be designed and displayed using 3D model design software, AR technology differs in that it provides immersive environments and has been developed as an immersive language learning framework that was motivated by the CTML [22]. Many scholars contend that different learning tools lead to different learning outcomes as shown in Table 1. Few researchers have paid attention to arguments of the design of AR models, the instructional material in this case. A confounding question arises: Does AR facilitate or inhibit learning efficiency by highlighting partial but critical information?

2.2. Manipulation of Extraneous Information with Various Learning Materials

AR has been proven to be a more efficient way of learning in various studies as shown in Table 1. Nonetheless, the evaluations of, compared to, conventional learning environments were basically limited to learning outcomes and, using questionnaires to examine students’ subjective motivations and satisfaction [23, 24]. Because the major function of AR rests in highlighting critical information and labeling extra information as a reference for learning purposes, AR can be perceived as a measure that manipulates extraneous information processing, potentially enhancing the generative process of learning. From this perspective, previous researchers did not answer why and how AR foster learning in construction. In the educational domain, AR appears to be a smart technology with which to create attractive and motivating content. It improves the time spent on acquired learnings [25]. Moreover, an experiment revealed higher learning achievement and lower cognitive load by utilizing mobile AR application [26]. For construction education, applying AR can create a realistic learning environment without health and safety risks and enhance students’ comprehensive understandings of construction equipment and operational safety [8, 10, 27]. As shown in the “control group” column of Table 1, generally, the advantages above of AR mainly come to conclusion after comparisons with conventional learning type, especially TG-based. However, the comparisons ignore the contrast with real PM-based learning materials. Besides, some of the TG-based learning material is colored as extraneous information in the experiments of Table 1, but in this paper, the TG-based model is designed according to Chinese Drawing Collection for National Building Standard Design which is not highlighted with color. The PM-based learning material is modeled as well.

2.3. Eye Tracking for Cognitive Processing Measures

Although the effect is proposed that the AR design feature leads to better learning outcomes, there is little substantive evidence that shows how this occurs in the cognitive processing. Fortunately, the AR material is designed based on the CTML, and many researchers have studied how to measure its cognitive activity. Eye tracking, combined with measures of learning performance, provides information about the focus of cognitive activity [31]. Consequently, to identify how learners behave in AR-based and other conventional learning environments, the use of an eye-tracking device is an effective way to provide cognitive processing measures.

Eye-tracking techniques can be utilized to record eye movement which can show how people behave while they are engaged in cognitive processing such as fixation count, total fixation time, and average fixation duration [32, 33]. However, the use and interpretation of eye-tracking measures are different and depend on research questions. A summary of relevant studies in which eye tracking was used to conduct eye-movement measures in multimedia learning and cognition is listed in Table 2. Fixation duration and fixation count are the most prevalently used eye-tracking measures [34]. Generally, for the learning process, both longer fixation duration and lower fixation rates indicate higher cognitive load, and more fixation counts mean less efficient information processing. Moreover, a long average fixation duration means that deeper information processing is led by the complexity of the background information [32, 35, 36]. Besides, the attentional guidance hypothesis proposes that participants pay more attention to salient elements than other elements, which leads to longer fixation times [37].

In summary, three eye-movement measures, including total fixation time, fixation count, and average fixation duration are utilized in this study to demonstrate how learners behaved during the entire formal experimental process for the following reasons: (1) The higher the values of fixation count and fixation time, the more the cognitive load in extraneous processing and the more the distributions in essential processing. (2) The longer the average fixation duration, the deeper the comprehension of the learning material, the more the complex information generated by various information sources, and the more the focus on essential processing. The relationship between the eye-movement metrics and the CTML cognitive processing is shown in Figure 1.

3. Research Questions and Methodology

The literature review shows that many related studies explain the effects of AR by comparing AR-based and TG-based (Table 1). These studies demonstrate the effectiveness of AR. However, they do not reveal the gap with PM-based education, which is also a common teaching method in construction education. The differences in effectiveness between AR and PM need to be examined to leverage the application of AR. Therefore, it is necessary to compare AR-based to TG-based and PM-based to provide convincing evidence with which to explore the effects of AR. On the contrary, although it has been proven that AR has a positive effect on learning outcomes, there is a lack of research works on the exploration and evaluation of AR in the cognitive process. Consequently, the researcher aims to prove the following hypotheses:(1)Compared to TG- and PM-based materials, AR-based materials promote learning outcomes.(2)Compared to the use of TG-based materials and PM-based materials, the use of AR-based materials that are designed using the CTML can lower learners’ cognitive loads and foster deep information processing, which means that AR-based groups will have lower fixation counts and fixation times but higher levels of average fixation duration than TG- and PM-based groups.

To achieve the results, an experiment that involved learning and testing was developed. There were three groups of people who were exposed to three different learning environments: TG-based, AR-based, and PM-based learning environments. Each participant was separately given the same questions. The questions were answered by referring to the learning material provided in the TG-based, AR-based, or PM-based learning environments.

Figure 2 shows the experimental flow. Before the test, learning content and corresponding test questions were prepared. We randomly divided participants into the three groups (AR, TG, and PM). In the cognitive testing process, we recorded the participants’ answers and answer times as their learning outcomes to comparatively analyze the three groups. During the whole testing process, participants’ eye movements were recorded using an eye tracker (SMI iView XTM HED at 50 Hz). The fixation time and fixation count data were obtained using Begaze (iView software). We defined one area of interest (AOI) for each question, and total fixation time, fixation count, and average fixation duration values for each AOI were recorded and calculated.

3.1. Participants

A total of 40 senior undergraduate students majoring in construction management at Chongqing University were invited to participate. Because the samples of eye-tracking-related studies range from less than ten samples for qualitative studies to 30 for quantitative studies [49], a total of 40 samples are robust enough for a quantitative eye-tracking study.

Chongqing University is one of the top 10 research universities in the field of construction management in China. In this study, we use two approaches to invite participants: (1) students of one class were assigned to participate in the study as their final project; (2) an invitation flyer was posted in the laboratory of Chongqing University to invite volunteers to participate in the experiment. Finally, we selected 23 students from the class and 17 volunteers who were attracted by the flyer. To maximally avoid the differences between individuals, we choose participants with the same major (construction management), same grade (forth year), and similar age (21 to 23 years). There were 22 males and 18 females among the participants, and they all took the same courses in college. The students were trained with 32 credit hours of reinforcement arrangement courses in the third year of college, but they all lacked practical experience in construction, meaning that they did not receive any on-site training or have any injury experience in construction. Based on their academic and practical backgrounds, we assumed that these students had similar intrinsic learning abilities. The vision of all participants was either normal or corrected-to-normal.

3.2. Learning and Test Materials

Learning materials were about the detailing of longitudinal bars at the tops of antiseismic corner columns from one Chinese Drawing Collection for National Building Standard Design, 11G101-1 (drawing rules and standard detailing drawings of an ichnographic representing method for construction drawings of RC structure). According to our previous research and interviews with experts with engineering and construction majors in Chongqing University, this is quite an important and basic section of professional knowledge for construction workers. Meanwhile, it is difficult to understand for students who do not have any practical experience. Therefore, we designed three forms of instructional materials based on this content with the guidance of a teacher in the field of construction techniques.

For the TG-based learning environment, the learning material was abstracted from 11G101-1 (Figure 3) and shown on a computer screen for learners.

Figure 4 shows the design of the AR model. The key steel bars are highlighted and distinguished based on their binding methods with various colors. The others are processed with gray to reduce its recognition. Thus, according to multimedia theory, this design could attract attention and help learners reduce extraneous processing. Besides, the key information can be easily selected to manage the essential processing and learners should have a better comprehensive understanding of learning contents with more effective generative processing. If one adopts the CTML, it can be supposed that AR-based learning environments may be more attractive than others, helping learners pay attention to key information.

The AR-based learning environment consisted of a computer with ARToolkit software, a camera, and a paper label. As shown in Figure 4, before the experiment, a virtual model based on the learning content was made with two software programs: Revit Structure, and 3D Max. Then, the ARToolkit was used to connect the model to a paper label. In the learning process, utilizing a plug-in installed in ARToolkit, which was developed in our previous research, put the paper label in front of the camera. The AR model would then appear on the label. The users could observe the model from different angles by rotating the label. Figure 5 shows the workflow of the AR-based learning environment, and the final practical AR-based environment is shown in Figure 6.

As for the PM-based learning environment, a solid model was made with mini-steel bars based on the actual situation on construction site, as shown in Figure 7.

Correspondingly, a test was designed to evaluate learning outcomes within the three different environments, and the test consisted of six questions in total, which in detail, included three true or false and three short-answer questions (Table 3). During the testing process, both learning material and text material were given on the same screen. Learning material was on the left and text material was on the right, with one question on each page. As shown in Figure 8, a cross-sectional drawing was given in the test material, and the configuration of each numbered longitudinal bar was arranged using one of the various ways shown in learning materials. Learners could reference the learning materials based on the questions, and they were asked to figure out the arrangement of each bar and their spatial relationships to give the correct answers. For each question, there was one corresponding AOI in learning material that showed the most important information that learners need to notice and process.

When answering true or false questions, learners were asked to make a judgment about a description associated with the spatial configuration and then answer with “yes” or “no.” For the short-answer questions, on the basis of each question, learners were required to give the correct number of the 12.

3.3. Experimental Procedure

Every participant was randomly assigned to one of three groups. Each participant was provided training materials in TG-based, AR-based, or PM-based form. Referring to these training materials, the participants sequentially answered predesigned questions. Details about the experimental procedure are listed as follows.

3.3.1. Preexperiment Calibration

Participants were told about the purpose of the experiment. Then, they were asked to identify their dominant eye using the facilitator’s instrument so that participants could be fit with the eye tracker (SMI iView XTM HED) with the proper eyeglass—with a sampling rate of 200 Hz. Participants were seated approximately 50 cm away from the front of the screen in which the learning materials were demonstrated. A five-point calibration screen was used to assess the calibration for each participant before each cognitive process. If the accuracy exceeded 1° in the x or y direction, then the calibration was repeated.

3.3.2. Formal Experiment

Every participant was given two minutes to familiarize themselves with the learning content. Six questions were then sequentially demonstrated on the screen (Figure 9). After the participant answered, the research facilitator immediately switched slides to the next question and recorded the participant’s answer. No auxiliary verbal instructions were provided during the entire formal experiment in any group.

During the whole process, participants in the AR and PM groups could ask the research facilitator to rotate the paper label or model according to their own requirements if they wanted to observe from different angles. They were not given opportunities to change their answers.

3.4. Data Analysis

Every participant’s answers and the completion times for every single question were recorded by the facilitator, and learners’ eye movements were recorded by the eye tracker (SMI iView XTM HED) and the associated software (Begaze), which was utilized to build AOI. The total fixation time and fixation count of each AOI could be then calculated and exported.

Table 4 gives a brief definition of each measure. All data were imported into Excel and SPSS for statistical analysis. To identify if there were statistically significant differences among three groups, ANOVA was used to conduct group comparisons. If statistically significant results existed, then further Bonferroni multiple comparisons to identify the significant differences were conducted between the two groups.

4. Results

A total of 40 students participated in this study. However, because the eye-tracking data were missing for six participants, we finally had 34 subjects for analysis in this study, 11 for the TG group, 11 for the AR group, and 12 for the PM group. Thus, 204 (34 ∗ 6 = 204) data points for each index were recorded or calculated. Before mathematical calculation was conducted, all data were checked with SPSS to identify outliers, and the result showed that five completion time data points, eight fixation time data points, six fixation count data points, and three average fixation duration data points were thought of as outliers and excluded during the following statistical analysis.

4.1. Learning Outcomes

As seen in Table 5, generally, the mean scores of the PM group were the highest, with minimum average completion times for both question forms. A significant difference of scores in short-answer questions () was found among three groups, and multiple comparisons (Table 6) showed that the AR group and the PM group scored significantly higher than the TG group on the short-answer questions. No significant differences in scores among the three groups were found in the true or false questions. There were no significant completion time differences among the three groups for either form of question.

People in the AR and PM groups performed better than those in the TG group. The increase in scores was much more significant for the short-answer questions. Contradictory to the first hypothesis, our findings showed that people in the PM group exhibited the same degree of learning performance as those in the AR group.

4.2. Eye-Tracking Measures

The eye-movement data were analyzed using ANOVA to explore learners’ cognitive processes with regard to key information in AOIs.

Tables 7 and 8 show that for fixation time, people in the TG group spent significantly more fixation time on AOI compared to those in the PM group for true or false questions, and there were no significant differences regarding other comparisons between the two groups. The results of fixation count show that for true or false questions, people in the TG group significantly fixed AOI more frequently than the other two groups. However, the result was different for the short-answer questions. Multiple comparisons showed that there were no significant differences between any two groups.

The average fixation duration result showed that significant differences were found in both question forms among three groups. Multiple comparisons determined that for true or false questions, people in the AR group showed a significantly higher level of average fixation duration than those in the TG group. For the short-answer questions, people in both the AR and PM groups showed a significantly higher level of average fixation time than those in the TG group.

The result of all eye-movement measures showed that AR-based learning material did not reduce learners’ fixation counts or fixation times in all conditions. Moreover, no significant difference between AR-based and PM-based learning material was identified. People in the TG group spent significantly less fixation time on the true or false questions than those in the PM and AR groups, which could not fully prove the second experiment hypothesis.

However, the results demonstrate that the effects of AR and PM teaching were different for the two question forms.

Although people in the TG group had similar scores on the true or false questions as people in the other two groups (Table 5), they had significantly longer fixation times and fixation counts. Long fixation times indicate that difficulty was faced in extracting information or that the object is more engaging in some way. Moreover, a high fixation count on AOI indicates inefficiency in identifying relevant information [34, 36, 50]. For the same learning outcomes, the result demonstrated that compared to the TG-based environment, both the AR-based and PM-based environments reduced learners’ cognitive load sand improved their searching efficiency in the learning and test processes.

For the short-answer questions, people in the TG-based group exhibited the same level of fixation time and fixation count as those in the other two groups. However, it should be noticed that on the short-answer questions, participants in the AR and PM groups scored significantly higher than those in the TG group. Consequently, both AR-based and PM-based teaching considerably improved learners’ answering accuracy, but it cannot be determined that which environment means lower cognitive load and searching efficiency by comparing eye-tracking data.

Unlike the two indicators of fixation time and fixation count, the result of average fixation duration showed that for both question types, the AR-based group had the highest level while the TG-based group had the lowest (Table 6). A long average fixation duration is thought to be an indication of deep processing [32]. When related information is easy to target and integrate, learners can likely engage in the deep processing of key information required for meaningful learning [37, 51, 52]. This result indicates that the AR-based learning environment helped learners more easily find and focus on key information for each question, which then lead to deep understanding of the content.

5. Discussion

The main purpose of the study is to understand how AR-based teaching impacts college students’ learning outcomes and learning processes compared to TG-based and PM-based teaching about construction. The result showed that AR-based environments lead to better learning outcomes than TG-based environments, but not compared to PM-based environments. However, the difference on eye-tracking data did not keep the same gap during the whole process.

5.1. Effect of Question Form

Participants in the TG group scored significantly lower on the short-answer questions than those in the AR and PM groups. People in the three groups had similar scores for the true or false questions. In this study, to answer the true or false questions, learners just had to say “yes” or “no.” However, they had given precise and comprehensive numbers of steel bars in the short-answer questions, which required more exact information processing. This result suggests that for some limited tasks, learners with TG-based learning or training environments can achieve ideal performance, despite the high cognitive load and inefficiency of doing so compared to when it is done in AR-based and PM-based environments. Moreover, TG-based teaching has the advantages of low cost and easy implementation. Therefore, for some learning tasks and practical work, TG-based education is the most economical option.

5.2. Effect of Cognitive Load and Emotion

Another reason why the participants in the TG-based group scored significantly worse on the second question form is related to cognitive load and motivation. As a positive emotion in cognitive processing, interest is closely related to motivation and attention, and those who with interest show greater persistence on subsequent tasks. Cognitive load may affect emotional state and further hamper effective visual search [5355].

Before they started to learn, all learners in the three groups were thought to have positive emotions and motivations. Their performances at the beginning were based on the same emotion. In this study, the sequence of the test was three true or false questions followed by three short-answer questions. The TG-based group scored at the same level as the other two groups with significantly more fixations in the first three questions. We supposed that learners in the TG-based group experienced excessive cognitive load at the beginning, which further had a negative impact on their motivation, so they were not motivated enough to pay adequate attention to information processing. Thus, it led to the increasingly worse learning outcomes on the final three questions.

5.3. Effect of AR

Compared to the PM-based learning environment, the AR-based learning environment did not show a competitive advantage in learning performance or significant difference in eye-movement data with the exception of average fixation duration. The result showed that although the result of longer average fixation duration indicated that learners in the AR-based group more easily found and focused on key information and then had a better understanding of the learning content than others, this did not translate into superior learning outcomes. After the experiment, a few students were invited to experience all three learning tools. They generally thought that compared to the traditional TG-based learning method, both AR and PM are obviously helpful for them to understand the learning material. However, they did not indicate that there were significant differences between the effects of AR and PM. Their subjective is in agreement with our experimental result. It further indicates that the features and advantages of AR were not sufficiently utilized.

In practical application, AR has superiority in flexibility and convenience. In contrast to PM-based education, users can build AR-based learning or training environments with no limit on time, and the displayed objects can be repeatedly modified and utilized. Thus, AR has great potential and prospects. However, efficiently utilizing the features of AR to help learners or trainers achieve improved performance is not only the key to maximize its value but also the most persuasive reason for its application, which calls for further studies. It is worth exploring for which tasks AR is the most suitable environment or whether other ways need to be combined with AR to improve teaching and training efficiency.

6. Conclusion

In this study, we applied TG-based, AR-based, and PM-based learning environments for construction learning. We compared learners’ learning outcomes and utilized eye tracking to explore the cognitive processes of the three groups.

For learning outcomes, our research suggests that the effects of learning environments are different for various forms of tasks. The three-dimensional display should have the advantage of showing objects more comprehensive and intuitively than other displays, but our study showed that, in terms of outcome, conventional TG-based training ways can achieve the same degree of AR-based and PM-based in some specific tasks, such as answering true or false questions. In practical application, the content and demand for learning and training are diverse for different majors and posts. AR and PM are not as effective in all cases. One should be careful and selective on the application and popularization of the new method.

Eye-tracking data provided quantitative evidence about the cognitive process. Both AR-based and PM-based environments helped learners reduce their cognitive loads compared to those in the TG-based group. However, lower cognitive loads did not transform into significantly higher test scores or quicker completion times compared to other groups. Similarly, eye-tracking data showed that AR has the potential for learners’ key information focus and deeper understanding, but learners in the AR-based group did not show better learning performance than those in the other groups. This result suggests that to achieve improved outcomes, maybe we should combine other materials, such as 2D drawings and text, or perform more reasonable adjustments when modeling. To explore how to take full advantage of AR or other similar technology in practical application, additional research needs to be developed and integrated to provide an in-depth understanding of learners’ mental models and cognitive processes.

In summary, this study illustrates the effects of TG-based, AR-based, and PM-based environments on construction learning outcomes and learners’ cognitive processes. However, it remains limited by learning the single material and a few independent test questions. Future researchers should apply AR to systematized tasks and perform comprehensive tests to evaluate the effects of doing so.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors would like to extend their appreciation to the Fundamental Research Funds for the Central Universities of China (no. 106112016CDJSK03XK06) and the Natural Science Foundation of China (no. 51578317) for vital support.