Abstract

Since the emergence of deep learning-based chatbots for knowledge services, numerous research and development projects have been conducted in various industries. A high demand for chatbots has drastically increased the global market size; however, the limited functional scalability of open-domain chatbots is a challenge to their application to industries. Moreover, as most chatbot frameworks employ English, it is necessary to create chatbots customized for other languages. To address this problem, this paper proposes KoRASA as a pipeline-optimization method, which uses a deep learning-based open-source chatbot framework to understand the Korean language. KoRASA is a closed-domain chatbot that is applicable across a wide range of industries in Korea. KoRASA’s operation consists of four stages: tokenization, featurization, intent classification, and entity extraction. The accuracy and F1-score of KoRASA were measured based on datasets taken from common tasks carried out in most industrial fields. The algorithm for intent classification and entity extraction was optimized. The accuracy and F1-score were 98.2% and 98.4% for intent classification and 97.4% and 94.7% for entity extraction, respectively. Furthermore, these results are better than those achieved by existing models. Accordingly, KoRASA can be applied to various industries, including mobile services based on closed-domain chatbots using Korean, robotic process automation (RPA), edge computing, and Internet of Energy (IoE) services.

1. Introduction

Chatbots are communication tools that can be used to achieve a goal through automated dialog without human intervention [1]. Similar to a personal assistant, chatbots have been used in various applications, such as booking restaurant reservations, assisting with shopping, and conducting web searches. In recent years, many businesses around the world have strengthened their competitiveness by using chatbots and, as a result, many industries in Korea have also introduced chatbots in their services [2]. The global market size of chatbots is increasing at a rapid pace, as the benefits they provide have produced a high demand. Accordingly, large companies such as Facebook, Google, and Naver have started providing chatbot services as part of their professional services [3]. For example, in April 2016, eight years after Steve Jobs introduced the App Store, Mark Zuckerberg of Facebook announced the launch of Facebook Messenger platform, which includes chatbots. The number of users on that platform has reached over one billion, which exceeds the number of iPhone users that existed at the launch of the App Store.

However, application of open-domain chatbots is not effective in every industry because of the limited functional scalability of these chatbots. Moreover, as most chatbot frameworks employ English, it is necessary to create chatbots customized for other languages. To this end, this paper proposes KoRASA as a pipeline-optimization method using a deep learning-based open-source chatbot framework to understand the Korean language. KoRASA, which is a closed-domain chatbot, consists of four stages: (1) a tokenization step, which separates the corpus into tokens, i.e., language elements that cannot be further grammatically divided; (2) a featurization step, which extracts the features of each token; (3) an intent classification step, which classifies the intent of each question posed to the chatbot; and (4) an entity extraction step, which extracts the relevant entity from the question sentence. In the present study, an optimization experiment was performed at each stage with respect to the Korean language understanding. Based on datasets reflecting common tasks in industry, the accuracy and F1-score of our proposed method were 98.2% and 98.4% for intent classification and 97.4% and 94.7% for entity extraction, respectively. Accordingly, KoRASA can be used not only for developing a closed-domain chatbot for the Korean language, but also for various industrial purposes such as mobile chatbot services, edge computing services, and robotic process automation (RPA) services.

The remainder of this paper is organized as follows. In Section 2, related work is discussed. The method of our proposed work is explained in Section 3 and we provide experimental results in Section 4. Finally, we conclude the paper with future research directions in Section 5.

2.1. Chatbot Framework

As numerous companies have started introducing and utilizing chatbots in recent years, various research and development projects have been conducted to improve the functionality of chatbots. Adamopoulou et al. [4] introduced the history, technologies, and applications of chatbots, and Cahn [5] proposed the architecture and design of chatbots and the process of chatbot development, thereby presenting guidelines for chatbot researchers and developers to follow. Figure 1 shows the architecture of a typical chatbot. When a user inputs a message, the intent is classified, and an entity is extracted by the natural language understanding (NLU) component. A tracker maintains a conversation state, while the intent and entity (which have been classified and extracted by NLU) are put into slots to await further processing. A dialog management system then designates the next action based on previous actions, action results, and the contents of the slots. The message generator then provides the user with a corresponding response. If it is necessary to extract data from a database or URL, an application programming interface (API) is used for communication, transmission, and reception.

Gwendal et al. [6] suggested a Xatkit framework to develop a multimodal low-code chatbot. Xatkit comes with a runtime engine that automatically deploys the chatbot application and manages the defined conversation logic over the platforms of choice. Addi et al. [7] developed a chatbot based on knowledge graphs, which led to the emergence of chatbots capable of learning diverse data types. Furthermore, Park et al. [8] developed a framework for developing dialog systems in a smart home environment. The framework ontologically presents the knowledge required for the task-oriented dialog system’s process and can build a dialog system by editing the dialog knowledge. Reshmi and Balakrishnan [9] proposed an interface framework for providing customer service chatbots. More specifically, they proposed an integration method with big data as a knowledge base for the chatbots that can enable the generation of dynamic responses to user queries and improve the analytical capability of chatbots with data from a distributed environment. By using chatbots in this manner, developers can create services that are tailored to specific fields and increase the productivity of companies. However, because the chatbots proposed by these studies are based on frameworks that were designed for specific purposes, their development and use in a broad range of industries are limited.

2.2. Natural Language Understanding (NLU)

The key technology used in chatbots is NLU, which involves transforming natural language expressions into a machine-readable format. Figure 2 illustrates the architecture of NLU. The preprocessing stage of NLU is divided into two substages. The first substage is tokenization, in which a corpus is divided into tokens, which are the grammatically indivisible units of a language. The second substage is featurization, wherein the features of each token are extracted. After the preprocessing stage, intent is classified to ensure that the user’s request message is sufficiently understood, and an entity is extracted to provide a suitable response. Through this process, a user-friendly chatbot system can be developed.

There are two types of NLU [10]: rule-based and deep-learning-based. Rule-based NLU depends on features extracted by humans, which means that it takes an extended period to manually extract features from sample user input. In addition, the accuracy of rule-based NLUs decreases as document volume increases. Deep-learning-based NLU automatically extracts and learns features from the data. Compared with the existing rule-based NLUs, it enables wider contextual processing and can also be used to build multimodal models by combining it with models that learn other types of data, such as images and voices. Accordingly, many recent studies have actively employed deep-learning-based NLU.

For example, Radford and Narasimhan [11] conducted research on improving natural language understanding using the generative pretraining model. Liu et al. [12] designed and developed multitask deep neural networks for natural language understanding. Jiao et al. [13] developed a method using the popular transformer-based machine learning technique BERT for natural language understanding. In the field of NLU research, studies on understanding natural language using transformer are becoming increasingly popular [14].

Among the different deep-learning-based NLU services, LUIS, Watson Conversation, API.ai, and RASA have been widely used in recent years [1518]. Braun et al. [19] evaluated these NLU services for conversational question-answering systems (see Table 1). Such NLU services share the same basic concept: users can train intent classification and entity extraction based on example data. In the performance evaluation of the NLU services of chatbots, RASA and LUIS achieved an F-score of 0.94, whereas Watson Conversation and API.ai only achieved scores of 0.74 and 0.68, respectively. In particular, RASA, an open-source chatbot framework, is a strong framework as it enables users to easily develop the functions of machine-learning-based dialog management and natural language understanding. Therefore, in the present study, we adopted RASA as a deep learning-based open-source chatbot framework [18]. However, RASA is optimized for English; thus, to develop a chatbot for use in Korean industries, the framework must be optimized through performance experiments, examining the different stages in the pipeline, to enable it to understand the Korean language.

Recently, deep learning has been actively applied in various industrial as well as NLU fields. O’Shea and Hoydis [20] developed a new fundamental method of considering communications system design as an end-to-end reconstruction task. They demonstrated the application of convolutional neural networks on raw IQ samples for modulation classification. Aceto et al. [21, 22] proposed a general framework for deep learning-based mobile and encrypted traffic classification based on a rigorous definition of its milestones. They overcame the design limitations of previous works by envisioning the simultaneous use of multimodal and multitask techniques. Hwang et al. [23] proposed Gen2Vec, a search engine-based framework for operating power plants using deep learning. Gen2Vec can be the core engine of the knowledge system for power plant operators and can be used in search and chatbot services for power plant operation programs and new employee training. Wang et al. [24] proposed a deep learning-based ensemble approach for probabilistic wind power forecasting. They could effectively extract the nonlinear and stochastic nature exhibited in each wind power frequency, and the latter could cancel out the diverse errors. Wang et al. [25] proposed a method of detecting potential adverse drug reactions (ADRs) using a deep neural network model. They could not only discover the potential ADRs of drugs but also predict the possible ADRs of new drugs. Roman et al. [26] proposed a machine learning pipeline for battery state-of-health estimation, providing insights into the design of scalable data-driven models for battery SOH estimation, emphasizing the value of confidence bounds around the prediction.

3. KoRASA: Pipeline Optimization for Open-Source Korean Natural Language Understanding Based on Deep Learning

3.1. Pipeline Architecture of KoRASA

KoRASA is an optimization method that uses a deep learning-based open-source chatbot framework to understand the Korean language. KoRASA is developed based on the RASA framework [18], and it follows a typical NLU process, including tokenization, featurization, intent classification, and entity extraction, as shown in Figure 2. Figure 3 illustrates the pipeline architecture of KoRASA as proposed in this study. The tokenizer uses the Korean version of Mecab, and a count vector featurizer is adopted [27, 28]. Then, the Dual Intent Entity Transformer (DIET) is used for intent classification and entity extraction of KoRASA [29]. DIET is capable of fast learning using a common baseline model. However, as KoRASA is intended to be applied to the industrial field; the focus is not on the learning speed of the model but on its ability to perform intent classification and entity extraction. Accordingly, the architecture in this study was optimized by tuning its parameters. Moreover, to maximize the performance of entity extraction, a mapping function was added using the Entity Synonym Mapper, and the Transformer Embedding Dialogue (TED) policy was applied [18, 30].

3.2. Optimized Structure of DIET Model (DIET-Opt)

Figure 4 shows the optimization architecture of the DIET model of KoRASA (DIET-Opt). We refer to the Rasa blog for parameter optimization in KoRASA [31]. Tokens are extracted by the Korean tokenizer and are then input into the transformer layer through feed-forward layers. The sequence (), which is the output from the transformer layer, becomes the sequence () of entity labels and an input value for the conditional random field (CRF) algorithm. As shown in equation (1), the entity loss () can be calculated using the negative log-likelihood of CRF.

The CLS token () and intent labels (), which refer to the sentence encoding of an input sentence, are input into the respective embedding layers (, , where ). In addition, as expressed in equation (2), the dot-product loss used here not only maximizes the target label () and the similarity measure () but also minimizes negative samples () and similarities () [27].

Tokens include mask tokens, and values that are output from the transformer layer following the feed-forward layer are then input into the embedding layers. There are two embedding layers: an embedding layer for mask tokens and another for unmasked actual tokens. In practice, 15% of the input tokens are randomly selected in each sequence. A total of 70% of the tokens selected are then replaced by a special mask token MASK, and 10% of them are replaced by random tokens. For the remaining 20%, the original tokens are used directly as input. As shown in equation (3), the output of the transformer () for each selected token () is provided through a dot-product loss that is similar to the intent loss.

Finally, the model is trained to minimize the total loss (), which can be expressed by

Although the overall architecture of KoRASA is similar to the baseline model of DIET (DIET-Base), various experiments were performed to improve the performance of its intent classification and entity extraction processes. An experiment was conducted to determine the optimal number of epochs for best performance. Instead of the traditionally used 300 epochs, 500 epochs were adopted as the optimal number. The number of transformer layers was also increased from 2 to 4, while the weight sparsity was decreased from 0.8 to 0.7. The number of embedding dimensions was increased from 20 to 30, and the hidden layer size increased from 256 to 512, thus optimizing the parameters (see Table 2).

4. Experiments and Results

4.1. Dataset

For the experiments performed in this study, datasets were built based on common tasks applicable to most industries. For the intent classification experiment, a total of 487 datasets were generated with the following six components: greeting (Greet), closing (Goodbye), menu (Food_Menu), department contact (Dept_Contact), division of work (Pers_Work), and calculator (Calc) (see Table 3).

For the entity extraction experiment, a total of 3,354 datasets were generated with the following seven entities: date (Date), department (Dept), work (Work), name (Name), time (Time), number (Num), and absence of entity (No_Entity) (see Table 4).

4.2. Preprocessing

As shown in Figure 2, the preprocessing of the NLU can be divided into tokenization and featurization stages. In terms of tokenization, the performance of intent classification and entity extraction was measured by applying three tokenizers: the WhiteSpace Tokenizer, ConveRT Tokenizer, and Korean version of Mecab [18, 27, 32]. The count vector featurizer (CV), fallback classifier, and DIET-Base, which are the three baseline models recommended by RASA, were used as algorithms for featurization, intent classification, and entity extraction [18]. The tokenizers were compared with respect to the intent classification and entity extraction performance. The intent classification performance was similar to that of the tokenizer. With regards to entity extraction, however, the WhiteSpace Tokenizer and the ConveRT Tokenizer had an accuracy of 0.073 and F1-score of 0.050, whereas the Mecab tokenizer exhibited significantly better performance with an accuracy of 0.989 and an F1-score of 0.959 (see Table 5).

For featurization, a comparative experiment was conducted by applying the following three featurizers: Regex Featurizer, Lexical Syntactic Featurizer (LS), and Count Vector Featurizer (CV), while the Korean version of the Mecab tokenizer (Mecab (Ko)) was being applied [18]. Combinations of these featurizers were applied, and the top three performance results were charted. The entity extraction performance of the Count Vector Featurizer was 0.905 (F1-score), which was at least 0.3% higher than those of the combined Regex + LS + CV Featurizer and the Regex Featurizer (see Table 6).

4.3. Intent Classification and Entity Extraction

A comparative experiment was performed to identify the optimal number of epochs for constructing the DIET-Opt model of KoRASA (see Figure 5). The accuracies and F1-scores for intent classification and entity extraction were analyzed when the model was trained with 100 to 900 epochs. In the experiment, when the epoch number was set to 500, the accuracy and F1-score of intent classification were found to be 0.982 and 0.984, respectively, and those of entity extraction were found to be 0.974 and 0.947, respectively.

The intent classifier of KoRASA proposed in this study was also compared with other intent classifiers on its intent classification and entity extraction performance. To this end, an experiment was conducted for the keyword classifier, fallback classifier, and DIET-Base classifier models (see Table 7) [18, 29]. The performance evaluation results revealed that the F1-score of KoRASA was 19.8%, 0.3%, and 0.4% higher in intent classification and 3.9%, 4.2%, and 3.7% higher in entity extraction than those of the keyword classifier, fallback classifier, and DIET-Base classifier, respectively.

An example of the results of intent classification conducted on the experimental data is shown in the following confusion matrix (see Table 8). After performing a total of six different types of intent classification, the greeting (Greet) and the closing (Goodbye) datasets were recognized as being composed of similar words, meaning that it was difficult to classify intent between those datasets. However, the intents of some of the other datasets, such as menu (Food_Menu), department contact (Dept_Num), and calculator (Calc), were completely classified.

In addition, the entity extractor of KoRASA was compared with other entity extractors. An experiment was conducted to compare the performance of KoRASA with CRF, a combination of DIET-Base and CRF (DIET-Base + CRF), and DIET-Base (see Table 9) [29, 33]. The experimental results revealed that the F1-score of KoRASA was 1.5%, 1.2%, and 0.4% higher in intent classification and 4.2%, 4.3%, and 3.7% higher in entity extraction than those of CRF, DIET-Base + CRF, and DIET-Base, respectively.

Exemplary results from the experimental evaluation of entity extraction conducted are shown in the confusion matrix in Table 10. The results from seven different types of entity extraction show that our model demonstrated high accuracy for numerical datasets such as date (Date) and time (Time), as well as for proper nouns such as department (Department) and work (Work). Alternatively, the performance with the name (Name) dataset was lower in terms of accuracy than with the other datasets.

To understand the optimization performance of each stage, we compared the experimental results between (1) the baseline model of RASA; (2) the optimized preprocessing model, which included tokenization and featurization stages (KoRASA(PP)); and (3) the model with optimized preprocessing, intent classification, and entity extraction stages, i.e., the model applying DIET-Opt (KoRASA(PP&IC&EE)) (see Figure 6). Compared with the baseline model of RASA, the results of the KoRASA (PP) model demonstrated comparable performance for intent classification. In contrast, entity extraction performance increased by 88.6% (accuracy) and 85.5% (F1-score). Finally, the model in which KoRASA (PP&IC&EE) was applied demonstrated similar results for intent classification. Contrarily, the entity extraction performance increased by 90.1% (accuracy) and 89.7% (F1-score).

5. Conclusion

This paper proposed KoRASA as a pipeline-optimization method for a deep learning-based open-source chatbot framework designed to understand the Korean language. KoRASA is a closed-domain chatbot applicable to most industries in Korea. It consists of four stages (tokenization, featurization, intent classification, and entity extraction) and is optimized to understand the Korean language. The tokenization stage uses the Korean version of the Mecab tokenizer, and the featurization stage adopts the Count Vector featurizer. For the intent classification and entity extraction stages, this study developed the DIET-Opt model by parameter tuning and optimizing the DIET-Base model. According to our experimental results, our model demonstrated an accuracy of 98.2% and an F1-score of 98.4% for intent classification. These scores were 19.8%, 0.3%, and 0.4% higher than those of the keyword classifier, fallback classifier, and DIET-Base classifier, respectively, which are popular existing intent classifier algorithms. Regarding entity extraction, the experimental results demonstrate that our model had an accuracy of 97.4% and an F1-score of 94.7%. Thus, the DIET-Opt model had 1.5%, 1.2%, and 0.4% better performance than those of the CRF, DIET-Base + CRF, and DIET-Base models, respectively, which are popular existing entity extraction algorithms. When the results of intent classification and entity extraction are charted using the Confusion Matrix, the improved performance of our proposed method was confirmed across the different datasets, with the exception of the greeting (Greet), closing (Goodbye), and employee name (Name) datasets, which are recognized as being composed of similar words. Therefore, we confirmed the necessity of constructing an accurate dataset for effective chatbot training. The results of our experiment demonstrate that KoRASA is useful not only for developing a closed-domain chatbot for the Korean language, but also for various industrial areas such as mobile chatbot services, RPA interworking services, edge computing, Internet of Energy (IoE), and smart grids. We will apply KoRASA to RPA Chatbot and optimize its tokenization method using word embedding for Korean language. In addition, we will construct an open dataset and develop an automatic optimization function to adapt to additional datasets.

Data Availability

No data were used to support this study.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this study.

Acknowledgments

This work was funded by the Korea Electric Power Corporation (KEPCO).