Natural language processing (NLP) is a critical part of the digital transformation. NLP enables user-friendly interactions between machine and human by making computers understand human languages. Intelligent chatbot is an essential application of NLP to allow understanding of users’ utterance and responding in understandable sentences for specific applications simulating human-to-human conversations and interactions for problem solving or Q&As. This research studies emerging technologies for NLP-enabled intelligent chatbot development using a systematic patent analytic approach. Some intelligent text-mining techniques are applied, including document term frequency analysis for key terminology extractions, clustering method for identifying the subdomains, and Latent Dirichlet Allocation for finding the key topics of patent set. This research utilizes the Derwent Innovation database as the main source for global intelligent chatbot patent retrievals.

1. Introduction

Despite the global impact of COVID-19, almost 80% of global artificial intelligence (AI) projects have maintained the same or even increasing the investments of R&D since the beginning of the pandemic. AI-based systems nowadays are widely adopted for decision makings, which have a profound impact on individuals and society. The so-called intelligent systems are mostly driven by machine learning (ML) or deep learning (DL) algorithms with their models being trained and tested by big data [1]. As an important application of AI technologies, smart chatbots (or called intelligent chatbots) help answer a large number of questions related to the pandemic [2]. Statistics provide reliable insights into trends in the intelligent chatbot development. Reported by Business Insider, the market size of chatbots is expected to grow from US$2.6 billion in 2021 to US$9.4 billion in 2024, with a compound annual growth rate (CAGR) of near 30% [3]. The study shows that more than 50% of customers, in various business sectors, expect businesses to be open 24/7. Chatbots, or virtual agents, enable company organizations to answer and take care simple questions and requested tasks from call centers, help desks, and service agents, and at the same time pass more complex issues to the real staffs and personnel, thereby controlling the human resource costs. Chatbots can save up to 30% of customer support costs with shortened response time and answering up to 80% of regular questions [4].

The applications of intelligent chatbots have increased rapidly in recent years. A lot of research delves into the details of AI and DL algorithms for chatbot solutions and applications in pursuits of high efficiency and intelligence. Even though the development of chatbot seems to be booming, thorough review of the life cycle of chatbot developments and key technologies are in great needs. Furthermore, with the popularity of the Internet and social platforms, a digitally transformed environment for the uses of smart chatbots (as human machine communication interfaces) has become largely popular. More and more applications offer “life” services by mounting voice-interactive assistants; that is, smart chatbots, which hold regular conversations and provide online services interactively with users, are becoming a trend [5]. As a technology pioneer or market leader, based on the comprehensive review, one can identify innovative technologies or applications to maintain its lead. As a company that wants to follow the trend of digitization and entering into chatbot applications, betting resources on the most valuable development and finding the right breakthrough are the best strategy, knowing the state-of-the-art technologies and applications through the review article. To bridge the research gap, this research aims to use the intelligent ontology extraction and patent-mining methodology to review comprehensive chatbot-related patents and their innovative technologies and applications.

Chatbot is a computer program that allows computers to mimic human communications and conversations. At first, chatbot can only answer standard questions where questions and answers are known and saved in the system. With the technological advances, computers can gradually answer a freelanced question like human by passing a Turing Test, which is closer to a human intelligence [6]. With the rapid development of AI in recent years, intelligent chatbot has entered a new era and has been widely applied in many industries. For example, the voice customer query interfaces of large shopping malls, bank chatbots for monthly account balance queries, and even the well-known Siri reflect how chatbot technology gradually enters into people’s daily lives through intelligent interfaces. NLP is becoming the norm for obtaining information, allowing companies to easily obtain key information from text documents, thereby enhancing operational efficiency or improving service levels. NLP also has many applications in other fields. Taking the medical industry as an example, NLP technology detects signs of cognitive impairment by analyzing the conversations between the elderly and patients with Alzheimer [7]. In the banking industry, optical character recognition (OCR) and NLP technologies are used to automatically capture key document text and perform document content reviews to speed up the lending process [8]. For the catering services, NLP is used to analyze customers’ comments and emotions for improving services or performing precision marketing [9].

NLP-enabled chatbot is a complex system. Starting from the front-end user inputting utterance, the natural language understanding (NLU) module of chatbot judges the user’s intent from the user’s natural language expression. Next, the dialogue management module finds contents that can answer the user’s request. In this process, different types of databases may be accessed for finding answers. Finally, the natural language generation (NLG) module converts the collected contents into human-readable expression as the response to the user [10]. NLP-enabled chatbot is also a smart system that integrates many AI technologies. The chatbot technology that uses AI to imitate human conversations has begun to mature and provides accurate solutions or answers to complex questions. Because natural language-enabled chatbots have the ability to map oral or written inputs to intent, they become popular in many applications, such as in manufacturing or service industry. Before there were chatbots, when employees wanted to obtain data from the company’s information system, they needed to log into the system, select the corresponding function, find the corresponding file searching through complex file folders, and then finally access the needed information. With a chatbot, a single verbal request can complete the task. Among enterprise-level applications, there are few voice-enabled chatbots, but the demand for such functions is increasing. In addition, on the premise of satisfying basic service functions, soft functions are essential to the success of chatbots. Chatbots that incorporate features such as tone, emotion, and personality are desirable. Furthermore, smart chatbots tolerate human errors or allow fuzzy requests, still generate accurate answers, and are very attractive [11].

NLP technology is an important branch of AI. It studies the use of computer software, such as machine learning (ML), to intelligently process natural language. The basic NLP technology is mainly developed around seven levels of language, including phonemes (language pronunciation patterns), morphology (words, how do letters form words, the morphological changes of words), vocabulary (the relationship between words), syntax (how words form sentences), semantics (the corresponding meaning of language expression), pragmatics (semantic interpretation in different contexts), and chapter (how sentences are combined into paragraphs).

As AI drives the transformation of the digital economy, companies should also pay more attention to intellectual property (IP) innovation and management. Therefore, it is expected that the latest trend of chatbot development can be found from collective patent information. Through the patent layout (or landscape), important technology development trends can be evaluated, and the development direction of important international manufacturers can be found, and international technology benchmarks can be used as a reference for subsequent R&D investment decisions [12].

According to statistics from the World Intellectual Property Organization (WIPO), more than 80% of emerging technologies with commercial values are patented, which shows that the patent database consists of comprehensive domain knowledge. The purpose of the patent database is not only to provide a search for prior arts, but also to obtain a wealth of information for future R&D. For example, when key patents are found, the technology development trends can be extrapolated, the technical contents of domain patents can be analyzed, and the core countries, assignees, and inventors of the key technologies can be identified. By making good use of such patent information, companies can develop various business and management strategies [13].

In order to understand the latest emerging technologies of chatbots, this study takes “natural language-enabled chatbots” as the domain for relevant patent technology exploration. Thus, the overall chatbot technological development trends can be discovered and future research directions can be suggested.

Before investigating natural language-enabled chatbots, a well-constructed knowledge ontology is needed. Afterwards, the global patent management landscape map and technology function matrix are presented. After that, a discussion of the analytical results will be presented to show the interesting technology trends we found and verified with the matching literature. In this study, some text-mining tools are used, such as clustering and topic modeling. Saura [14] summarized the types of 11 analysis methods of data sciences (DS) in digital marketing and provided good support for the patent-mining analysis method used in this study.

2. Literature Review

2.1. Patent Review Workflow

Past patent reviews are usually analyzed by experts. However, with the increasing number of patents and the development of information technology [15], most patent reviews are now performed through text-mining technology. Even with the assistance of text-mining technology, if there is no systematic patent review workflow, it is likely to cause the deviations from the subject of patent analysis. Abbas et al. [16] present an overview of the research workflow and tools on patent analysis. They divide the patent review workflow into three parts, including preprocessing, processing, and postprocessing. Retrieving patents and transformation into structured data are for preprocessing. Extraction of structures is for processing, including key term extraction and specific statistical data. Patent analysis approaches are for postprocessing that is classified into two categories, text-mining-based approach and visualization approach [17]. Kim and Bae [18] present a method for forecasting emerging technology of health care by patent analysis. They define the patent review workflow that can be divided into four stages, domain patent acquisition, technology clustering, technology defining, and evaluating patent clusters. They also mention that technology clustering results may vary depending on the analyst. In order to avoid a lack of objectivity, they focus on Cooperative Patent Classification (CPC) for forecasting emerging technology. In terms of the nonpatent literature, it is also an ample source of analyzing emerging technologies. Thilakaratne et al. [19] present a literature-based research workflow. They define the article retrieval process in detail for avoiding missing any related articles, including the retrieval rules and the standards of selection. In the article retrieval process, defining main research purposes, key words, and searching strategies are three parameters for determining the patent database. After constructing the patent database, they use systematic criteria to determine the literature is relevant or not. There have three stages for filtering the literature: the first stage is analyzing title and abstract, the second stage is analyzing introduction and conclusion, and the last stage is a complete reading and using a quality checklist. After that, visualization techniques are used to present their findings. In summary, the entire patent review workflow can be summarized into three main parts, patent search for determining the database, patent analysis for extracting key information, and result display for presenting the result in an easily understandable way.

Govindarajan et al. [20] proposed a systematic research flow for industrial immersive technology. Start by identifying the domain definition and confirming the scope of the research, and then after the main domain technical review, then keyword identification and ontology generation are carried out. The method cross-references a large number of technical articles and essential patents, ensuring a high coverage of technical information in specific fields. Finally, with text-mining technology, LDA topic modeling method, and TFM, a complete research flow structure is formed.

2.2. Patent Database

In a knowledge-based economy, the economic status of a country depends on the production, distribution, and use of knowledge and information. The latest trend of economic growth in various countries mainly depends on the individual’s innovative technological knowledge, which is an important reason why intellectual property has attracted attention. Information related to intellectual assets, such as technical insight and legal status, cannot be obtained from any other literature search except for the patent database. Thus, the importance of the patent database can be revealed [21]. Krejcar et al. [22] compared several common large-scale patent databases, including AcclaimIP, Symphony Innovate, Inteum, IPzen, FoundationIP, Thomson IP Manager, and Derwent Innovation (DI), and pointed out the power of DI. The DI database uses the scientific literature, global patent data, and commercial data, so it can make more confident decisions in IP. Powerful analysis functions and simple workflow tools make DI be the best solution.

Derwent World Patents Index (DWPI) and the smart search function are two major features of DI. DWPI is a process of translation, rewriting of key abstracts, content debugging, and normalization of patent holders after experts have read the entire official patent disclosure materials, which is considered to be the essence of the patent content. The DWPI rewritten items include novelty, use, advantage, technical focus, detailed description, drawing description, activity, and mechanism. Every operation of DI simultaneously searches the official patent publications and DWPI patent value-added database to obtain more complete results. This is also the unique feature of DI. Smart search will analyze the word string semantically and automatically expand keywords, and then go through multiple steps of calculation, including weighting of classification numbers and weighting of citations, to find patents related to the input technical description. Grammar is not that important here, because smart search will remove conjunctions, prepositions, etc. in the description and only retain the technical keyword description. Therefore, whether the words used in the technical description are accurate or whether they are mixed with too many unnecessary technical conditions have more influence on the search results than the grammar. If the keywords left by smart search after analyzing the string are not as expected, or the results found by smart search for the first time do not meet the requirements, manually adjust, including adding new keywords in the search pane, or removing possible noise to let smart search recalculate new results. After several adjustments, the result of smart search will be closer to the demand. Smart search is an iterative process, the purpose is to quickly find potential targets, and if you want to search all related patents without omission, it is suitable to use general patent search technique [23].

2.3. Ontology Construction

An ontology map for a specific domain connects the relevant subjects and key terms, provides a domain knowledge-rich structure that can be as the basis for analyzing technologies in depth. Weng et al. [24] presented a lexicon-based ontology construction method, which utilized term frequency and weighted factor to identify the relationship between key terms. If a term has significant weight, then it will be imported to the lexical database. The critical words for constructing the ontology are selected from the lexical database. Trappey et al. [25] proposed an information extraction approach and a knowledge-based ontology construction method for smart retailing technology mining, in which unsupervised ML methods are applied, including clustering and Latent Dirichlet Allocation (LDA), to construct a complete ontology by continuously refining. Tsatsou et al. [26] proposed an automatically constructing ontology method, which utilized the term frequency-inverse document frequency (TF-IDF) technique to determine key terms that may be branches or nodes of the ontology. Subhashini and Akilandeswari [27] mentioned that constructing an ontology is required to follow the six key steps, determining the scope of the ontology, capturing related data, encoding those useful data to machine-usable, integrating the results, evaluating the results, and documenting the ontology. In summary, constructing an ontology can mainly be divided into three parts, data source, determining the relationship between terms, and effectiveness evaluation.

2.4. Patent Mining

Patent documents contain important research results. However, they are lengthy and rich in technical terms, so analysis requires a lot of manpower, and there is an urgent need for automatic tools to assist patent engineers or decision makers in patent analysis. The importance of patent mining is thus seen. Patent-mining technology includes text segmentation, abstract extraction, feature selection, term association, cluster generation, topic identification, and information mapping [28]. In addition to the extensive use of LDA topic modeling methods in ontology construction, it is also very popular in patent mining.

In the patent analysis application of drones, through LDA, the three most active technology development themes such as communication technology, power supply, and navigation system are found [29]. Based on LDA, Korobkin et al. [30] proposed a new patent-mining method, which includes statistical and semantic analysis of patent documents, machine translation of patent applications, and calculation of semantic similarity between patents and applications. In the aspect of term association, Hu et al. [31] utilized a skip-gram-based model to extract key terms from patents and compared the proposed approach with the TF-IDF method. In terms of cluster generation, k-means is still powerful. Shanie et al. [32] used the k-means method to cluster patent documents related to green tea, in which the adaptive cluster number determination method is adopted based on silhouette score. Recently, ML methods for patent analysis have also begun to appear. Li et al. [33] proposed a DeepPatent that combines the convolutional neural network (CNN) model with the word embedding model for classifying patents. Lee and Hsiang [34] fine-tuned a bidirectional encoder representations from transformers (BERT) model to classify patents and compared the fine-tuned model with the previously mentioned model, DeepPatent, and the result shows that the precision is 9% higher. Jun [35] proposed a method for technical integration and analysis using boosting (an ML algorithm that can be used to reduce bias in supervised learning) and ensemble learning. This method uses regression trees, random forests, extreme gradient enhancement, and ensemble models After analyzing the integrated patent data, it can be extended to technology integration and analysis in more than three technical fields.

2.5. Technology Function Matrix (TFM)

To further focus on the patent development context of a specific technical field and find a technical minefield or a technical blue ocean zone, it is necessary to analyze the technical location and function of each patent through a more detailed TFM, and further explore in-depth strategies, such as technological innovation or avoiding development conflicts [36]. In the patent analysis of cyber-physical systems (CPSs) and Industry 4.0, Trappey et al. [37] adopted domain ontology and International Patent Classification (IPC) as the basis of TFM. However, IPC and Cooperative Patent Classification (CPC) are general classifications. When exploring technology in a specific field, a large number of patent documents may have the same or similar classification codes, which makes the identification of technical classifications insufficient, and finally manual interpretation by professionals is still required. According to a survey of examiners at the European Patent Office (EPO), 84.7% of examiners believe that CPC is very important for patent searches. Although 70% of examiners believe that AI and ML technologies can provide valuable support in the future, about 45% of examiners still believe that patent searches fundamentally rely on human efforts. And 52% of examiners do not think that a fully automated patent search can be done before 2035 [38].

In the practice of the industry, most of the patents collected are read by the researcher one by one and classified according to the technical field and effect of their professional human judgment. The manual classification method consumes a lot of time, and it is difficult to obtain a comprehensive review through the interpretation of a large number of patent documents. Many recent studies have tried to find a more efficient way to construct TFM. Yang and Ren [39] proposed a semiautomatic TFM construction method by extracting technical words and computer-aided algorithms to reduce labor costs and time. Ki and Kim [40] proposed a programmatic automation method based on NLP technology to quickly construct an Information Relation Matrix (IRM), which describes relationships among technical information in the patent and is similar to TFM. Trappey et al. [41] used the resultant patent text and data mining technology to create ontology-based TFM for patent analysis of additive manufacturing in the dental industry. The abundant literature shows that ontology, text mining, NLP, topic modeling, and TFM technology can be regarded as the main procedures for patent analysis today.

2.6. Comparison

Table 1 shows the comparison of the 17 related studies of technology, especially patent-mining techniques from its research purposes, tasks for preprocessing, processing, and postprocessing [16]. The second column of Table 1 lists the tasks in each part, and the third column lists the more specific methods used. Each successive column corresponds to each article, of which part, task, and method used are listed.

In the preprocessing part, the use of natural language processing for text preprocessing is mentioned in most articles, and the corresponding algorithms, tools, or kits are quite mature. Although some articles did not specifically mention this part, it is believed that this part, as a relatively mature part, should have been implemented. Two main tasks, key term extraction and patent management map, are included in the processing part. The TF-IDF method is widely used in the key term extraction task and can almost be regarded as a standard configuration. Skip-gram is an important method to study the contextual relationship, and it is often used in the research that uses the contextual relationship as the vectorization method. Patent management map, or patent map analysis, is a statistically-based data analysis method that has been widely used, with a database and business intelligence tools to visualize patent portfolios. Patent management map only involves data sorting and presentation, which does not conform to the current general definition of text-mining. Therefore, it is hardly mentioned in the research of patent analysis by text-mining in recent years. Among them, only the patent classification code will be referenced as a benchmark to verify whether the results of the text-mining-based approach are valid and consistent. The postprocessing part contains two parts: text-mining-based approach and visualized approach. The main methods of the former are clustering, topic modeling, and classification; the latter is mostly based on the expression of node-relation graph. Although TFM is less common, it is still one of the good visualization tools for exploring emerging technologies.

The main purpose of these studies is focused on classification, ontology construction, and finding emerging technologies. Classification is very basic, and the patent data itself already have classification codes, such as IPC or CPC. Researchers who use classification methods in postprocessing parts have a clear aim at classification. Ontology construction aims to clarify the technical details and scope of a specific field, and clustering and topic modeling methods can achieve this goal well. Both classification and ontology construction only obtain and analyze existing data, but in order to explore emerging technologies, it is necessary to find rules or discover changes in trends from the data.

The framework proposed in this study completely includes the three parts of preprocessing, processing, and postprocessing. In addition, this research also performed patent management map analysis and compared the results with text-mining to explore emerging technologies and verify the ideas and conclusions put forward in this research.

3. Patent-Based Ontology Construction

Figure 1 illustrates the ontology construction process, including four levels and two aspects.

The four levels are patent retrieval, patent clustering and target domain selection, topic modeling, and keyword generation. The two aspects are research process and ontology construction. At level 1, some key terms about natural language-enabled chatbot are figured out, and the smart search on DI is used to do the patent retrieval. Then, the most related 50 patents are quickly glanced to check if they match the subject of this study. If not, the search query is adjusted and do the retrieval again until the records are much in line with the subject. At level 2, DWPI title, DWPI abstract, and independent claims are used to do the k-means clustering, and silhouette score is used to evaluate the propriate number of clusters. After clustering, normalized TF-IDF (NTF-IDF) is used to identify the key words and key phrases. Again, we will check if the key words match the subject. If not, go back to level 1 and adjust the search query. Repeat the process until ideal target domains are found. At level 3, topics for domain are found in 2 different ways. The LDA model is used in domain of NLP, model, and system, while manual induction is used in domain of applied scenarios. In order to discover deeper topics or concepts at this level, each domain resets patent search conditions for applying the LDA method. After each execution, it is determined whether the subject of each domain is clearly identified according to the results. If not, the patent search conditions must be adjusted again. The topics of each domain are determined in this iterative process. Finally, by sorting out the key words and key phrases from level 2 and level 3, the construction of level 4 can be completed.

3.1. Patent Retrieval

Smart search on DI provides a semantic search tool, which offers a quick path to capture related patents from simple search terms. The powerful algorithm behind replicates the strategies used by expert searchers to provide a manageable result set that matches users’ intent. By using smart search, it is not necessary to list all probable related terms before searching. Instead, the records discovered are always related to the technology described by the input terms but may not be exactly contained. Smart search automatically sorts the result set according to the relevance score to show the content that best matches the search term.

In order to obtain a well-constructed ontology, the main purpose is to find as wide a range of technologies as possible from the field, and not to focus on specific technologies that will lead to a small number of emerging technologies that cannot be found. Smart search has the advantage of intelligence, but the limit of 1,000 records corresponds to about 450 to 550 DWPI families on average, which is not much in terms of the number of patents related to NLP chatbot. The results of patent search will be used as the data source for clustering task at level 2. To use more patents for clustering, traditional patent search is also tried, which directly search patents form the original term user lists. Although by traditional search more patents can be found, if there are emerging technologies or applications that are not widely discussed or even undetected, they will not be found. After several rounds of trials this study finally selected 508 DWPI families detected by smart search as the results of level 1, and its search is shown in Table 2.

3.2. Patent Clustering and Target Domain Selection

At level 2, the patent obtained from the previous level is clustered and some target domains are discovered from the results. The process begins at extracting the words in the patent document and using NTF-IDF to do vectorization, so that numeric vectors are obtained and can be applied to perform the k-means clustering. After that, the top words and n-gram top phrases of each cluster can be counted, from which target domains are selected.

3.2.1. Patent Columns for Clustering

This study chooses DWPI title, DWPI abstract, and independent claims as the source attributes for clustering. Patent documents may come from different countries, written in different languages, and cover a large number of attributes. Patent is to protect the inventor’s smart finance or as a consideration for the enterprise’s knowledge layout. Contrary to academic articles, patents are not written for users to understand easily, and some information may even be deliberately hidden in the title, which is not conducive to patent mining. The DWPI title and DWPI abstract, provided by the DI database, just solve the above problems. DI employs discipline-professional editors with scientific and engineering backgrounds to manually read all patents one by one and rewrite the title and abstract with easy-to-understand text, which are DWPI title and DWPI abstract, respectively. They remove the legal jargon, use American spelling, and intellectually choose drawing instead of just choosing the ones on the front page. In addition, many studies have shown that the value of patents is greatly affected by the number of independent claims, which are also included as the source attribute of the cluster.

3.2.2. Clustering

After retrieving and vectorizing patent documents, k-means can be performed to show the clustering distribution phenomenon in the vector space. The appropriate number of clusters can be obtained by calculating the silhouette score: the goal is to maximize the distance between clusters and minimize the distance within clusters. In this study, 13 clusters are clustered from 508 patents, and the top 10 words and 2-gram phrases in each cluster are extracted through NTF-IDF (see Table 3).

3.2.3. Domain Selection

The top 10 words and 2-gram phrases of 13 clusters, with a total of 260 terms, of which technical details are examined individually, are classified as 13 subdomains, which are combined to form the 4 domains, that is, NLP, model, system, and applied scenarios (see Table 4). The subdomains can display related topics and assist topic selection when performing topic modeling in level 3. “NLP” domain is distributed in clusters 3, 4, 5, 6, 8, 9, 11, 12, and 13; “model” domain is concentrated in cluster 6; “system” domain has clusters 7, 9, and 10, and “applied scenarios” domain is in clusters 1 and 2. Some clusters are related to multiple domains at the same time. Since the purpose of cluster analysis is to find out target domains, it is not so important whether each group must be clearly assigned to only one certain domain.

This research takes natural language-enabled chatbot as the subject. A large number of words related to NLP appear in large numbers in each cluster, which is not helpful to find out the domain, such as “NLP,” “natural language,” and “processing.” In addition, many chatbot-related words are very versatile, which also increase the difficulty of domain exploration, such as “processor,” “request,” “input,” and “module.” The above vocabularies are skipped during the domain selection. One step in the preprocessing of patent documents before clustering is to vectorize the patent documents. Although those skipped terms in Table 4 could be set as stop words in the preprocessing stage, the reason for not skipping them is to avoid affecting the integrity of some phrases. Take “recognition” as an example. “Recognition” is also included in “intent recognition,” “named entity recognition,” “speech recognition,” and “image recognition.” While setting “recognition” as a stop word, the above related phrases will not be found. However, failing to remove “recognition” has caused it to appear repeatedly in each cluster and does not have domain recognition.

“NLP” domain contains cognition, named entity recognition (NER), linguistics (which include syntactic, semantic, and morphology), natural language understanding (NLU), response, and speech recognition. Nine clusters, cluster 3, 4, 5, 6, 8, 9, 11, 12, and 13, are distributed in NLP domain. For cluster 3, two subdomains, cognition and response, are involved. For cognition subdomain, representative patent US9361884B2 (assignee: Nuance Communication Inc.) proposed a human-machine dialogue system, incorporating with an NLU engine and a dialogue manager for providing NLP application to identify and resolve anaphora. For response subdomain, patent US10417266B2 (assignee: Apple Inc.) proposed systems and processes for operating an intelligent automated assistant to provide a set of predicted responses. Cluster 4 and 5 focus on linguistics. Patent US20200327284A1 (assignee: ServiceNow Inc.) in cluster 4 proposed an agent automation system, which has processor that is configured to assign respective word vector to nodes and encodes semantic meaning of word or phrase represented by nodes. The system generates an annotated utterance tree by using a combination of rule-based and ML-based components, wherein an annotated utterance tree represents a syntactic structure of the utterance, and nodes of the annotated utterance tree include word vectors that represent semantic meanings. The annotated utterance tree is used as a basis for intent or entity extraction. Patent EP3111338A1 in cluster 5 also used automated text annotation for the construction of NLU grammars. Patent US10789426B2 in cluster 5 described a device for processing natural language text with the context-specific linguistic model. Patent US10304444B2 (assignee: Amazon Tech Inc.) applies NLU to the music field, which uses a hierarchical organization of intents and entity types, and trained models associated with those hierarchies, so that commands and entity types may be determined for incoming text queries without necessarily determining a domain for the incoming text. Although cluster 6 is mainly concentrated in the “model” domain, there are also many terms related to “named entity.” A representative patent US10755046B1 (assignee: Narrative Science) describes an NLP system for conversational inferencing with four-step parsing process.

Cluster 8 focuses on speech recognition. Patents US10446147B1 and US20200118564A1 (assignee: Amazon Tech Inc.) describe a speech recognition system to provide a contextual voice user interface. Patents US9245525B2, US9741347B2, and US10049676B2 describe an interactive response system mixes HSR subsystems with ASR subsystems to facilitate overall capability of user interfaces. Patents US9245525B2, US9741347B2, and US10049676B2 describe an interactive response system mixes HSR subsystems with ASR subsystems to facilitate overall capability of user interfaces.

Cluster 9 mentions about NLU, in which patent US9761225B2 (assignee: Nuance Communications Inc.) is representative. In US9761225B2, a method for identifying and resolving anaphora in multimodal conversational dialogue application for smartphone is proposed, in which multiple NLU interpretation selection models may be generated. The NLU interpretation selection models may include a generic model and one or more specialized NLU interpretation selection models, and each of which may be specific to a particular set of NLU interpretation type. Semantic reranking mechanism is applied in this method. Cluster 11 also mentions about NLU capability and focuses more on the follow-up actions, which are more related to “system” domain. Cluster 12 focuses on knowledge extraction in NLU. The representative patent is US10762113B2, which uses conversational knowledge graphs in virtual assistants to process natural language input, which involves receiving natural language queries from users at the virtual assistant’s NLU system. Cluster 13 also belongs to cognition subdomain. Patents US9965461B2, US9594745B2, US9569425B2, and US20140249801A1 in cluster 13 (assignee: The Software Shop Inc.) describe the method for improving efficiency of syntactic and semantic analysis.

“Model” domain, concentrated in cluster 6, has no subdomain, and the number of key words is relatively low. The possible reason is that since neural networks are mainly mathematical algorithms and computers are only the carriers of mathematical operations, they cannot contribute to the technology themselves. In this case, what field the close integration of technologies and functions are in has come an important basis for judging technicality. If AI is only used to analyze business data, and technical problems are not solved, it is likely to be regarded as having no technical ideas, and it is difficult to overcome the nonpatent reasons by applying for repetition or amendment [42]. Algorithm-related patents must be combined with hardware-related terms as the carrier of the algorithm. This also explains the reason why cluster 9 contains a large number of “nontransitory computer readable device” vocabulary. The representative patents in cluster 6 are US10748526B2, US10747958B2, and US10733375B2.

“System” domain contains user interface, medium, and communication or channel subdomains, in which four clusters, cluster 7, 9, 10, and 11, are distributed.

As for “applied scenarios,” concentrated in cluster 1 and 2, terms such as “virtual assistant,” “medical,” and “billing” are found. In cluster 1, three patents assigned to Google LLC are representative for virtual assistants in “personal” subdomain. Patent US20200320136A1 proposes a method for using distributed state machines for human-to-computer dialogues with automated assistants to protect private data. Patent US20200050788A1 describes a system for assembling responses from remote automated assistants. Patent KR2020131299A proposes a method for generating Internet of things-based notification by automated assistant client of client device. In cluster 2, three patents assigned to Nuance Communication Inc. are representative for medical billing and coding in “medical” subdomain. Medical billing and coding are two closely related aspects of the modern health care industry. Both practices are involved in the immensely important reimbursement cycle, which ensures that health care providers are paid for the services they perform [43]. Patent US20170323060A1 describes a system with a graphical user interface (GUI) and an NLU engine to automatically derive one or more engine-suggested medical billing codes. Patent US10319004B2 proposes techniques to deal with the overlapping codes derived by the NLU engine, and patent US10754925B2 proposes a method for training NLU engine, involves providing training data in form of free-form text, corrections, and finalized sequence of medical billing codes.

Three domains were found from the clustering results. It is particularly important to emphasize that the composition of natural language-enabled chatbot mostly relies on the three domains, NLP, model, and system. Since most of the related patents contain these three parts at the same time, it is difficult to determine the exact belonging domain for each patent and also meaningless.

3.3. Topic Modeling

According to the ontology construction process (see Figure 1), search query, corresponding result, and topics founded in each domain are illustrated in Table 5. For domain NLP and system, DI smart search is applied, while CTB (claim/title/abstract) strategy is applied for domain model and applied scenarios. Table 6 illustrates the keywords of each topic.

3.4. Applied Scenario Topic Modeling

This research hopes to find the application field of NLP chatbot, but a lot of experts are describing natural speech-related technologies or the system framework of conversation management, which are not discussed in this section. This research mainly divides the application scenarios into engineering applications and e-commerce applications. It can be found from the patent search results that natural language-enabled chatbot is widely used in the field of e-commerce, while the application on the engineering side is difficult to find. 44 patents are reviewed manually and classified to certain topic or scenario. These patents with respect to the applied scenario are listed in Table 7.

Here are some patents in topic of e-commerce. Patent US20170323060A1 describes a system for facilitating automated natural language understanding for medical documentation of patient, which has processor for presenting set of medical billing codes for user review in graphical user interface (GUI) before finalizing coding of encounter. Patent KR2020000621A describes a conversation system for grasping user attention during various situations in a vehicle by using a mobile device. The system has a storage unit for storing situation information collected from a vehicle. A dialogue management module obtains a factor value of action factor used to perform an action corresponding to a dangerous situation when an input processor obtains an action corresponding to the starting situation from the storage unit. An input processor generates a dialogue to perform the action corresponding to the dangerous situation by using the factor value of the acquired action factor while obtaining the action corresponding to the dangerous situation and generates a conversation message. A result processor generates a conversation response corresponding to a delivered starter message. Patent US10223934B2 proposes a method for monitoring and analyzing language environment, vocalization, and development of key child, which provides metrics associated with key child’s language environment and development in a relatively quick and cost-effective manner. The proposed method is used to promote improvement of the language environment and key child’s language development and to track development of the child’s language skills. Key child’s language environment and language development are monitored without placing artificial limitations on the key child’s activities or requiring third party observer.

Here are some patents in topic of engineering. Patent JP06792132B2 defines an information-processing apparatus, which is used in the manipulator control system and NLP system and can be performed with high versatility. The information-processing apparatus has processing module groups, and each of which is equipped with several processing modules with specific processing capabilities. These processing modules have a neural network with a hierarchical structure. The information is processed by sending and receiving the information signal of the processing module in several interhierarchical structures. Patent CN111267097A proposes a natural language-based assisted programming method for industrial robots, involves parsing language instructions, matching parsing result, and combining coordinates output to generate final robot auxiliary code. The multiattention mechanism model adopted by the method improves the recognition accuracy and solves the problem that the current method cannot accurately recognize objects in an industrial environment. Modular programming technology solution simplifies engineers programming complexity and effectively improves development efficiency. Patent US10843080B2 describes a system for facilitating automated program synthesis from natural language. The system allows a user to be more comfortable and familiar with grammatical requirements for forming a proper sentence in native language as opposed to memorizing rules or required constructs for a potentially complicated programming language. The system employs fuzzy grammar matching to reduce complexity, while slightly trading off complexity for accuracy. The system allows the user or developer to examine to express an idea in a different manner to better reflect user an original intent. Patent DE102018212503A1 defines communication and control systems, which has control devices for operating machine based on software communication chatbot, for filling beverage in bottling plants. The chatbot recognizes a voice input and a text input by an operator to output or display information about an operating state of the machine. The systems realize production conversion of energy in an automatic manner and order completion in a rapid manner and improve media efficiency and scheduling efficiency. Patent WO2020181365A1 proposes an apparatus for 360-degree assistance for quality control system scanner with mixed reality (MR) and ML technology. The apparatus has an optical sensor, a display, and a processor to receive diagnostic information from a server related to a field device in an industrial process control and automation system. The processor identifies an issue of the field device based on the diagnostic information, detects, using the optical sensor, the field device corresponding to the identified issue, and guides, using the display, a user to a location and a scanner portion of the field device that is related to the issue. The processor provides, using the display, necessary steps or actions to resolve the issue, and connects, using a cloud server, a user to get modules of installation, commissioning, AMC, and training for a QCS as per the selected person.

3.5. Ontology

In this section, the ontology map of NLP chatbot is drawn based on the previous outputs. A four-level ontology includes subject, domains, topics, and key phrases in a top-to-bottom sequence. Under the subject of NLP chatbot, the domains are NLP, model, system, and applied scenarios. The third level has the topics under each domain. For NLP domain, there are speech recognition, linguistics, conversation, and knowledge. For domain of model, topics are feature, graph, voice device, question answering, classification, and automatic service. For domain of system, the topics are infrastructure, dialogue management, and user interface. For applied scenarios, e-commerce and engineering are the two main topics. The fourth level has the key phrases under each topic. It is noticed that some key terms are shared by multiple topics. The ontology map of NLP chatbot is shown in Figure 2.

4. Patent Macro Trend Analysis

Related patents are searched by entering keywords related to NLP and chatbots on the DI database, and patent management map analysis is conducted (see Table 8). From 2011 to 2020, totally 21,834 individual records or 12,840 DWPI families are published. Patent family refers to the collection of patents applied for in different patent offices for the same invention. DWPI has a stricter definition. Each patent in the same DWPI patent family must have exactly the same priority as other patents in the family. The analysis of this section is mainly based on DWPI families. The following term “patents” refers to “DWPI families” unless otherwise specified.

Since 2017, 10,480 patents have been published, accounting for 82% of the total 12,840 patents in the past decade. Furthermore, since 2019, 8,099 patents account for 62%. From the perspective of the annual growth rate of the number of patents, the number was a high 44% in 2014, but returned to 6% in 2015, which is the lowest number in the past decade. However, starting in 2016, the annual growth rate has increased sharply until it reaches a peak of 105% in 2019, and it then falls back to 66% in 2020. Whether the decrease in the number of 2020 is related to the impact of COVID-19 is unknowable, but this may be a signal that implies that the technology related to natural language-enabled chatbot may have gradually matured.

However, a single reduction in quantity cannot lead to any conclusions unless supported by more other data or evidence. IPC is a standard taxonomy developed and administered by WIPO for classifying patents and patent applications, which covers all areas of technology and is currently used by the industrial property offices around the world. From the annual number of patents with IPC analysis, to 2018, all The IPC classifications have been covered. In other words, among the 8,099 patents in 2019 and 2020 that accounted for 62% of the number in the past decade, no new technology has been produced.

Top 6 4-character IPCs, with a number of patents that greater than 1,000, are G06F (electric digital data processing), G06N (computer systems based on specific computational models), G06Q (data processing systems or methods), G10L (speech analysis or synthesis; speech recognition; speech or voice processing; speech or audio coding or decoding), H04L (transmission of digital information), and G06K (recognition of data), each in which has a number of 8,870, 3,144, 2,413, 2,176, 1,364, and 1,258 patents, respectively (see Figure 3). It should be noted that the total proportion can exceed 100%; that is, the summation of these number can be greater than 12,840, because a patent can be classified as multiple IPC codes.

G06F’s patents accounted for 8,870 of 12,480 patents. Therefore, the complete IPC classification in G06F was further explored. Among the top 10 IPCs listed (see Figure 4), 2,295 patents are classified in G06F 17/27 (for automatic analysis, parsing, orthographic correction, etc.). The second largest class is G06F 17/30 (for information retrieval and database structure). It is worth noting that the 3rd and 4th classifications (G06N 3/04 and G06N 3/08) represent the interconnection topology architecture and learning method, respectively. G06F focuses on data processing procedures, while G06N emphasizes system structure. G06F and G06N domain classifications represent the key technologies for implementing the main modules of complex natural language-enabled chatbot systems. In addition, G10L 15/22, ranked 9th, is about programs used in speech recognition for human-machine dialogue.

In addition to statistics on the number of patents, the fluctuations in the number in recent years are also worthy of attention. Based on the annual growth rate of all patents, when the growth rate of an IPC is higher than average, it represents greater momentum; conversely, when the growth rate of an IPC is lower than average, it may imply that the technology has entered the mature stage early. The four 4-character IPCs with the largest number were selected for this analysis (see Figure 5).

G06F has an overwhelming 69% of total patents, but its annual growth rate is much inferior to the average annual growth rate. In 2014, the total number of patents related to natural language-enabled chatbot rose sharply by 44.37%. The growth rate of G06F in that year was only 41.40%, which was slightly lower than the average. Since 2016, during the period of rapid growth in the number of patents, the growth rate of G06F has not been outstanding. Even when the average growth rate reached a peak of 104.49% in 2019, G06F was 14.92% less than the average. By contrast, the annual growth rate of G06N is amazing. In 2014, it was 43.86% higher than the average, and from 2016 to 2020, the annual growth rate was 73.74%, 26.14%, 89.49%, 52.84%, and 74.29% higher than the average, respectively. G06Q and G10L fluctuate up and down in average annual growth rates and have not yet shown a clear trend.

In general, the average annual growth rate began to slow down after reaching a peak in 2019 after rapid growth, no new IPC appeared after 2018, and all of which indicate that the development of natural language-enabled chatbot has entered a mature stage. It is worth noting that the patents related to only reading G06N are still growing rapidly.

Assignee analysis helps to find the main players in the market, which are all technology giants from the results. The number one IBM has 1,358 patents, which is more than the total number from the second to the tenth. The well-known technology giants Apple Inc and Facebook Inc are ranked 16th and 17th, respectively. Although they are not in the top 10, they are also listed in the table due to their influence (see Table 9).

IBM’s patents began to grow rapidly in 2016, when IBM’s patents were concentrated in the two categories of G06F 17/30 and G06F 17/27, showing that IBM focused on information retrieval and grammar analysis in NLP. In 2019, the number of patents of Microsoft, Amazon, Accenture, and Univ Kunming Science and Tech began to grow significantly. In addition to G06F 12/27, Amazon and Microsoft use speech recognition technology based on natural language models in human-machine dialogue, which is mainly reflected in the two IPCs G10L 15/18 and G10L 15/22. In 2020, the number of patents of Google, Samsung, and Baidu increase rapidly at the same time. In addition to the two categories of G10L 15/18 and G10L 15/22 related to speech recognition in 2019, both Google and Samsung have more patents appearing in G06F 3/16, which focuses on the conversion between speech and digital information. On the other hand, Google and Baidu applied for many patents on G06N 3/08, which are the computer system based on learning methods. In addition, Baidu also has a large number of patents on G06F 40/30 for semantic analysis. Google and Baidu are both Internet service companies that started as search engines, and Google and Samsung are also close partners in the android camp. The highly increasing number of patents assigned to these three companies, which are quite close to the end user, might imply the maturity stage and mass application in this technology field. From the IPC distribution of Apple Inc.’s patents in 2019 and 2020, it can be seen that its patents are highly concentrated on speech recognition-related G10L 15/18, G10L 15/22, and G06F 3/16, which are similar to Google. Google and Apple coincidentally began to cut into a large number of patents in the field of speech recognition, speech, and digital information conversion in 2019. The clues can also be seen from their products. The Google Nest Mini launched in November 2019 and the Apple HomePod launched in August 2019 show the development path from smart speaker to smart home. With the maturity of natural language technology and IoT, the use of natural language to control objects around life will gradually replace the previous method of operating through buttons or operating with limited system interfaces. When other companies focus on deepening NLP-related technologies or developing speech recognition applications, Facebook Inc. has paid more attention to electric communication technique, including H04L 12/58 and H04 29/08. The two IPC codes represent message switching systems and transmission control procedure in network communication, respectively.

5. Technology Function Matrix

A Technology Function Matrix (TFM), which investigates the corresponding relation between technologies and functions on patent amount, is a critical approach for patent data analytics. The domain of NLP, model, and system, which is introduced before in Section 3.2.3, are used to form the TFM. The construction process of TFM is described as the following. A well-constructed ontology is defined before, from which technology and function terms can be defined, and patents can be collected by the search query set according to the ontology. Next, each patent is visited iteratively to count if it matches each technology and function. By doing this, a TFM can be constructed.

This research uses the TF-IDF-based TFM automatic construction method. After defining the technologies and functions, an unstructured text description that best represents each technology or function must be prepared. These text descriptions are transformed into a set of vectors through unsupervised learning, which acts as an agent for each technology or function. Then, specific fields are selected from each patent, converted into a vector, and compared with each technology and function through similarity, and a threshold is used to determine whether the patent can be classified as the technology or function. Thus, the text description of each technology or function is very important. Sections 5.1 and 5.2, respectively, explain the technologies and functions selected in this study, followed by the TFM result in Section 5.3. After that, the domain of applied scenarios is added to form the three-dimensional matrix, which is called A-TFM and is introduced in Section 5.4.

5.1. Definition of Technology

13 TFM technologies, listed below in Table 10, are defined according to domain of NLP, model, and system. The description of the similarity compared with the patent text is extracted from Wikipedia. Speech recognition, NER, NLU, and NLG are technologies in the domain of NLP. Feature engineering, RNN, CNN, and transformer are of model. And speech-generating device, cloud computing, voice activity detection, human-computer interaction (HCI), and immersive technologies are of system.

5.2. Definition of Function

Nine TFM functions, which are information extraction, dialogue management, context prediction, recommendation system, algorithm efficiency, automated control, communication, user experience, and virtual assist, are listed in Table 11. The description of the similarity compared with the patent text is extracted from Wikipedia and other web resources.

5.3. TFM Result

For finding emerging trend of natural language-enabled chatbot, year 2020 patents are used as the source for TFM. The 13 × 9 TFM result is obtained through the automated process described before (see Table 12). Transformer is a DL language model, developed in 2017, widely used to process natural language tasks. The patents related to transformer technology and prediction function are the highest number, which means transformer is a mature technology and be widely applied for context prediction. In terms of technologies (row), transformer and speech-generating device are the main technologies of the current market and have a positive impact on almost all functions. In terms of functions (column), automated control function is more widely used than others. For instance, speech recognition and speech-generating device are for increasing the pipeline of the control system. In addition, the NLP domain technologies mostly relate to information extraction, dialogue management, and prediction, such as the improvement of NLU and NLG can enhance the system’s ability to identify users’ intent. Last, the system domain technologies mostly concentrate on communication, user experience, and virtual assistant. For instance, adopting immersive technologies can enhance user experience or the development of cloud computing makes portable devices handling complex tasks. Therefore, a lot of virtual assistants are developed to assist people for a convenient life, such as intelligent drive assistant. Next, the interaction of technology and function and its related patents are explored to find emerging technologies or applications.

5.3.1. Speech Recognition (T01)

The most applied function of speech recognition is information extraction (F3). Accuracy of speech recognition is the key to determining whether it can be applied to the commercial field, and good information extraction ability is a necessary condition. Although speech recognition technology has gradually matured, there are still a large number of patents in this field for better recognition capabilities and information extraction capabilities.

Google LLC’s patent US10431206B2 uses the hierarchical recurrent neural network (HRNN) structure handles the task of multiaccent speech recognition. Patent CN110033766A proposes a complex multiple deep neural network architecture, including single layer of one-way RNN model, binary bidirectional RNN model, and binary bidirectional LSTM (BiLSTM) model and other network structure, in pursuit of faster speed and less energy consumption. Patent EP3497630B1 uses CNN architecture, which allows better signal propagation and long-range dependency learning, thus improving output quality.

In addition, speech recognition and automated control functions (F6) are combined with each other to form the application of speech-driven automated control. When receiving speech data from the client, speech recognition and NLU model stored in the cloud are used to interact with other devices in the cloud space, such as unmanned aerial vehicles (UAVs), robots, augmented reality (AR), and virtual reality (VR) devices, through AI modules and 5G network technology.

5.3.2. NER (T02)

In order to improve the accuracy of NER, preprocessing is very important. Patent CN110990525A proposes a sentiment-based information extraction method that achieves good performance in the field of financial sentiment information extraction through preprocessing and feature extraction modules. Data labeling and feature engineering are the two main steps in preprocessing. Patent CN111783466A proposes a named entity recognition method for Chinese medical record field, in which the label uses two-layer conditional random field (CRF) classification to determine the final output label thus improving the accuracy of NER and reducing the time consumed by training. There is similar research in literature studies. In view of the insufficient representation of potential features of Chinese characters, Han et al. [44] uses the BiLSTM network to learn the internal strokes and radical semantic information of Chinese characters and combines with the BiLSTM-CRF model to construct an adaptive multifeature fusion embedded CNER model. In addition, patent WO2020167558A1 proposes a dynamically trained model of named entity recognition over unstructured data, which defines entity labels for specific domain knowledge ontology, and uses these entity labels to identify the relationship between unstructured documents and domain knowledge. Patent CN111737969B proposes a resume analysis method based on a DL model, which combines NLP, OCR, and named entity recognition technology. This method first performs feature modeling on the resume. After the model training is completed, the key information is classified and the category mapping model is set, so that the parser can read it like a human and improve the overall analysis effect.

5.3.3. Transformer Model (T08)

The transformer model is widely used to improve the accuracy of the information extraction function (F1). Patent CN110941698A proposes a method based on the bidirectional encoder representation on BERT CNN, which generates rich contextual semantic information of word vectors, thereby effectively supporting service similarity calculation to find the most accurate target service, and achieving accurate retrieval of target services.

As for dialogue management function (F2), patent CN111274362A proposes a dialogue generation method based on the transformer architecture, which involves obtaining a vectorized representation of words, and generating a reply based on a comprehensive semantic vector and a copy mechanism, which is used to solve the NLG based on background domain knowledge dialogue. Patent US20200372341A1 proposes a pipelined natural language question answering system based on the BERT model, which involves receiving an input text of a natural language question and provides an answer to the natural language question considering context.

The transformer model is used in context (F3) function to improve the accuracy of NLP. Patent CN110737764A proposes a method for generating personalized dialogue content based on a multiround dialogue model. The transformer model effectively learns the dialogue sequence relationship between natural languages, can predict the generated content to reduce the probability of replying commonality, and increase the diversity of dialogue content. Patent CN111708882A proposes a method for complementing missing Chinese text information based on transformer encoder. This method starts from manually preprocessing Chinese text documents, dividing the text into a large number of short sentence corpora, and converting it into the smallest unit of BERT vector. Since the purpose is to find out the missing words and sentences in the article, the training method is to randomly generate noise to hide the words in the complete article to create the effect of the omission. Conversely, in order to be able to fill in the missing words, the model must have text generation capabilities. Through repeated information deletion and generation procedures, Chinese natural language processing task accuracy is further improved.

5.3.4. Speech-Generating Device (T09)

Speech-generating device is highly related to the three functions of information extraction (F1), dialogue management (F2), and context (F3), with 1,190, 1,141, and 1,123 patents, respectively. The speech recognition technologies of T09 and T01 are also highly related, but the classification of T09 in the “system” domain means that the description of this technology is more focused on the hardware or system framework, so that for T09, F1, F2, and F3. The gap between is blurred. From these large numbers of patents, it can be found that with the maturity of Internet technology and mobile devices, the past information retrieval systems have begun to be replaced by chatbots. However, when NLP technology is not yet mature, rule-based chatbots cannot exert influence. However, as NLP technology and speech recognition technology mature, speech-generating devices have also developed rapidly and combined with chatbot applications. Task-oriented retrieving systems began to be replaced by speech query systems. Patent CN110111766A claims a multifield multitask system, which solves the problem of the multidomain multitask switching in the dialogue system. The complex multitask dialogue system integrates a speech recognition module, a domain confidence state tracking module, dialogue managing module, an NLG module, and a speech synthesis module to realize the capability that semantic level information can be shared between each domain. Patent JP2020098308A proposes a voice inquiry system for information provision, in which each of chatbot servers and smart speaker operation server use the DL model, accept a spoken question, infer, and output the corresponding answer in spoken speech.

The next step after reaching the speech query system is speech-driven remote control. 1,006 patents related to automated control function also support this idea. Patent US10748529B1 (assignee: Apple Inc.) proposes a voice-based digital assistant for use with home automation of voice activated controllable device, such as TV, speaker, or camera. The application of speech-driven automated control is not so uncommon, but they are focused on devices that do not have safety hazards, such as home-related devices. It also means that speech-driven automated control is still at the auxiliary stage and cannot replace existing functions. However, it is believed that one day people will hope that many functions that require physical contact can be replaced by voice control, and the first thing to overcome is noise. Since the sound is not specified, the device may receive unexpected sounds and trigger actions at any time. Therefore, a gateway may be required to avoid unexpected actions caused by noise. Patent US20140214414A1 proposes a communication system for use in automatic speech recognition applications, which can transmit commands through wireless network to modify gateway’s noise reduction processing state.

5.3.5. HCI (T12)

When it comes to smart homes, in addition to speech control, there are more automatic control methods through HCI. Patent CN110932953A proposes s smart home control method and device, which can receive the user control command of the target home, login target start home residence in the target network, intelligently perform control, and return the result message back. This solution realizes the multihome for different manufacturers and different communication protocols for uniform control.

It is observed from TFM that HCI technology is widely used to improve user experience (F8), and there are 909 patents located in the interaction. Most people use chatbots to meet their needs, such as information retrieval or specific operational tasks. It is most important to be able to meet the needs of users in fewer conversations. Many patents also aim to reduce dialogue and improve dialogue efficiency, such as CN112015879A, CN110990594A, CN111488433A, and CN110827831A.

5.3.6. Immersive Technologies (T13) with Virtual Assistant (F9)

In addition to the HCI methods of contact and voice, the use of gaze tracking to help virtual assistants more accurately grasp the text or dialogue paragraph the user is paying attention to is an emerging application.

5.4. A-TFM and TFM with Applied Scenarios

As mentioned in Section 3.4, the applied scenario factor is also a valuable part for analyzing patents. Therefore, this research utilizes the applied scenarios as the third dimension to construct a 3-dimensional matrix. As shown in Figure 6, the scale of node means the number of patents. X-axis means 13 technologies, Y-axis means 9 functions, and Z-axis means 7 applied scenarios. The source of this three-dimensional matrix is 50 patents which randomly collected from the source of the above TFM. “Personal” and “e-commerce” are the main applied scenarios of the current market. “Medical,” “engineering,” and “driver assistant” are applied scenarios still under development. Also, few patents related to “education” and “society” chatbots are found.

6. Discussion

Nine topics, including medical data, smart cities, IoT, data privacy, sustainable strategies, CRM, personalization, social media listening, and ML models, are identified as latent topics for future research based on data-driven strategies [14]. This research thoroughly investigates the application of chatbots by comprehensive patent-mining process and claims the consistency between the findings of this study and the above results. Thus, the effectiveness of proposed analysis is justified.

6.1. Knowledge Graph

AI makes huge progress; algorithms are rapidly improving, managing massive amount of data; however, it still is not knowledge-driven technology. The knowledge behind the natural language-enabled chatbot is very important for dialogue with humans. The early development of chatbot was mostly dominated by a single domain. It has been observed that more research has been directed towards open domain [4549] and multidomain [5052] in recent years. Single-domain chatbots are limited to accomplishing specific tasks, while multidomain or open domain chatbots can better meet the needs of smart assistants and even further provide people’s companionship or social applications. With the development of 5G and cloud applications combined with social media, many social media, such as Telegram, Cortana, Slack, WeChat, Facebook Messenger, Google Assistant, and Siri, provide platforms that can easily build chatbots [53], making the transition of technology bottleneck shifting from simple single-domain chatbot system construction into complex integration of multidomain knowledge bases. The correlation between these two phenomena is hypothesized.

With the rapid development of the semantic web, a large amount of structured data has been provided in the form of a knowledge based on the web. Making these data accessible and useful to end users is one of the main goals of chatbots based on link data [54]. KG is considered to be a new AI technology trend, which originated from the basic principles of the Semantic Web and the construction of the knowledge base [55]. The novel KG-based framework is used in many chatbot applications. They combine the query language SPARQL of the resource description framework to quickly integrate the existing knowledge base.

Related patents in recent years have also focused on studying how the knowledge framework can improve the capabilities of NLU and integrating the KG into the knowledge base of chatbot. Patent US10733375B2 (assignee: Apple Inc.) provides a system and process for operating intelligent automated assistants. This process is based on a knowledge framework and can improve the validity of NLU, analyze the mapping of domain attributes and words from the natural language input, then correspond to the data of the knowledge base according to the analysis results, and determine the output response results according to the ranking mechanism. Patent EP3362972A1 proposed a system for authoring visual representation for text-based natural language document. User interface is provided that contains a document area and thus enables to interactively generate the visual representation information that accurately depicts the underlying source text. The system generates a node graph of at least one of the parse trees, the entity information, or the relational phrase information and processes the document to determine relational phrase information indicating that the portion of the text includes a relationship to at least one of a subject, verb, or object in a sentence that includes the portion of the text. Also, the system generates another visual representation links the nodes and the relations. Patent WO2020160264A1 proposed a method of identifying relevant data sets using training models related to topics of interest, involving access to one or more sources, each of which contains information systems and related methods used to organize, represent, find, discover, and access data. The embodiment represents information and data in the form of a data structure called a “feature graph.” The feature graph includes nodes and edges, where edges are used to “connect” nodes to one or more other nodes. The nodes in the feature graph can represent variables, that is, measured objects, features, or factors. The edge in the feature graph may represent a measure of the statistical association between a node and one or more other nodes that have been retrieved from one or more sources. The data set that represents or supports statistical correlation or measurement correlation variables is “linked to” form the “feature graph.” Patent US10762113B2 (assignee: Cisco) proposes the use of conversational knowledge graphs in virtual assistants to process natural language input. After receiving the natural language query of the user, the method retrieves the contextual information of the conversational knowledge according to the intention and calls the back-end service accordingly and obtains the response after the service is performed. Finally, the response is translated into natural language and provided to the user. There are similar studies in literature studies. Zhong et al. [56] designed a cognitive information representation model based on the knowledge graph, which combines the perception information and semantic description information of the industrial robot ontology to form a structured and logically reasoning cognitive knowledge graph, including the perception layer and the cognitive layer. The realization of automatic representation of robot perception information enhances the versatility, systematicness, and intuitiveness of robot cognitive information representation and can effectively improve the cognitive reasoning ability and knowledge retrieval efficiency of robots in the industrial Internet environment.

Patent US20200317093A1 proposed a query response system for converting natural language queries into standard queries using neural networks, with a processor that determines the relevance of documents and returns documents when they are determined to be relevant. This application describes a system and method for converting natural language queries into standard queries using sequence-to-sequence neural networks. As described in this article, when a natural language query is received, the natural language query is converted into a standard query using a sequence-to-sequence model. In some cases, the sequence-to-sequence model is associated with the layer of interest. The perform searches using standard queries and can return various documents. The documents obtained by the search are scored based at least in part on the determined conditional entropy of the documents. Use natural language queries and documents to determine conditional entropy.

6.2. Deep Learning

The importance of algorithms related to AI and deep learning to chatbot is obvious. However, this kind of emerging technology is less noticeable in patent documents. Commonly used chatbots are LSTM, transformer, RNN, etc. Interestingly, the bidirectional mechanism is applied to almost all architectures. Chatbot-related articles using bidirectional architecture have appeared in large numbers since 2019, and their number accounted for more than 80% of all years (see Table 13).

Patent CN111267097A proposed a natural language-based industrial robot-assisted programming method, including parsing language instructions, matching analysis results, and combining coordinate output to generate the final robot-assisted code. The present invention requires a method for auxiliary programming of natural language-based industrial robots according to language instructions and generating corresponding executable codes for the environment image robot. The present invention is divided into three parts. First, use LSTM bidirectional recurrent neural network (Bi-RNN) and fast regional convolutional neural network (F-RCNN) to extract language instructions and features of the factory environment. Second, provide the “attention mechanism” model of the alignment algorithm, and correctly match the machine translation of the instruction in the machine environment, so as to identify the specified object and the output coordinate point of the object. Third, use the model output of the generating operation to match the CoBlox result modular programming model.

The technical development of DL in NLP has been quite mature. Although academic research is constantly pursuing better performance, it is already more than enough at the applied level. When applying any framework commonly used today, even with little training data, a chatbot is able to be perceived satisfactory by users [57]. Therefore, in addition to being used to handle NLP tasks, the other main application of DL is to assist the dialogue management of the chatbot system.

Patent CN108282587B proposes a mobile customer service dialogue management method based on state tracking and policy orientation for communication industry, involves adopting the deep Q-network-based strategy optimization method to select best action strategy. The method involves establishing a dialogue problem guiding strategy based on the partially observable Markov decision process (POMDP) model, and applying an action to dialogue environment state of user through the internal action of the POMDP model, so that the state of the conversation environment changes and a certain return is obtained. The likelihood of executing a series of strategies is measured based on the cumulative returns obtained, and the problem is turned into a strategy choice problem. A deep-enhanced learning problem-guided strategy optimization algorithm is constructed based on the dialogue problem guiding strategy obtained by the POMDP model, and a deep Q-network (DQN)-based strategy optimization method is adopted to select the best action strategy.

6.3. Speech-Related Technologies

Chatbot has developed towards an integrated conversation system, where in the context of multiperson conversations, speech segmentation and speaker recognition algorithms have been the main research topics in recent years [58, 59]. Li et al. [60] summarizes the modern noise-robust technology of ASR developed in the past 30 years and proposes the classification standards for various noise-robust technologies, and the pros and cons of using different antinoise ASR technologies in actual application scenarios. For example, for stable voice-controlled driving, the environmental conditions of drones must be handled carefully, including environmental noise that can reduce the accuracy of recognition. So, Park and Na [61] studied multiple unmanned aerial vehicle (UAV) control and noise reduction methods driven by voice.

Patent CN111768768A proposes a method of processing voice in the fields of AI, DL, NLP and voice interaction, and noise reduction processing on voice data sent by peripheral control equipment. The specific implementation scheme is as follows: in response to the acquired voice recognition interface call request sent by the peripheral control device, start the voice recognition process; acquire the type of the peripheral control device; determine the target voice noise reduction mode according to the type of the peripheral control device. In the noise mode, noise reduction is performed on the voice data sent by the peripheral control device to obtain the voice data after noise reduction; after noise reduction, voice recognition is performed on the voice data to generate text data. Therefore, through the voice processing method, the noise level generated by other operations in the peripheral control device included in the voice data is reduced.

6.4. Speech-Driven Automated Control

Interactive Smart Agents (ISAs), which are controlled by users through natural language dialogues, are becoming a part of life, especially in smart home scenarios [62]. Patent WO2020203067A1 describes an information-processing device containing a control unit driven by natural language, which is arranged for controlling the movement of a moving object on the basis of results of a speech recognition process. Patent CN110654738A describes an automatic garbage classification and recycling device based on NLP. The garbage bins are, respectively, equipped with infrared sensors, and the lower box body is equipped with a mechanical transmission mechanism and an automatic classification mechanism. The device and method of the present invention have high recognition efficiency and high degree of automation.

6.5. Internet of Things (IoT)

Patent KR2020131299A (assignee: Google LLC) proposes a method of associating multiple remote automation assistant components through IoT devices, combined with voice recognition modules to monitor and send voice data. Patent US10543931B2 proposes a method for monitoring audible and message alerts received during flight in the aircrafts. IoT cockpit includes subsequently marking a cascaded message alert to associate with the display element. After receiving a plurality of alerts, including at least one of the audible alerts or message alarm, the first NLP task is applied to convert the auditory alarm into a text alarm that is structurally consistent with the format for aggregation, or a cascaded message alarm, where the second NLP task is applied to identify the context.

6.6. Applied Scenarios

According to the A-TFM results in Section 5.4, it can be found that the related patents of chatbot applications are still mainly focused on personalized services and e-commerce. Both types of applications are focused on using chatbot as a virtual assistant serving a specific purpose, or using chatbot as an expert in a specific field to achieve the purpose of knowledge acquisition. These applications for providing utility or productivity are progressing towards education [63, 64], medical [65], emotional [66, 67], and social services [6870]. Under these conditions, the integration of socioemotional behavior and personality processing design principles can lead to a decisive competitive advantage [71]. The application trend of chatbot obtained from the patent analysis in this study is consistent with some studies [71, 72], which illustrates the effectiveness of this research.

7. Conclusion

The study conducts a comprehensive patent review on emerging technologies of natural language-enabled chatbots. The contribution of this study is addressed in Section 7.1, the managerial implication is described in Section 7.2, the practical/social implications for marketers are described in Section 7.3, and the limitations and future research are suggested in Section 7.4.

7.1. Contribution

The contribution of this study is from three aspects. First, a patent analytic framework is proposed and proved to be effective. Second, emerging technologies are found. Third, application trend is addressed.

A patent analytic framework starts from patent-based ontology construction, followed by patent management map and TFM, and performing the case study part. The four-level hierarchical structure of the ontology is constructed with text-mining approaches such as k-means clustering algorithm and LDA topic modeling, to reduce human interference during the process. The ontology map can be used as the basis for strategic and sustainable R&D planning, from which researchers are able to quickly understand the development trends of key technologies and can identify technology gaps. It is worth noting that in some past patent analysis articles, detailed patent query conditions were first designed, on which the following analysis are based [25]. However, the patent analysis method proposed in this research uses iterative process to find out the most appropriate query conditions and patent information during the construction of ontology. In addition to patent analysis, it is reasonable to find emerging technologies from academic articles, and systematic literature review (SLR) is the main method. Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) was created by the international health collaborator network and provides a framework for the SLR to ensure methodological rigor and quality [73]. The conduct of an SLR largely depends on the scope and quality of the included research. Therefore, the systematic reviewer may need to modify its original evaluation plan in the process, and the PRISMA statement recognizes this iterative process [74]. This provides crucial support for the iterative method used in this study to continuously adjust the patent query conditions in the ontology construction process.

The emerging technologies are summarized as follows. Knowledge is the basis of natural language-enabled chatbot, among which feature graph is a feature generation framework that has recently attracted attention. DL is the core of the main method, and most of the DL algorithms are mature. In recent years, patents have focused on the combination of various DL algorithms, by capturing their respective advantages and filling each other’s shortcomings. In terms of speech technology, noise reduction is the focus of recent speech recognition technology. Sounds including voices and noise in operating equipment are obtained from the device and converted into refined text data through the integration of DL and NLP technologies. Furthermore, it is found that context is the main research subject, whether it is the exploration of the knowledge base or the logic of the algorithm. Previous research on NLP has focused on unstructured text, but in recent years, it has clearly turned to messages in dialogue. In unstructured texts, the term frequency-based method can have good results, but the message in the dialogue relies on a large number of pronouns and the continuity and relevance of the context, and the anaphora is more complicated. Even to be able to apply NLP to daily conversations, it faces a larger and broader domain and knowledge base. For this reason, the chatbots of various specific domains integrate with each other to become a more complete and powerful system. Communication technology and system integration are also very important.

As for the application trend, the increasing number of patents shows the rapid development of NLP chatbot in recent years. From the macroscopic patent trend analysis, the development trend of patents has been found. The patents related to natural language-enabled started in 2014 and developed rapidly since 2016. At first, it was mainly based on NLP and knowledge base. By 2018, speech recognition and communication technology have been developed and perfected, and then a large number of applications began to appear in 2019. These applications are concentrated in Silicon Valley’s technology giants, and they have also brought significant improvements to people’s lives. Natural language-enabled chatbot is widely used in the field of e-commerce, focusing on customer service and medical consulting. With the popularization of 5G network technology, more and more voice-driven applications, such as speech-driven automated control for IoT and system integration, along with immersive human-computer interaction interfaces provide better user experience. In addition to e-commerce applications, more applications in the product life cycle process have begun to be observed. The application scenarios of natural language-enabled chatbot have clearly begun to shift from e-commerce to engineering applications, such as product design, engineering assets management, smart manufacturing, and workshop management. Natural language-enabled chatbot, as an emerging smart system architecture using AI, has become a service integration solution through the integration of devices, algorithms, and network communication technologies. It is also expected to continue to impact the traditional information system architecture in the future.

7.2. Managerial Implication

At present, the application of chatbot is still focused on personal assistants and customer services, and these application scenarios are limited to a very limited field of knowledge

From the early rule-based dialogue interaction system to natural language interaction, coupled with the maturity of voice recognition technology, chatbot can provide good dialogue quality in chit-chat and single-round dialogue. The bottleneck of service provision has shifted from system development to the establishment of in-depth domain knowledge base. Many Internet service providers have been able to provide a convenient application framework for establishing chatbot as an automated customer service or personal service assistant. The success of the chatbot service depends on whether it accurately interprets users’ context or intented question and possesses the knowledge base needed to fully support the context and provide accurate replies.

The limitation of chatbot’s focus on a single domain has begun to be noticed, so the practice of integrating multiple domain chatbot into a chatbot advisory group has been seen in recent patents and research. With the changes in chatbot system structure, multiple domain knowledges are integrated into a complex system. In recent years, the strategy of focusing on data-driven innovation has led to new products and business models in the emerging and developing digital markets. However, while exploring knowledge from data, user privacy is an issue that needs to be treated with caution [75, 76].

To sum up, the feature of chatbot shifts from simple information provision to complex information integration and versatile decision supports, which means the reasoning and automatic dialogue and interface controls must be addressed. Patents on the control of electronic devices for smart homes or cars also support this idea.

7.3. Practical/Social Implications for Marketers

The three main motivations of chatbot usage imply the importance of social media to the development of chatbot, the potential of chatbot, and immersive technology in the entertainment industry, and the issues of chatbot implementation [72]

As a platform for people to initiate conversations, social media has become main chatbot interface applications to the end users. The rapid integration of social media and chatbot in e-commerce sites continues to grow and evolve.

The second most important application motivation is entertainment, which is rarely addressed in patent documents. The realism of chatbot is still insufficient, but it can already provide rich and interesting interaction. In terms of industrial development process, VR is at a similar stage. The VR experience itself is very attractive, just like an exciting game, so the user experience when creating a virtual environment is far more important than the degree of realism [77]. It can also be found from the results of TFM that there are some patents located in chatbot combined with immersive technology to improve user experience. For digital marketers, it implies that combining VR and chatbot in marketing and entertainment is expected to bring users a more immersive and innovative experience.

The third most application motivation is about social services, such as social care for the elderly living alone. In the 3D-TFM proposed in this research, some patents for chatbot applications in social services and education scenarios have indeed been observed. The Turing Test was proposed in 1950 as a method to examine how a machine behaves like a person [78]. In 2000, 50 years later, there has been a lot of controversy about the relationship between the Turing Test and AI development [79]. However, now, with the mature development of DL technology nowadays that brings clear productivity and benefits, it is not that important whether a chatbot behaves like a person. An article on the application of chatbot in health care also mentioned that “AI needs to pass the implementation game, not the imitation game” [80]. The applications of service industries, such as entertainment, social service, and education, imply that chatbot should not be regarded as merely an emulated person, but a system interface that can talk in natural language and should be more convenient for human-computer interactions. Although studies have shown that consumers generally prefer to interact with people compared to chatbots, giving human qualities can still effectively enhance the consumer experience [81]. For marketers, it will be an important issue to strike a balance between competent tasks and anthropomorphic enthusiastic responses.

7.4. Limitations and Future Research

The first limitation is that the data source selected for this study is patent documents from the DI collective global database

The smart search feature of the DI database uses natural language processing and deep learning methods to help find related patents that match the user’s domain description. Compared with the traditional field search, this is a great feature that can help identify related patents faster and more accurately. Nonetheless, this limits the use of paid DI database for comprehensive patent set. The second limitation is that even though data-driven ontology construction methods are investigated in this study, domain experts are still needed to be involved in the entire operation of the framework for two main purposes, key term extraction and result verification. When searching for patents in a specific domain, relevant term will appear in a large number of patent documents. Although the TD-IDF vectorization mechanism has considered both the number of terms and the uniqueness in all documents, the clustering results show that each cluster still contains a large number of common terms. In the results of topic modeling, these general terms are the main topics corresponding to the clustering results, which indirectly confirms the validity of the method of this research. However, even though we construct ontology from patent documents through a data-driven method, we still need domain experts to verify the correctness of its ontology. In addition, in the construction process of TFM, this research also explores the scenarios in which these technologies and functions are applied. Terms related to these scenarios are mentioned in patent data but occupy little number of words. This is also a limitation on TF-based text-mining method.

Future research will solve the problems mentioned above. The first is to expand the source of data. In addition to patent data, Ribeiro-Navarrete et al. [76] proposed an SLR method of analyzing academic articles or the nonpatent literature. It is expected that a more comprehensive view might be provided by adding SLR in future research, and the comparison between the results of SLR and patent-mining can be further investigated. Moreover, how to better eliminate repeated terms in unstructured documents iteratively or other approaches will help to make text-mining methods more focused on finding unique representing terms in specific domain. Thus, since quantitative and similarity-based text-mining approaches have been applied and reach the limit, advanced technologies related to key term identification are clearly very important future research. Despite the above limitations, the framework proposed in this study, which analyzes the development of natural language-enabled chatbot with quantitative supporting data, finds emerging technologies and points out possible future development directions and is still comprehensive and effective. In addition, this method and framework are universal and can be easily applied to discover emerging technologies in other domains.

The patent analysis method proposed in this research is used to explore the emerging technologies and trends of natural language-enabled chatbot, which can reach high consistency with the hints given in academic research. The methodology of this research is not restricted by a specific domain, so the authors hope that this methodology can be used as a reference for researchers to explore more emerging technologies and trends in other fields, so as to demonstrate the contribution of this research.

Conflicts of Interest

The authors declare that they have no conflicts of interest.


This research was partially supported by research grant funded by the Ministry of Science and Technology (grant no. MOST-108-2221-E-007-075-MY3). The authors also express their gratitude to Yi-An Su for helping refine the illustrations in the paper.