The use of natural language processing (NLP) methods and their application to developing conversational systems for health diagnosis increases patients’ access to medical knowledge. In this study, a chatbot service was developed for the Covenant University Doctor (CUDoctor) telehealth system based on fuzzy logic rules and fuzzy inference. The service focuses on assessing the symptoms of tropical diseases in Nigeria. Telegram Bot Application Programming Interface (API) was used to create the interconnection between the chatbot and the system, while Twilio API was used for interconnectivity between the system and a short messaging service (SMS) subscriber. The service uses the knowledge base consisting of known facts on diseases and symptoms acquired from medical ontologies. A fuzzy support vector machine (SVM) is used to effectively predict the disease based on the symptoms inputted. The inputs of the users are recognized by NLP and are forwarded to the CUDoctor for decision support. Finally, a notification message displaying the end of the diagnosis process is sent to the user. The result is a medical diagnosis system which provides a personalized diagnosis utilizing self-input from users to effectively diagnose diseases. The usability of the developed system was evaluated using the system usability scale (SUS), yielding a mean SUS score of 80.4, which indicates the overall positive evaluation.

1. Introduction

Remote diagnosis systems are becoming increasingly popular and accurate, with enormous advantages such as cost-effectiveness, fast and reliable decision support for medical diagnostics, and treatment and prevention of disease, illness, injury, and other physical and mental damages in human beings. The rise in remote health services (or telehealth) offered by healthcare institutions coincided with the evolution of assisted living systems and environments, aiming to widen the possibility for older and disadvantaged people to access appropriate healthcare services and thus improve their health status and clinical outcome [1]. With the increase in the innovation of medical technologies, there is a need to adopt medical expert systems that will oversee and control diagnosis and treatment processes [2]. Medical diagnostic processes carried out with the aid of computer-related technology which is on the rise daily have improved the experience and capabilities of physicians to make an effective diagnosis of diseases while employing novel signal processing techniques for analysis of patients’ physiological data [3, 4] and deep neural networks for decision support [5]. With the rise of the artificial intelligence (AI) techniques, the chatbots have appeared as a promising direction in streamlining the communication between doctors and patients [6]. Such chats are becoming increasingly popular as remote health interventions are implemented in the form of the synchronous text-based dialogue systems [7]. Patients with chronic diseases could make the most advantage from the use of chatbots which can continuously monitor their condition, provide reliable up-to-date information, and remind of taking medication [8]. For the effective use of chatbots in the healthcare domain, the chatbot technology need advanced reasoning capabilities based on the formalization of medical knowledge (semantics) and health state of patients coupled with language vocabularies and dialogue engines [9].

The natural language processing (NLP) technology can serve as an interaction between computers and humans using linguistic analysis and deep learning methods to obtain knowledge from an unstructured free text [10]. The NLP systems have shown their uniqueness and importance in the areas of information retrieval mostly in the retrieval and processing of large amount of unstructured clinical records and return structured information by user-defined queries. In general, the NLP system is aimed at representing explicitly the knowledge that is expressed by the text written in a natural language. There are few applications of the NLP techniques in diagnosing diseases despite the enormous amount of text-based information, which can be retrieved from patients’ self-narrations [11]. The main challenges addressed by the application of NLP for medical records are flexible formatting, structure without sentences, missing expected words and punctuation, unusual parts of speech (POS), medical jargon, and misspellings [12]. Linguistic structures such as coreferences make medical texts difficult to be interpreted [13]. Moreover, unique linguistic entities such as medical abbreviations make the inference of knowledge from medical texts much harder [14].

This study introduces the use of the NLP model via an SMS and a chatbot platform to improve the health self-assessment and decision support in digital healthcare systems. The extraction of knowledge from the electronic health record (EHR) is a growing area of interest in medicine, and the use of electronic medical records (EMRs) at the healthcare center and on the cloud [15] has provided a vast amount of data to be analyzed. An EMR is a digital record of health-related information that is created, collected, and managed by medical experts [16]. Compilation of existing and available medical data complications includes integrating NLP into multiple EMRs, ensuring privacy and security of patients’ data [17] and clinical validation of a tool. All these can be overwhelming to medical research for improving patient care. However, the application of NLP techniques to screen patients and assist medical experts in their diagnosis would serve as a boost in successfully improving healthcare services through effective analysis of narrative text of symptoms provided by a patient.

For example, Langer et al. [18] used several NLP tools along with classification methods to process the drug-related questions. They developed a natural language-based interface that enables the users to phrase their queries and get an accurate result up to 81% in classifying drug-related questions. Pendyala et al. [19] presented an application which allows machines to take on the function of life support. The focus of the study was based on medical diagnosis, and an experiment was conducted to show the relationship of information retrieval and text mining to the medical diagnosis problem. The study concludes that the proposed system would help in improving the goals of providing a ubiquitous medical diagnosis. Fernandez-Millan et al. [20] presented a rule-based expert system using the list of likely diseases regarding laboratory test results for diagnosis. The authors concluded that the proposed system clinically gave a better accuracy and speed, thereby improving the efficiency and quality of service. Atutxa et al. [21] used deep learning models to extract the International Classification of Diseases (ICD) codes from the death certificates written in a regular natural language, obtaining an F-score of 0.963, 0.838, and 0.952 for Hungarian, French, and Italian, respectively. Combi et al. [22] proposed an NLP method for the transcoding of natural text descriptions of adverse drug reactions into MedDRA standard terms, reaching an average precision and recall of 91.8% and 86.9%. Evans et al. [23] analyzed patient safety incident reports written in free text to categorize incident type and the severity of outcome, reaching an accuracy of 0.891 and 0.708, respectively. Kloehn et al. [24] generate explanations for complex medical terms in Spanish and English using WordNet synonyms and summaries, as well as the word embedding vector as a source of knowledge. Sarker et al. [25] used fuzzy logic and set theory-based methods for learning from a limited number of annotated examples of unstructured answers in health examinations for recognizing correct concepts with an average F1-measure of 0.89. Zhou et al. [26] used deep learning models pretrained on the general text sources to learn knowledge for information extraction from the medical texts. Lauraitis et al. [27] used text input acquired by a smartphone app for evaluation of cognitive and motor deficits for people showing symptoms of central nervous system (CNS) disorders as a part of self-administered cognitive testing (SAGE) tests.

The development of the medical domain-oriented conversational chatbots has been addressed by several researchers. Such conversational agents powered by AI techniques may serve patients with minor health concerns while allowing medical doctors to allocate more time to treat more serious patients [28] or find suitable donors [29]. A chatbot-powered healthcare service can promptly respond to the problems that arise in daily life and to the health state changes of people with chronic diseases such as obesity, diabetes, or hypertension [30]. For example, Ahmad et al. [31] developed a chatbot that is able to advice on the kind of drugs to be taken based on the data submitted by the user. Avila et al. [32] developed a chatbot to find the best prices for medicines and suggest their best possible substitutes. Bao et al. [33] suggested a hybrid model composed of a knowledge graph and a textual similarity model to construct a system for responding to medical questions using Hierarchical BiLSTM Attention Model (HBAM). Chaix et al. [34] developed a chatbot for patients with breast cancer to provide support and answers to their concerns on their disease as well as to remind taking the prescribed drugs. Denecke et al. [35] developed a mobile app with a chatbot that uses the elements of the cognitive behavior therapy to support mentally ill people in addressing their psychological problems. Harilal et al. [36] developed a chatbot app aimed at supporting empathetic conversations, sensing the associated emotions, and extending medical advice for people with depression. Huang et al. [37] developed an AI-powered chatbot for promoting healthy lifestyle and providing advices for weight management. Hussain and Athula [38] developed a chatbot that uses Media Wiki API to extract information from Wikipedia to supplement the chatbot’s knowledge for advising diabetic patients on diabetes management. Kökciyan et al. [39] integrated the data from commercial health sensors, EHR, and clinical guidelines with a conversational chatbot that provides further explanations about their overall well-being based on the argumentation-based dialogue. Ni et al. [40] suggested a knowledge-driven primary care chatbot system that has an analytic NLP-based engine for understanding the descriptions of patients’ symptom, a reasoner for mapping symptoms to possible causes, and a question generator for creating further dialogue questions. On the other hand, Zini et al. [41] developed a deep learning framework-based conversational agent to represent a virtual patient that can be used for teaching medical students on patient examination.

Machine learning algorithms, especially SVM, have shown promising results in classifying free text such as Georgian language in medical records [42]. SVM with a polykernel was used for classifying primary care patient safety incident report content and severity [23]. However, authors claimed that improving definitions and increased training samples of select categories will further improve performance of the system. Deep learning methods were proposed by Zhou et al. [26]. The authors presented transfer learning methods based on the traditional multilayer neural network (NN) model to develop a clinical information extraction system. Other methods in existing studies include an interactive NLP tool for identifying incidents in radiology reports presented by Trivedi et al. [43]. The authors implemented and assessed usability based on an open-ended questionnaire and the system usability scale (SUS). The summary of the selected literature is depicted in Table 1.

Medical chatbot has been designed and implemented in various clinical areas for developing conversational tools with wide access to medical knowledge and healthcare issues. Existing chatbots are designed for either generic or specific disease purposes. A novel approach based on the AI method was proposed by Madhu et al. [46] for designing a simple and interactive medical assistance chatbot for medicine dosage intake considering the age and weight of patients. Mandy chatbot system was designed to assist healthcare staff in automating patient intake process [40]. The proposed chatbot is based on three sections which are the mobile app front for patient interaction, the diagnosis section, and the doctor’s interface for assessing patient’s records. The chatbots combine NLP with knowledge-driven diagnosis abilities. Similarly, Siangchin and Samancheun [47] developed a chatbot application using the auxiliary NLP library. The system was further compared with traditional ICD-10 application based on analytic hierarchy process (AHP) for analyzing, selecting, and classifying diabetes mellitus, trauma, and external causes. The integration of NLP and machine learning algorithms has also played a key role in creating chatbot application for disease prediction and treatment recommendation [48]. Deep learning framework was proposed by Zini et al. [41] for enhancing virtual patients’ conversational skills. The authors integrated long-short-term memory networks and CNN for sentence embeddings in a given QA script. Other methods include a study by Roca et al. [8] which introduced the chatbots-patient interaction system for specific chronic disease psoriasis. Further study from Rosruen and Samanchuen [49] also developed an intent-based approach chatbot known as MedBot using Dialogflow for medical consultant services. The authors claimed that the proposed system was able to maximize user’s convenience, increase service capability, and reduce operational cost.

The successful adoption of chatbot technology from Table 2 has shown effective interaction between users and machines especially in various domains within the healthcare system. However, there are some limitations with some of the methods proposed in the literature such as challenges associated with the static local knowledge based in chatbots and time consumption during training especially for a specific domain [38]. Therefore, there is a need for a future study to develop chatbot software with more scalability, increased data sharing and reusability, and improved standard conversation model [8].

The continuous growth of mobile technology has affected every facet of human life around the globe as its support of healthcare objectives through telemedicine, telehealth, and m-health [52] has helped to diagnose and treat patients at low cost especially in the developing countries, where there are limited options of diagnosis and treatment. Out of various communication media available on mobile devices, short messaging service (SMS) has proven to be unique and reliable due to its low cost, reliable delivery, personal to users, and not Internet-oriented service [53, 54]. Considering the need to provide good medical care to everyone including rural dwellers with poor electricity and slow Internet connections, it is therefore important to integrate SMS with a medical diagnosis system, thus establishing an SMS-medical diagnosis system to best meet the needs of a common man [55]. Considering the overall progress and research efforts made by researchers in improving e-health systems and designing decision support systems (DSS) [5658], there is still much work to be done for effective understanding and identifying key features based on NLP for enhancing diagnosis, thus improving good health and well-being of the global society at large.

Summarizing the existing medical diagnosis systems (MDS) often adopts poor decisions due to interpretation of the text-based input provided by the patient. Therefore, there is a need to automate MDS for efficient diagnosis of diseases and support their decisions based on the severity of symptoms. Moreover, the medical experts need a platform to keep track of large text-based chunk of knowledge narrated by patients in a natural language, hence improving healthcare delivery for remote patients.

The contribution of this paper is as follows: (1) we have developed a text-based medical diagnosis system which provides a personalized diagnosis utilizing self-input from users to effectively diagnose diseases. (2) The proposed system combines the NLP and machine learning algorithms for SMS and Telegram bot. (3) The system is able to diagnose using a direct approach of the question and answering technique to suggest a medical diagnosis.

The structure of the remaining parts of the paper is as follows. We present and discuss our methodology and the algorithm used in Section 2. In Section 3, we evaluate and discuss the results. We present conclusions and outline future work in Section 4.

2. Design Methodology

2.1. Outline of the Architecture

The study assesses the clinical data needs and requirements in diagnosing the tropical diseases in Nigeria and assesses the patients’ clinical data found in EHRs or manual records. The architecture of the proposed text-based medical diagnosis system is depicted in Figure 1.

The steps involved in the proposed text-based medical diagnosis system are as follows: (1) description of the knowledge base; (2) preprocessing of text-based documents; (3) tagging of document; (4) extraction of answer; and (5) ranking of candidate answers.

The implementation of the diagnosis system framework was done using the Python language due to the following functionalities: cross-platform and high availability of third-party libraries for tasks relating to machine learning and NLP. The system uses Python library packages to access the machine learning functions and NLP needed for categorization.

2.2. Knowledge Base

The knowledge base is the principal source of data in a question and answering system, and it could be in the structured or unstructured format. To develop the knowledge base, the information from a medical database system was collected and divided into categories which are referred to as context knowledge of the disease.

The main sources of information in the knowledge base are as follows: (1) WordNet [59], which provides a lexical database and defines the relationships of words and phrases; (2) YAGO (Yet Another Great Ontology) [60], an open source knowledge base which we use to construct a knowledge graph of common knowledge entities; (3) UMLS (Unified Medical Language System), which combines many medical vocabularies, including ICD-10-CM and SNOMED CT, and is used to link medical terms and extract medical concepts, relationships, or knowledge; UMLS is recognized as a comprehensive knowledge source in the healthcare domain [61]; and (4) Disease Ontology (DO) [62], which contains a knowledge base of over 10,000 human diseases.

We used WordNet as WordNet is used in most of the question-answering systems, and it has proved to be useful when dealing with words. An access to WordNet is implemented via WordNet HTTP API. An access to YAGO is implemented using a SPARQL [63] query engine, which sends queries to the SPAQL endpoint and returns the semantic fact triples (subject-predicate-object). An access to UMLs is implemented via UMLS REST API using the Python language. An access to DO metadata for a specific DO term is by using the REST metadata API by constructing a HTTP request. The knowledge base is specified using eXtensible Meta Language (XML), which ensures a common way to specify and share structurally organized data that are not dependent of an application.

For knowledge representation, we have adopted a three-layer model of disease-symptom-property (DSP) originally suggested in [64] as shown in Figure 2. The knowledge in the knowledge database is stored as resource description framework (RDF) triples (property, symptom, and disease), while the computational model is adopted from the disease compass [65], which allows us to query the causal chains of diseases.

Our knowledge base includes 71 instances of diseases (mainly tropical ones) and 542 pieces of information. The model (see Figure 3) uses the knowledge from the knowledge database and applies the fuzzy rules described in Section 2.6.

2.3. Communication System

The communication system was implemented based on the knowledge base for the efficient communication of users through Telegram or SMS with the medical doctor using question-answer rules. Each diagnosis question has certain features and attributes that give additional information about the request. The different attributes of a request/question can be as follows:(1)Diagnosis question: the actual diagnosis question that will be sent across to the user(2)Response: the list of responses that will be shown to the user that denote the answers that he/she can send to the system via the Telegram GUI or SMS(3)Serial id: the order in which the question should be asked

The types of questions are as follows.

2.3.1. Basic Data Questions

These are the preliminary questions asked once a communication has been established by the user and the system. These questions are basic information about the users, some of which include information such as gender, age, height, and weight. The sample question sequence is demonstrated in Algorithm 1.

def question_data ():
 return {
  'user_age' : {
  'diagnosis_question’: “What is your age?”,
  'diagnosis _response’: ['15-25', '25-40', '40-50', '>50'],
  'serial' : 1
  'user_weight' : {
  'diagnosis_question’: “What is your weight?”,
  'diagnosis _response’: ['40-50 kg', '50-70 kg', '70-90 kg', '>90 kg'],
  'serial' : 2
  'user_height’: {
'diagnosis_question’: “What is your height?”,
  'diagnosis _ response ' : ['4-5 ft', '5-6 ft', '6-7 ft'],
  'serial' : 1
  'user_gender' : {
  'diagnosis_question’' : “What is your gender?”,
  'diagnosis_ response’ : ['Male', 'Female', 'Unspecified'],
  'serial' : 3
2.3.2. Symptom-Related Questions

These questions are the questions that will be asked from users to confirm whether they are showing signs of a symptom or not. They can gather specific responses based on predetermined constraints or either binary (yes/no) responses. These questions are subdivided into two types:(i)Target diagnosis questions: these are questions that confirm if a symptom exists or not(ii)Linked diagnosis questions: these are question used for asking more information about a symptom in case the user replies affirmatively to the target question of that symptom

Each of these questions is designed such that they give an affirmative response of the existing symptoms or nonexisting symptoms. For instance, a top diagnosis question that will ask the users of the presence of certain symptoms is represented in Algorithm 2.

def diagnosis_top_data ():
 return {
  'fever' : {
  'diagnosis_question': “Do you have a fever?”,
  'diagnosis_response': ['Yes, High (>103 F)','Yes, Mild (101-103 F)',
    'Yes, Very Mild (99 - 101 F)', no],
  'head_ache' : {
  'diagnosis_question' : “Do you have a head ache?”,
  'diagnosis_response' : [yes, no],
  'body_chills' : {
  'diagnosis_question' : “Are you experiencing body chills?”,
  'diagnosis_response' : [yes, no],
  'diarrhea' : {
   'question' : “Are you having very frequent loose motions?”,
   'response' : [yes, no],
  'extreme_weakness': {
  'diagnosis_question': “Do you experience extreme weakness?”,
  'diagnosis_response' : [yes, no],
2.4. Content Extraction and Text Preprocessing

The content package is responsible for extracting the content knowledge from an SMS. For convenience, the package provides several different content extractors that specialize on extracting different sets of data from an SMS. The content extractors perform text processing using the NLP package. When a patient sends an SMS of his or her symptoms, the SMS receiver of the system receives it and passes the text (SMS body) to the NLP module, which looks at the text, makes necessary corrections if needed, and extracts important keywords.

The text processing operations include three major steps: noise removal, tokenization of document sentences, and splitting of sentences.

2.4.1. Noise Removal

Text contains a sequence of characters, both relevant and irrelevant characters. Therefore, noise is removed from the raw documents leaving only relevant content which is related to subjects for further processing.

2.4.2. Tokenization

This involves the fragmentation of strings of characters into its lexical elements. In this context, sentence splitting process was used for splitting of the text into a separate sentence. Here, we used the natural language toolkit (NLTK) tokenizer.

2.4.3. Tagging of Document

Useful information from the knowledge source is tagged for identifying quality information from the specific document. The names of diseases were used as labels for documents, and tools such as parser and WordNet were used for document tagging.

2.4.4. Parser

Stanford Parser was used as a tool for generating parts of speech (POS) of each word inputted by the user query, and, in addition, the candidate answers selected from the knowledge database. The WordNet was used to discover the relationship between the words of the user query and the data source. Words are grouped into nouns, verbs, adjectives, and adverbs.

2.4.5. Term Matching

Then, the system performs querying of the knowledge base to match the extracted words with information stored in the knowledge base.

2.5. Feature Selection and Extraction

The extracted collection of important words then was transformed into a feature vector suitable for use with the machine learning algorithm. For transformation into the feature vector, we used the word embedding [66] technique. Word embeddings are the representation of words in the semantic multidimensional space. For our system, we have adopted an available word embedding Glove [67], which is based on Twitter data, because it provides a good approximation to common English for an informal communication channel. For an effective training of data, the text messages were preprocessed and converted into feature vectors. This was achieved based on feature extraction using the word embeddings. The feature vector of the unlabeled document was then given to the classifier’s decision function to return a category for the unlabeled document.

2.6. Fuzzy Reasoning Module

Its primary purpose is to use fuzzy logic-based algorithms [68] to read and interpret the responses from the user, track and monitor all the symptoms that the user has already responded to, and to administer questions to the user that are most relevant based on the dataset of diseases that is maintained. Each disease is modeled as a bucket, where each bucket is associated with a symptom. The appropriate fuzzy rules are formulated to deal with multiple symptomatic diseases. The algorithms employed by CUDoctor read the state of these buckets and send the most relevant question to the user. This helps us narrow down the number of questions that would be asked by the system to reach a diagnosis.

The weighted fuzzy logic rule system is used, where each fuzzy rule has a weight assigned based on the historic data. The Mamdani fuzzy logic model of fuzzy inference was used, in which the IF-THEN statement represents each rule. The fuzzy rules were specified as “If x1 is A1 and y1 is B1 then z1 is C1,” where A1, B1, and C1 are fuzzy sets. The fuzzy rules are weighted by an assessment of the level of contribution of the properties and symptoms to diagnosing the disease. The crisp dataset D is described by n features and k samples [F1, F1, …, Fn], and the n-dimensional tuple Ti = [a1, a1, …, an] is represented as a kn-dimensional feature vector:where μFTk(ai) is the membership degree of the fuzzy term FTk of feature Fi (Fi = ai). If the fuzzy variable Fn has k fuzzy terms, FT1, FT2, …, FTk, then for each value of Fn, the fuzzy value is computed as max{μFT1(), μFT2(), …, μFTk()}.

The max-min operators were used for implication. In order to obtain the crisp output, the center-of-gravity (centroid) defuzzification was employed, where the weights are expressed by the degree of membership of the value xi with the concept modeled by the fuzzy set A.

The process is implemented (Figure 4) as follows:(1)Fuzzification: transforms the crisp inputs into fuzzy values. Expert judgement is used for defining the degree of membership function. During fuzzification, a fuzzy rule controller receives input data (fuzzy variable) and analyzes it according to membership functions.(2)Knowledge base: it comprises a fuzzy definition database and a fuzzy IF-THEN rule base. The rule base describes the diseases for each combination of crisp input variables.(3)Inference engine: applies the appropriate fuzzy rules on the input data.(4)Defuzzification: produces the crisp output values from the fuzzy values as the results.

The final decision is made by selecting the fuzzy rule that achieved the highest score. The system architecture, which was adopted from [64], is presented in Figure 5.

2.7. Classification Module

In this study, the choice of our classifier was dependent on the practical requirements of the proposed application and the need for a classifier with better results based on the established literature for document classification. However, there were specific requirements in the proposed application which influence the selection of our classifier such as the computational complexity for the training and/or testing phase.

Machine learning was used to provide category prediction on the text messages using the fuzzy SVM classifier [69]. It has only one hyperparameter C, which is a regularization parameter. The value of the hyperparameter was tuned by grid search. In a fuzzy SVM, a fuzzy membership values are used for each data point of SVM, and the SVM is reformulated such that different input points can make different contributions to the learning of the decision surface. Here, the fuzzy membership values are calculated using an algorithm suggested by Le et al. [70]. First, we use clustering techniques to find clusters of data. Fuzzy membership values of data points belonging to clusters are set to 1, and fuzzy memberships of other data points are determined by their distance to the closest cluster, respectively.

The classifier was trained using a set of training documents which have been processed by the NLP package and converted into word embedding used as feature vectors. The length of the feature vector is 300 as suggested by Mikolov et al. [66]. Finally, the feature vectors extracted from the user answers are passed to the fuzzy SVM model, which suggests the diagnosis by performing the classification on those important words contained in the SMS and then sends the result to the patient via an SMS.

2.8. Graphical User Interface (GUI)

The system initiates a communication/conversation with the user/patient to obtain more insights about their basic personal data such as gender, age, height, and weight. Once the basic data are acquired, the CUDoctor moves to the second stage and proceeds to query the patient for symptoms based on the above algorithms. The Telegram API was used for the GUI design with an additional custom keyboard that is also provided by the Telegram API. The screenshot of a question-answer subsystem is shown in Figure 6.

The SMS text request and response were integrated for communication since the application does not require an Internet connection, and it is compatible with all mobile devices. The python-telegram-API library was used as the Python wrapper that executes communication with the Telegram API. It enables easy setting method hooks that are triggered whenever a function is executed on the part of the Telegram chatbot. In the same way, the application sends a request to the Twilio communication API for the SMS text formatting server for incoming requests and passes responses to the logical layer for actual processing and presentation of the result to the client users. The python-Twilio-communication-SMS-API library was used as the Python wrapper that executes communication with the Twilio communication SMS API. The SMS interface for diagnosis conversation is represented in Figure 7.

3. Evaluation and Results

3.1. Data Collection

The data used in this study were collected from a medical database, and an interview was conducted for extraction of text content from experts and individuals with knowledge about the various diseases. The extracted text content was then stored on the local file of the system.

3.1.1. Selection of Participants

Selection of respondents was based on snowball sampling. The inclusion criteria require participants who have recently been diagnosed of the disease we are working on, and/or specialist who is engaged in clinical research, or having an indebt experience with the diseases by coming in contact with or treating patients who in recent times have been diagnosed. To be included for the questionnaire, the participants must be actively working in the hospital, responsible for treating patients who have been to the hospital for treatment on the various selected diseases. Individuals who do not match any of these inclusion criteria were excluded. Information extracted for the implementation of the system was the selected result from some individuals’ experience about the disease (individuals who can provide expert information about the selected diseases and their varying symptoms and individuals who have recently been diagnosed or hosted in the hospital for the selected diseases were recruited).

3.2. Evaluation of Results

To evaluate the performance of the developed service, we used the Bilingual Evaluation Understudy (BLEU) score [71], which has become a typical metric for evaluating chatbot services [72, 73]. BLEU scores an output response from the service as compared to the reference, where a BLEU score ranges from 0 to 1. Here, we used BLEU-2, which is based on unigram and bigram matches between the generated and the reference sentences. We also used the Recall-Oriented Understudy for Gisting Evaluation (ROUGE-L) metric [74], which is based on the longest common subsequence (LCS). The results are 25.29 for BLEU-2 and 31.56 for ROUGE-L.

3.3. Usability Testing

To perform the usability testing of the developed system, we followed the recommendations outlined by Cameron et al. [51]. We used the system usability scale (SUS), which is a questionnaire to evaluate the ease of use of a system using a Likert scale (five-point scale that varies from strongly disagree to strongly agree).

The SUS score is able to evaluate usability performance in terms of effectiveness, efficiency, and overall ease of use. SUS is considered as a reliable tool for measuring the usability, and it allows to evaluate a wide variety of products and services. SUS has become an industry standard, with references in over 1300 articles and publications [75], which also includes medical chatbots and other NLP-based medical diagnostics systems such as presented by Tielman et al. [76] and Valtolina et al. [77]. SUS is designed to support assessment and comparison of the user experience when interacting with different tools and is recommended to be included in any evaluation of health chatbots [44].

SUS has only 10 questions. The results of an SUS fall between 0 and 100, while a score of 68 is considered average. SUS has already been applied before to assess the usability of chatbots [78, 79]. The SUS questions are easily adaptable for use with different types of systems; therefore, it was adapted to be used for our study. The SUS questions we asked are listed in Figure 8.

The participants of the usability test comprised 27 participants, including 11 females and 16 males, 13 were aged between 25 and 34, 9 aged between 35 and 44, and 6 aged between 45 and 53. The study participants have provided an informed consent to take part in this research. Usability study lasted less than 45 min per session. Information booklets informing about the study performed were distributed to the participants, and the participants were requested to communicate with the chatbot by using a mobile phone. All received answers were anonymized.

The SUS scores were computed using the procedure provided by Brooke [80]: subtracting 1 from the score for questions 1, 3, 5, 7, and 9, whereas subtracting the score from 5 for questions 2, 4, 6, 8, and 10 and multiplying the total of the scores by 2.5 to obtain the final evaluation score.

The SUS score is assessed as follows: >80.3 (excellent), 68–80.3 (good), 68 (okay), 51–68 (poor), and <51 (bad). As a result of evaluation, the mean SUS score obtained for CUDoctor was 80.4, which is above the threshold of 68, which means that the overall evaluation was excellent. The results for all SUS questions are presented in Figure 9. They show that the user provided the worst evaluation for Q2 (“I found CUDoctor unnecessarily complex”) giving a score of 72.7, but still above the threshold of 68. The best evaluated feature was integration (Q5) with a score of 91.6.

3.4. Comparison with Other NLP-Based Services

Several chatbots with medical-related applications are provided on social networking platforms such as Facebook. For example, the FLORENCE bot that reminds the users when to take their medication and monitors their weight and moods. SMOKEY warns the users on bad air quality. HealthTap provides answers using a database of knowledge that contains similar questions. Google provides the Dialogflow Application Programming Interface (API) for the integration of NLP to the target applications. Woebot provides a cognitive behavior therapy service for patients with and has been tested with depression [81]. It allowed to reduce their symptoms of depression as evaluated by the depression questionnaire PHQ-9. XiaoIce is a social chatbot that emphasizes emotional connection [82], while using deep learning for meaningful response dialogue tasks. Chatbots are also used in suicide prevention and cognitive behavioral therapy, aiming at risk groups such as HARR-E and Wysa [83]. The main difference of the system described in this paper is that the service is delivered over the SMS rather than social networks, which require very good Internet connectivity often unavailable in remote rural regions of developing countries. Moreover, the described solution focuses on the niche domain of tropical disease symptom assessment, and we are not aware of any other NLP-based systems focusing on this domain of application.

4. Conclusion and Future Work

The timely access to healthcare avoiding unnecessary time wastage of patients is a major issue in sub-Saharan Africa. However, considering the exponential growth of mobile users and the need for a real-time medical diagnosis assistance tool, it is therefore important to explore the need for a cost-effective telehealthcare platform, which allows the earlier detection of diseases and effective communication with patients (users) to a diagnosis system (remote doctor at proxy). Based on the highlighted needs, this study was able to successfully build a text-based medical diagnosis system, which provides a personalized diagnosis utilizing self-input response from users to effectively suggest a disease diagnosis. The proposed system was able to combine NLP and machine learning algorithm for SMS and Telegram bot. The system was able to suggest a diagnosis using a direct approach of the question and answering technique to offer a diagnosis. The limitation of the system is that it is not secure against the false-positive cases of falsely suggesting the disease; therefore, a final diagnosis still must be confirmed by the medical doctor.

The future recommendations include the automation of this medical diagnosis system to easily recognize diseases, recommend treatments, prescribe a medication, and perform medication adherence. Audio interaction will be incorporated to make the system more interactive. These improvements will serve towards reducing cost and mortality rate, thereby reducing the workload burden on medical doctors in underdeveloped regions.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflicts of interest.


The authors gratefully acknowledge the support and sponsorship of Covenant University through the Centre for Research, Innovation, and Discovery (CUCRID).