Abstract

Machine learning is a branch of computing that studies the design of algorithms with the ability to “learn.” A subfield would be deep learning, which is a series of techniques that make use of deep artificial neural networks, that is, with more than one hidden layer, to computationally imitate the structure and functioning of the human organ and related diseases. The analysis of health interest images with deep learning is not limited to clinical diagnostic use. It can also, for example, facilitate surveillance of disease-carrying objects. There are other examples of recent efforts to use deep learning as a tool for diagnostic use. Chest X-rays are one approach to identify tuberculosis; by analysing the X-ray, you can spot any abnormalities. A method for detecting the presence of tuberculosis in medical X-ray imaging is provided in this paper. Three different classification methods were used to evaluate the method: support vector machines, logistic regression, and nearest neighbors. Cross-validation and the formation of training and test sets were the two classification scenarios used. The acquired results allow us to assess the method’s practicality.

1. Introduction

Since the origin of artificial intelligence in the 1940s, with some incipient work [1], it was sought to use computers as a help tool to solve problems of interest to humans [2]. It is from the influential work of Alan Turing in 1950 to determine if a machine was intelligent, performing the test that bears his name, that artificial intelligence and the number of scientific contributions related to this area increased significantly, emerging thus automatic learning, whose objective is to develop techniques that allow computers to learn. One technique used for machine learning is artificial neural networks which consist of a set of units, called artificial neurons, connected to each other to transmit signals. The evolutions of learning methods in conjunction with neural networks gave rise to deep learning (DL) which is made up of a set of machine learning algorithms that attempt to model high-level abstractions in data using computational architectures that support nontransformations. Multiple linear and iterative data are expressed in matrix or tensor form. These techniques are used in a large number of projects, among which we can find digital image processing. Digital image processing has, in recent years, acquired an important role in information and computing technologies. Today, it is the basis for a growing variety of applications including medical diagnostics, remote sensing, space exploration, and computer vision, among many others. Digital image processing (DIP) is the set of techniques that is applied to digital images with the aim of improving the quality or facilitating the search for information, using a computer as the main tool. Today, the DIP is a very specific research area in computing [3]. During the last 15 years, a growing number of techniques related to digital images and their processing in digital format have been introduced into medical practice. As in the case of the present work, digital image processing is used for the detection of tuberculosis. Tuberculosis (TB), also known as consumption, is a chronic infectious disease caused by a germ called Mycobacterium tuberculosis. The bacteria usually attack the lungs mainly but can also damage other organs of the human body. TB spreads through the air, when a person causes, in this way, the spread of the disease. It can be preventable and curable if detected early; otherwise, it could cause the death of the patient. Tests such as a chest X-ray or culture of a sputum sample can be done to find out if a person has TB disease [4]. In Iraq, one of the main causes of mortality is tuberculosis, with a rate of 9.24% per 100,000 inhabitants, according to data from the Statistical and Death System of the General Directorate of Epidemiology [5]. According to the World Health Organization, tuberculosis is one of the top 10 diseases with cough, and you can get tuberculosis by inhaling air droplets or from an infected person's cough or sneezing which is called primary tuberculosis (TB) [6] causes of mortality in the world. In 2015, about 10.4 million people became ill with tuberculosis and 1.8 million died from this disease. More than 95% of tuberculosis deaths occur in third world countries [6]. In Figure 1, we can see 4 X-ray images; the two images on the left show healthy patients and the two on the right show patients with the disease detected.

As shown in Figure 1, the methods of diagnosis and treatment of the disease have improved; however, the goal of the End TB Strategy for 2020 will not be met, and globally, estimated 10 million individuals have been contracting tuberculosis (TB) by 2020. There are 5.6 million males, 3.3 million women, and 1.1 million children in the country. Tuberculosis is seen in all nations and in all age groups. TB, on the other hand, is both treatable and avoidable [5, 6]. Undoubtedly, this public health problem continues to be a great challenge for the health system of countries, mainly developing countries. A very interesting review of the evolution of medical image analysis and processing techniques since the 1980s can be found in [4]. The task of extracting classes of information from an image is referred to as image classification. There are two forms of classification: supervised and unsupervised. The supervised classification starts from a set of known classes; these classes must be characterized according to the set of variables by measuring them in individuals whose membership of one of the classes does not present doubts, while the unsupervised classification does not establish any class, although it is necessary to determine the number of classes that we want to establish, and let a statistical procedure define them.

In the present work, artificial intelligence is applied in the automatic classification of chest X-ray images of patients with tuberculosis and without tuberculosis.

2. Review of Literature

Image processing is widely used, for example, in [6]; image processing is used to extract regions of interest with properties that can potentially be related to the medical diagnosis of Parkinson’s; it uses computer-aided diagnosis technology to process the images, extract the textures, make a segmentation of the image, and find the area of interest. Within [7], we found the use of image processing, pattern recognition, and artificial intelligence to help detect clusters of microcalcifications in digitized mammography images. Few articles [8, 9] used free databases, making it difficult to compare new techniques and even replicate results. The first results of their use were presented to the literature after the consolidation of two open and free databases of radiographic images [9] but considering the use of computer vision techniques for lung region segmentation [9]. Although the latter takes a different approach than the one that will be discussed in this paper, it encourages the use of these freely available databases for model training and testing. Most of the studies that looked into the use of multilayer perceptron neural networks for TB detection did not consider the use of medical images to feed such ANNs. On the contrary, they thought that laboratory parameters (cholesterol, creative, blood pressure, amylase level, and so on) and data from office exams (body temperature, cough, and difficulty breathing) could be used to provide experience with ANNs. Although these studies demonstrate the feasibility of using ANNs to detect tuberculosis from real-world data, they require medical examinations and trained professionals to provide input parameters to neural networks, which may not be available or feasible in certain circumstances, particularly given the majority profile of TB patients. To address these limitations, the approach proposed in this paper only uses radiographic images of the lungs, a low-cost and widely available exam that is thus more appropriate for realistic scenarios. Some works of interest where a compendium of techniques to improve medical images can be found are [810], and to eliminate noise from the image using techniques that range from erosion, extraction, and others commonly used in the state of the art, they can be referred in [11].

Another research found on image processing is found in [9] which like [1215] helps the early diagnosis of breast cancer using image processing; within this research, we can find that it uses the technique of segmentation by texture; the images used as evidence are from a database, which has images of cancerous masses and microcalcifications manually labelled by experts. In [1618], they carry out the identification of breast cancer using thermal images, perform a digital processing of the images, using a texture analysis of the images to identify, and extract all the regions of interest.

3. Proposed Method

The proposed method was developed. The characteristics of the images that are used as classification attributes are extracted with KERAS. KERAS is an open-source neural networks library created in Python that contains the ResNet50 architecture; this architecture will help to extract the characteristics of the images through arrays.

In the present work, three classification methods were used:(i)The first method used based on support vector machines (SVMs) is a supervised learning model with associated algorithms that analyse the data and recognize patterns. The data points closest to the hyperplane, or the elements of a data set that, if deleted, would change the location of the dividing hyperplane, are called support vectors. As a result, they might be regarded essential components of data collection. SVM stands for support vector machine and is a supervised machine learning technique that may be used for classification and regression. SVMs are more typically utilized in classification issues, thus that is where we will concentrate our efforts in this paper.When a dataset is divided into two groups, the SVM performs both linear and nonlinear classification, and the Kernel function is used to accomplish nonlinear classification; the kernels in nonlinear classification are homogeneous polynomial and complex regression analysis [10].(ii)The second method used is based on logistic regression (LR), which is a classification machine learning algorithm that is used to predict the probability of a categorical-dependent variable that is dichotomous; that is, it contains data that can be classified in one of two possible categories (dead or alive, sick or healthy, yes or no, and so on).One of the most crucial statistical models for designing the probability of a particular category or event, such as success or failure, is the logistic model. Logistic regression, on the other hand, employs a number of predicted variables that can be either numeric or categorical. This can be used to model a variety of events, such as determining whether an image contains a cat, tiger, fish, or another animal. Each detected object in the image will be assigned a probability between 0 and 1, resulting in a total of one. The Logit model or the broad sense classifier of entropy is other name for logistic regression. One of the supervised machine learning algorithms for “classification” tasks is logistic regression. It has developed a particularly positive reputation in the financial sector over the last two decades as a result of its exceptional ability to detect embezzlers. The general use of logistic regression and other prevalent linear classifiers is depicted in the diagram of network to extract the characteristics of the images that will be used as classification attribute for generating the maps.A logistic regression, therefore, requires that the dependent variable be binary. Also, the level 1 factor should represent the “desired” value. Only significant variables should be included as independent variables, which, in turn, should be independent of each other [14].(iii)The third method used is based on the nearest neighbors (KNN, K-Neighbors Classifier), which is an algorithm based on supervised type instances of machine learning.The fact why closest neighbor approaches have remained popular in practice is primarily due to their empirical success throughout time. This explanation, however, may be unduly simple. We focus on four elements of closest neighbor approaches that we feel are important to their continuing popularity. First, the ability to choose what “near” means in nearest neighbor prediction allows us to easily handle ad hoc distances or use existing representation and distance learning types of machinery, such as deep networks or decision tree-based ensemble learning approaches, to handle ad hoc distances. Second, the computational efficiency of a number of approximate closest neighbor search algorithms allows nearest neighbor prediction to scale to large, high-dimensional datasets that are typical in current applications. Third, closest neighbor approaches are nonparametric, relying on data to make minimal modelling assumptions rather than allowing the data drive predictions directly. Finally, nearest neighbor approaches are interpretable: they show the closest neighbors discovered as proof for their predictions.This method is particularly useful for classifying new samples (discrete values) or for predicting or estimating future values (regression, continuous values). It basically searches for the most similar data points (by proximity) learned during the training stage and makes suggestions for new points based on that classification [2].

For the present work, the Montgomery database was used, the X-ray images of this database were collected from the tuberculosis control program adopted by UNDP, Iraq, and set contains 138 radiographs, of which 80 radiographs correspond to healthy patients (normal) and 58 radiographs show manifestations of tuberculosis (abnormal). This database is available. All images have been de-identified and are in DICOM format. The set includes a wide variety of abnormalities, such as spillage patterns. The dataset contains radiological readings in the form of a text file. Each image contains a label that aids with image identification. The labels can be with TB (label with the number “1,” success) and normal or without TB (label with the number “0,” failure).

A preprocessing was carried out on the images used. There are two main parts to preprocessing: (1) padding and (2) resizing. Both stages are carried out after the images were extracted; these stages result in a matrix for each input image with dimensions of 224 × 224 and with numbers from 0 to 255; this corresponds to a 224 × 224 image in 3 channels (RGB); the purpose of this process is to provide the network with a matrix of these dimensions. Once these steps have been completed, they enter the ResNet50 network. Figure 2 shows the process carried out by the network.

The last stage of the preprocessing is when they enter the network to extract the characteristics of the images that will be used as classification attributes. The network takes as input the generated matrix of [x224x3], and in each layer, it performs convolutions to the matrix and in this way generates extraction maps.

In the penultimate layer of the network, a vector of dimension 2048 is obtained, which contains the general characteristics of the image such as saturation, luminosity, and intensity, among others.

3.1. Processing for Cross-Validation

Once the arrangements with the characteristics of the images for TB and Normal are obtained, the labels are created in a text document to name each one of the images that will be used for the training of the program. Within the processing program, labels and characteristics are called and relationships between labels and characteristics are created to later be converted into arrangements.

When the system finishes sorting the data for its best interpretation, it converts the relations into 0 and 1; in order to be able to interpret them, in this part, each detected object in the image will be assigned a probability between 0 and 1, resulting in a total of one. The Logit model or the broad sense classifier of entropy are other names for logistic regression. Figure 3 shows the diagram of the processing carried out for the cross-validation scenario, with each of the automatic classification methods.

3.2. Processing for Training and Test Set

For the second scenario, training and test sets were created. 80% of the images were used for training, and the remaining 20% were used for testing; this in order that the test images are never seen by the training set. For each classification performed, we recorded the following evaluation metrics: Accuracy, Precision, Recall, and F-Measure (statistical measure F). In the state of the art, it is common to name these metrics by their names in English.

The results obtained are shown in the next section.

4. Results

The graph in Figure 4 shows a comparison of results between both scenarios.(i)Cross-Validation (CV). A methodological error is learning the characteristics that facilitate surveillance of disease-carrying objects and a generally assumed evaluation for clinical diagnostic use of the same data x, y. A model that just repeats the labels of the samples it has just seen would get a perfect score but it would fail to anticipate anything valuable on yet unseen data. Overfitting is the term for this circumstance. To avoid this, it is usual practice to set aside a portion of the available data as a test set such as X test and y test when doing a (supervised) machine learning experiment. It is worth noting that the term “experiment” is not just for academic purposes; even in commercial contexts, machine learning frequently begins as an experiment. A typical cross-validation approach in model training is depicted in the flowchart below. Grid search strategies may be used to find the optimal parameters.(ii)Training Sets (TS). The study and creation of algorithms that learn from and make judgments on data is a typical job in machine learning. These algorithms work by constructing a computational formula from incoming data to make data-driven predictions or judgments. In most cases, the input data needed to develop the model are split into different data sets. Three data sets, in particular, are often employed at distinct phases of the model’s development: training, validation, and test sets.

For each of them, the four recorded evaluation metrics are shown.

In the graph, it can be seen that the best result of both scenarios is obtained when using SVM as a learning method (Figure 4) and the worst values obtained were that of close neighbors for both classification scenarios in practically all the metrics. Table 1 shows the results obtained in the cross-validation scenario for the four evaluation metrics. While in Table 2, we find the results obtained with the training and test sets formed.

The mean of scenario that shows a better performance is the training and test sets, which is the most desirable classification scenario, when there are enough instances to form both sets, because the training set never sees the test set, thus avoiding having any kind of influence or tendency when assigning the category to the image under study. It can also be observed that in both scenarios the classifier that shows the best performance is SVM.

A test harness is required while constructing a framework for a predictive modelling issue.

The test harness specifies how the domain’s sample of data will be used to assess and compare potential models for a predictive modelling challenge.

There are several methods to organize a test harness, and there is no one-size-fits-all solution for all applications.

Using a piece of the data for training and tuning the models and a portion for giving an objective assessment of the tuned model’s skill on out-of-sample data is a popular strategy.

A training dataset and a test dataset are created from the data sample. The model is assessed using a resampling approach such as k-fold pass on the training dataset, and the set may be further separated into a test data for tuning the model’s hyperparameters.

5. Conclusions

In the present work, results of automatic classification of medical images are presented in two categories: with and without tuberculosis.

To carry out the classification, features are extracted using deep learning and the RESNET50 neural network. Cross-validation and the formation of training and test sets were the two classification scenarios used. The scenario with the best results was the one in which the training and test sets were formed with an accuracy greater than 85%.

The classification method that shows the best performance in the two scenarios implemented in this work is SVM. As can be seen in the results obtained in the present work, these far exceed chance and allow to carry out the classification of images in an efficient way.

Computer tomography (CT) of the abdomen, CT of the head, magnetic resonance imaging (MRI) of the brain, and MRI of the spine were all used in this investigation. Our suggested CNN architecture could automatically categorize these 4 sets of medical photos by image modality and anatomic location after converting them to JPEG (Joint Photography Experts Group) format. In both the validation and test sets, we achieved outstanding overall classification accuracy (>99.5 percent).

The collected results allow us to assess the viability of the methods adopted. It also allows us to identify the best classification scenario and machine learning method to carry out the classification of radiographs with and without tuberculosis.

Data Availability

The data underlying the results presented in the study are included within the manuscript.

Disclosure

It was performed as a part of the Employment of Institutions.

Conflicts of Interest

The authors declare that they have no conflicts of interest regarding the publication of this paper.