Abstract

The accurate prediction of cancer from microscopic biopsy images has always been a major challenge for medical practitioners and pathologists who manually observe the shape and structure of the cells from tissues under a microscope. Mathematical modelling of cell proliferation helps to predict tumour sizes and optimizes the treatment procedure. This paper introduces a cell growth estimation function that uncovers the growth behaviour of benign and malignant cells. To analyse the cellular level information from tissue images, we propose a minimized cellular graph (MCG) development method. The method extracts cells and produces different features that are useful in classifying benign and malignant tissues. The method’s graphical features enable a precise and timely exploration of huge amounts of data and can help in making predictions and informed decisions. This paper introduces an algorithm for constructing a minimized cellular graph which reduces the computational complexity. A comparative study is performed based on the state-of-the-art classifiers, SVM, decision tree, random forest, nearest neighbor, LDA, Naive Bayes, and ANN. The experimental data are obtained from the BreakHis dataset, which contains 2480 benign and 5429 malignant histopathological images. The proposed technique achieves a 97.7% classification accuracy which is 7% higher than that of the other graph feature-based classification methods. A comparative study reveals a performance improvement for breast cancer classification compared to the state-of-the-art techniques.

1. Introduction

In 2020, almost 19.3 million new cancer cases (without considering nonmelanoma skin cancer) and nearly 10 million cancer deaths (without considering nonmelanoma skin cancer) occurred. Breast cancer in females has surpassed lung cancer as the most often diagnosed cancer, with approximately 2.3 million new cases (11.7%), after lung cancer (11.4%), colorectal cancer (10.0%), prostate cancer (7.3%), and stomach cancer (5.6%) [1]. According to the American Cancer Society, a group of diseases specified by intractable growth and a spread of abnormal cells is termed cancer, which can cause death if the spread is out of control [2]. Normal cells mature in a very well-defined way, whereas cancer cells do not have this well-defined characteristic [3]. Figure 1 shows the microscopic histopathological images of breast tissues (benign and cancerous). The main difference between cancerous and normal cells is that normal cells go through the apoptosis stage, which means that the cells die after some time, but cancerous cells do not go through this state, so they are alive until external therapy is given to them. The apoptosis stage in cancer cells is rare or very rare, resulting in no death of the malignant cells [4]. Cancer is a state of dynamic tumour growth [5]. Tumour evolution is a process in which gene expression is modified, causing unusual cell behaviour. There are a variety of tumour growth models that are important for predicting tumour growth patterns. Estimation of tumour growth rates is a very important issue because it can help in constructing an accurate tumour growth model. The models can be used to evaluate screening strategies and treatment protocols.

In [6], the authors proposed a model that modifies the Viola–Jones model by adding segmentation tasks capabilities. The segmentation process is performed by scanning ovarian and breast ultrasound images pixel by pixel utilizing the local details of pixels. The authors proposed a computer-aided method to distinguish different types of tumours from the breast and ovarian ultrasound images [7]. The authors in the work [8] demonstrated a novel transfer learning method called DTL to overcome the insufficient medical image problem. The proposed approach works well in skin cancer and breast cancer histopathological images. In [9], the authors proposed a method for the classification of benign or malignant breast cancer from mammogram images. The proposed approach contains multifractal dimension (M-FD) to extract multiple features and a genetic algorithm to reduce the size. Finally, the ANN was used to classify the features. The authors proposed modified Xception model to resolve the overfitting problem and improve the classification accuracy [10]. The methods perform well for breast cancer histopathological images. Different microscopy imaging techniques construct the cell arrangement images and the function-based characteristics of the biological system frameworks, including cultured cells, tissues, and organs. There are some approaches that can form a graph of cells from a tissue image and compute graph theoretical features to evaluate how the cells are distributed over the tissue [1120]. The concept of cell-graph mining was introduced by Bilgin in 2007 [14]. They presented hierarchical graphs, where they obtained an 81.8% accuracy in the classification of the breast cancer tissue. In another work [21] by Bilgin et al., they presented a paradigm shift in the area of cell-graph mining by incorporating the ECM information and allowing multiple coloured cell graphs, each modelling a different type of cell structuring, to coexist on the same tissue. They demonstrated this method for bone tissues that represent healthy, fractured, and cancerous tissues. The main finding of the work was the dissimilarity between the healthy, fractured, and cancerous tissues, which is obvious, but the cancerous versus fractured tissue was not easily distinguishable. To date, no work has demonstrated and implemented cell growth functions to differentiate between the benign and malignant cells. In addition, this graphical feature-based proposed technique is very novel for classifying benign and malignant cells from the BreakHis dataset [22]. Many deep learning-based studies have been performed to classify benign and malignant cells from the BreakHis dataset [22]. Deep learning-based classification features are hidden to the user, which is a major drawback of this technique. In medical image analysis for fruitful classification, the classifying features need to be sensitive and transparent for comprehension by a pathologist.

Considering this significant research, the primary contributions of this paper are as follows:(1)First, in this paper, we estimate the growth rates of normal and abnormal cells in the breast tissue. We find the difference between both growth rates using the growth estimation function. The difference between functions explores the cell growth behaviour, which is plotted graphically.(2)Then, the paper proposes minimized cellular graph (MCG) method, which shows its efficiency in distinguishing between benign and malignant breast cancer cells. The proposed MCG requires a low amount of computational time. The experiments are carried out on the BreakHis dataset.(3)Third, the proposed framework for the classification of benign and malignant breast cancer cells from the histopathological images is compared with different classifier performances on the BreakHis dataset. Our proposed approach performs better than the existing systems on the BreakHis dataset.

The remainder of this paper is organized as follows: in Section 2, we discuss our proposed system to generate a minimized cell graph from a tissue image and graph mining procedure. We explain the experimental results in Section 3. We deliver a conclusion and a future perspective for our research work in Section 4.

2. Methodology

2.1. Dataset

The dataset [22] consists of 7909 histopathological biopsy sample images collected from 82 patients. These samples were taken by the P&D Laboratory in Brazil from January 2014 to December 2014. The BreakHis dataset is divided into two classes, benign and malignant, of which 2480 are benign and 5429 are malignant tissue images. The BreakHis images were stored at four magnification levels (40×, 100×, 200×, and 400×). This dataset is in .png format.

2.2. Model

Figure 2 illustrates the framework of our proposed approach used for the growth estimation function and the classification of breast histology images for normal and abnormal cells. It comprises two major tasks, one part is a normal and abnormal cell-growth functions’ estimation and the other part is a classification using MCG.

2.2.1. Growth Estimation Function for Normal and Abnormal Cells

Assuming P cells in a unit area, a normal cell can divide into Pn cells at time t. However, cells die in a multicellular organism in two ways:(i)Things that destroy cells (for instance toxic chemicals or physical injury) kill them; a process called necrosis.(ii)They are activated to undergo line-up cell death. The best-comprehended kind of modified cell life end is apoptosis.

These are the steps by which the number of cells in our body stays in balance.

As shown in Figure 3, the left side of the figure represents the growth steps for normal cells in the apoptosis stage and the right side of the figure represents abnormal cell growth. The mathematical illustration, in the form of a flow, is represented in the abovementioned figure.Let us assume that normal cells died in P cells at time t.Then, the total number of cells = (Pn − P).2nd stage, again after t time, Pn cells divide into (Pnn) number of cells.Then, the total number of cells = ( − 2P)....After number of t times , cells divide into (n) number of cells. Then, the total number of cells= ( − mP).At stage t = 0,Φ (n) = P.At stage t = t,Φ (n) = Pn − P.At stage t = 2t,Φ (n) = ( − 2P)....At stage t = mt,Φ (n) = ( − mP).Then, in the case of abnormal cells,At stage t = 0,⋀ (n) = P.Here, P is the number of cells initially in a unit area and t is the time interval instances.Let abnormal cell growth be {nn} times the number of cells in the previous stage after interval t.At stage t = t,⋀ (n) = .At stage t = 2t,⋀ (n) = ....At stage t = mt,⋀ (n) = .At stage t = 0, ⋀ (n) – φ (n) = (P − P) = 0.At stage t = t, ⋀ (n) – φ (n) = Pn ^ 2 − Pn + P = Pn (n − 1) + P.At stage t = 2t, ⋀ (n) – φ (n) = Pn ^ 4 − Pn ^ 2 + 2P = Pn ^ 2 (n ^ 2 − 1) + 2P....At stage t = mt,⋀ (n) − φ (n) = Pn ^  (m + 2) − Pn ^ m + mP=Pn ^ m (n ^ 2 − 1) + mP.Let us assume, that at stage t = mt,Difference between normal and cancerous cells is = D.D = (⋀ (n) – φ (n))=(Pn ^  (m + 2) − Pn ^ m + mP)=P (n ^ m (n ^ 2 − 1) + m).

The growth function of the normal and abnormal cells illustrates the behaviour of cell growth for its entire life cycle. The estimated difference function shows the cell growth difference at the mth stage at time t. The difference will be incremented with time t, i.e., the growth rate will increase.

Figure 4 shows the graphical property of the difference in the cellular graph node between the normal and cancerous tissues. For that purpose, we automatically extract cellular graphs from the benign and malignant tissue images from the BreakHis dataset.

Figures 5(b) and 4 show that the graph created from the difference function and the graph created from the cellular graph node count (extracted automatically) possess very similar characteristics. The growth rate estimation function shows the cell growth characteristics of both the benign and malignant tissues. The difference function explores the behaviour of cell growth, which is also validated by the difference between the numbers of cell counts from the dataset images.

2.3. Development of Minimized Cellular Graph (MCG)

Automatic cell extraction is a subpart of our proposed minimized cellular graph generation, which is illustrated in the next section. In this section, a minimized cell graph algorithm is proposed. Figure 6 shows the block diagram of the proposed system. The purpose of the proposed algorithm is to produce a minimized graph that reduces the time complexity and achieves good classification accuracy (Algorithm1).

Input:
A: RGB image with dimension m  n
Output:
G: Minimized graph
(1)Start
(2)Convert the input RGB image into Gray scale image:
(3)Apply K-means segmentation
(4)Obtained the segmented image is Seg
  
  
  If
   rk == p
   Seg = Segk
  End
(5)Binarization:
If
Seg (:, :)> =t1 & Seg (:, :) ≤ t2 = 1
Else
Seg (:, :) = 0
Here, t1 and t2 are the threshold values.
End
(6)Extract connected component:
(7)Distance matrix creation from the extracted connected component:
(8)
(9)If ds > 
Extract the nodes with their coordinates.
(10)G=Generate the graph with the extracted nodes
(11)End

We need to extract the cell from the images using segmentation. In statistics and machine learning (ML), k-means clustering is a method for analysing the clusters, which aims to separate n observations into k clusters in which each observation belongs to the cluster with the nearest mean. In this paper, we used the k-means algorithm, which clusters the pixels from the images according to their La  b values. Unlike the RGB colour space, the La  b colour space is a uniform colour space. The colour and detail information are completely separate entities on a tissue [15].

In a cell graph, the cells or cell clusters of a sample tissue are the vertices. An edge is defined between a pair of cells or cell clusters based on an assumption that has a biological foundation (or hypothesis) [21]. The cell graph approach collects the details encoded in the tissue by capturing the spatial distribution of the cells and their cluster information. In our work, we present a novel computational model that solely relies on the topological characteristics in the case of cancerous cells in the tissue. The complex dynamic nature and self-organizing clusters of cancerous cells exhibit distinguishable graph properties that discriminate the cancerous tissue from noncancerous tissues. The graph property metrics computed from the cell graphs of images, i.e., one from benign cells and another from malignant cells, are discriminated with a high accuracy [23].

The following are the steps for graph minimization:Step 1: connect component extraction:The number of components present in the graph is an important topological invariant of the graph.As the topological space (X), i.e., the image matrix space is not connected, X has subtopological spaces, and it makes sense to investigate the maximal connected subspaces that contain more structural information. Given a point xX, we defineIn the abovementioned equation, we write the union of all connected subspaces of X that contain point x as C(x).Each connected component is considered as a colony of cells. In this proposed method, we extract the colony or the connected component as one of the steps of graph minimization. In biological instances, the distribution of the counts of normal cells per unit area is less in cancerous tissues. Hence, we concentrate on the connected component containing more cells or, in other words, a denser area for the minimization procedure.Step 2: calculate the centroid:A set of objects having the same properties is normally presented as the centre of a cluster. Thus, it is desirable to seek a centre that is intrinsically representative of the different cell colonies.As the colony encloses similar types of cells, it is simpler to take one node characteristic, the centroid of a colony, i.e., connected component that has a similar characteristic to the other cells of the colony. With the centroid concept, we can use the surrounding features of the centroid and the interaction between the cells of the colony and the centroid.Step 3: deduct the close centroids:We hypothesize an association between two nodes, i.e., centroids of the connected component when the interspace between them is less than a threshold. The Euclidean distance between two points, p and q, in is given by

We compute all the possible distance metrics without duplicating the nodes. We extract the mean metrics of the Euclidean distances. Considering the mean value as the threshold point, we deduct the nodes from the pairs that are less than the threshold value. Later, we generate the cluster centroid and compute the graph with the help of a Voronoi diagram. The main aim of this proposed work is to reduce the processing time, considering fewer nodes without losing any important characteristics.

Figure 7 presents the flow of the graph minimization method with all the detailed steps.

2.3.1. Graphical Feature Extraction and Classification

Features play a major role in classification. It is desirable to extract fewer features with important discriminative information. We use a reduced set of graphical features for classification while still providing an improved result. We extract a small set of graph features that are significantly discriminative, such as triangulation, the number of polygons, convex hull area, and edges.

The upper side graphical plots of Figure 8 show the normal cell graph features for differentiating between the benign and malignant tissues. The lower side of Figure 8 shows the MCG features that efficiently and prominently distinguish between benign and malignant cells.

To validate the discriminative property of these feature sets, we conduct our experiments using different classifiers. The classifiers used are SVM (polynomial, Gaussian, and RBF), decision tree, and random forest, nearest neighbor, LDA, Naive Bayes fitrensemble, and artificial neural network (ANN).

3. Results

To prove the significant effect of our proposed MCG features, we conduct experiments on BreakHis histopathological breast tissue images. We use 200X breast tissue images for this purpose. A performance analysis of the different classifiers is given in Table 1 and is graphically plotted in Figure 9.

The MCG-based classification accuracy with SVM, SVM-polynomial, SVM-Gaussian, SVM-RBF, decision tree, random forest, nearest neighbor, LDA, Naive Bayes fitrensemble, and artificial neural network (ANN) classifiers was 88.29%, 96.85%, 96.40%, 95.50%, 96.40%, 95.95%, 94.59%, 80.63%, 95.05%, and 97.7%, respectively. The cell graph feature-based classification accuracy with SVM, SVM-polynomial, SVM-Gaussian, SVM-RBF, decision tree, random forest, nearest neighbor, LDA, Naive Bayes fitrensemble, and artificial neural network (ANN) classifiers was 47.75%, 74.77%, 74.32%, 71.17%, 72.52%, 68.47%, 73.87%, and 75.20%, respectively.

From Tables 1 and 2, it is observed that the ANN performs better than the other classifiers based on the MCG features. The experimental results also reveal that the MCG features give higher classification accuracy than the cell graph (CG) features for all the classifiers.

3.1. Computation Complexity Analysis

Analysing the proposed model, we observe that the computational complexity of our proposed system is O (∛n) for a total of n pixels in an input image. In the case of CG, the complexity is proportional to the total count of pixels of an input image given to the system. The computational complexity is O (n).

Table 3 shows that the running time of the classification system with CG features is 710.67 s and the running time of the classification system with MCG features is 238.49 s. Hence, we can say that our proposed method outperforms in classification as well as in reducing the computation complexity.

3.2. Comparative Study and Discussion

There are very few works on CG-based classification systems. The classification accuracy of cancerous and noncancerous tissues using graphical features was 81% Bilgin, Cagatay, et al., 2007 [14], and 90% in Bilgin, Cemal Cagatay, et al., 2010 [19], whereas a 97.7% accuracy was achieved using the MCG features. Table 4 describes a comparative study with state-of-the-art techniques.

There are some recent works for the classification of benign and malignant cells using the BreakHis dataset with various deep learning models (DCNN, CNN, RNN, Inception v3, and ResHist). Table 5 shows the performance of recent deep learning methods on the BreakHis dataset. Our proposed method still gives an improved classification accuracy.

During the experiments, we observed that our proposed approach is very effective for benign and malignant classifications, as shown in the results section. On the other hand, this approach can aid in the processing of the large quantities of malignant histopathological tissue image data for diagnostic purposes with less time as our approach reduces the complexity of the problem. Benign and malignant classifications were performed with SVM, SVM-polynomial, SVM-Gaussian, SVM-RBF, decision tree, random forest, nearest neighbor, LDA, Naive Bayes fitrensemble, and artificial neural network (ANN). Compared with the existing classification techniques and graphical feature-based classification techniques, we observe that our approach improved the classification accuracy and reduced the processing time. For better generalization, which is a limitation, we need more breast cancer histopathology datasets with multiclass tissues to apply to our proposed technique.

4. Conclusion

Estimation of tumour growth rates is a very important issue because it can aid in building an accurate tumour growth model. Those models can be used to evaluate screening strategies and treatment protocols. In this paper, we generate a difference growth estimation function that shows the typical characteristics of cell growth. This function may be useful in estimating tumour growth for patient survival purposes. Mathematical modelling of cell proliferation helps to predict the tumour size and to optimize treatment procedures. The research contributions are noted as follows. We estimate the growth rates of the normal and abnormal cells of the breast tissue. We find the difference between both the growth rates using the growth estimation function. The difference function explores the behaviour of cell growth which is plotted graphically. The minimized cellular graph (MCG) method has been proposed, which showed its efficiency in distinguishing between benign and malignant breast cancer cells. The proposed MCG uses less computational time. The proposed framework for the classification of benign and malignant breast cancer cells from histopathological images has been compared with different classifier performances on the BreakHis dataset. Our proposed approach performs better than the existing systems. The advantage of our approach is that it is computationally less complex compared to the existing approaches. In addition, our graph-based approaches precisely extract the local and global structural features and relationship features, which are very significant for classification tasks. The limitations of our proposed research work are as follows. Our experimental investigations only used one available online dataset. Due to the limited availability of histopathological breast image datasets, we did not conduct experiments on other datasets. Moreover, we applied our proposed method to breast cancer histopathological images. The other types of cancer images have not been considered in our work. Only a binary classification was conducted, as the image dataset includes only benign and malignant images. We are not able to observe the performance of the multiclass classification system. The growth rate was estimated based on mathematical formulations; we did not validate it clinically. In the future, a system will be implemented for different kinds of cancerous histopathological images. We plan to work on multiclass classification based on cancer severity grades. Research improvements are possible if image datasets are practically collected in association with hospitals and clinicians [29, 30].

Data Availability

The data used to support the findings of this study are available from the corresponding author on request.

Disclosure

The experiment has been performed on publicly available dataset. The work presented here was conducted in the Bio-Medical Infrared Image Processing Laboratory, Computer Science and Engineering Department, Tripura University (a central university), Suryamaninagar-799022, Tripura West.

Conflicts of Interest

The authors declare that there are no conflicts of interest with respect to the authorship and/or publication of this article.

Acknowledgments

This work was supported by short-term ICMR-DHR (Indian Council of Medical Research- Department of Health Research) International Fellowship Programme for Indian biomedical scientists awarded to Mrinal Kanti Bhowmik for the year 2019-2020 under the Grant number: INDO/FRC/452/S-69/2019-20-IHD, dated 9th December, 2019. The second author is also grateful to the Department of Science and Technology (DST), Government of India, for providing the DST INSPIRE fellowship with a Grant number: IF200476.