Abstract

The recommended tool for assessing knee injury is magnetic resonance imaging (MRI). However, knee MRI interpretation takes time and is vulnerable to clinical errors and inconsistency. A deep learning automated technique for reading knee MRI might help physicians identify high-risk patients and make diagnosis easier. In this study, we have proposed a deep learning-based model to detect ACL and meniscus tears and other knee abnormalities. At its core, this model is based on the ResNet50 transfer learning technique. In this paper, we have focused to present a ResNet50-based model for detecting different knee problems using MRIs. The best models for every option achieved the objectives that were probably similar. The models were developed using 18, 3, and 1 slice. These models’ outcomes were rather startling. The AUC findings obtained with 1 slice per MRI exam were equivalent to those obtained with 18 and 3 slices and, in some cases, were significantly better. The dataset used in this model is from Stanford University. We trained this model in three different settings of MRI slices (18, 3, and 1). The best results that our models were able to achieve were when trained using 3 slices of each MRI sample. The area under the receiver operating characteristic curve, or AUC curve, values that our best models were able to achieve for detecting ACL, meniscus, and other knee abnormalities are 0.87, 0.82, and 0.90, respectively. The results of our models are comparable to some state-of-the-art models. These models are very fast and efficient to train and hence will be helpful to doctors for making an effective and fast diagnosis based on knee MRIs.

1. Introduction

Magnetic resonance imaging (MRI) exams are amongst the most prevalent diagnostic methods, with approximately 40 million MRIs performed in the United States each year [1]. MRIs performed on the knees can help in identifying abnormalities such as anterior cruciate ligament or ACL tears, meniscus tears, and other similar abnormalities. Deep learning methods are increasingly being used in the evaluation of medical images. A deep learning automated image analysis technology might help healthcare professionals to detect high-risk patients and make diagnostics simpler. Magnetic resonance imaging scans require significantly more time to conduct compared to CT scans, and patient satisfaction is a concern, which may be aggravated by the technology. Even when conducted by experienced radiologists, proper interpretation of knee MRI is time-consuming and prone to errors due to the number and complexity of images in each exam [2]. MRIs aid in the identification of numerous knee ailments as well as in differentiating between different kinds of knee tears. Magnetic resonance imaging of the knee can aid in the detection of anomalies such as anterior cruciate ligament (ACL) injury, meniscus tears, and other related conditions. Such models are incredibly quick and efficient to develop, which will enable clinicians to make effective and timely diagnoses based on knee MRIs. An automated system to detect knee abnormalities can prove to be very useful. In recent years, we have seen an increasing application of deep learning models in medical image assessment. These models have been found to be better or at par with human experts at diagnosing diseases in most cases.

In this study, we propose a ResNet50-based model to detect various abnormalities in knees using MRI. MRIs are helpful in the diagnosis of various knee injuries and help in distinguishing different types of knee tears from each other [3]. The dataset used in this study is the MRNet dataset, provided by Stanford University, which contains 1370 MRI exams of the knees. The dataset contains three types of MRIs taken from three planes, namely, sagittal, coronal, and axial planes. This dataset contains MRIs for ACL tears, meniscus tears, and other knee abnormalities. To detect these tears, MRIs from all three planes are required [4]. This dataset can be approached in multiple ways. In this study, we created a base model using ResNet50 and then we trained it 27 times using different combinations of input data.

The rest of this paper is laid out as follows: The motivation for this work is in the following paragraph. Work related to this paper is described in Section 2. Methodology and materials used in this study are described in Section 3. Section 4 has the details regarding the results and the discussion based on these results. Section 5 contains the comparison of a few state-of-the-art models with our proposed model along with the comparison of all models employed in this study. Section 6 has the details regarding to the future scope of this study and its conclusions.

1.1. Motivation

The usage of deep learning in medical image assessment is growing day by day, but most of these models are used for analysis of brain, chest, and breast images [5]. Other parts of the human body do not have nearly as many research papers written about them as the ones mentioned above. One of the reasons for this is the nonavailability of data. We, in this study, decided to work with knee MRIs as the number of research papers published related to knee MRI analysis using deep learning is very few. Knees are essential in the human body as they allow us to walk. If someone’s knee hurts, doctors usually prescribe an MRI. However, a nonexpert human cannot understand an MRI. An automated approach for detecting knee problems might be highly valuable. Deep learning methods have been widely used in medical image identification in recent years. Throughout many situations, computational models have been proven to be more effective than or on the level with human specialists at identifying illnesses.

Even experts sometimes face difficulties in interpreting these MRIs. Therefore, we decided to create a deep learning model that, just by giving the image of the MRI as input, would be able to tell if the MRI is normal or not. It would also give us an idea about how well a deep learning-based model performs compared to human experts.

Due to the difficulty of the task for a person to do, the abundance of data, and the success of deep learning, there is already a large amount of research work done on employing deep learning for MRI interpretation [68]. A perfect prognosis of cardiac illness might save a person’s life, whereas wrong predictions can lead to death. For example, to locate and categorize a brain tumor, deep learning techniques can be used. Breast cancer is one of the most common tumors in women. The use of MRI for early identification of breast cancer helps patients recover more quickly. Although the number of research studies published related to medical image analysis using deep learning is large, most of these research studies are focused toward brain tumors, chest diseases, etc. [9]. Knees and bones in general do not have nearly as many studies associated with them. This became one of the reasons why we decided to work on knee MRI analysis in this study.

Researchers in one study presented a thorough study on the use of deep learning for MRI image manipulation, which included a wide range of MRI image applications. This study describes the numerous issues that individuals and healthcare providers are encountering as a result of COVID-19. Several methods for controlling the influence of COVID-19 employing Internet of things are explored in this study [10]. One of the statements to emerge from this report was that there are many different deep learning architectures from which researchers can choose when to use deep learning to interpret MRI data.

In another study on knee MRIs, researchers developed a CNN that predicted the likelihood of having an abnormal exam when provided with a knee MRI sample [11]. A CNN model called MRNet was trained for each plane: axial, coronal, and sagittal, and for each form of knee injury, the probabilities from the model were blended to get a single probability.

ELNet, a CNN-based model designed for early knee MRI diagnosis, was presented in another study [12]. Unlike most techniques for knee MRI analysis, researchers trained ELNet from the ground up rather than utilizing a transfer learning method. The ELNet model is lightweight and performs as well as other models due to the merging of multislice normalization and Blur Pool operations.

Researchers calculated the likelihood of ACL injury, meniscus tear, and knee anomalies using the ResNet18, ResNet50, and ResNet152 models in another study. Two approaches were discussed in this study. The first process was to consider only the center slice of each MRI sample. This method assumed that the slices in the middle of the series would include more significant data than those at the beginning and end. The second technique used basic mathematical operations on a certain number of pictures to interpolate the number of slices to a specific number. The second approach seemed to produce better results than just utilizing center slices. Unpermitted consumers misappropriate multimedia material by distributing it on multiple web domains to earn more money fraudulently without any assistance of the original copyright owners [13].

Because of its quality of local connections and shared weights, a deep learning model like CNN can infer the representation of pictures [14]. Another research based on the MRNet dataset investigated two deep CNN models (VGG16 and VGG19) with a transfer learning strategy to predict knee injury [15, 16]. Chest cancer is the most common intrusive cancer in women as well as the second leading reason of cancer mortality in women. It could be normal or abnormal [17].

3. Materials and Methods

3.1. Dataset

The dataset that we used in this study was provided by researchers from Stanford University and has been publicly made available since 2018 [18]. The dataset contained 1370 MRI exams from 1088 patients, which included 1,104 abnormal exams, with 319 ACL tears and 508 meniscal tears. These were examined and labeled by human experts. The dataset was split into a training set (1130), a validation set (120), and a test set (120). The test set is not publicly available. The dataset comprises 80.6% abnormal exams, 23.3% ACL (anterior cruciate ligament) tears, and 37.1% meniscus tears. The labels for each MRI were manually extracted from the patient’s clinical files. The MRNet dataset, which was contributed by Stanford University and comprises 1370 MRI tests on knees, was used in the investigation. The collection comprises three different types of MRIs acquired in three different planes: sagittal, coronal, and axial. MRIs for ACL tears, meniscus tears, and other knee anomalies were included in this collection. Because patients who received an MRI were more likely to develop a knee injury, the dataset was heavily skewed in favor of classification of injuries. The organization of the dataset is shown in Figure 1.

Each exam in the dataset consists of multiple axial, sagittal, and coronal slices. The sagittal plane, which is on the xz-plane, splits the knee into the left and right halves. The coronal plane, which is on the xy-plane, splits the knee into front and rear halves, and the axial plane is parallel to the ground and splits the knee into the top and bottom sections. Figure 2 depicts these planes.

All exams in the dataset are of .npy format. The dataset contains labels for each exam in three different .csv files, namely, abnormal.csv, meniscus.csv, and acl.csv. There are separate labels for the validation set too. The shape of each file is (x, 256, 256) and (x, 256, 256) where x represents the number of slices of each exam. The number of slices in the dataset varied greatly, for example, for the same patient, an exam in the axial plane had 44 slices and 36 slices in the coronal and sagittal planes.

3.2. Data Preprocessing

Preprocessing is a word that refers to activities performed on images at the most fundamental level. The intensity of a picture is frequently specified as a matrix of image function values. Geometric transformations of pictures (e.g., rotation and scaling) are some of the preprocessing techniques. The purpose of preprocessing is to enhance the image information by removing undesired noise or boosting certain crucial picture elements for subsequent processing [19].

In this study, we initially applied some of the traditional preprocessing techniques like Gaussian blur to remove the noise from the MRI samples, a Laplacian filter to detect edges, and a sharpening filter. The resulting samples of these preprocessing techniques, when passed through our model, were not providing satisfactory performance. We tried different combinations of these techniques and used different kernels, but the results were close to the results we were getting without applying any preprocessing techniques, or in some rare cases, even worse than that. Therefore, we decided not to go through this route [20].

As discussed above, the MRI samples in our dataset had different number of slices for each sample. To fix this, we used interpolation of MRI samples. In simple words, the use of known points to guess unknown points is referred to as interpolation. Interpolation is frequently used in the picture field to change the size of an image. From the points in an old image matrix, the points in a new image matrix are calculated and added [21]. Various interpolation algorithms are employed. Here we have used a simple interpolation method called “zoom” which is included in the SciPy library. We have modified each MRI sample to contain 18 slices. Due to the computational limits of our computing hardware, the maximum number of slices that we included in our study was 18 slices per MRI exam, but this does not take away anything from the results of this study as in our observation: after using more than 3 slices, we were not getting any major differences in the results. Based on this observation, we decided to train our models using 3 and only 1 slice of each MRI sample. The results produced by using only 1 slice were pretty comparable to the results that we achieved by using 3 or 18 slices. We got the zoom values by dividing the required shape, i.e., 18,256,256 (in the case of 18 slices) by the original shape, i.e., x, 256, 256, where x represents the number of slices in the original MRI sample. We created one NumPy file for each type of MRI sample. For example, we created one big file for axial samples, one file for sagittal samples, and one file for coronal samples. This made it easier to pass all data into our model or do any modifications to it.

3.3. Training Model

We trained our models in three different settings:(i)Using 18 slices of each MRI sample(ii)Using 3 slices of each MRI sample(iii)Using only 1 slice of each MRI sample

The rest of the setting was the same for every model. We built our model based on the ResNet50 model and used it as a feature extractor. We used the pretrained ImageNet weights. The input to this base model is of the shape x, 3, 256, 256, 3, 256, 256 where x denotes the number of slices. 3 denotes the number of channels, and 256, 256 denotes the shape of the 2D MRI slice. By passing this through our feature extractor, we get a tensor of shape x 2048, 8, 8. We used a global average pool layer to transform the data into the shape x 2048. We then finally reshaped the data into 2048, x, 1. We did this for both training and validation sets. Therefore, our final training data were of shape 1130, 2048, x, 1 and our final validation data were of shape 120, 2048, x, 1, where 1130 and 120 represent the number of samples in each set, respectively.

We then passed these data into a batch normalization layer and a max pool layer [22]. The output of this max pool layer was passed through a flattened layer and a dense layer with 128 units and activation as ReLu. We then passed the output of this dense layer into another dense layer with 64 units and used a dropout layer with the dropout rate as 0.15. Finally, the output of this dense layer was passed to a final dense layer with units as 1 and activation as sigmoid to get output as 0 or 1, which indicated the prediction of our model [23]. We then compiled our model using Adam as the optimizer and binary cross entropy as loss. The architecture of our ResNet50-based CNN model is shown in Figure 3.

We also employed callbacks by using validation accuracy as a monitoring parameter for model checkpoint and model validation loss for early stopping. We used batch size 20 to train our model and trained our model over 50 epochs. We trained a total of 27 models, 9 using 18 slices for each MRI sample, 9 using 3 slices, and 9 using only 1 slice. To get 18 slices for each MRI sample, we used interpolation. For 3 slices, we got the middle slice and subtracted and added 2 to it to get the other 2 slices. For one slice, we selected the middle slice from each MRI sample. The base models were exactly the same for all these slices [24]. Since the dataset was largely unbalanced, there was some difference between the training and validation accuracies and losses between epochs [25].

4. Results and Discussion

The results of the best models for each setting were quite comparable to each other. We trained the models using 18, 3, and 1 slice. The results of these models were quite surprising. The AUC results using 1 slice per MRI exam were quite comparable to those using 18 and 3 slices and, in some cases, were even better. We also observed that by increasing the number of slices beyond 3, the results were not improving that much and, in some cases, were getting even worse.

After getting the AUC scores, we decided to get the accuracy scores of the models too and a similar trend was found to be present here as well. The best validation accuracy scores for abnormal and meniscus tears were obtained using 3 slices per exam, and for ACL tears, 18 slices per exam gave the best results. The accuracy scores of these models are mentioned in Table 1. The ROC-AUC graphs of the best models for every knee tear in different settings are shown in the following.

The best MRI plane in terms of model performance turned out to be the axial plane. One more thing to note is that the results of the models trained using sagittal planes were obtained close to the models trained using axial plane MRIs. For meniscus and abnormal MRIs, there was not a large difference between models trained using 1 slice and models trained using 18 slices as can be seen in Table 1. The accuracy scores in the table are quite similar to the AUC scores in a way that for meniscus and abnormal samples, models trained using 3 slices perform the best and for ACL samples, models trained with 18 slices perform better. There is also a trend that can be seen from the table and the AUC graphs that meniscus tears are best detected using sagittal plane MRI while ACL and abnormal samples are best detected using axial plane MRIs.

4.1. For Abnormal MRIs

For abnormal MRIs, the best results were achieved when the models were trained using axial plane MRIs. The AUC scores in all three settings were comparable, but the model with 3 slices with an AUC score of 0.90 and an accuracy score of 91.66% performed marginally better than the models with 18 and 1 slices, and it is shown in Figures 46.

4.2. For ACL Tears

In the case of ACL tears, the best results were also achieved when the models were trained using axial plane MRIs. Here, the results of all three settings were quite close, with the models trained using 18 slices performing marginally better than the models trained using 3 slices and 1 slices. The model trained using 18 slices achieved an AUC score of 0.92 compared to 0.87 and 0.89 of the models trained using 3 slices and 1 slice, respectively, and it is shown in Figures 79.

4.3. For Meniscus Tears

For meniscus tears, the best results were achieved when the models were trained using sagittal plane MRIs. Meniscus tears were the only type of knee tears that we tested in our study that were best detected using sagittal plane MRIs. Although the models trained using sagittal plane MRIs were performing the best, the models trained using axial slices were very close or even as good in some cases. Here, the models trained using 3 slices outperformed the models trained using 18 and 1 slices by achieving an AUC score of 0.82 compared to 0.74 and 0.76 of 18- and 1-slice models respectively, and it is shown in Figures 1012.

5. Comparison

A considerable amount of work is already done in the field of MRI analysis using deep learning, and even for the dataset that we considered in our study, some of the good work is already published. However, the others focused on improving the results by employing various techniques like preprocessing and hyperparameter tuning. Even if certain cutting-edge models outperform our model in some circumstances, our model is still far faster and more efficient to train. In industry, the ability and flexibility to efficiently train models are sometimes more important than small performance measures. However, ResNet has shown to be effective in a wide range of areas; one major downside is that larger networks typically need months of training, and finding them is almost impossible in practical use. We employed a rather simple model with ResNet50 as a base, and we focused on getting similar results to the state-of-the-art models by using as fewer slices of each MRI as possible. For 2 out of 3 types of tears (meniscus and abnormal samples), using only three slices for model training proved to be better than using 18 slices. Even though for ACL tears, the 18-slice models performed better and the difference in AUC and accuracy scores was not that large.

One study based on the MRNet dataset discovered that the ROC-AUC value for detecting a meniscus tear was 0.826, and it was 0.956 for detecting an ACL tear and 0.936 for detecting abnormalities [11]. To create a shared disturbance detection model in the coronal, sagittal, and axial planes, the researchers used a logistic regression-based ensemble learning strategy. Even though their model performed slightly better for abnormal and ACL tears, our model is way simpler than theirs and only uses 3 slices per MRI.

In another study, images were first selected based on noise and other parameters that might aid in the detection of diseases. Then, a region of interest was located using CNN and denoising autoencoder, and finally for the diagnosis, a ResNet50 model was used [20]. The best accuracy scores this model was able to achieve for ACL, abnormal, and meniscus tears are 83.19%, 89.92%, and 77.12%, respectively. Our model performed better than this model for all three cases. Table 2 shows the comparison for all our models trained using different settings.

6. Conclusions and Future Work

We explored how to set up a deep learning model to enhance the detection efficiency on a dataset of knee MRI images in this study. We used deep learning on the MRNet dataset and trained a range of ResNet50-based deep learning models to suggest the likelihood of a knee injury on a given knee MRI. We used scans from three planes: axial, coronal, and sagittal, since doctors routinely view MRIs from several perspectives.

Our best performing models were trained using 3 slices for each MRI sample for meniscus and abnormal MRIs. For ACL tears, the model trained using 18 slices provided the best results. The difference however was not large. To get 18 slices for each MRI sample, we used interpolation. For 3 slices, we got the middle slice and subtracted and added 2 to it to get the other 2 slices. For one slice, we selected the middle slice from each MRI sample. The three-slice model performed equally well even when compared to some state-of-the-art models. Even though some state-of-the-art models performed marginally better than our model, in some cases, our model is way faster and more efficient to train. In industry, the capacity and flexibility to quickly train models are sometimes more essential than minor changes in performance. It is also worth noting that the models trained using axial plane MRIs consistently performed better than the models trained using the other two planes, especially for the detection of ACL and abnormal tears. For meniscus tears, models trained using sagittal planes were slightly better than models trained using axial plane MRIs.

Finally, coming to the future scope, ensemble approaches would probably improve the overall performance. Moreover, a multiclassification model can also be developed in the future, which will classify an MRI sample into ACL, abnormal, and meniscus tears. When using MRI images, to diagnose cancer, the location of the tumor site is crucial. Studies have shown that focusing on a specific part of the MRI for detecting tumors can yield better results than focusing on the whole image. The same concept can be employed here too.

Data Availability

The data will be made available on request to the corresponding author.

Conflicts of Interest

The authors declare that they have no conflicts of interest.