Abstract

Accurate selection of embryos with the maximum implementation condition is a necessary step to increase the effectiveness of fertility treatment in in vitro fertilization (IVF). The deep learning algorithms presented high potential for monitoring and visualizing embryo features such as cell numbers and their morphological development in time series manner. Due to the ability of the computer vision and deep learning algorithms, this paper aimed to present a novel deep learning approach to distinguish simultaneous abnormality of embryos in time-lapse systems for detecting live and non-live births in IVF. The approach is composed of local binary convolutional neural network (LBCNN) and long short-term memory (LSTM). The LBCNN improved accuracy of classification by employing deep and local feature sets with lower number of learnable parameters in comparison with a standard convolutional layer. Moreover, LSTM network is employed to analyze temporal information of time-lapse embryos. The results indicate that the proposed approach achieves significant results in ROC analysis (0.98) in 5 days of monitoring compared to state-of-the-art models. In addition, the approach showed compatible results in early diagnosis of abnormality detection (72 hours) with 82.8% accuracy of classification compared to the pretrained well-known convolutional neural network (CNN) models and baseline CNN.

1. Introduction

Computer-aided detection (CAD) is designed to reduce experimental mistakes of physicians interpreting healthcare data. Recently, CAD systems with the help of machine learning approaches are utilized for different types of disease treatment and detection [1]. These kinds of approaches could be categorized into two types of models such as deep [2] and shallow-based models [3]. The shallow-based models consist of optimization algorithms (genetic algorithm, particle swarm optimization, etc.) [4] and shallow machine learning algorithms (support vector machine, linear regression, etc.) [5]. For instance, improved machine learning-based system based on improved adaptive particle swarm optimization algorithm and artificial immune recognition system were designed for wart disease treatment [6]. The main advantages of these approaches are remarkable results with less training data with fewer processing cost. Commonly, the deep learning approach-based frameworks for CAD systems are utilized as pretrained well-known models in case of deep feature learning and extractions [7]. The main reason for utilizing transfer learning is reduction of processing cost and enhanced efficiency during training new models. For example, the authors in [8] proposed deep and local convolutional neural network for brain anomaly detection. Furthermore, CAD systems with the help of deep learning algorithms have grown significantly because of high accuracy rate [9]. In this study, anomaly detection based on deep learning algorithms is utilized for CAD systems. Anomaly detection (also referred to as outlier detection) commonly implies the detection of rare issues, events, or observations that deviate considerably from most data and do not adapt to a well-defined concept of normal performance. In the context of CAD systems, anomaly detection is such that the process can be used to alert physicians of abnormal physiological data that could indicate health complications [10, 11].

Due to the effects of CAD systems on healthcare, we presented anomaly detection with the help of deep learning approach for in vitro fertilization (IVF) [12]. Nowadays, the world’s population suffers from many infertility problems that prevent normal reproduction. One of the most widely used technologies for infertility treatment is IVF by the collecting multiple follicles for the fertilization and in vitro culture [13, 14]. This technique is a useful and non-invasive method for reproduction since it allows to evaluate the fetus without harm. Moreover, the time-lapse videos and photos of fetuses are employed for fetal growth at different time lapse using brief time intervals, and they can be used as an advanced technology to record the growth of the embryos in IVF. They are also widely applied in various cases, for example, the supervision of reproduction by medical centers during the embryonic development [15]. The embryo quality (cleavage embryo scoring) is categorized into four main stages. Stage one embryos have four blastomeres on day two and eight blastomeres on day three. Their blastomeres are of equal size, large, round, and with clear cytoplasm and no fragments. Although the amount and shape of blastomeres are like stage one embryo, stage two embryos might have 10% fragments and have irregular blastomeres. In stage three embryos, the amount of blastomeres is fewer than normal, and the fragmentation ratio is 20% or above. The amount of uneven blastomeres improves in stage one and two embryos. In stage four embryos, the structure and volume of the blastomeres are distinct from each other, and the disintegration ratio is above 50%, as presented in Table 1 [16].

Gardner’s classification criteria were used to score the embryos. In Gardner’s classification, each embryo is scored based on blastocoel size, internal cell mass, and trophectoderm structure as presented in Table 2 [17].

For these reasons, the main objective of this study is to present anomaly detection with novel deep learning approach based on time-lapse device video data. In addition, the accuracy of implantation potential has been detected by morphological analysis of human embryos in the early stages of development (stage 1). In addition, the anomaly detection in this scenario means that blastocyst structure grade is not in A, B, and C grades based on Gardner’s classification [18].

In the past, the common strategy for selecting quality embryos was mainly by examining the number of cells, the degree of fragmentation, and the number of nuclei in the incision stage, while poor quality embryos (based on morphology) were discarded [19]. Today, with the improvement of laboratory culture conditions and further development of physiological culture media, long cultures up to the blastocyst stage are performed. In humans, blastocyst formation begins about 5 days after fertilization, when a fluid-filled cavity occurs in the morula, which is the early embryonic stage of a 16-cell embryo. It is essential to select high-quality embryos for IVF [20]. Scoring systems for morphological evaluation of embryos have been developed to increase the birth rate. However, these methods are not sufficient to predict live birth, as there is no obvious link between morphology and chromosomal aneuploidy [21].

Recently, various applications with deep learning and computer vision methods have been proposed to enable the automation of embryo assessments for IVF analysis. For instance, in [22], a deep learning approach was presented based on extracting hierarchical features from input data instead of rule-based image processing programming. In another similar study [23], Google’s Inception model was applied for time-lapse images of blastocyst selection in in vitro fertilization. The study of [14] proposed the multitask deep learning with dynamic programming approach in the classification of the development of embryos based on time-lapse images. Since stereoscopic cells have potential to overlap at different sizes in the time-lapse imaging, even for an experienced embryologist, it is difficult to count the number of cells in a single image. Therefore, some studies focused on early periods of embryo development. For instance, a set of zygotes was analyzed and significant results in blastocyst stage classification were obtained on the second day after fertilization [24]. In case of recognition of ploidy status, two-stream inflated 3D ConvNet [25] proposed for classifying time-lapse videos. Another important field of study in deep learning-based embryo monitoring systems was the automatic grading of blastocysts. In the study of [26], a convolutional neural network was proposed to detect inner cell mass and trophectoderm grades from each image. The recurrent neural network was utilized on the top of the network to classify blastocyst temporal information from video. Another time series-based embryo classification study [27] suggested the two-classifier vote-based method using a convolutional neural network (CNN). In this study, the number of cells of embryo was detected and classified. Another similar study [19] presented non-invasive classification of embryos with the help of attention branch network (ABN). Another study examined the automatic framework, namely, Cell-Net [12]. This CNN-based approach includes residual incremental atrous pyramid for counting and centroid localization of embryonic cells. Another study, a computer-automated based time-lapse with image analysis approach presented in [28]. In the case of a smaller number of training data and improving the accuracy of classification, [29] presented a novel approach, namely HEMIGEN. This approach employed a Generative Adversarial Network (GAN) for the production of one-, two-, and four-cell time images and classification with a deep neural network (DNN).

According to these studies, it is clear that most of the research in the field of embryo monitoring systems has been faced with two main challenges: detection abnormalities and qualitative classification. The first challenge is to extract robust and advanced features for identifying one-, two-, and four-cell embryonic stages for training deep learning models. Another challenge is the time series training model with recurrent neural networks to evaluate the cellular phase divided at specific times of embryo development. Therefore, we aimed to conduct a novel deep learning approach for time series abnormality embryo detection. This approach included local and deep extracted features from each input frames of time-lapse embryoscope device. Due to fusion of deep and local features in presented deep learning recurrent neural network approach, the accuracy of classification improved along with detection of abnormality in early stages. To sum up, the main contributions of this paper are as follows:(1)This study proposed a novel deep learning framework consisting of local and deep features in time series manner for abnormality detection of embryo time lapse, namely, LBCN-LSTM.(2)The proposed LBCN-LSTM demonstrated significant results in the accuracy rate and receiver operating characteristic (ROC) curve analysis compared to experimented baseline CNN-LSTM and pretrained models.(3)This proposed approach achieved compatible results in early diagnosis of anomalies in embryo monitoring. Furthermore, the present technique is unique and has much more advantages compared to the traditional methods in terms of training cost.(4)We presented a public access embryo monitoring database.

The rest of this paper is categorized as follows. Section 2 presents methodology. Section 3 presents experimental results, and finally the last section gives the conclusion.

2. Methodology

Figure 1 illustrates the summary of the approach which is the automated system for detecting abnormalities in embryo monitoring. The time-lapse image sequence in time series manner is fed into the proposed deep learning framework. For each input image, deep and local features are considered as extracted contribution of the LBCNN [30]. Furthermore, last layer of this CNN approach contains 2D global average pooling layer connected to the fully connected layer as input of LSTM network to leverage temporal information. Finally, the last layer of the LSTM network contains SoftMax function with one node hidden layer to detect anomaly in embryo time-lapse monitoring with outputs. In the following paragraphs, the input images, the LBCNN, and the LSTM are described in more detail.

2.1. LBCNN

In this study, we applied the local binary convolutional neural network (LBCNN) in the deep learning approach to decrease the computational complexity of CNNs with better classification accuracy. The LBCNN is inspired by open-source codes available in the following link: https://github.com/whoisraibolt/LBCNN. The LBC layer consists of fixed sparse binary filters, an activation function, and a set of trainable weights which reduces with optimizing algorithm. Local binary convolutional neural network (LBCNN) consists of LBC layers. Each local binary pattern (LBP) [31] extracted 8 resulting sums of all the bit maps which achieves the same results of eight 3  3 convolutional filters followed by simple binarization. The predefined weight vector includes []. In this case, LBP can be reformulated as follows:

In this formula, is defined as vectorization of the input images, represents sparse convolutional filters, is the non-linear binarization (Heaviside step function) operator, and is the result of LBP image. In this case, LBCNN includes fixed convolutional filters and input image is filtered by LBC, which produces variance feature maps (bit maps). The Heaviside step function is employed for backpropagation in the LBC layer with sigmoid or ReLU differentiable activation function. Each LBC layer feeds feature map as input () to the next layer, and generalized multichannel input is presented as follows:where and are output and input channels, respectively. The last step in calculating the total weight of the activations can be implemented through a convolution operation with size filters. Consequently, each LBC layer contains two blocks of convolutional layers with fixed and non-learnable weights continually. The architecture of LBCN layer is presented in Figure 2.

As presented in Figure 3, the proposed model includes residual LBCN blocks with different hidden layer sizes. The residual LBCN block consists of two LBCNs with batch normalization and three Conv2D layers. Furthermore, the first LBCN layer is fused to last Conv2D layer for each block.

Moreover, we utilized different hidden unit sizes for the LBCN blocks. The efficient and high accuracy model is selected with four layers of LBCN with hidden unit size [512, 256, 128, 64] architecture, named as LBCNN-4L, according to the experimental results. The architecture of the LBCNN-4L is presented in Table 3.

2.2. LSTM Networks

The recurrent neural network (RNN) [32] model has been widely employed for sequential data analysis in machine learning. Nevertheless, RNN has limitations in terms of learning long-term dependencies due to the vanishing of gradients during several backpropagation processes. The LSTM [33] network has been developed to reduce the weaknesses of RNN models for long periods of time. The LSTM is an RNN-enhanced version that can process long-term consecutive data with a low gradient vanishing rate compared to other algorithms. The LSTM algorithm with long-term memory can predict multivariate time series data with high accuracy. The LSTM block structure can model time series predictions such as long-term dependencies. Therefore, this paper employed LSTM networks in case of predicting the data in time series.

3. Experimental Results

Table 4 indicates the model configuration for the proposed approach. Moreover, experimental analysis was conducted based on default input image size of well-known pretrained networks such as VGG16 [34], Resnet-50 [35], Inception V3 [36], MobileNet V2 [37], and Xception [38] with training configuration. In addition, for fair comparison of these models, we set up optimization algorithm (Adam), activation function (ReLU), momentum (0.9), weight decay (1e − 6), mini-batch size (16), and epochs (1000). Furthermore, we utilized Adam optimization algorithm due to first-order gradient of stochastic objective functions because of adaptive values of lower-order moments. In this case, Adam has advantage for problems that are in conditions of large parameters [39]. Furthermore, the input window size of LBCNN and baseline CNN is set up (1024, ). In case of learning rate due to fine-tuning of the pretrained models, the rate is lower than that of LBCNN and baseline CNN, respectively, with 0.0001 and 0.01.

3.1. Embryo Database

The analysis data were obtained from Vitrolife Embryoscope device in Istanbul Aydin University. This database contains eight non-healthy embryos and twelve healthy embryos with 102 hours of monitoring. One example of healthy and non-healthy embryo is presented in Figure 4. In this figure, healthy and non-healthy videos in specific times such as 2, 20, 50, and 90 hours are presented. These embryo data were labeled by Dr. Esra Sen. The experimented datasets used to support the findings of this study are available from the corresponding author upon request.

3.2. Architecture Analysis Details

To test the effect of the input image window size on the classification accuracy for the LBCNN model, we utilized different window sizes including (128 × 128), (256 × 256), (512 × 512), and (1024 × 1024) and compared the model results. As presented in Figure 5, the test images of each video of the blastocyst are extracted and the model is trained with SoftMax classification layer with a single node for the anomaly detection.

The blastocyst image dataset is split into three sets: train, validation, and test sets, with 70%, 20%, and 10% ratio. The results showed that the input size of image with 1024 ×  in the LBCNN achieved a highest accuracy among the other input image sizes for baseline CNN such as (512  512) and (256  256). In addition, the result of Table 5 shows that the LBP features with convolutional layers increased the accuracy of classification compared to baseline convolutional layers. The architecture of LBCNN and baseline CNN is based on four layers with 3  3 kernel size convolutional layer with nodes 512, 256, 128, and 64.

After the testing the input size of the image, we analyzed the effects of the deep network on the accuracy of the abnormality detection. In this experiment, we designed three different models with four different node sizes for each LBCN block. The three experimented LBCNN models are named as LBCNN-4L, LBCNN-5L, and LBCNN-6L with Conv layer ( window size). The input block for all experimental models is () with the input block containing two layers of LBC with batch normalization and Conv 2D continually. Besides, we utilized hidden unit sizes [512, 256, 128, 64], [512, 256, 128, 64, 32], and [512, 256, 128, 64, 32, 16] continually for LBCN blocks of LBCNN-4L, LBCNN-5L, and LBCNN-6L models, respectively.

Contrary to the highest accuracy in the LBCNN-4L and LBCNN-5L, the lowest accuracy was obtained in the LBCNN-6L. The main reason for the inefficiency of the LBCNN-6L model is the lack of training data. According to the experimental results of the studied articles [110], it is clear that a large number of training data have great impact on the performance of large structure of convolutional neural networks. Besides, the results showed that similar results were obtained from the LBCNN-4L and LBCNN-5L. To reduce the processing cost, we selected the LBCNN-4L. Therefore, throughout the work, the architecture of the LBCNN is selected as 4 layers with (). The architecture of the LBCNN-4L is presented in Figure 4, and the details are presented in Table 6.

3.3. Embryo Anomaly Detection

After experimental results by the blastocyst anomaly detection, we examined time series-based embryo anomaly detection with the LSTM neural network. To study the effects of LBP features in proposed approach, we compared LBCNN + LSTM with baseline CNN + LSTM considering the training cost and accuracy. For the appropriate comparison between two models, we applied the same input image windows size, layer, and size of hidden nodes for the baseline CNN. Normal convolutional layer with same size of nodes is employed rather than the baseline model of the LBCN. The architecture of baseline CNN is presented in Table 7. Furthermore, we applied the same configuration of LSTM for both systems for the appropriate comparison. This configuration is obtained with the help of the grid search methodology by the highest accuracy. In this paper, the LSTM sequential model is designed to analyze the time series anomaly detection of the developing time-lapse embryo monitoring. A linear stack of layers is utilized in two layers with return sequences. The first layer of the LSTM layer contains 600 memory units and it returns sequences. The second layer includes 600 memory units. After each LSTM layer, the dropout layer is applied. Finally, the last layer has a fully connected layer with a SoftMax activation function with one node for detecting anomaly.

ROC curve analysis is used to show the connection between two possible approaches between sensitivity and specificity. The ROC curve results showed that LBCNN has better area under the ROC curve (AUC) than baseline CNN. As shown in Figure 6, the LBCNN and CNN succeeded at 0.985 and 0.989, respectively.

To examine the training cost, we conducted the test between LBCNN and CNN in Figure 7. Both systems are analyzed in specific number of epochs and compared with accuracy of classification measurement. LBCNN and baseline CNN are trained with different values of learning epochs, and their accuracies of training and testing for each size of epochs are presented in Table 3. In this case, we selected different number of epochs such as 10, 25, 50, 75, and 100 for training of LBCNN and baseline CNN. The lowest accuracy rate in baseline CNN is achieved in 10 epochs with 32%; in contrast, the LBCNN model attained 37% of accuracy rates. Furthermore, both experimented models achieved same accuracy rates with 55%. In addition, in 100 epochs, LBCNN compared with CNN model improved accuracy rates with 8%. In conclusion, these results show that the proposed method ensured better accuracy in a smaller number of training epochs because of few number of trainable parameters compared with the baseline CNN.

Other reason for this advantage can be explained by extracting robust and enhanced features from entire video for classification. For better visualization of extracted features, we employed t-distributed stochastic neighbor embedding (t-SNE). Figure 8 shows that LBCNN extracted more enhanced features and the anomaly embryo t-SNE features were clearly separated from normal ones compared to baseline CNN. In this figure, the healthy and non-healthy features are presented in blue and red points (Figure 8).

3.4. Comparison of Time Series-Based Extracted Deep and Local Features

In the second step, we examine different well-known pretrained models as a deep feature extractor with the LSTM for the video classification in Table 8. In addition, these results are compared with the LBCNN as input for time series classification model (LSTM). The embryo video dataset is split into three sets: train, validation, and test sets, with the ratio of 70%, 20%, and 10%. The results indicated that the best experimental result is obtained by Resnet-50 + LSTM with the values 0.97, 0.98, 0.93, and 0.98. In addition, the second highest accuracy is obtained by LBCNN + LSTM with the values 0.96, 0.95, 0.96, and 0.96, respectively, for accuracy, precision, recall, and F1-score, respectively. Furthermore, the lowest accuracy was achieved by the LeNet 5 + LSTM with 0.76. Based on these experimental results, it can be concluded that the LBCNN can extract robust features and achieve significant accuracy with few number of learnable parameters compared to VGG16, Inception V3, and Xception models.

3.5. Early Diagnosis of Embryo Anomaly

To test the early diagnosis of embryo anomaly, we conducted a test by the different time sections, and we selected five different time-lapse videos, namely, 12 h, 24 h, 48 h, 60 h, and 72 h, as presented in Figure 9 by the standard morphology of embryo. This figure separately presented each class of the data (train and test). For instance, the class of 12 h contains the video frames from 0.0 h up to 12.0 h. Similarly, the 24 h class contains the video frames from 0.0 h up to 24.00 h.

As shown in Table 9, the LBCNN + LSTM can achieve the highest accuracy in the 72 h class compared to well-known pretrained models. In this class, Resnet-50, VGG16, and Inception V3 models with LSTM achieved 80.3, 78.9, and 79.7% accuracy of classification, respectively. The finding also indicated that the proposed method achieved successful results compared to the well-known pretrained models like VGG16, LeNet 5, Resnet-50, Inception V3, MobileNet V2, and Xception. The main reasons of improvement of embryo early detection in the proposed system can be explained by the employing deep and local combined features for the classification.

3.6. Discussion

We also evaluated the different methods and compared them with this study by the quality analysis and the abnormality detection of embryo in video or image analysis (Table 10). It can be concluded that the quality analysis with single blastocyst image by Khosravi et al. [23] had better accuracy compared to CNN-LSTM with video analysis technique by Kragh et al. [26]. Moreover, the LBCNN-LSTM approach achieved better results in abnormality detection with AUC = 0.98 compared to study of Tran et al. [22], Lee et al. [25], and Sawada et al. [19] with the results of AUC curve analysis at 0.93, 0.74, and 0.93, respectively. In addition, among the proposed approaches by the abnormality detection, the results of the LSTM with attention map [19] have still significant result. In this case, we can conclude that the blastocyst morphology analysis has more effects than time series analysis on the abnormality detection. Furthermore, Payá et al. [40] proposed a supervised contrastive learning framework for grading and anomaly detection of embryos which achieved 0.94 AUC in abnormality detection. Nevertheless, this comparison is superficial and unreliable because the researchers used different databases in terms of size and number of items.

4. Conclusion

This study presented the approach by fully automated deep learning to analyze the blastocyst morphology in case of anomaly detection from time-lapse imaging of human embryos. In this paper, we presented a novel deep learning approach, namely, LBCN-LSTM. This approach achieved significant results in case of accuracy of classification and ROC curve analysis compered to existing well-known pretrained models and state-of-the-art algorithms. The main advantage of this model is utilizing deep and local features in end-to-end manner with employing fewer number of trainable parameters compared to baseline CNN. In addition, this model can detect embryo anomaly based on Gardner’s classification in early stage (stage 1) based on blastocyst stage table with higher accuracy rate compared to existing models. The results showed that the proposed LBCNN-LSTM model can be an efficient model for the real-life application regarding the accuracy of the diagnosis, process cost, and early detection of the abnormality of the human embryo in time-lapse incubator.

Data Availability

The human embryos dataset used to support the findings of this study have been deposited in the Sajad EINY repository ([email protected]). This dataset is available under certain terms and conditions upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.