Abstract

Neoadjuvant chemoradiotherapy (nCRT) followed by total mesorectal excision is the standard treatment for locally advanced rectal cancer (LARC). A noninvasive preoperative prediction method should greatly assist in the evaluation of response to nCRT and for the development of a personalized strategy for patients with LARC. Assessment of nCRT relies on imaging and radiomics can extract valuable quantitative data from medical images. In this review, we examined the status of radiomic application for assessing response to nCRT in patients with LARC and indicated a potential direction for future research.

1. Introduction

Colorectal cancer is the third most common cancer in men and the second most common cancer in women, and it is expected to increase by 60% in 2030 [1], with rapid rising morbidity and mortality rates in many low- and middle-income countries [1]. About 30% of patients with colorectal cancer have rectal cancer [2]. Locally advanced rectal cancer (LARC) is defined as rectal cancer with clinical tumor stage 3-4 (cT3-cT4, tumor invading through the muscularis propria) or positive clinical nodal stage (cN+, malignant lymph nodes detected) [3]. Neoadjuvant chemoradiotherapy (nCRT) followed by total mesorectal excision (TME) is the standard treatment for patients with LARC [4], with the goal to eradicate micrometastatic diseases and to improve survival. For patients with LARC, the responses to nCRT vary widely, from a pathological complete response (pCR) to almost no response or tumor progression in a small group of patients [5, 6]. Approximately, 20% of patients will achieve pCR after surgery [5], which is defined as a complete response without residual tumors on the histological report after standard excision [7]. Currently, the only accurate way to confirm pCR is the pathological diagnosis after TME [8]. Thus, a feasible and accurate preoperative noninvasive prediction method will be helpful for assessing the efficacy of nCRT and for developing a personalized treatment plan [9]. Moreover, patients with favorable prediction results can be treated with an organ sparing therapy to ensure quality of life [10] and presurgical prediction could aid in the selection of the best therapies [11].

Magnetic resonance imaging (MRI) and computerized tomography (CT) are the most common imaging modalities for patients with LARC. However, the traditional imaging characteristics discernible with human eye such as tumor size, location, enhancing characteristics, etc., could not effectively predict the treatment response to nCRT. Advanced functional imaging methods such as diffusion-weighted imaging [12] have been shown to improve response assessment but could be costly and time-consuming. Therefore, a noninvasive approach for predicting the response to nCRT based on the initial pretreatment MR or CT images could potentially assist clinical management.

Radiomics has emerged as a promising tool for assessing imaging biomarkers to treatment response and it is defined as a method of extracting image features from medical images with high-throughput [13]. Radiomics involves data acquisition and preprocessing, tumor segmentation, feature extraction, and modeling. Compared to the traditional visual methods for qualitative imaging features that are discernible to human eye, radiomics uses novel computational techniques to mine the quantitative features contained in medical images. Radiomics provides convenient, repeatable, and objective information that can more effectively assist in clinical decision making.

Over the past decade, there has been an increasing number of radiomic studies on rectal cancer; approximately, 60% of the relevant radiomic literature on LARC focused on the prediction of treatment response and long-term prognosis after preoperative nCRT [11]. These studies addressed various aspects of rectal cancer, including pCR after nCRT, decrease in staging after nCRT [14], lateral lymph node metastasis [15, 16], and extravascular invasion [17]. Radiomic parameters, such as skewness, entropy, kurtosis, and evenness, have been used to assess and quantify intratumoral heterogeneity [18], which could potentially make up for the lack of spatial heterogeneity in the TNM staging system [19]. Previous radiomic literature on rectal cancer has identified various features and predictors for tumor response in divergent studies that differed in study design (multicenter versus single center, retrospective versus prospective), image segmentation method (manual versus automatic segmentation), imaging modality (CT, MRI, or PET), and predictive modeling (machine learning versus deep learning methods) [3, 9, 2024]. Nevertheless, there is still no consensus among the researchers regarding the optimal application of radiomics to predict treatment response for LARC.

This review comprehensively assessed the body of literature on radiomics and its application in rectal cancer regarding treatment response to nCRT. Radiomic analysis such as data acquisition, tumor segmentation, feature extraction, feature selection, and predictive modeling was also reviewed. Through the literature review, we identified potential areas for future research. The studies included in this review are presented in Table 1 [3, 9, 12, 1418, 2076].

2. Radiomic Methods

Radiomics can be divided into classic radiomics and deep learning-based radiomics depending on if a deep learning technique is used. Classic radiomics involves the following steps: image acquisition and preprocessing, tumor segmentation, feature extraction, feature selection, modeling, and validation (Figure 1) [24]. Deep learning-based radiomics performs learning procedures via convolutional operations such as the convolutional neural network (CNN) approach (Figure 2) [75].

2.1. Classic Radiomics

Data acquisition is the first step in classical radiomic methods. Data used in radiomic studies may be retrospective or prospective from a single-center, or a multicenter setting. The image modality can be CT, MRI, and PET. The specific choice of modality needs to be decided according to the research objective. The investigator should first identify the clinical problem to be addressed and should be aware that imaging protocols may not always be standardized with variability between institutions. In this regard, the recommendations of the Image Biomarker Standardization Initiative may help to reduce the variability of image preprocessing before analysis [76].

To achieve repeatability and generalization, several preprocessing steps after image acquisition are necessary, which typically include the following: intensity normalization, spatial smoothing, spatial resampling, noise reduction, and MR field nonuniformity correction [77].

Tumor segmentation is another basic step in radiomics, where researchers typically analyze the entire primary tumor and select the region of interest (ROI) corresponding to one slide of the image or a volume of interest (VOI) indicating the volume of a specific area. Image segmentation of the target ROI or VOI can be done manually, automatically, or semiautomatically. Manual segmentation is more accurate in some cases, but less repeatable. Automatic segmentation depends on the algorithm, which is efficient and helpful to eliminate subjective errors. However, there is a lack of accurate automatic segmentation algorithm so far, and its application is limited. Currently, several steps have been used to improve radiomic performance, including involvement of different medical professionals, adaptation of consistent methods in tumor segmentation, and standardization of imaging features [78].

Based on ROI or VOI, radiomic features are extracted from the images. These quantitative imaging features are important characteristics in radiomics because they bridge medical images and the clinical endpoint. These intrinsically valuable features can be extracted directly from the initial medical image, or by transformation or filtering. This process can be performed using different open-source tools such as PyRadiomics, TexRAD, and MaZda, and the main method is based on the study published by Aerts et al. [79]. Different types of quantitative features can be extracted from medical images; these features are mathematically defined differently and features are usually divided into the following subgroups. Shape features represent geometric relations and properties of the segmented ROI or VOI, such as the maximum diameter, maximum surface area, volume, compacity, or sphericity [80]. First-order statistical features or histogram-based features use the image intensity distribution represented by histograms that characterize the distribution of individual pixel or voxel-intensity values within the segmented ROI or VOI. Second-order statistical features or textural features quantify the intratumoral heterogeneity. Higher-order features are usually statistical features computed on matrices that consider relationships between three or more pixels. In addition, wavelet features and model-based features are also used in radiomic studies [53, 71].

In radiomic analysis, feature selection is a necessary step to obtain features closely related to target results. Hundreds or thousands of features are often extracted, a large proportion of features may not be useful for the task and unstable features should be excluded to preserve the most important features and prevent overfitting. Therefore, feature selection is a critical step in radiomics. The commonly used feature selection methods in radiomics are divided into three broad categories: filters, wrappers, and inserts [81]. Minimum absolute shrinkage, selection operator regression, minimum redundancy, and maximum correlation are commonly used algorithms for feature selection.

After feature selection, it is necessary to establish a prediction model, which usually includes biological, imaging, and clinical feature parameters. Machine learning provides several modeling methods. The most used methods in radiomics are linear and logistic regression, decision trees (e.g., random forests), support vector machines, neural networks, and Cox proportional risk models. Each modeling approach has its limitations. In logistic regression, the Bayesian networks and deep learning, feature independence, feature discretization and network configuration dependence should be considered, respectively. In building the model, researchers can use different software tools such as the R-language and the SPSS modeler [82].

The models can be validated internally and externally. In addition to validation of the model, quality assessment should be performed to ensure reproducibility of the study. A model may be used potentially for clinical decision making only after a standardized assessment of its performance has been completed.

2.2. Deep Learning-Based Radiomics

Deep learning is a deep neural network architecture based on broad spectrum algorithms that allow machine learning of highly complex mathematical models for data representation and for performing accurate data analysis. Manual and semiautomatic methods are time-consuming and difficult to implement in clinical practice with a high degree of intraobserver and interobserver variability [83]. In addition, feature selection using the filter mentioned above is also time-consuming and laborious. On the other hand, deep learning can help to solve some of the issues associated with classic radiomics. Deep learning methods often rely on information about outcomes to select their features. In contrary to the classic radiomics, deep learning-based radiomics skips the steps of image segmentation and feature extraction. Instead, it uses the entire non-segmented image to extract and select high-dimensional features through the automatic neural network and to identify the inherent information contained in the images without manual segmentation [84]. The following three types of deep learning models are commonly used for medical imaging: convolutional neural networks (CNNs), generative adversarial networks, and sparse autoencoders. Deep learning-based radiomics performs the learning process through the convolutional operation and the CNN structure [82]. Compared with traditional radiomics, convolution operation has a stronger feature extraction ability. In deep learning models, deep learning features are usually extracted from convolutional layers. By changing the convolution kernel and modifying the structure, the neural network structure can flexibly extract different task-related features, thus making the method more targeted. Each hidden layer module in the network transforms the representation at a level. For example, the first level may represent edges in an image oriented in a particular direction, the second may detect motifs in the observed edges, and the third could recognize objects from ensembles of motifs [85].

Similar to the classical machine learning methods, deep learning also has supervised, unsupervised, and semisupervised methods. Supervised deep learning methods include CNN and recurrent neural networks (RNN), which use their internal memory to process sequential inputs and take previous outputs as the input. Through learning, these methods could assess which data in the sequence is important and should be kept or discarded. Unsupervised learning algorithms include deep auto encoders (AE) and restricted Boltzmann machines (RBM) [75].

Deep learning-based radiomics also has its limitations. The main issue is the need for large datasets to train the model because feature selection depends on training data rather than hand-crafted radiomics. Another issue is the lack of interpretability. Artificial neural networks build complex computational functions that can be challenging to interpret as the generated features are not easily explained by tumor characteristics. A comparison between classical radiomic methods and deep learning-based radiomic methods is summarized in Table 2.

3. Radiomics for Predicting Response to nCRT in Patients with LARC

3.1. Image Acquisition

In this review, studies on patients with biopsy-proven non-mucinous LARC were included. Studies with poor image quality, incomplete tumor coverage on imaging, rectal perforation, or mucinous tumors were excluded.

Most radiomic studies on LARC used MRI and CT images and few used PET/CT images. MRI is commonly used for imaging rectal cancer because it has the advantages of no radiation and high soft tissue resolution, which can clearly identify the rectal wall [86]. In the MRI radiomic studies, T2-weighted imaging has been used as a morphological parameter. Other multiparametric MRI such as a combination of diffusion-weighted imaging (DWI), T2-weighted imaging, and dynamic contrast enhancement imaging have also been used. For staging, MRI provides tissue details about the tumor location, extension, and relationship to surrounding tissues to establish markers for subsequent treatment. In addition, it reveals prognostic information such as mesenteric fat involvement, vascular invasion, and distance to the anal sphincter complex [87].

CT imaging is capable of predicting lymph node metastasis [88], but a few studies have indicated otherwise [1518, 29, 59, 60, 64, 71, 89]. The CT radiomic model built by Jiazhou Wang et al. improved the prediction ability of overall survival to 0.730 from 0.672 with only clinical characteristics in patients with LARC treated with nCRT [71]. Hamerla et al. showed that random forest classification added no value to radiological data obtained from non-contrast CT scans in patients with rectal cancer [64]. Vandendorpe et al. predicted the clinical response to nCRT using contrast-enhanced CT and texture analysis, reaching an area under the curve (AUC) of 0.70 [14].

Six studies focused on PET/CT-based radiomics to predict the treatment response [27, 31, 33, 48, 58, 70]. 18-F [FDG]-PET/CT was used in radiomic studies where texture analysis was performed. Giannini et al. reported that their logistic regression model could predict the complete response with an AUC of 0.84, with higher gray-level co-occurrence matrix (GLCM) contrast and lower GLCM homogeneity [27]. Shen et al. developed a random forest model based on 18-F [FDG]-PET/CT to predict pCR after chemoradiotherapy in rectal cancer [48]. Martin-Gonzalez et al. assessed tumor heterogeneity in 18-F-FDG PET [33, 90].

3.2. Tumor Segmentation, Feature Extraction, and Selection

In the literature presented in Table 1 [3, 9, 12, 1418, 2076], most researchers used manual segmentation, in which two experienced imagers were generally responsible for drawing tumor ROI. A few used automatic tumor segmentations. Jin et al. proposed a multitask deep learning approach to predict treatment response and tested a model in a multi-institutional cohort of patients with rectal cancer. The deep neural networks were performed on two different but related tasks simultaneously, namely, tumor segmentation and response prediction. The tumor segmentation of the proposed network was consistent with the expert description and the results were similar to the specialized deep neural networks trained with a single task. The AUC values from internal and external validation cohorts for predicting treatment response were 0.95 and 0.92, respectively [21]. Leng et al. developed endorectal co-registered photoacoustic microscopy (PAM) and ultrasonography system paired with a CNN to assess the rectal cancer treatment response, which enabled automatic ROI selection [22]. Pang et al. introduced a deep learning model for ROI characterization. A novel two-stage model, called two-stage rectal perception U-NET (TSRAU-NET), was proposed to replace manual assessment. Their results with AUC values of 0.829 and 0.815 from the internal and external validation sets validated the feasibility and stability of their method for pCR prediction [9].

Regarding tumor segmentation, it has been suggested that intraclass correlation coefficients (ICC) should be used to assess inter-reader and intrareader consistency [91]. In addition, given the possibility of subjective bias, segmentation results may be inconsistent, which may be mitigated by providing more training to imagers, or performing multiple segmentations [78].

Extracted radiomic features include features for intensity, shape and size, texture, and for wavelet and Gabor filters [28]. Zwanenburg et al. standardized 174 radiomic features to enable verification and calibration of different radiomic software [76]. Their dataset consisted of features commonly used to quantify morphologic characteristics, first-order statistical aspects, and spatial relationships between voxels (texture) in three-dimensional images of the regions of interest (ROI). The commonly used platform is py-Radiomics, a flexible open-source platform that extracts a large number of engineering features from medical images, which enables standardization of feature definition [37].

Delta-radiomics is another radiomic method that extracts features from a time series of images to reflect the time variation of radiomic features [92]. For instance, this method has shown an improvement over the radiomics that focus on a single time point for assessing overall survival in patients with recurrent glioblastoma [92]. A recent study showed that the T2-weighted imaging-based delta-radiomics improved the early response assessment in patients with soft tissue sarcomas [93]. A study by Davide et al. identified two delta-radiomics features including the change in minimum length of principal component analysis (△Least) and gray inhomogeneity calculated by a run matrix (△GLNU) as the promising predictors of clinical complete response to nCRT in patients with rectal cancer [53].

Most radiomics studies have used a filter for coarse radiomic feature selection [91]. Filter methods can be generally divided into two types as follows: univariate methods and multivariate methods. Univariate filters rank features using the Chi-square test or the Mann–Whitney U test. A multivariate filter consists of a collator and a subset selector. Another feature selection method commonly used in radiomics is the least absolute shrinkage and selection operator (LASSO) method; which is a linear regression contraction and selection method proposed by Tibshirani [94]. In the study by Yi et al., MaZda software was used for the first time to generate a total of 340 quantitative features, and the LASSO method was then used to select the most useful predictive features from the original dataset. Radiomic score (RAD-Score) was calculated for each patient, weighted by their respective coefficients as linear combination of selection features [95]. Several studies used a combination of methods for feature selection. For instance, a study by Pang et al. adopted two feature selection methods in their study, i.e., first calculating Harrell’s Concor index (C-index) between the feature and the pCR status to evaluate the discriminating ability of each single feature and then using the LASSO method to further select the remaining features [9].

In almost all the studies included in this review, manual segmentation was performed, which was laborious and time-consuming. Various feature reduction methods were used to ensure the number of features being reasonable compared to the number of enrolled patients, thereby reducing overfitting or type I errors [11].

3.3. Modeling and Validation

Radiomics aims to construct predictive models for clinical outcomes. In machine learning, several algorithms can be used to generate predictive models. Validation is an integral part of a complete radiomic analysis. There is no doubt that independent external validated models are more reliable than internally validated models because the results of data obtained independently are generally more reliable. The receiver operating characteristic curve (ROC curve), sensitivity, and specificity of the model can be used to measure the performance of the radiomic model.

There are usually two datasets in radiomic analysis, i.e., the training dataset (for training the model) and the validation dataset (for evaluating the model performance). The validation sets can come from external or internal sources although few studies used external validation data due to unavailability. Most studies were retrospective and had a small sample size without external validation. Nevertheless, it should be stressed here that radiomic studies with independent external validation are more reliable than studies with only internal validation. Radiomic results from externally validated studies are generally more robust and more applicable to clinical practice.

3.4. Deep Learning-Based Radiomics

A few studies have used deep learning to predict treatment response in patients with rectal cancer [3, 9, 2024]. Trebeski et al. constructed a deep learning model to segment rectal tumors by fusing diffusion-weighted imaging (DWI) and T2-weighted imaging (T2WI), and obtained dice similarity coefficient (DSC) values of 0.70 and 0.68 [96]. Zhu et al. proposed a deep learning model for automatic segmentation of rectal tumors on DWI images, constructing a 3D volume U-net to characterize the spatial features in all three directions, unlike the previous DWI and T2WI fusion model. This model was designed to perform segmentation using DWI data alone to avoid potential registration errors [97]. Leng et al. developed an imaging system consisting of an intrarectal registration photoacoustic (PA) microscope (PAM) that was paired with a convolutional neural network (CNN), which showed high diagnostic performance in assessing the treatment response with potential for optimizing posttreatment management [22]. However, a study by Khadidos et al. evaluated six traditional learning models and one deep learning model based on MRI texture analysis of patients with LARC, and found that their deep learning CNN model did not show any predictive potential [24].

It should be noted that deep learning requires more data than traditional machine learning. In addition, if data from more than one scanner were used, magnetic field or vendor signal variability should be taken into consideration. A study based on two 1.5 Tesla MR scanners found that 75% of functionality was unstable due to vendor and image acquisition variability [49]. If the images were scanned at different magnetic fields, the changes would be even greater. Since few radiomic studies of LARC used deep learning, the information regarding its validity and efficacy are limited and more studies should be performed to fully assess its potential for clinical applications.

4. Discussion and Future Perspective

In this review, we presented data to show radiomics as a promising noninvasive imaging-based method for predicting treatment response in patients with LARC. Future radiomic research should focus on independent validation of existing models while continuing to develop new models for novel research questions. The current knowledge on deep learning in LARC is limited and more research is needed to explore its potential for clinical applications. Incorporation of multimodal imaging data and other factors such as clinical features and surgery-related variables should enhance predictive model performance.

There are inherent issues with interpretability of radiomic models. A lack of understanding of how machine learning predictions are generated remains a barrier to its adoption in clinical practice. This situation also occurs in deep learning approach. These black-box-like networks are hard to understand and hard to correlate with clinical outcomes with no strong theoretical support [98]. The lack of interpretability of predictive models can undermine interest and trust in them [99]. More work needs to be done to familiarize the users of radiomic models and help them to understand the associated interpretations [100]. Multidisciplinary teams need to create visual displays to help clinicians better understand how machine learning works [101].

The high variability of data acquisition should also be addressed in radiomics. As imaging protocols and scanner parameters are different in various research centers, the radiomic results are often different, which will greatly affect the reproducibility of radiomic data. It is prudent to use multicenter studies and standardized imaging parameters [46]. MRI and CT are commonly used as imaging modalities of choice for patients with rectal cancer. Staal et al. evaluated quality of published literature using Quadas-2 and radiomics quality score (RQS). They concluded that the high-quality studies were predominantly MRI-based radiomic analysis of the rectum [11]. Although CT may not have the detailed tissue characterization as the MRI for assessing treatment response of LARC, CT is still more commonly used than MRI in clinical practice. Nevertheless, the combined imaging features of CT and MRI are desirable [37]. The multimodal radiomic model designed by Li et al. achieved an AUC of 0.925 in the training set and AUC of 0.93 in the validation set [89]. However, the multimodal strategy is time-consuming and costly with a potential issue for overfitting.

There were issues with the study design of radiomic studies. Most of existing radiomic studies are retrospective, sourcing from single-center data with the lack of independent external validation, which may limit the reliability and applicability of the results. Multicenter studies are conducive to reducing bias yet not achieved in most studies.

There was a lack of clinical features in predictive modeling in the studies included in this review, suggesting a need to take clinical features into consideration for future studies. For instance, preoperative factors [57], surgical approach, and postoperative treatment all have important effects on the prognosis of patients with LARC [102], which should be included in modeling. In addition, studies incorporating clinical features have a higher predictive value. Pizzi et al. presented a novel machine learning model combining clinical and MRI-based radiomic features, demonstrating that the combination of clinical and radiomic features contributed to improved performance of the prognostic models [37]. Their result was generally in line with that of a study by Staal et al. [11]. This review was limited due to the lack of surgery-related literature for prediction models of surgical planning, preoperative and postoperative complications. Future radiomic studies and predictive modeling should incorporate relevant surgical information to improve model performance for predicting LARC prognosis.

5. Conclusion

In summary, this review examines the status of radiomic application in predicting treatment response to nCRT for patients with LARC. The limitations of existing radiomic studies have pointed out the need for large-scale prospective multicenter approach to avoid the potential pitfalls of small sample size, single-center data, imaging variability, and overfitting issues. In addition, there is a need to incorporate clinical factors in predictive modeling to improve the model performance and clinical relevance. More work needs to be done to render the radiomic data more interpretable and explainable to enhance its application for clinical use. Radiomics has emerged as a potential tool for identification of imaging biomarkers for cancer treatment, which could assist in clinical decision making and personalized medicine for patients with cancer.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Ethical Approval

Institutional Review Board approval was obtained from the Research Ethics Committee, Xiangya Hospital (IRB: 2022020245).

Written informed consent was waived due to the literature review nature of this study.

Conflicts of Interest

The authors declare that there are no conflicts of interest.

Authors’ Contributions

Rui Chen performed the search and wrote the first draft of the manuscript. Xiaoping Yi and Hongyan Zai supervised the literature search and revised the manuscript for intellectual content. Qian Pei and Yan Fu contributed to the literature search and interpretation of results. Bihong Chen reviewed the study design and data analysis and edited the manuscript critically. All authors read and approved the final manuscript.

Acknowledgments

This research was funded by the Xiangya-Peking University, Wei Ming Clinical and Rehabilitation Research Fund (No. xywm2015I35), the Project Program of National Clinical Research Center for Geriatric Disorders (Xiang ya Hospital, Grant No. 2022LNJJ09), Natural Science Foundation of Hunan Province (2022JJ30979), and China Post-Doctoral Science Foundation (2018M632997, 2022M713536).