Abstract

The purpose of this study was to investigate the effect of image preprocessing on radiomic features estimation from computed tomography (CT) imaging of locally advanced rectal cancer (LARC). CT images of 20 patients with LARC were used to estimate 105 radiomic features of 7 classes (shape, first-order, GLCM, GLDM, GLRLM, GLSZM, and NGTDM). Radiomic features were estimated for 6 different isotropic resampling voxel sizes, using 10 interpolation algorithms (at fixed bin width) and 6 different bin widths (at fixed interpolation algorithm). The intraclass correlation coefficient (ICC) and the coefficient of variation (CV) were calculated to assess the variability in radiomic features estimation due to preprocessing. A repeated measures correlation analysis was performed to assess any linear correlation between radiomic feature estimate and resampling voxel size or bin width. Reproducibility of radiomic feature estimate, when assessed through ICC analysis, was nominally excellent () for shape features, good () or moderate () for first-order features, and moderate or poor () for textural features. A number of radiomic features characterized by good or excellent reproducibility in terms of ICC showed however median CV values greater than 15%. For most textural features, a significant () correlation between their estimate and resampling voxel size or bin width was found. In CT imaging of patients with LARC, the estimate of textural features, as well as of first-order features to a lesser extent, is appreciably biased by preprocessing. Accordingly, this should be taken into account when planning clinical or research studies, as well as when comparing results from different studies and performing multicenter studies.

1. Introduction

Radiomics concerns the management of standard-of-care digital medical images from different modalities (e.g., computed tomography (CT), magnetic resonance (MR), and nuclear medicine (NM)), with the aim of mining from them pathophysiologic changes underlying disease [1]. Specifically, quantitative morphological and textural characteristics of tissue can be obtained from medical images by measuring different mathematical indices, namely, “features.” In addition to other available data from demographics, pathology, blood biomarkers, and genomics, radiomic features can be used for diagnostic, prognostic, or predictive purposes exploiting statistical or machine learning methods [2]. Despite its potential, radiomics is not yet a widely used and well-consolidated tool in clinical practice, since it involves complex processes (e.g., image acquisition and reconstruction, image segmentation and rendering, features estimation, databases and data sharing, classification, and analysis), which need proper application and optimization in order to obtain reliable results [3, 4]. Moreover, accurate classification methods using radiomic features and artificial intelligence (AI) require large data sets. Given that each step of the radiomic workflow can introduce noise (i.e., variability) in radiomic features estimation, robust radiomic features should be obtained for adequately training AI predictive models, mostly when only limited data are available as in many practical applications [5].

Colorectal cancer is the most common gastrointestinal malignancy and the third leading cause of cancer-related death in Western countries. More than half of rectal cancer patients are diagnosed with locally advanced tumors (locally advanced rectal cancer (LARC): T3/T4 tumor and/or positive limphonodes) [6]. For this group of patients, preoperative radiochemotherapy (RTCT) followed by total mesorectal excision (TME) is the standard of cure [7]. However, some studies [8, 9] have reported better outcomes for the not negligible part of the patients who reach pathological complete response (pCR) after RTCT. In order to apply organ-preserving strategies, as well as to personalize treatments or to deescalate therapies [10], there is a great interest in stratifying the risk in patients with LARC, aimed at predicting pCR by exploiting various techniques which include radiomics [11]. Moreover, it would be of practical utility whether this could be accomplished by using the available CT images acquired for radiation therapy planning [1214]. In this regard, we note that previous studies have assessed the potential role of CT imaging radiomics in rectal cancer both for contrast-enhanced [12, 13, 1517] and noncontrast CT scans [14, 18].

Robustness of radiomic features relies on reproducibility and repeatability of their estimates considering different aspects of the radiomic workflow [19, 20]. Previous studies of CT imaging [2124], as well as of MR [2530] and NM [3134] imaging, have assessed the reproducibility and repeatability of radiomic features estimation for various applications. Phantom and in vivo CT studies have reported dependence of radiomic feature estimates on various factors such as scanner type [35, 36], tube current [3739], acquisition voxel size [21, 35], reconstruction kernel [4043], and number of gray levels [21] or gray level discretization [35]. Shafiq-Ul-Hassan et al. [21], given that in clinical studies CT images are acquired using different voxel sizes, have suggested that resampling all image data sets with the same isotropic voxel size allows to reduce variability in radiomic features estimation. This specific preprocessing step can be accomplished by using different interpolation algorithms. However, the effect of the used interpolation algorithm on CT imaging radiomic feature estimate is usually not taken into account.

Only few studies have investigated the robustness of radiomic features from CT imaging in rectal cancer. For instance, Hu et al. [44] have studied feature stability in repeated CT acquisition, showing that features normalized to the tumor volume and those calculated as average over slices exhibit greater values of intraclass correlation coefficient (ICC) and concordance correlation coefficient (CCC) with respect to the unnormalized ones. Van Timmeren et al. [45] have compared two different test-retest situations, i.e., the analysis of repeated CT acquisitions after 15 minutes and few days in lung and rectal cancer, respectively. They have found that 446/542 features have a higher CCC for the test-retest analysis of the data set of patients with lung cancer than for patients with rectal cancer, showing the importance of controlling factors such as scanner, imaging protocol, reconstruction methods, and time points in a radiomic analysis.

Therefore, the aim of this study was to specifically assess, for the first time, the effect of preprocessing—in terms of resampling voxel size, interpolation algorithm, and bin width—on radiomic features estimation from CT imaging in patients with LARC.

2. Material and Methods

2.1. Patients and CT Imaging

Twenty representative patients with LARC were enrolled in this retrospective study approved by the internal review board. All patients underwent clinical CT imaging for preoperative radiotherapy. CT scanner (manufacturer/model) and acquisition parameters are reported in Table 1.

For each patient, the rectal gross tumor volume (GTV) was delineated (avoiding the inclusion of air regions) on CT images by a single experienced radiation oncologist, using the Eclipse treatment planning system (version 8.6, Varian, Palo Alto, CA). Then, a binary mask of the GTV region was created by employing 3D Slicer (version 4.10.2) [46].

2.2. Preprocessing of CT Images

A lower threshold of -500 HU was applied to the CT images, in order to account for partial volume effect and exclude voxels containing air from analyses.

The original isotropic voxel size (i.e., the cube root of the acquisition voxel volume) of CT imaging ranged approximately from 1.5 mm to 2 mm across patients. Therefore, in order to assess any effect of resampling voxel size on radiomic features estimation, CT images were resampled to isotropic voxels with size of 1 mm, 1.3 mm, 1.6 mm, 1.9 mm, 2.2 mm, and 2.5 mm. For each resampling voxel size, this was performed by using 10 different interpolation algorithms available in PyRadiomics (version 3.0) [47] (namely, BSpline (BS), BlackmanWindowedSinc (BL), CosineWindowedSinc (CWS), Gaussian (G), HammingWindowedSinc (HWS), LabelGaussian (LG), LanczosWindowedSinc (LWS), Linear (L), NearestNeighbor (NN), and WelchWindowedSinc (WWS)), at fixed bin width of 5 HU.

Furthermore, in order to assess any effect of bin width on radiomic features estimation, different bin width values of 3 HU, 4 HU, 5 HU, 6 HU, 7 HU, and 8 HU were employed. Indeed, for all patients, a bin width of 5 HU yielded a suggested number of quantization levels between 30 and 130 [4, 33, 48]. Accordingly, only bin width values close to 5 HU were considered. This was performed for different resampling voxel sizes (i.e., 1 mm, 1.3 mm, 1.6 mm, 1.9 mm, 2.2 mm, and 2.5 mm), at a fixed interpolation algorithm (i.e., BS).

All preprocessing of CT images was carried out by using the open source PyRadiomics library [47] (version 3.0.1) with Python (version 3.7.3).

2.3. Radiomic Features Estimation

Radiomic features estimation was performed through PyRadiomics (version 3.0.1) [47]. All but 9 radiomic features were calculated in compliance with the Image Biomarker Standardization Initiative (IBSI) [49]. For the remaining 9 radiomic features, 7 (namely, voxel volume, mesh volume, maximum probability, joint energy, sum squares, uniformity, and entropy) were calculated using IBSI formulas but presented different names, 1 (i.e., kurtosis) was in accordance with IBSI except for an offset value (i.e., 3), and 1 (i.e., total energy) was not defined by IBSI.

For each GTV region and preprocessing parameter combination (in terms of different resampling voxel sizes, interpolation algorithms, and bin widths), a total of 105 features, divided into 7 classes, were estimated. In particular, 14 shape features (shape class), 18 first-order features (first-order class), 22 gray level cooccurrence matrix features (GLCM class), 14 gray level dependence matrix features (GLDM class, with coarseness parameter ), 16 gray level run length matrix features (GLRLM class), 16 gray level size zone features (GLSZM class), and 5 neighbouring gray tone difference matrix features (NGTDM class) were estimated. Second-order features estimation was performed according to the Chebyshev norm with a distance of 1 pixel. Only 3D versions of features were considered. GLCM and GLRLM features were computed from each 3D directional matrix (i.e., the 13 matrices identified by the 13 unique direction vectors within the 26 connected neighbouring voxels) and averaged over the 3D directions.

2.4. Statistical Analysis

In this study, four different effects on radiomic features estimation were assessed: (a)For each resampling voxel size, with fixed bin width (i.e., 5 HU), effect of using different interpolation algorithms;(b)For each resampling voxel size, with fixed interpolation algorithm (i.e., BS), effect of using different bin widths;(c)For each bin width, with fixed interpolation algorithm (i.e., BS), effect of using different resampling voxel sizes;(d)For each interpolation algorithm, with fixed bin width (i.e., 5 HU), effect of using different voxel sizes.

For each aforementioned effect of interest, any variability in radiomic feature estimate was assessed by means of ICC analysis [50, 51]. Specifically, the two-way mixed-effect model, with single rater and absolute agreement options, was applied to our data [50, 52]. Accordingly, the ICC coefficient was calculated as where MSR is the mean square for rows, MSE is the mean square for error, MSC is the mean square for columns, is the number of subjects, and is the number of raters, with ICC matrices realized considering each resampling voxel size, interpolation algorithm, or quantization bin width as a rater and each patient as a subject. ICC, which ranges between 0 (maximum variability) and 1 (minimum variability), expresses the variability of radiomic feature estimate associated with the effect of interest (i.e., resampling voxel size, interpolation algorithm, and bin width) with respect to the variance between subjects. Then, ICC values of radiomic features were nominally stratified as follows: poor (), moderate (), good (), and excellent () [36, 52, 53].

In order to better characterize the reproducibility of radiomic features estimation, an additional analysis of the coefficient of variation (CV) was performed. In particular, for each radiomic feature and patient, CV was calculated as the percentage ratio between standard deviation and mean values of feature estimates obtained by varying each considered preprocessing item (resampling voxel size, interpolation algorithm, or bin width) when the others were kept fixed.

Any linear correlation between radiomic features estimate and resampling voxel size or bin width was assessed through a repeated measures correlation analysis, namely, rmcorr [54]. This statistical technique accounts for nonindependence among observations (i.e., repeated measurements on the same subject with varying preprocessing) by using the analysis of covariance (ANCOVA) to adjust for individual differences.

Statistical analysis was performed by using R Studio (version 1.2.5033) and R (version 4.0.2) software packages [55].

3. Results

ICC results for effects A, B, C, and D are reported in detail in Figures 14, respectively. ICC values were excellent for all features belonging to the shape class. In general, radiomic features belonging to the first-order class showed good or moderate ICC values, while features belonging to the other textural classes (GLCM, GLRLM, GLSZM, GLDM, and NGTDM) expressed moderate or poor ICC values.

CV of radiomic features estimate for each effect of interest (i.e., A, B, C, and D) is summarized in Table 2, showing that for a number of features the variability associated with the four effects of interest can range up to 40% or more. Moreover, CV of radiomic features estimate for effect A, B, C, and D is reported in greater detail in Figures 58, respectively, showing features with both median (across subjects and resampling voxel sizes, bin widths, and interpolation algorithms, for effect A/B, C, and D, respectively) and .

The results of the analysis of the linear correlation between radiomic features estimate and bin width or resampling voxel size are reported in Tables 3 and 4, respectively. Most of the textural features (i.e., those belonging to GLCM, GLRLM, GLSZM, GLDM, or NGTDM classes) were characterized by a significant (, adjusted using Bonferroni correction) linear correlation with respect to both voxel size and bin width within the considered range of variation (i.e., 1-2.5 mm and 3-8 HU for voxel size and bin width, respectively).

4. Discussion

Recent studies have suggested a potential role of CT imaging radiomics in rectal cancer [1118]. Bibault et al. [12] have presented a novel approach combining deep learning with clinical and pretreatment CT imaging radiomic features to build a model predicting complete pathologic response in a multicenter cohort of patients with locally advanced rectal cancer treated with neoadjuvant chemoradiation, followed by surgery. They have found that this model correctly predicted complete response after neoadjuvant rectal chemoradiotherapy in 80% of patients. In another study [14], pretreatment CT-based radiomic signatures were developed and validated in two independent cohorts. This imaging biomarker has proven to provide a promising way to predict complete pathologic response and select patients for nonoperative management. On the other hand, Hamerla et al. [13] have reported no evidence of added value of a radiomic model based on noncontrast CT scans for prediction of complete pathologic response in locally advanced rectal cancer. Vandendorpe et al. [15] have aimed to determine the value of baseline contrast-enhanced CT texture analysis in the prediction of downstaging in patients with locally advanced rectal cancer, calculating a radiomic score as a linear combination of radiomic features. By using a multivariable prognostic score that included this radiomic score and clinical factors, they have shown that this approach may lead to more personalized treatment for each patient. Wang et al. [18] have suggested that, by supervised modeling, radiomic features from radiotherapy CT imaging can potentially predict overall survival for locally advanced rectal cancer patients with neoadjuvant chemoradiation treatment. Moreover, Huang et al. [17] have developed and validated a radiomic signature from contrast-enhanced CT imaging as a complementary tool to differentiate high-grade from low-grade colorectal adenocarcinoma, with an area under the receiver operating characteristic curve of 0.725.

There is increasing evidence that preprocessing can someway impact the estimation of radiomic features derived from CT imaging [2123], as well as from MR [2528] and NM [3133] imaging, in various clinical applications [19]. The degree and relevance of this effect can depend on imaging technique/modality and clinical application (e.g., anatomical region and lesion type). In this regard, Traverso et al. [19] have reviewed radiomic feature reproducibility and repeatability issues as reported by numerous research groups for different anatomical sites and various aspects of the radiomic workflow (i.e., image acquisition and reconstruction, image preprocessing, and feature extraction). They have submitted that further investigations are needed on these issues, possibly expanding the cohort of cancer types and providing details on feature extraction, image preprocessing, and statistical cutoff values used to distinguish stable features. To the best of our knowledge, no previous study has assessed the effect of preprocessing on CT-based radiomic features of locally advanced rectal cancer. Moreover, the effect of the interpolation algorithm was recently reported worthy of attention both in MR imaging radiomics of the LARC [25] and in PET imaging radiomics of oesophageal cancer [53]. Thus, we performed a rather comprehensive analysis, which considered multiple preprocessing elements such as resampling voxel size, interpolation algorithms, and bin widths.

We found that preprocessing can substantially bias the estimation of several CT imaging radiomic features in patients with LARC. In particular, the estimate of most textural features (i.e., features of GLCM, GLDM, GLRLM, GLSZM, and NGTDM classes) showed a relevant dependence () on interpolation algorithm (Figure 1), resampling voxel size (Figures 3 and 4), or bin width (Figure 2). Notably, except for a limited number of radiomic features, results of Figures 13 show that the degree of variability in radiomic features estimation due to interpolation algorithm/bin width and resampling voxel size is rather independent of resampling voxel size and bin width, respectively. However, in Figure 4, the heatmaps present a more pixelated appearance, indicating that the effect of resampling voxel size on textural features estimation can be more appreciably modulated by the used interpolation algorithm. Furthermore, for numerous textural features, a significant linear correlation between their estimates and bin width or resampling voxel size was observed (Tables 3 and 4).

As expected, based on ICC analysis, shape radiomic features did not show a relevant dependence on preprocessing (), with median CVs for different resampling voxel sizes within 0.6%-3.1% (Table 2). Moreover, first-order radiomic features were characterized in general by higher ICC values than textural features when varying preprocessing. In this regard, we note that, except for entropy and uniformity features, PyRadiomics—according to IBSI recommendations [49]—calculates intensity-based first-order features using original images without discretization, as indicated by the null median CVs for different bin widths (see Table 2).

ICC analysis allows only to assess how relevant is the variability in radiomic features estimation due to preprocessing with respect to intersubject variability. Therefore, we also calculated the CV for each effect of interest (i.e., A, B, C, and D), in order to obtain additional information on absolute variation in radiomic features estimate due to different resampling voxel sizes, interpolation algorithms, and bin widths. We found that several textural features showed median CVs greater than 40% or more (Table 2). Furthermore, we revealed that even some radiomic features with high ICC values (≥0.75) can have a nonnegligible variability in terms of CV (≥15%) (Figures 58). In particular, for some effects of interest, CV values of single subjects can range up to 60% or more—this should be considered when comparing radiomic data from different studies. The results of Figure 8, regarding a small number of only 6 radiomic features, might suggest a potential reduction in median CV values across subjects when using the Gaussian interpolation algorithms with respect to the other interpolation algorithms. The explanation of this potential effect, which however does not necessarily hold true for all radiomic features (data not shown), appears not straightforward. Indeed, the use of different interpolation algorithms for resampling voxel size can modify radiomic characteristics of images in a manner rather complex and not easily predictable [56, 57].

In radiomic analyses, image interpolation at the same voxel size is a common and recommended practice (especially in retrospective studies) to reduce any heterogeneity in acquisition voxel size, while image discretization is required to make texture features estimation computationally less burdensome [31, 49]. Nonetheless, it should be noted that these preprocessing steps modify de facto acquired image data and likely their radiomic characteristics. In particular, the alteration of acquired image data due to preprocessing can actually yield a possible variation in the estimates of radiomic features when using different resampling voxel sizes and bin widths. Therefore, professionals and researchers executing or planning clinical or research studies should be aware of this important aspect of radiomics.

Given the revealed nonnegligible effect of preprocessing on the estimate of CT radiomic features in LARC, some caution regarding this aspect is recommended in clinical studies [23, 33, 37], mostly when considering textural radiomic features [58]. This emphasizes the importance of clearly identifying, assessing, and reporting all the processes involved in the applied radiomic workflow [35, 11, 19, 31].

5. Conclusions

In patients with LARC, the estimate of CT imaging-derived texture radiomic features, as well as of intensity-based first-order radiomic features to a lesser extent, is appreciably biased by resampling voxel size, interpolation algorithm, and bin width. Accordingly, toward optimization and standardization of radiomic methods, this should be taken into account when planning a clinical study, as well as when performing multicenter studies.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Disclosure

Patrizio Barca current address is Department of Medical Physics, IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy.

Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this article.