Abstract

Prostate cancer occurs when cells in the prostate gland grow out of control. Almost all prostate cancers are adenocarcinomas. The survival rate for prostate cancer patients depends on the screening outcome, which can be either no prostate cancer, early detection, and late detection or advanced stage detection. The main objective of this study was to estimate the risk factors affecting the screening outcome of prostate cancer. With ordinal outcomes, a generalized Bayesian ordinal logistic model was considered in the analysis. The generalized Bayesian ordinal logistic model helped in estimation of coefficient parameters of the risk factors affecting each level of prostate cancer-screening outcomes. In the study, positive coefficients, that is, , indicated that the higher values on the explanatory variable increased the chances of the respondent being in a higher category of the dependent variable than the current one, while the negative coefficients, that is, , signified that the higher values on the explanatory variable increased the likelihood of being in the current or lower category of prostate cancer. For instance, from the analysis, positive or negative outcomes of prostate cancer showed that an increase in weight lowered the chances of an individual having the disease.

1. Introduction

Cancer is a disease caused by abnormal cell growth, where the disease has the potential to invade or spread to other parts of the body [1]. Once infected, normal cells are transformed into tumor cells, which progress from a precancerous lesion to a malignant tumor [2]. There are several types of cancer, namely, prostate, skin, cervical, breast, colorectal, lung, brain, and bone [3].

Policies on cancer treatment and management in Kenya guided the development and implementation of legal frameworks for the delivery of cancer services at the national and county levels [4]. The key among them was to provide health insurance cover to allow the vulnerable, orphans, and elderly to access comprehensive health care in line with the Sustainable Development Goal (SDG) [57]. This intended to inspire prostate cancer patients to seek early screening devoid of the cost for treatment.

A report by the National Cancer Screening Guidelines in 2018 showed that most Kenyans did not turn up for early screening and diagnosis [8]. This was based on data from Kenyatta National Hospital which indicated that, in 2014–2016, approximately 64% of cancer patients were diagnosed at stage III or IV, who were difficult to attain effective treatment [9].

According to Bratt et al. [10], different people had expressed different reasons, such as age, health, race, obesity status, and family history, which deterred them from screening for prostate cancer. It was also believed that majority of people failed to turn up for screening due to lack of awareness training, socioeconomic-related challenges, worry about examination discomfort, fear of finding positive cancer status due to associated myths and stigma, and also inability to establish effective follow-up treatment [11, 12]. Motivated by these reasons, the study estimated the risk factors affecting the screening outcome of prostate cancer screening.

2. Literature Review

A study by Kramer et al. [13] critically evaluated the evidence for recommending the screening of asymptomatic men for prostate cancer with a blood test to detect a prostate-specific antigen (PSA). They found out that although screening for prostate cancer had the potential in saving lives, over diagnosis, screening, and subsequent therapy could actually have net unfavourable deaths or quality of life or both. Without controlled clinical trials and prospective ascertainment of costs at the individual level, it was not possible to determine the net cost for mass screening programs of prostate cancer. There was uncertainty because earlier disease stages resulted in lower treatment costs, but the potential cost increased from over diagnosis. Understanding the impact of risk factors on screening outcomes will help individuals seek screening services at the earliest time possible. When the disease is detected in early stages, there are high chances in the reduction of treatment costs and saving of life. However, Wolf et al. [14] sought to establish predictors of interest in PSA screening and the impact of informed consent from prostate cancer patients. A method of a randomized trial was used which indicated that informed consent decreased patient interest in prostate-specific antigen screening. The result showed that perceived screening efficacy, perceived seriousness of an abnormal prostate-specific antigen, and willingness to accept treatment risks were univariate predictors of PSA screening interest among patients who were not informed. Estimation of risk factors affecting screening outcomes in this study helped identify major risk factors that need to be considered to partake in this noble task.

Bozkurt et al. [15] compared Bayesian networks and binary logistic regression methods in prediction of prostate cancer. It was clear that serum PSA level, age, digital rectal examination, and clinical symptoms were helpful for early detection of the tumor. The aim of their study was to examine and compare the methods used for early detection of prostate carcinoma which was identified by both logistic regression and Bayesian networks. Karlsson [16] also performed microsimulation modeling of prostate cancer screening in Sweden. A method based on the approximate Bayesian computations and Markov chain Monte Carlo was developed. On predicting the effect of dynamically changing current opportunistic PSA testing patterns for regular screening, the result revealed a reduction in the prostate cancer incidence and an increase in mortality when 8 yearly screening in men aged 55–69 was introduced for over 20 years. This study focused on the effect of risk factors considered by individuals before screening for prostate cancer. The effect was based on different outcomes (measurement levels) by using a generalized Bayesian ordinal logistic regression model.

Brant et al. [17] carried out research on screening for prostate cancer by using random-effects models. Male participants in a long-term longitudinal study were screened by using posterior probabilities, where each male was classified into one of the four diagnostic states for prostate cancer, i.e., normal, benign prostatic hyperplasia, local cancer, and metastatic cancer. Repeated measures of PSA were collected when there was no clinical evidence of prostate cancer diseases which were used in a classification process. The result showed that, overall, 86.8% or 88.3% were correctly classified by using the longitudinally collected PSA measurement, depending on whether or not a distinction was made between local and metastatic cancer. However, predictors like diet and family history were not included in the model because of lack of availability or insufficient information in data. Their study focused on the screening measurement of PSA and the effect of the age and the groups that an individual belonged to in predicting the next measurement level (i.e., PSA level) over a period of time. This study focused on the effect of unmeasured risk factors on the screening outcome (measurement level) of an individual by using a generalized Bayesian ordinal logistic regression model without random effects.

On the other hand, Liu and Koirala [18] indicated that the proportional odds (PO) assumption for ordinal regression analysis is often violated because it is strongly affected by the sample size and the number of covariate patterns. To address this issue, the partial proportional odds (PPO) model and the generalized ordinal logistic model were developed. However, these models are not typically used in research. One likely reason for this is the restriction of current statistical software packages: SPSS cannot perform the generalized ordinal logistic model analysis, and SAS requires data restructuring. Their article illustrated the use of generalized ordinal logistic regression models to predict mathematics proficiency levels using Stata and compared the results from fitting PO models and generalized ordinal logistic regression models.

Kulkarni et al. [19] also indicated that generalized linear models (GLMs) such as logistic regression are among the most widely used arms in the data analyst’s repertoire and often used on sensitive datasets. A large number of prior works that investigated GLMs under differential privacy (DP) constraints provide only private point estimates of regression coefficients and are not able to quantify parameter uncertainty. In their work, with logistic and Poisson regression as running examples, they introduced a generic noise-aware DP Bayesian inference method for a GLM at hand, given a noisy sum of summary statistics. Quantifying uncertainty allowed them to determine which regression coefficients were statistically significantly different from zero. They provided a tight privacy analysis and experimentally demonstrated that the posteriors obtained from their model, while adhering to strong privacy guarantees, were close to nonprivate posteriors. This study incorporated the Bayesian paradigm to estimate significant risk factors affecting each screening outcome of prostate cancer.

A Schorgendorfer et al. [20] indicated that logistic regression is a popular tool for risk analysis in medical and population health science. With continuous response data, it is common to create a dichotomous outcome for logistic regression analysis by specifying a threshold for positivity. Fitting a linear regression to the nondichotomized response variable assuming a logistic sampling model for the data has been empirically shown to yield more efficient estimates of odds ratios than ordinary logistic regression of the dichotomized endpoint. They illustrate that risk inference is not robust to departures from the parametric logistic distribution. Moreover, the model assumption of proportional odds is generally not satisfied when the condition of a logistic distribution for the data is violated, leading to biased inference from a parametric logistic analysis. The novel Bayesian semiparametric methodology for testing the goodness of fit of parametric logistic regression with continuous measurement data was developed. The testing procedures hold for any cutoff threshold, and our approach simultaneously provides the ability to perform semiparametric risk estimation. Bayesian factors were calculated using the Savage–Dickey ratio for testing the null hypothesis of logistic regression versus semiparametric generalization. When parametric logistic regression fails, an empirical Bayesian approach that is fully Bayesian and computationally efficient is used to test the methods that have been proposed and presented for semiparametric risk estimation and odds ratio estimation. Theoretical results establish the consistency of the empirical Bayes test. Results from simulated data show that the proposed approach provides accurate inference irrespective of whether parametric assumptions hold or not. Evaluation of risk factors for obesity shows that different inferences are derived from an analysis of a real dataset when deviations from a logistic distribution are permissible in a flexible semiparametric framework.

In this study, Bayesian methods were used to estimate the parameters, where the coefficients varied across predictor variables. This was useful in estimating the effects of the risk factors on each outcome of prostate cancer screening. Thus, the generalized Bayesian ordinal logistic model was used in estimating the parameters for each outcome of prostate cancer.

3. Methodology

Since McCullagh’s 1980 research paper on regression for ordinal data, cumulative-type models have become a standard tool in ordinal regression [21]. The ordered logit model is a regression model for an ordinal response variable. It is based on the cumulative probabilities of the response variable. In particular, the logit of each cumulative probability is assumed to be a linear function of the covariates with constant or varying coefficients across response categories [22].

In this study, the outcomes of screening for prostate cancer are ordinal. The predictor variables were measured as continuous and/or categorical data. The risk factors considered by those who turned up for screening prostate cancer in this study are age (continuous), traces of prostate cancer in family/history (binary with no = 0 or yes = 1), weight control (ordinal with underweight = 0, normal weight = 1, overweight = 2, and obese = 3), and hereditary breast and ovarian cancer syndrome (binary with no = 0 or yes = 1). The screening outcome for the individual was therefore given by

The effects of the risk factors on each outcome are estimated by a generalized Bayesian ordinal logistic model. This was based on the coefficient parameters, , where is the number of risk factors and is the number of prostate cancer outcomes. This implies that there is the matrix of the coefficient parameters, where and is the risk factor considered by the individual for prostate cancer screening. The risk factors for each individual form an matrix as follows:

The generalized ordinal logistic regression model for every outcome of prostate cancer is given bywhere is the antilogarithm transformation function and is the threshold for the prostate cancer outcome. From the above representation, j as indicated on the parameters represents the outcomes of prostate cancer. Since there are four outcomes, the generalized Bayesian ordinal logistic model consists of a series of binary regression models, and therefore,whereby the three sets of coefficient parameters represented by are defined as follows:(i) are the coefficient parameters for risk factors of the model that represented the chances of an individual having prostate cancer or not having the disease, that is, either no prostate or an individual being in one of the following groups: early, late, or advanced stages (0 versus 1, 2, and 3).(ii) are the coefficient parameters for the risk factors of individuals who were in the early stage or no prostate versus other stages of prostate cancer (i.e., 0 and 1 versus 2 and 3).(iii) are the coefficient parameters for the risk factors of the individuals who were in the advanced stage versus other stages (that is, 0, 1, and 2 versus 3).

The latent propensity of the generalized Bayesian ordinal regression model in equation (4) follows a series of logistic distributions with the conditional mean matrix . Therefore, there are a series of latent continuous random variables , and thus, the variable is observed such that if with and . To be more specific, a latent propensity variable is used as a basis for modeling the ordered ranking of prostate cancer screening outcomes for the individual. It is assumed to be a linear function of the covariates , and therefore, in this study, a series of latent variables for each screening outcome are given bywhere the screening outcome of the individual is defined as

Considering the Albert and Chib [23] joint posterior model for ordered multinomial responses, in this study, the joint distribution model for the set of parameters is given byFor the model parameters , with , the following priors are considered:

To get the conditional distribution of the parameter estimates , the likelihood function of the model parameters and the latent variables is given by the following joint posterior distribution:where, in the equation, is the likelihood function of the model parameters. Therefore, the fully conditional distribution of is given by

To prove equation (10), the chosen priors of the parameters in equation (8) and that of (assumed diffuse), are used whereby

Let in the equation. This implies thatwhere

Thus,

On the other hand, with the defined prior for , the fully conditional distribution shown in the following equation is derived:

Since follows inverse gamma with parameters a and b, then its prior is given byand the fully conditional distribution of is given by

which is a kernel of the inverse gamma distribution with,and hence,

The conditional distribution of other parameters in the model, that is, the fully conditional posterior distribution of the latent variables, is independent with

The fully conditional density of given other parameters is given up to the proportionality constant by the equation

The equation is seen to be uniform on the interval and .

4. Results

In the study, , , and are the latent variables for the model of individuals with negative or positive outcomes of prostate cancer, individuals who were in early or no prostate versus late or advanced stages, and individuals who were in advanced stages or lower stages of prostate cancer, respectively.

The solution of the coefficient parameters was then obtained, as shown in equation (13). In the solution, represent the coefficient parameter estimates of the risk factors in the generalized Bayesian ordinal logistic model where I is the identity matrix,where is the coefficient parameter mean for each outcome divided by common variance .

The study assumed the normal prior with the mean and standard deviation on all of the fixed effects. For positive coefficients of , that is, , the higher values on the explanatory variable increase the chance that the respondent will be in a higher category of the dependent variable than the current one, while the negative coefficients, that is, , signify that the higher values on the explanatory variable increase the likelihood of being in the current or lower category. Fitting the data in the model, the coefficient parameter estimates were obtained, and the results are presented in Table 1.

Posterior predictive probability values are used to test the hypothesis and estimation of significant risk factors affecting the outcome of prostate cancer. The one-sided hypotheses indicated that the posterior probability exceeded 95%; for two-sided hypotheses, the value tested against lied outside the 95% CI. In addition, the posterior probabilities of point hypotheses assumed equal prior probabilities. For instance, age is the only variable that is not significant for the model that represented individuals who were in the category (0 vs. 1, 2, and 3). In the other categories, that is, 0 and 1 vs. 2 and 3, none of the variables are insignificant, and in the (0, 1, and 2 vs. 4) category, only weight control is not significant.

Using the coefficient estimates, age had a coefficient of 0.83 which is positive, indicating that an increase in age increases the chances of an individual being in early, late, or advanced stages of prostate cancer. Similarly, traces of family history and hereditary breast and ovarian cancer syndrome had coefficients of 2.82 and 0.76, respectively, thus indicating that individuals who have family members with a history of prostate cancer or hereditary breast and ovarian cancer are likely to have the disease. However, the coefficient parameter of weight control was −1.56 which indicated that the higher values of individuals controlling weight increased their likelihood of not having prostate cancer. The results for other categories can be interpreted based on the signs of the coefficient parameters.

The graphical representation of the parameters in this category is shown in Figure 1. The figure shows how the intercept and coefficient parameters are distributed and their prediction from the first to the fourth chain. On the other hand, the marginal effects describe the average effect of change in the explanatory variable (i.e., risk factors) on the change in the probability of prostate cancer outcomes. They also provide a direct effect and easily interpreted answer to research questions. In this research, the marginal effect, which showed whether an individual was in advanced or lower stages of prostate cancer outcomes, i.e., 1, 2, 3, or 4, is illustrated in Figure 2.

From Figure 2, as the values of age increase, the values of prostate cancer outcomes also increase, indicating that aged individuals are most likely to be at an advanced stage of the disease. The same scenario is presented for individuals in and . The other variable, that is, , had negative coefficients/slope; thus, its graph had a decreasing function.

5. Discussion

The findings gave the specific effects of the risk factors on prostate cancer outcomes. Every outcome of prostate cancer patients was taken into account in the analysis, starting with estimation of the risk factors affecting individual outcomes, that is, no prostate or with prostate cancer. The analysis also focused on the risk factors that led individuals to have early stage detection or no prostate and those who were either in late or advanced stages of prostate cancer outcomes. The other category of prostate cancer outcomes considered in the analysis was individuals who were in advanced or lower stages of prostate cancer.

The effect of weight control indicated that individuals with more weight were likely to not have prostate cancer. Other risk factors like family history, hereditary breast and ovarian cancer syndrome, and age positively affected the outcome of individuals having prostate cancer. All risk factors had a positive impact on individuals who were in late or advanced stages versus lower stages of prostate cancer. It was also observed that age, traces of family history, and hereditary breast and ovarian cancer syndrome were the variables which positively affected individuals who were in the advanced stage.

Data Availability

The data used for analysis were generated using R software.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The study was funded by the authors.