Impact of Using Double Positive Samples in Deming Regression

Adarkwa, Samuel Akwasi; Owusu, Frank Kofi; Okyere, Samuel

doi:https://doi.org/10.1155/2022/3984857

International Journal of Mathematics and Mathematical Sciences

On this page

Abstract Introduction Results Conclusion Data Availability Conflicts of Interest References Copyright Related Articles

Research Article | Open Access

Volume 2022 | Article ID 3984857 | https://doi.org/10.1155/2022/3984857

Impact of Using Double Positive Samples in Deming Regression

Samuel Akwasi Adarkwa,¹Frank Kofi Owusu,¹and Samuel Okyere²

Academic Editor: Niansheng Tang

Received14 Jul 2022

Revised20 Jul 2022

Accepted22 Jul 2022

Published12 Aug 2022

Abstract

In the method comparison approach, two measurement errors are observed. The classical regression approach (linear regression) method cannot be used for the analysis because the method may yield biased and inefficient estimates. In view of that, the Deming regression is preferred over the classical regression. The focus of this work is to assess the impact of censored data on the traditional regression, which deletes the censored observations compared to an adapted version of the Deming regression that takes into account the censored data. The study was done based on simulation studies with NLMIXED being used as a tool to analyse the data. Eight different simulation studies were run in this study. Each of the simulation is made up of 100 datasets with 300 observations. Simulation studies suggest that the traditional Deming regression which deletes censored observations gives biased estimates and a low coverage, whereas the adapted Deming regression that takes censoring into account gives estimates that are close to the true value making them unbiased and gives a high coverage. When the analytical error ratio is misspecified, the estimates are as well not reliable and biased.

1. Introduction

A biological assay (Bioassay) is a scientific experiment where a substance of interest is introduced to a living organism to assess the effects of the substance introduced. In the area of drug development, a quantitative bioassay is when the effects of the substance introduced are quantified. The quantitative bioassay is mainly applied in drug development and environmental pollution assessment [1]. In order for a firm to receive approval from the Food and Drugs Administration (FDA) for a new medical measurement device or method, the firm must show that the new method target value is accurate as the old standard method (gold standard method) [2].

In an example, suppose this pharmaceutical company has two methods X and Y that could be used to measure the count of CD4+ in the blood of HIV patients. If there are some measurement errors associated with one method, which may be due to calibration or the way the scientist handled the device while taking the measurement, the classical regression approach or traditional regression approach is normally employed. In this approach, the least-squares (LS) method is usually used to find “good” estimators of the regression parameters, intercept and slope [3]. In the LS method, it is assumed that the measurements of one of the methods are without random errors, i.e., in the most familiar setting, X is measured without error and Y is a linear function of the X plus some random measurement error, which is conventionally assumed and modeled by a normal distribution [4].

However, in this same pharmaceutical company, two variables X and Y are to be fitted to a straight line to the data and the two variables have errors in them. We would be willing to have more clarification on the dataset, how it was collected and why there are errors in both X and Y. The scientist would be surprised with the kind of questions these statisticians would ask him because to him as a scientist, errors in both variables may seem to be quite trivial since all he may need is to see the straight line plotted to the data.

Suppose, for example, the company has got in their possession a new method Y that said to give a better reading of the CD4+ count in the blood of the HIV patients. The new method as well as the old method X would still be used to measure the CD4+ on the patients. If both devices have some measurement errors in them, fitting a straight line through the data points will not be the same as that one with only one error in the variables. In such a situation, the least-squares method cannot be applicable as such another method should be applied, which can take into account the measurement errors in both variables like the Deming regression also known as the traditional Deming regression (assumes that the measurements by both methods have random errors), the weighted Deming regression (takes into account non-constant analytical standard deviation for both methods, i.e., both methods subject to proportional measurement errors), and the regression procedure based on the rank principle [5].

An extra complication occurs when the measurement methods are subject to a limit of quantitation (LoQ) below which the true substance cannot be accurately measured.According to Croghan and Egeghy [6], the LoQ is defined: “the term often used by laboratories to indicate the smallest amount that they consider to be reliably not quantifiable.” This calls for an adaptation of Deming regression to take into account the censored data rather than deleting them.

1.1. Study Objectives

The main aim of this project is to assess the impact of censored data on the traditional Deming regression based on the deletion of censored data compared to an adapted version that takes into account the censored nature of data.

2. Methodology

2.1. Sources of Error

There are many sources by which errors occur. In their book, Good and Hardin [7] stated some sources of errors in statistical procedures including the following:(1)Using the same dataset in the formulation of hypothesis(2)Taking samples from a wrong population(3)Failing to draw representative samples randomly(4)Measuring the variables or failing to measure what is to be measured(5)The use of wrong and inefficient statistical models(6)Failing to validate models used

In view of the underlined sources of error, Good and Hardin [7] reported that there exist three fundamental concepts to the design experiments and surveys, namely, variation, population, and samples. There is variability virtually in all observations; therefore, in designing an experiment or survey, we must always anticipate the possibilities of errors arising from the measuring instrument and/or from the observer. According to them, a sample is any proper subset of a population. Proper means the sample must be representative of the population and must be drawn at random as well as being reasonably large.

2.2. Measurement Error

In the traditional regression analysis, it is assumed that there is measurement error in the dependent variable and this dependent variable is known to relate to through a linear regression model,where depicting the variability of the dependent variable around the regression line. However, if there is measurement error in and , then can be replaced by with the assumption and can be replaced by assuming that . The linear regression model relating to the response and the predictor would be assumed aswith . The slope of the above model does not represent the true relationship between the response and predictor variables, which we would like to know. In view of that, Carroll et al. [8] reported that in order to see the effect of the error on the estimated regression coefficients, the maximum likelihood estimate of a bivariate normal model with responses and is used.

2.3. Censoring

The assay limit of being quantified is a common practice when a value in an observation cannot be analysed reliably because the value is seen to be below a certain threshold. According [4], “the limit of quantitation (LoQ) is a value below which it is felt that the analyte cannot be measured reliably.” Censored observations are observations that are below certain level of concentrations of chemicals with values known only to be somewhere between zero and the laboratory detection limits. The measurements are seen to be inaccurate if they reported as single number; therefore, it is usually reported as being less than an analytical threshold. The word censored observation is used by statisticians when the said observations are not quantified but are known to exceed or be less than a threshold value. In brief, observations or values that are known to be below a threshold are left censored data and values that are seen only to be exceeding a threshold are right censored, while values that are only known to be within an interval are called interval censored. The worst practice in dealing with censored observation is deleting them [9].

As a common practice, these values below the threshold are left out of the analysis, but if they are left in, they are marked unreliable. However, for Method Comparison studies, there is no reason to select the samples whose true values are above the threshold of the measuring instrument. The instrument may select the limit of quantification in one or both observations. If the unquantifiable data are deleted, there will be loss of information.

2.4. Statistical Software

The SAS version 9.4 software was used to fit the various models, and statistical significance was taken at a 5% level.

3. Data Simulation Procedure

3.1. Deming Regression

In the traditional Deming regression, censored data are deleted in the analysis. However, as an extension, we would like to the compare the traditional Deming regression method with an adapted version that takes into account the censored nature of data. For the purpose of the study, a linear regression model using the Deming regression was used, the response variable is a continuous variable, and the predictor variable is also continuous. The Deming regression is a method for fitting a straight line to bivariate data where the two variables are measured with error. It is different from the classical linear regression where only the response variable has error in it. The Deming regression uses paired measurements, , measured with errors, and , where

Figure 1 depicts a graphical representation of the measurement error problem with continuous data in variable . It is assumed that a variable is measured with error as represented in the above model with being the true value for the variable of interest at the ith subject and is the error made while recording the variable [1],

Figure 2 depicts a graphical representation of the measurement error problem with continuous data in variable y. It is assumed that a variable is measured with error as represented in the above model with being the true value for the variable of interest at the ith subject and is the error made while recording the variable [1]. To estimate the intercept, , and the slope, , in the equationwhere and are the estimates that are used as estimates of and are used for the estimation of alpha and Beta.

3.2. Assumptions of Deming Regression

The Deming regression requires the following assumptions:(1)The measurement errors, and , are independent and normally distributed with expected values of zero and variances and , respectively, which are constant or at least proportional(2)The measurement error variance ratio or analytical error ratio is constant and assumed to be known(3)The subjects are independent of one another and are selected at random from a larger population

3.3. Simulation

The data used were simulated to follow a normal distribution and were then analysed with SAS procedure NLMIXED as the tool for the analysis. Two variables are of prime importance in the study. Random values were generated by the software with X and Y being the variables of interest and follow normal distribution because the measurement error distributions are supposed to be zero [10] in the form specified below, whereandwhere is the true value which both X and Y aim to measure, but it is not the parameter of interest; therefore, it is known as the nuisance parameter. The parameter is the ratio of the standard deviation of X to Y, but it is assumed to be known, and it also can not be determined within the method comparison approach. From the distribution of X and Y, the measurement error of X is fixed and it is known while the measurement error in Y is multiple. It is then assumed that , i.e., we assume a constant variance [4],and

The primary parameters of interests are and of the regression. is a variance function, which may need additional parameters. However, it is also assumed that is constant/same for the two variables of interest X and Y.

Due to the limit of quantitation, and/or is left censored (below the limit we cannot measure).

In the coding of the SAS procedure for the simulation, the following notations were used: Let be the limit of quantitation for an X measurement and be the limit of quantitation for a Y measurement, where in this report, when X and Y methods have a threshold say and , respectively, a clear-cut X and Y value above the limit is reported for analysis,

= 1 means the value is above the threshold, therefore uncensored; = 0 means the value is below the threshold, therefore censored.(i)If , then we observe (noncensored)(ii)If , then we observe (censored), so we observe X; hence, the censoring indicator delta is 0(iii)If , then we observe (non-censored)(iv)If , then we observe (censored); therefore, the censoring indicator delta is 0

X and Y can take continuous positive and negative values. For this reason, values of X and Y are randomly selected from the normal distribution. The probability density is used for the observed data since the probability density is the probability of having a known value (i.e., or ). The cumulative density, however, is used for the censored because left censored observations are observations below the LoQ (i.e., or ).

For the loglikelihood contribution, we have the following.

When is observed,

3.4. Model Performance Measures

After the simulation plan was done and the model was fitted for every setting, a performance measure was done to check if the results obtained are reliable to make statistical and biological inferences from them. According to Linnet [10], a number of performance measures have been spelt out some of which are the bias of the slope estimate, root mean squared error of the slope estimate, and hypothesis testing. In a broader sense, what these performance measures do are that they check how the models are faring or performing.(i)Bias of the slope estimate. This is the difference between the true value and the mean of the estimated slope values for the various settings of the simulation runs that were made,(i)Root mean squared error of the slope estimate (RMSE). This is also the estimate of the overall error of the estimate of slope including the systematic part (bias) and the standard error. This is the standard deviation of the dispersion of around and its bias [5],(iii)The hypothesis is tested by comparing the observed and expected frequencies of the rejection of the null hypothesis on the basis of the t-test for the slope carried out in each simulation run. In their paper, Deal et al. [11] can also be used to check the performance measure of the results.

3.5. Nonlinear Mixed Model

Stöckl et al. [12] recommended in their study that a nonlinear regression can be used for a bivariate response. The Deming regression was done using the NLMIXED in the SAS software as a tool for the analysis. This procedure fits models by numerically maximising to a marginal likelihood where this likelihood is integrated over the random effects using the adaptive Gaussian quadrature as its integral approximation that uses the empirical Bayes estimates of the random effects as its focal point after which updates are done for every iteration [13]. Because this method is very efficient, one can typically obtain very satisfactory results. If convergence is achieved successfully, the optimisation problem results in maximum likelihood and not restricted maximum likelihood (REML).

4. Results and Discussions

4.1. Description of Simulation Settings

Different settings were assumed to produce the results for this report. This was done by the alteration of the various parameters in the model. The mean of the true value is put at 30 with a standard deviation of 3, and the standard deviations of Y and X are also put at 2 and 4, respectively. In addition, 100 simulated datasets with each 300 observations were used in the simulation studies in the report. Eight different settings were done. In the first setting, the LoQ values for X and Y were 29 and 35 with all other values; in the second setting, the LoQ values for X and Y are 25 and 20; in the third setting, the LoQ was only for X at 25; and the LoQ for setting 4 was put at 20. The various settings were misspecified in the analytical error ratio was also put at 2.

4.2. Results and Performance Measures

4.2.1. Simulation Run for Setting 1

Table 1 is built with an intercept and a slope values of 10 and 1.5, respectively, where the limits of quantitation for X and Y are 25 and 50. It is observed that in the traditional Deming regression, which deletes all censored observations, the parameter estimate for intercept is , deviating from the true value of the intercept with a relatively high standard deviation of 11.51 with an astronomical biasness indicating the deviations of estimates from the true parameter. The coverage of the intercept is 79%, indicating that the true value lies in 79 out of 100 datasets. The mean square error of the intercept is 341.73. The slope is 1.931 with a standard deviation of 0.37. The estimate of the slope indicates that there is a considerable amount of deviation from the true value of the slope. The biasness and mean square error are 0.186 and 0.3186. The true value of the slope only lies in 84 out of 100 datasets representing 84% coverage,

It is observed on the other part of Table 1 where another Deming regression is conducted by taking into account censored observation in their analysis, the intercept has an estimate of 9.529 and slope 1.5151. The biasness of the intercept 0.2218 and for the slope. The biasness compared that of the traditional Deming regression where censored observations are deleted from the analysis is smaller. The mean square errors for both and are 24.841 and 0.0275. The true values of and are found in 92% of the total datasets,

4.2.2. Simulation Run for Setting 2

Table 2 is built with an intercept of 0 and a slope of 1 with X having a censoring value of 25 and that of Y being 20. Table 2shows parameter estimates for censored and traditional Deming regression for setting 2 where the parameters of interest and have estimates of -19.141 and 1.58, respectively, for the traditional Deming regression where all censored observations are deleted. For the traditional Deming regression where all censored observations are deleted. In this setting, the limit of quantitation in X and Y is 25 and 20, respectively. In this setting, the coverage of the intercept is 6%, depicting that the true value of the intercept and slope lies in 6 out of 100 datasets simulated based on a null hypothesis that 0 should lie in the confidence interval for the intercept, while the coverage of the slope is 14%, indicating that the true value lies in 14 out of 100 datasets based on a hypothesis that 1 must lie in the confidence interval for the slope. The biasness of the traditional Deming regression for the setting is 366.378 and 0.337 for the intercept and slope. The mean square error for intercept and slope is 431.952 and 0.405. The standard errors for the intercept and slope are 8.139 and 0.263,

In the setting of the Deming regression which takes censoring into account, the results were different from the traditional Deming regression. The parameter estimates are and 1.501 for intercept and slope, respectively. The biasness are 1.065 and 0.251 for the intercept and slope, respectively. The mean square errors of the intercept and slope are 10.306 and 1.197, respectively. The standard errors for the intercept and slope are 3.226 and 0.978, respectively. The coverage for all 100 datasets for the intercept and slope is 80% and 70%, depicting how much the true values of the intercept and slope lie in the datasets. The coverage in the traditional Deming regression compared with the one taking into account censoring is very small,

The two approaches seem to give seemingly similar results in terms of the slopes that are close to the true values. The coverage in the traditional Deming regression is very small, showing that when censored observations are deleted, they have an impact on the capture or coverage of the data. However, in the traditional Deming regression, the biasness is huge than the approach with censoring depicting how much deviations we have in the traditional Deming regression. The parameter estimates for the traditional are very huge, which are different from the true values put in the simulation runs.

4.2.3. Simulation Run for Setting 3

Table 3 is also built with the same assumption and parameterisation as in Table 2; however, in this setting, censoring is done only on X with a value of 25 with Y not being censored. Table 3 shows the parameter estimates and their performance measures. The traditional Deming regression for the setting has intercept and slope of and 1.588, respectively. The biasness in it are 375.945 and 0.346 for the intercept and slope, respectively. This shows how the estimated mean for the parameters are deviating from the true value. High biasness indicates that the regression does not cover the data well; hence, we observe small coverages for and ,

On the other hand too, for the Deming regression that is adapted to take censoring into account gives estimates of and 1.359 for the intercept and slope , respectively. These estimates are close to the true values in the simulation run; therefore, the biasness and mean square error confirm how close they are with small values. The coverage of this regression was much better as the coverage in is 80% and is 76%,

From the results of regression methods, it is observed that the Deming regression adapted to take censoring into account does better than the traditional Deming regression, which deletes censored observations. This is seen from the parameter estimates that are close to the true values in the Deming regression that takes censoring into account. The standard deviations are also smaller compared with that of the traditional Deming regression. The biasness and mean square error are as well small depicting the small deviations in the regression method that takes into account censoring.

4.2.4. Simulation Run for Setting 4

Table 4 is built with the same setting in Table 2; however, in this setting, Y is the only variable censored at 20. The table shows the parameter estimates and performance measures for the censored and traditional Deming regression for setting 4 of the simulation run. For the traditional Deming regression, it is observed that the intercept and slope have estimates of 0.010 and 1.023. These estimates are very close to the true values put in the simulation run for this setting. The standard deviations are 3.208 and 0.251 for the and , respectively, with biasness of and . This small biasness is a result of the mean of the mean parameter estimates being close to the true values. The mean square errors for and are 10.19 and 0.063. The coverages of the setting are 94% and 93% for both parameters. This shows that the true value for is found in the 94% of the datasets, while for , the true value lies in 93% of the datasets,

The adapted Deming regression that takes censoring into account also gives estimates and performance measures of and . The mean estimates for and are and 1.033. These estimates we observe seem to be close to the true values in the simulation run. The biasness in the two parameters is 0.0973 and . The biasness depicts the deviations from the true value, and it is observed that has a small deviation. The standard deviations reported were 3.238 for the intercept and 0.25 for the slope . The mean square error for the intercept is 10.482, and the mean square error for the slope is 0.063. The coverage of the confidence interval of the various datasets in which the true value of the intercept and slope lies in 95% and 94%, respectively. On the other hand, when censoring is considered for the approach, and give estimates of 7.87 and 1.29, respectively,

The intercept of the traditional Deming regression and the adapted Deming regression taking censoring into account shows a small deviation from the true intercept of 0 with the slope also showing a little or no deviation from the true slope of 1. However, both regression methods give consistent standard errors. The biasness of the regression methods is small with both parameters in the traditional Deming regression than in the adapted Deming regression taking censoring into account. The mean square error for both regression methods seems to be consistent. The coverages of both methods are high but seem to be a little better in the Deming regression adapted for censoring since both parameters give higher capture.

As an extension of the assumptions made to both methods, we wanted to check how the methods would do if their analytical error ratio that is assumed to be constant was misspecified. In view of that, the value that was used in misspecifying was 2.

4.2.5. Simulation Run for Setting 5

From Table 5, it is observed that misspecification of the analytical error ratio has a massive impact on the estimates and the other performance measures. The mean estimates for the parameters are very huge in both the traditional Deming regression and the Deming regression that is adapted to take censoring into account. On comparing these estimates in Table 5 with the estimates in Table 1, it is observed that when the analytical error ratio is misspecified, the estimates are blown up, therefore reducing the coverage drastically.

4.2.6. Simulation Run for Setting 6

In Table 6, the situation was different from the findings of Table 5. However, the estimates in both approaches of the misspecified analytical error ratio in Table 6 seem to give similar estimates except for the coverage that the one in the traditional Deming regression has a better coverage (71% in and 60% in ), while the adapted Deming regression where censoring is taken into account gives a poor coverage (30% and 18% for and , respectively). On comparing the estimates in Table 2 with that of Table 6, it is observed that the estimates of the correctly specified analytical error ratio are highly preferred to that of the misspecified since the estimates are close to the true value put in the simulation run.

4.2.7. Simulation Run for Setting 7

Estimates in Table 7 indicate the misspecification of analytical error ratio for setting 3. Setting 3 is the setting where censoring was observed only in X. These estimates in Table 7 also seem to be similar in both approaches in the misspecified setting. However, in both approaches, it is observed that they have poor coverages. Comparing estimates in Table 7 with that of Table 3, different estimates are observed. We observe seemingly small estimates in the misspecified setting; however, we can also observe small coverages, while we seem to have to high estimates for the correctly specified analytical error in setting 3 where there is a higher coverage.

4.2.8. Simulation Run for Setting 8

Table 8 shows the estimates of the misspecified analytical error ratio for setting 4. In this setting, it is observed that the estimates are seemingly similar except for the coverage of the traditional Deming regression, which is higher than the adapted Deming regression that takes censoring into account. However, on comparing the correctly specified and misspecified analytical ratio, the estimates of the correctly specified analytical ratio are close to the true value with a huge coverage. Misspecification has an impact on the estimates.

5. Conclusion

The report sought to check the impact of doubly censored observations in method comparison studies despite the usual practice of deleting or ignoring values below the limit of quantitation. It can be seen that the main objective of the thesis was met as the coverages of the adapted Deming regression that takes censoring into account were better and yielding small mean square Error. This is an indication that when censored data are taken into account the analysis they have impact.

Misspecification of the analytical error ratio had a considerable biasness in the intercept and slope for both traditional Deming regression that does not take censoring into account and the adapted Deming regression that takes censoring into account.

Data Availability

No real data were used since we only used simulated data. All parameter values are duly cited and referenced.

Conflicts of Interest

The authors declare that they have no conflicts of interest regarding the content of this article.

References

E. Lesaffre and A. B. Lawson, Bayesian biostatistics, John Wiley & Sons, 2012.
C. Benson, “Guidance for industry and FDA staff: recommendations for clinical laboratory improvement amendments of 1988 (CLIA) waiver applications for manufacturers of in vitro diagnostic devices,” 2008, http://www.fda.gov.
View at: Google Scholar
M. H. Kutner, C. Nachtsheim, J. Neter, and W. Li, Applied linear statistical models, McGraw-Hill Irwin, 2005.
D. M. Hawkins and C. Weckwerth, “Errors in variables regression with value-censored data,” Journal of Chemometrics, vol. 30, no. 6, pp. 332–335, 2016.
View at: Google Scholar
K. Linnet, “Evaluation of regression procedures for methods comparison studies,” Clinical Chemistry, vol. 39, pp. 424–432, 1993, 3.
View at: Google Scholar
C. Croghan and P. P. Egeghy, “Methods of dealing with values below the limit of detection using SAS,” Southern SAS User Group, vol. 22, pp. 22–24, 2003.
View at: Google Scholar
P. I. Good and J. W. Hardin, Common Errors in Statistics and How to Avoid Them, John Wiley & Sons, Hoboken, NJ, USA, 2012.
R. J. Carroll, D. Ruppert, L. A. Stefanski, and C. M. Crainiceanu, Measurement error in nonlinear models: a modern perspective, CRC press, 2006.
D. R. Helsel, Statistics for Censored Environmental Data Using Minitab and R, John Wiley & Sons, vol. 77, 2011.
K. Linnet, “Performance of Deming regression analysis in case of misspecified analytical error ratio in method comparison studies,” Clinical Chemistry, vol. 44, pp. 1024–1031, 1998, 5.
View at: Google Scholar
A. M. Deal, V. W. Pate, and S. El Rouby, “A SAS® macro for deming regression,” SouthEast SAS Users Group, vol. 17, pp. 1–4, 2009.
View at: Google Scholar
D. Stöckl, K. Dewitte, and L. M. Thienpont, “Validity of linear regression in method comparison studies: is it limited by the statistical model or the quality of the analytical input data?” Clinical Chemistry, vol. 44, pp. 2340–2346, 1998, 11.
View at: Google Scholar
J. C. Pinheiro and D. M. Bates, “Approximations to the log-likelihood function in the nonlinear mixed-effects model,” Journal of Computational & Graphical Statistics, vol. 4, pp. 12–35, 1995, 1.
View at: Google Scholar

Copyright

Copyright © 2022 Samuel Akwasi Adarkwa et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

239

Downloads

504

Citations