Research Article | Open Access

Volume 2021 |Article ID 6667671 | https://doi.org/10.1155/2021/6667671

Ezekiel N. N. Nortey, Reuben Pometsey, Louis Asiedu, Samuel Iddi, Felix O. Mettle, "Anomaly Detection in Health Insurance Claims Using Bayesian Quantile Regression", International Journal of Mathematics and Mathematical Sciences, vol. 2021, Article ID 6667671, 11 pages, 2021. https://doi.org/10.1155/2021/6667671

# Anomaly Detection in Health Insurance Claims Using Bayesian Quantile Regression

Academic Editor: Sergejs Solovjovs
Received07 Oct 2020
Revised08 Jan 2021
Accepted12 Feb 2021
Published24 Feb 2021

#### Abstract

Research has shown that current health expenditure in most countries, especially in sub-Saharan Africa, is inadequate and unsustainable. Yet, fraud, abuse, and waste in health insurance claims by service providers and subscribers threaten the delivery of quality healthcare. It is therefore imperative to analyze health insurance claim data to identify potentially suspicious claims. Typically, anomaly detection can be posited as a classification problem that requires the use of statistical methods such as mixture models and machine learning approaches to classify data points as either normal or anomalous. Additionally, health insurance claim data are mostly associated with problems of sparsity, heteroscedasticity, multicollinearity, and the presence of missing values. The analyses of such data are best addressed by adopting more robust statistical techniques. In this paper, we utilized the Bayesian quantile regression model to establish the relations between claim outcome of interest and subject-level features and further classify claims as either normal or anomalous. An estimated model component is assumed to inherently capture the behaviors of the response variable. A Bayesian mixture model, assuming a normal mixture of two components, is used to label claims as either normal or anomalous. The model was applied to health insurance data captured on 115 people suffering from various cardiovascular diseases across different states in the USA. Results show that 25 out of 115 claims (21.7%) were potentially suspicious. The overall accuracy of the fitted model was assessed to be 92%. Through the methodological approach and empirical application, we demonstrated that the Bayesian quantile regression is a viable model for anomaly detection.

#### 1. Introduction

Providing quality and accessible healthcare is a major political decision that governments make all over the world. A major challenge in achieving this is funding. According to Novignon et al. , majority of countries in developing regions, especially sub-Saharan Africa (SSA), rely on donor grants and loans to finance healthcare. Such expenditures are not only unsustainable but also inadequate considering the enormous healthcare burden in the region. Apart from direct government investment into health services, health insurance schemes have been instituted to reduce the proportion of health service cost borne by the populace in many countries. However, fraud, abuse, and waste in health insurance claims by service providers and subscribers threaten the delivery of quality healthcare. The National Healthcare Antifraud Association (NHCAA) of the United States defines healthcare fraud as an intentional deception or misrepresentation made by a person or an entity, with the knowledge that the deception could result in some kinds of unauthorized benefits to that person or entity [2, 3]. This “intentional deception or misrepresentation” can produce observations in the data that do not conform to normal observed patterns. These can therefore be regarded as anomalies or outliers. There are several statistical methods that are employed in the identification of fraud, especially in health insurance claims. Wilson  applied the logistic regression model to detect auto insurance fraud. This method is only possible if there is information on both legitimate and fraudulent claims. Moreover, the labeling process may be fraught with error such that any model that is built on this may not automatically capture these errors. Ekina et al.  used Bayesian co-clustering model to detect health insurance fraud. This method has been used for the analysis of dyadic data, and since “conspiracy fraud” involves a connection between two entities, the Bayesian co-clustering is an appropriate method. Liu and Vasarhelyi  used a clustering method incorporating geo-locating information to detect fraud in Medicare/Medicaid data in the US.

Ekin et al.  provided a comprehensive assessment of medical fraud research. They mentioned that health fraud research is an emerging field which has gained some attention from researchers but acknowledged that more research needs to be conducted especially using Bayesian techniques. Bayesian methods have generally been proven to be an effective approach of identifying fraud. This is because Bayesian methods do not make strong statistical assumption as the frequentist counterparts.

Below are some reasons why Bayesian techniques would be appropriate for health fraud study .(i)Medical data are sparse because fraudulent cases are rare, as compared to legitimate cases.(ii)Data are also dynamic. Both fraudulent and legitimate patterns evolve over time.(iii)Fraudulent claims are not homogeneous.(iv)Missing values are present, and these can lead to over- or undersampling, nonrepresentativeness, and potential bias in inference.(v)Medical data are skewed and non-normal.

To address these challenges, this study proposes the use of Bayesian quantile regression techniques and Bayesian mixture model to identify fraud in health insurance data.

This study seeks to specify the posterior distribution of the parameters of interest given the data; identify the distribution of anomalous claims; and estimate the probability of each claim to identify potential anomalous claims.

#### 2. Materials and Methods

##### 2.1. Bayesian Method

From the Bayesian perspective, parameter vector of a given model is assumed to have a probability distribution referred to as the prior probability distribution. This prior probability distribution is informed by the prior knowledge about the parameter. The data from which the parameter is to be estimated are generated by a probability process, and this generating process is referred to as the likelihood. Of interest is the posterior probability which is the probability of the unknown parameter given the data. Given as a vector of observations, independently and identically distributed, and a vector of unknown parameters, then the posterior probability distribution of the parameters is represented aswhere is the prior probability of the parameter and is the posterior probability of given the observed data .

Generally, the prior may be chosen based on some information available and hence called informative prior. It may be chosen based on little or no information available. Such priors are called noninformative prior. The uniform distribution or Jeffrey’s prior is mostly used when there is no prior information about the unknown parameter. A prior may also either be a proper prior or an improper prior. A chosen proper prior results in a proper posterior probability. An improper prior is a distribution that does not integrate to one. However, if an improper prior is chosen properly, the resulting posterior distribution may be a proper posterior distribution.

There are other ways of choosing the prior. The prior can be chosen such that the resultant posterior distribution has the same form as the prior probability. Such priors are called conjugate.

##### 2.2. Quantile Regression

This study used the quantile regression to capture the relationship between the variables and to estimate parameters of the model. This is because of its ability to capture the relationship across the entire data without imposing strong parametric assumptions about the response variable.

It is therefore an adequate technique which captures all dynamics in the data and addresses challenges associated with health insurance claims data.

Furthermore, given a set of response variables and explanatory variables , the conditional cumulative distribution function isand for any , , the th conditional quantile function is

The quantile function is monotonic increasing such that for any two quantiles and where , .

##### 2.3. Description of the Method

To specify the density distribution that describes the anomalous claims, a posterior distribution of the parameters of interest (, , and ) is first constructed using the quantile regression model:where mi and ui has a standard normal distribution. The parameters to be estimated are the regression coefficients , a scale parameter , and the mixture component .

From the Bayesian perspective, given that are independently and identically random variables from which the parameters , , and are to be estimated, the posterior distribution is constructed to be proportional to some function expressed as

Nowby the factorization theorem. To estimate say , and are treated as constants and their functions are included as a normalizing constant.

Thus,so that

We then use the Gibbs sampler to sample from the conditional distribution . This is the same as sampling from the posterior distribution . Samples are also drawn from the conditional distributions and to estimate and , respectively. It is assumed that the estimates for form a mixture distribution of unknown densities.

##### 2.4. The Model

Suppose , are the set of independent variables identically and independently distributed, are the regressors or covariates, and are the regression coefficients.

The model for the Bayesian quantile regression as proposed by Kozumi and Kobayashi  iswhere is a scale parameter, are the error terms, , and . has a standard exponential distribution, and has the standard normal distribution. We replace with and equation (9) becomeswhich is the same as equation (4).

Given a random error variable with scale parameter , asymmetry parameter , and location parameter , is said to have the asymmetry Laplace distribution (assumed by Yu and Moyeed ) given by the densitywhere is the loss or check function, is on the interval [0, 1], and the quantile of is zero. The loss function assigns of the errors to positive values and of the errors to negative values. Now if we let , then equation (11) becomes

From equation (12), is the location parameter. can only be zero through when . The term , however, is an exponential random variable and cannot be zero. The presence of anomaly in can be demonstrated through by assuming that the predicted values of have a mixture distribution.

It is somewhat intractable to derive the full conditional posterior distributions for the parameters from equation (12). Markov Chain Monte Carlo (MCMC) algorithms such as Gibbs sampler  and Metropolis–Hastings algorithm  enable one to estimate the parameters from the posterior distribution.

MCMC algorithms generate random numbers that will have stationary transition probabilities, hence an equilibrium distribution. In the Bayesian perspective, this equilibrium distribution, given some initial conditions, mimics the posterior distribution. Having done this, we can then compute statistics such as mean and variance from this distribution as estimates of the posterior distribution.

Assuming has an exponential normal mixture distribution, it can be shown that will be normally distributed with mean, , variance, , and a normal likelihood density of the form:

The variables , and are estimated by assuming the following priors:

By specifying the above conjugate priors for the parameters, their full marginal posterior distributions can be derived and we can subsequently sample from them.

From the likelihood function in equation (13) and the above prior distributions, the posterior distribution is given aswhich becomes

The marginal conditional posterior distributions of the parameters are derived from equation (16) as follows:

The full conditional posterior distribution of is a normal distribution given aswhere is a constant that does not depend on .

Note that a function with domain on the whole real line is an unnormalized probability density function (pdf) if and only if . In this case, the unnormalized pdf of the normal distribution with mean () and variance () is

Comparing equation (17) to yields

If we let and , then the distribution of is normal with parameters and where

It can also be derived from equation (16) that the full conditional distribution of is

The full conditional posterior distribution of the is the generalized inverse Gaussian distribution with parameters , , and given aswhere and .

Again from equation (16), the full conditional distribution of follows the inverse gamma distribution given as

This is an inverse gamma distribution with shape parameter and scale parameter .

##### 2.5. Estimating the Mixture Components of

We used the R package for the Bayesian mixture model, “BayesMix,” to estimate the mixture components of . BayesMix provides Gibbs sampling of the posterior distribution, a method to set up the model and specify the priors and initial values required for the Gibbs sampler.

To run the Gibbs sampler, we specify the prior distributions of the parameters and initial values for the hyperparameters in the proposed models. The regression coefficients are normally distributed with mean and variance . We set initial values for these parameters as 0 and 10 (with precision as 1/10), respectively.

are exponentially distributed with rate parameter . The prior distribution of the scale parameter is an inverse gamma distribution with hyperparameters and . The quantity is the prior effective sample size for , and is prior point estimate for . For our analysis, we set to which is the rate of the exponential variable . The initial values for are generated randomly as uniform values with interval 0 and 1 and 0 and 2. This choice is subjective and arbitrary but with the hope that the data will be informative enough to produce a stationary posterior distribution.

##### 2.6. Model Evaluation

The proposed model is evaluated by replicating new values of , and determining how close these values are to the original ones. The approach is called posterior consistency and involves using a posterior predictive distribution.

The posterior predictive distribution is given bywhere is our parameter space and is the set of parameters we estimated from our model. We sample from this distribution as follows:(1)Sample from posterior distribution .(2)Substitute the values of above in and draw .

To compare the predictive posterior distribution to our model, we can compute some statistics such as the mean, standard deviation, and order statistics and compare the values. Graphical approaches may also be used. The histogram of gives the distribution of the predictive posterior distribution, and this can be compared to the model distribution of .

###### 2.6.1. Computing the Probability Scores

The densities of the component mixtures are estimated using the Bayesian method. Now, let and correspond to the density distribution with normal claims and density distribution with anomalous claims (which will have the higher mean), respectively.

From the above preamble, we compute the probability of each of the claim belonging to anomalous claim class, . These probabilities are ranked for all observations, and those with values greater than 0.5 are classified into the anomalous claim class.

We treat as a mixture of two normal densities and estimate the proportion and density of each component. For and (anomalous claims) while the smaller one will be associated with (anomalous claims) while the smaller one will be associated with (normal claims). , which describes the outlier distribution, can be computed as

###### 2.6.2. Sensitivity, Specificity, Precision, and Accuracy

The other metrics of performance evaluation can be calculated from Table 1.

 Classification Actual Positive Negative Total Positive TP FP Negative FN TN Total
##### 2.7. Data Acquisition

Secondary data on health insurance claims by 115 people suffering from various cardiovascular diseases located in different states in the USA were used to run the proposed model. The response variable is the claim amount while the covariates/independent variables are location (state), diagnosis, gender, type of care, pharmacy, and type of health facility. All the covariates are categorical variables. Table 2 shows the claims data but with the nominal values for the independent variables.

 Claim amount Location Diagnosis Gender Type of care Pharmacy Type of health facility 7257 1 1 2 1 1 1 5835 2 1 1 1 2 1 7023 2 1 1 1 3 2 9284 2 1 1 1 1 2 8148 2 1 1 1 3 2 9024 3 1 1 1 4 3 ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ 8256 3 1 1 1 3 1 8941 2 1 1 1 3 2 8385 3 1 1 1 4 2 11731.2 3 2 1 2 5 4
Location: 1, California; 2, Georgia; 3, Alabama; 4, Virginia. Diagnosis: 1, acute myocardial infarction; 2, heart failure. Gender: 1, female; 2, male. Type of care: 1, outpatient; 2, inpatient. Pharmacy: 1, heparin; 2, nitroglycerin; 3, eptifibatide; 4, clopidogrel; 5, furosemide; 6, metoprolol; 7, bisoprolol; 8, hydrochlorothiazide. Type of health facility: 1, ambulatory service; 2, outpatient clinic; 3, acute care; 4, hospital (source: https://data.cms.gov).

#### 3. Results and Discussion

We run the Gibbs sampler at quantile values, , to observe the trend of the changes in (regression coefficients) and (scale parameter). Table 3 shows the estimates of the regression coefficients at the specified quantiles ().

 Parameter 8.832456 8.81072 8.586895 6.582426 0.044465 0.056238 0.010074 0.09637 0.416721 0.273557 −0.03593 0.23371 0.094641 0.127956 −0.00163 0.229348 0.030974 0.217873 0.346222 0.393446 −0.03138 −0.03801 −0.0054 −0.06249 −0.00134 0.000169 −0.00819 0.009342 8.980103 3.062348 3.313797 0.803441

From Table 3, the quantile value is preferred to estimate the anomalous claims in the data. The values of the estimated parameters at help satisfy the assumption that an anomalous/fraudulent health insurance claim is usually located at the right of a legitimate claim. This suggests that we can estimate the parameters of the proposed model even at . However, the potential reduction factor at and is 1.01 and 1.08, respectively. This makes the preferred quantile value.

The mean, standard deviation, and standard error of the mean for each parameter estimated at are given in Table 4.

 Parameter Mean SD Naive SE Time-series SE  8.8107204 0.24925 0.000557 0.005705  0.0562377 0.06056 0.000135 0.000989  0.2735569 0.78952 0.001765 0.084209  0.1279555 0.10345 0.000231 0.001989  0.2178729 0.78366 0.001752 0.075442  −0.0380131 0.04904 0.00011 0.000912  0.0001692 0.08409 0.000188 0.001818  3.5536795 1.6551 0.003701 0.008894  3.8391055 1.75822 0.003932 0.008311  3.2403607 1.48188 0.003314 0.005923  2.8009853 1.29663 0.002899 0.005247  2.9037541 1.33627 0.002988 0.005209 ⋮ ⋮ ⋮ ⋮ ⋮ 3.062346 32698 0.00073 0.00568

We provide the residual and normal Q-Q plots of the model in Figure 1 and also the trace plot and density of and in Figure 2.

The results from Figure 2 (diagram of densities) satisfy the assumption of inverse gamma and normal prior distributions made for and , respectively.

Our objective is to identify potential fraudulent claims through the estimated values of the unobserved variable . Figure 3 shows the histogram and density of at .

##### 3.1. Identifying Anomalous Claims

The “BayesMix package” in R was used to set up an MCMC sampler to identify the two components of the mixture of . The mean (), sigma (), and proportion () of each component were estimated. Note that the probability of a claim being anomalous is and the probability of a claim being normal is . The results are shown in Table 5.

 Parameter Mean SD 2.5% 97.5% 10.48 0.6272 9.138 11.08 14.17 2.0379 10.964 18.73 0.8349 0.25 0.029028 0.9926 0.1651 0.25 0.007377 0.9710 3.4257 0.8005 2.2058 4.8034

The trace plots and densities of the mixture parameters and are shown in Figures 4 and 5, respectively.

The probability of each of the claims belonging to anomalous class, , is through , so we use for density of and . Moreover, the density of is and . Assuming the mixture components are normally distributed, we compute using equation (25) and provide ranked results with claims in Table 6. It is evident from Table 6 that each of the first 10 claims with claim amounts has greater than 0.5; hence, they would each be classified into the anomalous class.

 Claim amount  9440 2.67E − 15 0.000233 1  9638 3.17E − 11 0.00063 1  9316 1.24E − 10 0.000735 0.999999  8941 2.95E − 09 0.001063 0.999982  9117 3.30E − 09 0.001078 0.999981  9241 3.65E − 09 0.001091 0.999979  9024 1.55E − 08 0.001301 0.999924  9937 4.02E − 08 0.001465 0.999826  9284 8.22E − 08 0.001604 0.999675  9855 2.92E − 07 0.00189 0.999019  6661 0.407214 0.030185 0.011533  6624 0.935536 0.066375 0.011045  12272 0.466905 0.032471 0.010828  7356 0.523207 0.034726 0.010339  9562.8 0.885195 0.05808 0.010222  6115 0.881762 0.057646 0.010186  7939 0.878415 0.057234 0.010152  7257 0.599579 0.038033 0.009886  6522 0.62609 0.039274 0.009777  9712.3 0.811345 0.050588 0.009719  7471 0.651635 0.040528 0.009695  10472.8 0.80352 0.049954 0.009691  7206 0.675306 0.04175 0.009638  11065.6 0.781346 0.04827 0.009631  6831 0.687836 0.042424 0.009615

The result of our analysis using classification metrics is summarised in Table 7.

 Metrics Value (%) Sensitivity 100 Specificity 90 Accuracy 92 Precision 74

#### 4. Conclusions and Recommendations

The model identified 25 claims (about 22%) estimated at quantile level as belonging to the anomalous class and can thus be assumed to be fraudulent claims. The first 10 of these are shown in Table 6.

The model correctly identified all anomalous claims (sensitivity of ) and gave a high specificity value of 90%. The overall accuracy of the model was 92% with a precision of . This study therefore recommends Bayesian quantile regression as a viable model for anomaly detection. It is worthy of note that the study assumed a normal density mixture for the to be able to estimate the anomalous cluster in the data. Therefore, a choice of a different distribution for the could result in different findings. Also, instead of using a mixture model, a variable selection method through data augmentation could be used to aid in selecting only appropriate variables for the model.

#### Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest.

1. J. Novignon, S. A. Olakojo, and J. Nonvignon, “The effects of public and private health care expenditure on health status in sub-saharan africa: new evidence from panel data analysis,” Health Economics Review, vol. 2, p. 22, 2012. View at: Publisher Site | Google Scholar
2. T. Ekina, F. Leva, F. Ruggeri, and R. Soyer, “Application of bayesian methods in detection of healthcare fraud,” Chemical Engineering Transaction, vol. 33, 2013. View at: Google Scholar
3. W.-S. Yang and S.-Y. Hwang, “A process-mining framework for the detection of healthcare fraud and abuse,” Expert Systems with Applications, vol. 31, no. 1, pp. 56–68, 2006. View at: Publisher Site | Google Scholar
4. J. H. Wilson, “An analytical approach to detecting insurance fraud using logistic regression,” Journal of Finance and Accountancy, vol. 1, p. 1, 2009. View at: Google Scholar
5. Q. Liu and M. Vasarhelyi, “Healthcare fraud detection: a survey and a clustering model incorporating geo-location information,” in Proceedings of the 29th World Continuous Auditing and Reporting Symposium (29WCARS), Brisbane, Australia, November 2013. View at: Google Scholar
6. T. Ekin, F. Ieva, F. Ruggeri, and R. Soyer, “Statistical medical fraud assessment: exposition to an emerging field,” International Statistical Review, vol. 86, no. 3, pp. 379–402, 2018. View at: Publisher Site | Google Scholar
7. H. Kozumi and G. Kobayashi, “Gibbs sampling methods for bayesian quantile regression,” Journal of Statistical Computation and Simulation, vol. 81, no. 11, pp. 1565–1578, 2011. View at: Publisher Site | Google Scholar
8. K. Yu and R. A. Moyeed, “Bayesian quantile regression,” Statistics & Probability Letters, vol. 54, no. 4, pp. 437–447, 2001. View at: Publisher Site | Google Scholar
9. M. A. Tanner and W. H. Wong, “The calculation of posterior distributions by data augmentation,” Journal of the American Statistical Association, vol. 82, no. 398, pp. 528–540, 1987. View at: Publisher Site | Google Scholar
10. S. Geman and D. Geman, “Stochastic relaxation, gibbs distributions, and the bayesian restoration of images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 6, no. 6, pp. 721–741, 1984. View at: Publisher Site | Google Scholar