Mathematical Problems in Engineering

Mathematical Problems in Engineering / 2021 / Article
Special Issue

Robust Estimation Methods in the Presence of Extreme Observations

View this Special Issue

Research Article | Open Access

Volume 2021 |Article ID 9944363 | https://doi.org/10.1155/2021/9944363

Shokrya S. Alshqaq, Abdullah A. Ahmadini, Ali H. Abuzaid, "Some New Robust Estimators for Circular Logistic Regression Model with Applications on Meteorological and Ecological Data", Mathematical Problems in Engineering, vol. 2021, Article ID 9944363, 15 pages, 2021. https://doi.org/10.1155/2021/9944363

Some New Robust Estimators for Circular Logistic Regression Model with Applications on Meteorological and Ecological Data

Academic Editor: Ishfaq Ahmad
Received02 Apr 2021
Accepted06 May 2021
Published26 May 2021

Abstract

Maximum likelihood estimation () is often used to estimate the parameters of the circular logistic regression model due to its efficiency under a parametric model. However, evidence has shown that the classical extremely affects the parameter estimation in the presence of outliers. This article discusses the effect of outliers on circular logistic regression and extends four robust estimators, namely, Mallows, Schweppe, Bianco and Yohai estimator , and weighted estimators, to the circular logistic regression model. These estimators have been successfully used in linear logistic regression models for the same purpose. The four proposed robust estimators are compared with the classical through simulation studies. They demonstrate satisfactory finite sample performance in the presence of misclassified errors and leverage points. Meteorological and ecological datasets are analyzed for illustration.

1. Introduction

Circular data arise whenever the values of a random variable can present the circumference of the unit circle. It is measured by angles with values between 0 and or and , for example, wind directions, animal navigation, or the values of any periodic phenomena such as a 24-hour clock or days of the year which can be converted to circular data [1]. The modeling of the relationship between circular variables is so-called “circular regression,” and it is classified into three main classes, namely, circular-circular, circular-linear, and linear-circular regression models [2].

The applications on circular regression models are widely spread in many applied fields. Several regression models were proposed to predict a continuous circular variable from other circular or linear predictors [36].

Logistic regression analysis is a useful statistical tool that analyses the relationship between a binary response and a predictor. The theory of logistic regression is well developed [7]. Daffaie and Khan [8] proposed the circular logistic model to predict a binary variable from a circular variable, such as modeling the rainfall (yes, no) and wind direction, fatal road accident (yes, no), and time of accident.

The existence of outliers is a common problem in regression analysis. For the linear logistic model, Feser and Pia [9] showed that maximum likelihood estimation can be influenced by outliers. Croux et al. [10] found that the most dangerous outliers termed “bad leverage points” are misclassified observations that are outlying in the design space of predicted variables.

Circular logistic regression is also subjected to the existence of outliers as shown in [11], where they have proposed an outlier detection procedure based on the penalized maximum likelihood and applied it on different real datasets.

Several robust estimators that are less affected by outliers are proposed in the literature to improve the estimation performance in linear logistic regression models [12]. The authors in [13] introduced weights depending on the response and covariates and proposed the Mallows-type estimator. This estimator was analyzed deeply by Carroll and Pederson [14] using Mahalanobis distance to reduce the weights in terms of leverage, and Bianco and Yohai [15] proposed methods.

Since the published work just considered the detection of outliers in the circular logistic regression model, this article attempts to overcome the problem of outliers in the circular logistic regression model by extending some robust estimators from the classical linear logistic to the circular logistic case.

The rest of this paper is organized as follows. Section 2 reviews the formulation of the circular logistic regression model and its parameter estimation via . Section 3 presents the types of outliers in circular logistic regression and derives the proposed robust estimators in a logistic circular regression model. Section 4 discusses the effect of outliers on the circular logistic estimators by computing the influence functions. Section 5 investigates the performance of the considered robust estimators. Section 6 applies the proposed estimators on meteorological and ecological datasets. Section 7 provides the conclusion.

2. Model Formulation

A circular logistic regression describes the relationship between a binary response and circular predictors. It shows potential with various applications in the field of environmental sciences. The authors in [8] assumed that binomial observations with a probability of success, , for , depend on circular random variable , and the proposed model is given as follows:where is the value of the logit (log odds) when and is the angle where the logit reaches its highest value. Let and , and equation (1) can be written as

Suppose that binomial data of the form successes out of trails are observations from a binomial distribution, the likelihood function is then given by

Let , and by using the exponential function, we obtain

The maximum likelihood estimation is classically used for parameter estimation and is defined by an objective function aswhere

The maximum likelihood equations are given as follows:

These equations are solved iteratively by using the Newton–Raphson method. Recently, Abuzaid and ElShekh Ahmed [11] used the penalized maximum likelihood estimator () to identify outliers in the circular logistic regression model and investigated its performance via simulation. The following section discusses some possible robust estimators for the circular logistic regression model.

3. Robust Estimators for Circular Logistic Regression

3.1. Outliers in Circular Logistic Regression

This section distinguishes different cases of outlying observations in circular logistic regression, where outliers might occur in the dependent variable, independent variable, or both of them.

For binary data, all the ’s values are either 0 or 1. Hence, an error in the direction can only occur as a transposition of 0 to 1 or vice versa. This type of outlier is known as residual outlier or misclassification-type error [16,17].

A leverage outlier or leverage point occurs when the circular observation at position (e.g., ) is contaminated as follows: , where is the value after contamination and is the degree of contamination in the range . A leverage point can be considered as a good leverage point when with a large value of while it is a bad leverage point when with small value of and vice versa. Abuzaid and ElShekh Ahmed [11] considered the misclassification-type error outlier in their simulation study without any investigation of the leverage point detection.

Alternatively, robust estimators are the common methods used for handling the problem of outliers in logistic models. The following section presents four robust estimators for the parameters in the circular logistic model as follows.

3.2. Circular Mallows Class

The proposed estimator () is extended from the Mallows class in [18] to the circular logistic model for weighting the maximum likelihood estimator.

Assume is a continuous and increasing distribution function and is given by

Then, the partial derivatives in (7)–(9) becomerespectively. The robust estimates for the circular logistic regression model in equation (10) are given by the solution of obtained bywhere are the weights that may depend on , , or both and is a correction function needed to ensure consistency. If and , then equations (12)–(14) give the usual circular logistic regression estimate. If and , then the weights depend only on .

3.3. Circular Schweppe Class

Stefanski [19] stated that is robust Mahalanobis distance for the vector that depends on the covariance matrix of the regression model, which is given bywhere is a diagonal matrix with and is the probability of ; then, the Mahalanobis distance is given by

If , then the estimator is the same as the linear logistic Schweppe class proposed in [13]. Here, depends on and circular . This estimator is called circular conditionally unbiased bounded influence function or estimator.

3.4. Circular BY Estimators

Bianco and Yohai [15] proposed methods for the linear logistic model, and in this section, we extend it to the circular case and referred to as .

Let the be obtained by minimizing the deviance,where . By replacing the deviance function in (17) with function , the robust estimator is defined bywhere is a bounded, differentiable, and nondecreasing function defined in [15] and given bywhere is a positive number, , and .

3.5. Circular WBY Estimators

The extension of estimator by including weights is reducing the influence of outliers in space. This weighted () estimator is extended to the circular logistic regression model and defined aswhere the weights are distances which are computed using the minimum covariance determinant () estimator, to be a decreasing function of robust Mahalanobis distances (), and given by (see [20])

4. Influence Function

Suppose two source populations with -dimensional circular variables, which are both von Mises distribution with different means but the same concentration parameter, . Circular variable can arise from one of these populationswhere and . Let the binary variable indicates the source population of the corresponding , then

Let the joint distribution of (, ) be denoted by and be an estimator of the circular logistic parameters. Then, the influence function [21] is defined as

If , as shown in equations (7)–(9), then the influence function is an unbounded function for spaces and . Specifically, a small amount of contamination in the training data due to the presence of possible outlier in or intensively affects the , as shown in the simulation section.

Suppose , where the weights depend on the robust distance of observation , and robust distance is equal to the Mahalanobis distance of to the center of the data cloud. This condition reduces the influence of outlying observations in the space. Thus, is bounded with respect to but unbounded with respect to . Similar conclusions as can be derived.

If or , which adds a weight to , then a fully bounded influence is obtained.

5. Simulation Study

5.1. Settings

This simulation aims to compare the robustness of the proposed robust estimators and the classical . The independent circular variable is generated from von Mises distribution with mean and concentration parameter of  = 1, 2, 6, 10, and 15, with sample size , and 300; a large sample size is chosen to avoid separation problems. The true parameter values are .

The simulation study is reported in a variety of situations. Initially, the data without contamination are simulated. The robust properties of all estimators with contaminated data are examined in three different ways. First, proportions are taken from the responses, and is chosen randomly and changed from either 0 to 1 or 1 to 0. This process constitutes the misclassification-type error. For each contaminated case, 5%, 10%,20%, 30%, and 40% of the original data are contaminated. Second, the same proportions are taken to contaminate with for good leverage points. Finally, the same proportions are considered, and the generated data are contaminated with two types of outliers simultaneously. This process constitutes bad leverage points.

Each simulation includes 1000 replications. The performance of the estimators is evaluated based on the bias and the median squared error () for each parameter, which are defined as follows:

A good estimator has bias and that are relatively small or close to zero.

The simulation used the standard available “Robust” package in R to obtain the estimators and the” CircStats” package for generating the circular variable .

5.2. Results

The bias and of the five estimators are shown in Tables 1 to 7. Table 1 shows the results for uncontaminated data (i.e., clean data), where the biases and of all five estimators are fairly close to each other. However, the ’s is larger than the others for large concentration parameter (  = 6, 10, and 15). Hence, performs worse compared with the other estimators in this situation.


Methods
BiasBiasBias

10.00490.01660.00230.01270.00250.0114
0.00290.01940.00170.01400.00280.0123
0.00380.01660.00210.01280.00220.0114
0.00510.01790.00280.01350.00280.0117
0.00500.01790.00300.01350.00300.0118

20.00640.02130.00120.01661.2e-030.0131
0.01130.14960.00550.06012.3e-030.0406
0.00620.02330.00100.01771.6e-030.0136
0.00830.02200.00240.01711.0e-030.0136
0.00570.13960.00100.01824.2e-050.0143

60.01700.22210.01100.08490.00660.0642
0.10990.87340.07670.34990.04170.2006
0.01970.23460.01760.09630.01050.0749
0.00600.22480.00430.09080.00330.0641
0.03070.24020.00070.03160.04710.0632

100.02160.13920.01620.11550.01680.0938
0.06340.36000.06240.27690.06090.2530
0.03410.15290.03190.13110.01600.1030
0.03260.14190.01730.11820.01770.0948
0.07640.12640.07190.10900.07520.0871

150.08950.25010.07060.19920.03300.1789
0.15181.02600.11580.48440.14140.4132
0.07840.27490.07160.21370.05290.1945
0.08590.25600.06260.20350.03240.1805
0.09340.97540.08350.19490.08160.1661


% of misc errorMethods
BiasBiasBias

151.45702.19181.55702.20181.65812.3018
0.14000.30730.25100.41620.13670.3612
0.51480.44820.46480.34720.46360.3371
0.13440.37650.12330.27560.11440.2565
0.17450.38420.08650.39450.08410.2951
21.55702.29181.65702.30181.75812.4018
0.24000.40730.35100.51620.23670.4612
0.61480.54820.56480.44720.56360.4371
0.23440.47650.22330.37560.21440.3565
0.27450.48420.18650.49450.18410.3951
61.05702.18181.05702.20181.05812.3018
0.14000.31730.25100.62620.12670.5512
0.50480.42820.47480.33720.45360.2271
0.02440.32650.01330.26560.01440.2465
0.06450.34420.07650.38450.07410.2851
101.55802.20181.66802.31281.76912.4128
0.03100.21830.14000.30520.02570.2502
0.40380.33720.35380.23620.35260.2261
0.02340.26550.01230.16460.00340.1455
0.06450.27320.07550.28450.07310.1841
151.56812.20291.66812.31291.76922.4129
0.13110.31620.24010.40510.12560.3501
0.50370.43710.45370.33610.45250.3260
0.12330.36540.11220.26450.10330.2454
0.16340.37310.07540.38340.07300.2840
1102.46486.10132.35376.00022.25266.1113
0.34840.20980.23930.10870.12920.0076
0.89331.45980.99220.94970.98110.9386
0.25420.17160.14310.06050.03200.0504
0.05650.11200.04540.01100.03430.0121
22.36486.00132.25376.01022.15266.0113
0.24840.10980.13930.00870.02920.0066
0.69330.75980.89220.84970.88110.8386
0.15420.07160.04310.05050.02200.0404
0.04650.01200.03540.00100.02430.0021
62.35486.00132.24376.01122.14266.0013
0.23840.11980.12930.01870.01920.0176
0.9833.64980.88220.83970.87110.8286
0.14420.06160.03310.15050.12200.1404
0.14650.00200.13540.10100.12430.1021
102.57586.21232.46476.11122.14166.2223
0.23740.11880.12830.00770.01820.0166
0.88230.54880.88120.83870.87010.8276
0.14320.06060.03210.15150.12100.1414
0.04550.00300.03440.00000.02330.0011
152.57596.21242.46486.11132.36376.2224
0.23730.10870.12820.00760.01810.0065
0.68220.74870.88110.83860.87000.8275
0.14310.06050.03210.05040.02100.0403
0.04540.00100.03430.00000.02320.0010
1202.72887.47732.61777.36622.50667.2551
0.43090.64670.32080.53560.21070.4245
1.66031.80311.55022.70201.44012.6010
0.76881.32570.65771.21460.54661.1035
0.07030.32170.06020.21060.05010.1005
23.72888.47733.61778.36623.50668.2551
0.53090.74670.42080.63560.31070.5245
2.66033.80312.55023.70202.44013.6010
0.86882.32570.75772.21460.64662.1035
0.17030.42170.16020.31060.15010.2005
62.83887.58732.72777.47622.61667.3651
0.52090.75670.43080.64560.32070.5345
1.77032.91311.66022.81201.55012.7110
0.87881.43570.76771.32460.65661.2135
0.18030.43170.17020.21060.16010.2105
103.93888.68733.82778.57623.51668.4651
0.41090.64670.32080.53560.21070.4245
0.67031.81310.56021.71200.45011.6110
0.76880.53570.65770.22460.54660.1135
0.07030.32170.06020.10060.05010.1005
154.04889.79734.93779.68624.62669.5751
0.40080.63560.31070.52450.20060.4134
0.66021.80200.55011.70100.44001.6000
0.75770.52460.64660.21350.53550.1024
0.06020.31060.05010.10050.04000.1004


% of misc errorMethods
BiasBiasBias

1302.82478.00533.72477.00432.61476.0032
0.43540.56140.32540.65140.21540.5414
2.03184.16581.02083.15481.01083.0448
2.82588.01122.71588.00122.60588.0001
0.37520.55600.26420.44500.15320.3340
23.82479.00534.72478.00433.61477.0032
0.53540.66140.42540.75140.31540.6414
3.03185.16582.02084.15482.01084.0448
3.82589.01123.71589.00123.60589.0001
0.47520.65600.36420.54500.25320.4340
63.93479.11534.83478.11433.72477.1132
0.54640.67240.43640.76240.32640.6524
3.04285.17682.03184.16582.02184.0558
3.83689.02223.72689.01223.61689.0000
0.48620.66700.37520.55600.26420.4450
103.93589.11644.83588.11543.72587.1143
0.32430.45030.21430.54030.10430.4303
2.02074.05471.01073.04371.00073.0337
2.71478.00012.60478.00012.50478.0000
0.26410.66500.15310.33400.04210.2230
153.82589.00644.72588.00543.61587.0043
0.21430.34030.10430.43030.00430.3203
2.13074.16471.12073.15371.11073.1437
2.82478.11012.71478.11012.61478.1100
0.37410.77500.26310.44400.15210.3330
1402.88388.53202.77278.42102.66168.3100
0.77160.88490.76050.87380.75040.8627
2.42875.93372.31775.02272.20665.0116
2.87718.31482.76608.20372.65508.1026
0.05150.39610.04040.28700.03140.2980
62.99388.64202.88278.53102.77168.4200
0.88160.99490.87050.98380.86040.9727
2.53875.04372.42775.13272.31665.1216
2.98718.42482.87608.31372.76508.2126
0.16150.40710.05150.29810.04250.2091
103.00499.54213.98289.63113.87179.5201
0.99261.00300.98171.09490.97151.0838
2.87535.37042.77425.27132.66315.1612
2.71898.48422.60878.37312.50768.2621
0.51610.71400.15050.81290.25040.2013
153.11509.65323.09399.74223.98289.6312
1.09371.11411.98171.10501.06261.9747
2.98635.48172.88535.38242.77415.2723
2.82908.59532.71988.48422.61878.3732
0.62720.82510.26160.92300.36150.3124


% of misc errorMethods
BiasBiasBias

150.80720.81220.81620.82320.82720.8342
0.81150.81880.82250.82980.81050.8188
0.81220.82190.82320.83290.83420.8439
0.81380.83560.82480.84660.83580.8576
0.82090.86230.83190.87330.84290.8843
20.87200.82210.86210.83220.87220.8423
0.81510.88810.82520.89820.80510.8881
0.82210.81920.83220.82930.84230.8394
0.83810.85630.84820.86640.85830.8765
0.80920.82360.81930.83370.82940.8438
60.81720.92220.92620.93320.93720.9442
0.92150.92880.93250.93980.92050.9288
0.92220.93190.93320.94290.94420.9539
0.92380.94560.93480.95660.94580.9676
0.93090.97230.94190.98330.95290.9943
100.87120.82220.86220.83320.87320.8442
0.81520.88280.82530.89380.80250.8828
0.82220.81390.83320.82490.84420.8359
0.83280.85460.84380.86560.85480.8766
0.80390.82730.81490.83830.82590.8493
150.82170.82230.82260.82330.82370.8244
0.82510.82880.83520.88390.85200.8228
0.82230.89310.82330.89420.82440.8953
0.88230.86450.88340.85660.88450.8667
0.89300.83720.89410.88330.89520.8394
1100.80350.80960.71350.71960.70250.7086
0.80830.81830.71830.72820.72910.7393
0.80600.81430.71600.72430.72700.7353
0.81210.84160.72210.73160.73310.7426
0.81180.85060.72180.76060.71280.7066
20.83550.89660.73550.79660.72550.7866
0.88330.88230.78330.78220.79110.7933
0.86000.84330.76000.74330.77000.7533
0.82110.81660.72110.71660.73110.7266
0.81880.80660.71880.70660.72880.7666
60.91530.91960.81530.81690.81520.8168
0.91380.92380.82380.82280.82190.8339
0.91060.91340.81060.82340.82070.8335
0.91120.94610.82120.83610.83130.8462
0.91810.95600.82810.86600.81820.8065
100.81450.81070.72450.72070.71350.7196
0.81930.82930.72930.73920.73020.7404
0.81700.82530.72700.73530.73800.7463
0.82310.85260.73310.74260.74410.7536
0.82280.86160.73280.77160.72380.7176
150.82540.82070.73540.73700.72530.7269
0.82390.83390.73390.74290.74200.7540
0.82070.83350.73070.74350.74080.7536
0.83130.86620.74130.75620.75140.7663
0.83820.87610.74820.78610.73830.7267
1200.79100.79030.60910.60940.61020.6105
0.79540.79790.60640.60890.61750.6190
0.79110.79220.68110.68220.67220.6733
0.80190.78670.71190.69670.72200.6078
0.78670.81010.67670.70010.66560.7223
20.79110.79220.68110.68220.67220.6733
0.80190.78670.71190.69670.72200.6078
0.79100.79030.60910.60940.61020.6105
0.79540.79790.60640.60890.61750.6190
0.77860.80110.66770.70100.65660.7232
60.79100.79030.60910.60940.61020.6105
0.79540.79790.60640.60890.61750.6190
0.79110.79220.68110.68220.67220.6733
0.80190.78670.71190.69670.72200.6078
0.78670.81010.67670.70010.66560.7223
100.80200.80130.71010.71040.72120.7215
0.80640.80890.71740.71990.72850.7200
0.80210.80320.79210.79320.78320.7843
0.91290.89770.82290.70770.83300.7188
0.89770.92110.78770.81110.77660.8333
150.81310.81240.72120.72150.73230.7326
0.81750.81900.72850.72000.73960.7311
0.81320.81430.70320.70430.79430.7954
0.92310.80880.83300.71880.82210.7299
0.80880.93220.79880.82110.78770.8433


% of misc errorMethods
BiasBiasBias

1300.80890.83920.71890.74920.79890.7922
0.81240.84520.72240.75520.72440.7522
0.81110.84420.72110.75420.71120.7425
0.81500.86010.72500.77010.70520.7107
0.81090.86320.72090.77320.70920.7327
20.81900.84030.72900.75030.70900.7033
0.82350.85620.73350.76630.73550.7633
0.82210.85530.73220.76530.72230.7536
0.82610.87120.73610.78120.71630.7218
0.82100.87430.73100.78430.71030.7438
60.88900.89230.78910.79240.78990.7229
0.82410.85240.72420.75250.74420.7215
0.82310.84240.71120.74250.71210.7254
0.85010.80160.75020.70170.75200.7071
0.80910.83260.70920.73270.79200.7273
100.89810.90120.79800.80130.79880.7337
0.83520.86350.73530.76360.75530.7326
0.83420.85350.72230.75360.72320.7365
0.86120.81270.76130.71280.76310.7182
0.81020.84370.71030.74380.80310.7384
100.91890.91230.80890.90310.88890.8373
0.92350.93560.82350.83560.83550.8236
0.92340.93550.83220.83650.82230.8256
0.91620.92710.81360.82810.81360.8281
0.90210.93740.80310.83840.91030.8438
150.99820.92320.89810.93110.88990.8734
0.93530.95640.83530.85640.85540.8363
0.93430.95540.82240.86540.82330.8563
0.96220.97130.83620.88130.83620.8813
0.92110.97440.83110.88440.90320.8384
1400.81620.84210.72720.73110.73820.7421
0.82160.85190.73260.76290.74360.7739
0.81860.84580.72860.75580.73960.7658
0.84540.90450.75640.80350.76740.8145