Abstract

The purpose of this paper is to investigate a new family of distributions based on an inverse trigonometric function known as the arctangent function. In the context of actuarial science, heavy-tailed probability distributions are immensely beneficial and play an important role in modelling data sets. Actuaries are committed to finding for such distributions in order to get an excellent fit to complex economic and actuarial data sets. The current research takes a look at a popular method for generating new distributions which are excellent candidates for dealing with heavy-tailed data. The proposed family of distributions is known as the Arctan-X family of distributions and is introduced using an inverse trigonometric function. For the specific purpose of the show of strength, we studied the Arctan-Weibull distribution as a special case of the developed family. To estimate the parameters of the Arctan-Weibull distribution, the frequentist approach, i.e., maximum likelihood estimation, is used. A rigorous Monte Carlo simulation analysis is used to determine the efficiency of the obtained estimators. The Arctan-Weibull model is demonstrated using a real-world insurance data set. The Arctan-Weibull is compared to well-known two-, three-, and four-parameter competitors. Among the competing distributions are Weibull, Kappa, Burr-XII, and beta-Weibull. For model comparison, we used the most precise tests used to know whether the Arctan-Weibull distribution is more useful than competing models.

1. Introduction

Numerous disciplines of study have examined heavy-tailed probability distributions, including actuarial science, biomedical sciences, engineering, risk management, and economics. In recent years, some procedures have been proposed to generate a new class of heavy-tailed probability distributions with adequate description and a high degree of flexibility. Among these techniques, the use of trigonometric functions and their inverses has been at the forefront of the development of new families of probability distributions. One of the really essential functions of financial and actuarial science is the accurate forecasting of large monetary financial losses. Underestimation of such losses exposes the company to serious operational risks, including such bankruptcy and underestimating premium. To mitigate such circumstances and provide precise forecasts of actuarial science losses, actuaries frequently propose flexible heavy-tailed distributions.

Financial and actuarial data sets are generally heavy-tailed, unimodal-shaped, right-handed, and positive [1, 2]. The complex financial data sets can be better modelled by developing new families of probability distributions [15]. The suggested models significantly raise the effectiveness of quantitative analysis methods, and substantial work has been devoted establishing new statistical models. However, a number of basic difficulties with actual data seem to exist that do not really fit into most common statistical models. In order to capture the real-world phenomena, statistical distributions are commonly used. The theory of statistical distributions is widely studied along with the new developments for their usefulness. To describe different real-world phenomena, several families of distributions are developed. Recent developments in distribution theory and its uses have resulted in the emergence of a number of general families of probability distributions that have successfully been applied to a variety of statistical and probability problems. For more details, see [610].

Constructing flexible parametric models for modelling various types of data is a difficult task for applied statisticians. In general, this allows for the discovery of new features of real-world phenomena as well as the provision of advised predictions. Several families of distributions have been created in this regard using various techniques such as (i) inducing shape, skewness, or kurtosis parameter [6]; (ii) compounding of distributions [7]; transformation technique [1113]; (iv) finite mixture of distributions [1416]; and (v) composition of two or more distributions [17]. For more information about these techniques, see [610].

Unfortunately, the abovementioned generalization techniques for the classical probability distributions may face some constraints, such as (i) Adding more parameters to the probability model enhances its flexibility, and such methods usually result in reparameterization problems. (ii) There is an increase in the number of model parameters, which makes it more difficult to examine the model parameters. (iii) The tractability of the cdf is reduced by several extending approaches, which makes manual calculation of statistical characteristics more difficult to do. (iv) Other generalisation approaches make the pdf more complicated, leading in computing difficulties. The addition of additional extra parameters to existing models enhances the flexibility of the models, which is a desired characteristic. On the other side, it makes it more difficult to draw conclusions, for extra reading see [18, 19].

In order to make the model more flexible, most statistical models proposed in the literature have a large number of parameters. These estimators, according to some authors, are difficult to obtain using numerical resources. However, it is preferable to create models with a small number of parameters but a high degree of flexibility for modelling the data. To achieve this goal, a small group of researchers decided to look for new distributions using trigonometric functions and their inverses (see [1827]).

The fundamental objective of this research is to present and examine a new family of probability distributions with a small number of parameters but a high degree of flexibility for data modelling. To accomplish this, a group of researchers decided to look for new distributions using trigonometric functions and their inverses. Chesneau et al. [20] developed a new family of distributions called the sine Kumaraswamy-G family of probability distributions. Souza et al. [21] introduced a new family of probability distributions using the sine function. Souza et al. [22] proposed a new family of probability distributions using the tangent function. Souza [28] introduced other families of probability distributions using the cosine and secant functions. Mahmood et al. [27] developed a new family of probability distribution using the sine distribution. Chesneau et al. [23] developed a new family of probability distributions based on sine and cosine functions.

On the other hand, other researchers developed a new family of probability distributions using the inverse trigonometric functions. Lung et al. [18] developed a new family of probability distributions using the arcsine function. Rahman et al. [29] developed a new family of distributions by using the arcsine distribution. Chesneau et al. [24] developed a new distribution based on the arccosine function. Chaudary [26] introduced the Arctan Lomax distribution using the arctangent distribution. Muse et al. [30] introduced a new versatile log-logistic distribution using the tangent function. Furthermore, other researchers discussed the application of trigonometric functions and their inverses; for more information, see [19, 24, 27, 3133].

Given the preceding discussion, statisticians are willing to propose new distributions or distribution families based on an easily expressed pdf (probability density function) and a closed and tractable form of cdf (cumulative distribution function). As a result, an effort is being made in this paper to develop a new distribution family that avoids the issues mentioned above while also providing the best fit to financial data sets. The proposed family is known as the Arctan-X family of distributions (abbreviated “AT-X”) and was developed using the arctangent function. The proposed family has a straightforward pdf expression as well as a tractable and closed cdf form.

Here to authors’ knowledge, there is no published study on its mathematical and practical characteristics based on the arctangent function for the purpose of developing a new family of distributions in its whole in the present literature. One of the reasons for writing this paper is to address this unexpected gap. We derive some of the fundamental mathematical and statistical properties of the new family by using the general setting of the Arctan-X class, such as hazard rate function (hrf), survivor function (sf), quantile function (qf), moments and moment generating function (mgf), skewness, kurtosis, and residual life function.

The remainder of the article is arranged as follows: the Arctan-X family and its main mathematical properties are discussed in Section 2. Section 3 presents special cases of the Arctan-X family. The Arctan–Weibull distribution is introduced in Section 4. The Arctan–Weibull distribution mathematical and statistical features are examined in Section 5. Section 6 is used to estimate the Arctan-X family parameters. Section 7outlines the suggested model's Monte Carlo simulation. In Section 8, the suggested model's superiority is shown and explained using a real-world data application.. Finally, Section 9 contains final findings and major remarks and the whole work summary.

2. Arctan-X Family of Distributions

The AT-X family is provided in detail in this part of the paper. The AT-X family has many merits; one of these merits is that it has a very easily expressed pdf and a tractable and closed cdf form. If we assume that X is a random variable that belongs to the Arctan-X family, then its cdf can be written as the following expression:

Here, is considered as the cdf of the baseline (or parent) random variable depending on the parameter vector , and if has pdf , then the pdf of the class is expressed as

The complementary cdf (or survival function) can be written as below:

The instantaneous failure rate (or hazard rate function (hrf)) can be written as below:

The retro hazard (or reversed hazard rate function) may be expressed in the following manner:

And, the integrated hazard rate (or cumulative hazard rate function) is given by

2.1. Quantile Function

The quantile function (also known as the inverse cdf) of the Arctan-X family follows by inverting the Arctan-X distribution function. It may be written as follows in terms of the tangent trigonometric function:where . The quantile function expression may be used to generate random numbers from AT-X distributions.

2.2. Moments

In the field of actuarial science and financial science, moments are very important, particularly in applications. It gives the researcher a hand to get the key properties and characteristics of the proposed distribution under consideration. The rth moments of the AT-X distribution family are calculated as

Using the pdf of the AT-X family in equation (8), we get

Using the Taylor series, we have

Let , in equation [10]; we get

By the aid of equation (7) and substituting in equation (9), we will have the following result:Such that.

The moment-generating function for the AT-X family can be expressed in a general form as follows:

3. Special Cases of the Arctan-X Family

This section discusses certain cases of the intended Arctan-X family of distributions by using different base cumulative distribution functions. In Table 1, we present fifteen special cases of the proposed family including Weibull, Gompertz, log-logistic, Lomax, Kumaraswamy, Pareto, normal, Dagum, Burr-XII, Rayleigh, gamma, Lindley, exponential, Gumbel, and uniform distributions.

As an example, in the case of alternative parametrization, choose cdf of the Weibull distribution and introduce the Arctan-Weibull distribution as follows.

A random variable X is said to have a Weibull distribution with shape parameter and scale parameter denoted by X ∼ Wei . Its cdf is defined by the following:

And, the pdf is given by

We can write the reliability (survival) function as below:

The hrf is given by

The retro hazard rate is given by

The integrated hazard rate function is given bywhere is a vector of unknown parameters.

4. Arctan-Weibull Distribution

The Weibull distribution has a straightforward mathematical definition. It is tractable mathematically. It is also a model that can be used in a variety of situations. The Weibull distribution is regarded as a versatile model for loss modelling in general insurance due to its ability to adequately model data with a high degree of positive skewness, which is a characteristic of claim amounts. Hence, this part of the paper is devoted to introduce the new proposed distribution which is the AT-W distribution and derive the basic probability functions including the pdf, hrf, cdf, survivor function, integrated hazard function, and the retro hazard. Considering that is the cdf of the two-parameter Weibull distribution. The cdf of the AT-W distribution, for , can be expressed as

The corresponding pdf to the above cdf is given by

The sf is expressed as follows:

The hrf is obtained by

The inverted hazard rate function is as follows:

The cumulative hazard function may be denoted by the following:such that is the vector parameter in all of the above equations, respectively.

Figures 1, 2, and 3 show some possible shapes of the AT-W distribution's pdf, cdf, and sf functions to explore the behavioural patterns of its density, cdf, and sf functions for different values of the model parameters.

5. Properties of the Proposed Model

This section is devoted to use numerical examples to derive some statistical and mathematical properties of the AT-W distribution, such as the quantile function, skewness and kurtosis, moments, and residual and reverse residual life functions.

5.1. Quantile Function

The inverse cdf function is mostly employed in theoretical areas of distribution theory, such as thesimulations and applicability. The simulation software uses a quantile function to create random samples. The quantile function of the AT-W distribution is denoted bywhere is uniformly distributed from zero to one.

The quantile function of the AT-W model as follows:

The median, lower quartile, and upper quartile of AT-W distribution can be obtained easily by using the quantile function by setting , respectively.

5.2. Skewness and Kurtosis

Galton skewness (or asymmetry) and Moors kurtosis of the AT-W model with two parameters have the following mathematical expression form:

Here, Q donates the value of the quartile.

The preceding expressions can explicitly form as a function of the AT-W quantile function. These measures have many advantages [34].

5.3. Moments

Moments are essential in statistical modelling, particularly in applications. The LLT distribution’s rth moment is defined as

In fact, we have

5.4. Residual Life and Reverse Residual Life

This can be widely used in actuarial science, survival analysis, and many other fields such as the risk management; for more information, see [35]. The analysis of a device's lifetime after reaching age is especially important in reliability and survival analysis. Thus, is the original lifetime with survival function and the random variable is the corresponding residual life after age [36].

The distribution of can be calculated using the conditional probability definition in the following expression:

The residual lifetime is calculated using the following equation of the AT-W random variable (r.v.):

In addition, we can obtain the reverse residual life of the LLT r.v. as follows:

6. Classical Method of Estimation

Using classical methods is very important in the estimation process, so we devoted this section to maximum likelihood technique for estimating the parameters of the AT-X family of distributions from uncensored complete samples. Suppose that we have a random sample denoted as that represents independent random variables drawn from the AT-X family that have the following observations: , we can write the likelihood function for the AT-X family is defined as follows:

We can express the log-likelihood function as below:

Obtaining the partial derivate of the log-likelihood equation, we get

By equating the first derivative to zero and trying to solve this equation numerically, we get the MLE estimator of Now it is very easy to get the values of the estimates of the parameters by the aid of equation (35) for any subcase of the proposed family with pdf and cdf given by and respectively.

We can find the estimates by two methods either by the R program directly (Adequacy Model Package), the OX program (subroutine MaxBFGS), or the SAS (PROX NLMIXED) or indirectly by solving the nonlinear likelihood equations.

7. Monte Carlo Simulation Study

We assess the effectiveness of the maximum likelihood estimation (MLE) method for estimating the Arctan-Weibull distribution parameters using Monte Carlo simulation. A numerical evaluation of the performance of MLE of the AT-W model is performed using nlminb () R-function with the argument method =  “BFGS.” The simulation study is conducted to investigate the average bias (AB), root mean square error (RMSE), and mean square error (MSE), for the proposed model’s parameters,

We performed the simulation process by various samples and different values for the parameters. We generated the samples used in the simulation process from the quantile function of the AT-W distribution. In order to generate accurate samples and to get perfect estimates, we made 750 iterations using sample sizes and the parameter scenarios , in set I, in set II, and , in set III.

The MLEs are ascertained for each item of simulated data, say for and the AB, RMSEs, and MSEs of the parameters were computed bywhere .

7.1. Simulation Results

We explore the MLE method's performance in estimating the AT-W parameters using an MC simulation study with 750 repetitions. We determine the mean of the estimated parameters, the absolute bias, the mean square error (MSE), and the root mean square error (RMSE), and the following steps were followed:(i)We generated the samples by inverting the cdf given in [20](ii)Three different sets are taken for different true values of the parameters(iii)We used different sample sizes as mentioned in the simulation table

Tables 24 summarize the numerical findings of the MC simulation study. The average of the estimated parameters, as well as the AB, MSE, and RMSEs, is evaluated. Based on the results in the simulation tables, it is self-evident that MLEs is effective in estimating unknown parameters and that the resulting estimates are relatively stable and close to actual real values. Furthermore, as the sample size increases, the AB, MSE, and RMSEs decrease and so do the biases, MSE, and RMSE. For visual representation, the MC simulation outcomes are depicted in Figures 46. These graphs indicate that increasing the sample size n results in a decrease in the estimated value of AB, MSEs and RMSEs, and the average MLEs close to their true values.

8. Practical Illustration Using the Insurance Data Set

This section will examine a highly correlated real-world data set from the insurance business in order to show the AT-W distribution's value. We compare the proposed distribution’s goodness-of-fit test results and some information criterion measures to those of some other well-known competing distributions, such as the Weibull, Kappa, Burr-XII, and beta-Weibull distributions.

The primary attractiveness of the AT-W distribution derivation is its applicability to data analysis problems, which makes it useful in a variety of fields, particularly those concerned with insurance data analysis. Lately, a number of potential distribution families for insurance data sets have been offered. For more information on these, see [4, 5, 18, 19, 34, 35, 3840].

The cdf of the fitted models is as follows:(1)Weibull distribution is(2)Burr-XII distribution is(3)Beta-Weibull distribution is(4)Kappa distribution is

Certain analytical measures are used to identify the best fitting functionalities of the competitive distributions. In this regard, the Akaike Information Criterion (AIC), Hannan–Quinn Information Criterion (HQIC), Corrected Akaike Information Criterion (CAIC), and Bayesian Information Criterion (BIC) values were used to select the most appropriate ones. Apart from discriminating tests, additional goodness-of-fit includes testing, like the Anderson–Darling (A) statistic, the Cramer–von Mises (W) distance value test, and the Kolmogorov–Smirnov (K-S) statistic with associating p values, as well as the log-likelihood function, are also recorded.

The best model has the lowest values of AIC, BIC, CAIC, and HQIC, as well as the A, W, and K-S tests. Furthermore, the model with the greatest log-likelihood function value is selected as the best model and p values for the K-S statistics are applied to compare the competitive distributions. We observed that when compared to other distributions, the AT-W model provided the greatest match and fitting because it has the smallest values of the measured analytical tools.

The AIC is

The BIC is

The CAIC is

The HQIC is

Here, is the value of the likelihood function after taking the log for it and by substituting it with the estimates of the MLE, is the size of the sample taken in the experiment, and is the parameter number in the distribution. The following goodness-of-fit measurements are considered.

We can compute the value of the Anderson–Darling (A) test statistic by using the following equation:

We can compute the value of the Cramer–von Mises (W) test statistics by using the following equation:Here, is the output number in the vector of the data. When the data is sorted in ascending order, this is calculated.

8.1. Data I: Insurance Data Set

This data set includes 58 observations and represents monthly unemployment insurance metrics from July 2008 to April 2013. It was reported by the Maryland state, USA, Department of Labour, Licensing, and Regulation. The data set contains 21 variables, of which variable number 12 is of particular interest and it is studied by using alpha power-exponentiated exponential distribution to analyse [40]. The data can be found at:/catalog.data.gov/dataset/unemployment-insurance-data-july-2008-to-april-2013. The data frame contains the following58 observations:Table5

8.2. Analyses of Exploratory Data

The basic objective of research is to get information from large amounts of data. In this paper, we employed four distinct strategies to do exploratory study: (1) descriptive statistics for the data set, particularly our variable of interest; (2) box plot; (3) TTT plot; and (4) histogram.

The total-time-on-test (TTT) plot is a graphical representation of the form of the failure rate curve. Qualitative information regarding the shape of the failure rate function may help in the selection of a specific distribution in a variety of real-world application. The TTT plot for our data set used in this study exists in Figure 7, and it has a form indicative of a rising failure rate that is recommended for using Weibull distribution or its modifications.

Table 6 shows us the descriptive statistics of the insurance data set by computing specific aspects of the data (central tendency and spread).

8.3. Analysis of Data Set I

Table 5 contains descriptive statistics for data set I. The subjacent distribution of Data Set I, in particular, is highly skewed data (skewness estimated to be 2.436) with a heavier tail (kurtosis estimated to be 7.622). The proposed AT-W distribution has the lowest AIC, CAIC, BIC, and HQIC values and the highest log-likelihood values as shown in Table 7, As a result, it is chosen as the best appropriate model among the alternatives evaluated in this paper.

Table 8 shows the parameter estimate and p value for the Cramer–von Mises (W), Anderson–Darling (A), and Kolmogorov–Smirnov (K-S) tests for all competing distributions using the above Data Set I. According to Table 8, the proposed LLT distribution has the lowest values in A, W, and K-S tests, as well as the highest p value. As a consequence, the suggested AT-W distribution is selected as the best acceptable model among the competing distributions studied in this research. Also, plots of fitted cdf and pdfs with histogram of the observed data are shown in Figures 8 and 9. In addition, the visual representation of the estimated cdf, pdf, and PP plot in Figure 10 shows Kaplan–Meier plotting of the AT-W distribution fordata set I.

9. Conclusion

Distribution theory takes uncertainties into account and provides a set of regulations for discussing financial and economic taking decision difficulties. Due to the importance of distribution theory, we were motivated to introduce a new distribution family based on the inverse trigonometric function. We introduced a new superior family which better fits many kinds of data. The AT-X is very intriguing and offers better fit to many kinds of data such as actuarial data financial data and many other related data in such fields. The Arctan-Weibull (AT-W) distribution is defined as a subset of the family. The study developed the fundamental probability functions as well as some statistical properties of the submodel. The parameters of the AT-W model are estimated using classical inference by the maximum likelihood estimation technique. The proposed distribution is applied to insurance data set with a high degree of granularity. The AT-W distribution was compared to some well-known competitors, including Weibull, Kappa, Burr-XII, and beta-Weibull distributions. Four information criterion measures (AIC, BIC, CAIC, and HQIC) were used to make comparisons, as well as three goodness-of-fit measures (A, W, and KS test statistics with corresponding p values) and the likelihood function. Using these metrics, it is discovered that the AT-W model could be a good fit for analyzing high-dimensional financial data.

This study has a plethora of potential extensions. In practice, the special submodels of Table 1may be investigated in the future study, for instance. Additionally, a variety of frequentist and Bayesian techniques may be employed to estimate the parameters of these particular submodels. The proposed family could also be extended to study regression analysis as a generalized linear regression model [41] (GLRM).

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare no conflicts of interest regarding this paper.