Abstract

The Pareto distribution is widely used to model industrial, biological, engineering, and other various types of data. A new generalized model, namely the neutrosophic Pareto distribution (NPD), is developed in this article. The proposed model is a neutrosophic variant of the classical Pareto distribution, potentially useful for analyzing vague, unclear, indeterminate, or imprecise data. The structure form of the proposed distribution is skewed to the right and determined to be unimodal. Several characteristics of the NPD are investigated under the neutrosophic framework. The expressions for basic properties such as mean, variance, raw moments, and shape coefficients are obtained. The maximum likelihood approach is presented for estimating the imprecise distributional parameters of the proposed model. The extended notions of the NPD are explained with various key functions in the domain of applied statistical methods. Finally, the practical benefits of NPD are proven by analyzing two real datasets.

1. Introduction

The distribution of extreme values for some natural phenomena (e.g., earthquakes, winds, floods, waves, and temperatures) is of relevance in a wide variety of practical situations. For instance, the distributions of high waves in the sea, the distribution of large floods in dams, and so on are important when designing these structures. Extreme value theory has exploded in popularity in recent years as a result of this interest [1]. By the end of the twenty-first century, there is a substantially increased interest in safety and reducing losses from man-induced and natural disasters [2]. The combination of highly felt social needs and the emergence of new theoretical methods has resulted in tremendous progress in this essential multidisciplinary field of research [3]. Note that the study of statistical characteristics for various natural catastrophes is essentially required not only for understanding the physical nature of the underlying processes but also for risk assessment [4]. Edwards and Das [5] provide a comprehensive list of major domains where heavy-tailed distributions are shown to be useful. The Pareto distribution is one of the heavy-tailed distributions frequently encountered in physical systems to describe different natural disasters (e.g., volcanic eruptions, earthquakes, hurricanes, floods) [6]. The observed fluctuation in sea level, river flow discharges, asteroid craters, wind velocity, forest fires distributions, and some other natural disasters support the potential of a successful Pareto fitting model and obey the Pareto power law [7]. The Pareto distribution and its different variants are also especially well known in the literature for its ability to describe heavy-tailed data, which are commonly found in wealth distribution, actuarial science, life testing, finance, economics, engineering, and survival analysis [8]. A large range of socioeconomic variables have heavy-tailed distributions that are reasonably well fitted by the Pareto model [9]. The shape of income distributions is governed by some underlying law [10]. Rootzen et al. [11] also listed a number of additional quantities that have been measured in diverse physical, biological, technical, and social systems and for which the Pareto rule has been proven to be a good fit. In short, several studies have asserted that the Pareto model is versatile in modeling many forms of data with large tails.

A traditional method for analyzing extreme values in a population is based on a precisely characterized extreme value Pareto distribution. The customary approach of Pareto distribution is appropriate to use when data consist of a set of exact values or distributional parameters are exactly defined values [12]. However, this strategy has been critiqued since employing exact data results in the loss of information contained in data. Measurements on quantitative variables always have a certain range of inaccuracy [13]. Apart from continuous measurements, there is abundance of situations where exact reporting is impossible due to the irregular nature of circumstances. For example, due to the fluctuating nature of water, the depth of an ocean cannot be exactly quantified but can only be approximated. This issue is remedied by using fuzzy and neutrosophic statistics rather than conventional statistical approaches [14]. The neutrosophic approach is a broader idea that combines a fuzzy concept set with the notion of a classical set [15]. The neutrosophic philosophy takes into account the presence of truth, falsity, and ambiguity [16]. The notion of neutrosophy is now being utilized in a variety of application areas [1719]. The obtained data may be unclear in a number of real-world scenarios. Several researchers have turned to neutrosophical philosophy to solve the issues of incomplete data [2022]. In the domain of neutrosophic statistics, conventional statistical approaches have been rigorous in their treatment of ambiguous data processing. New application areas for distribution theory are emerging and demanding further attention. The literature on statistical distributions is dense with several strategies for generalizing continuous distributions in order to improve their ability to describe a variety of datasets.

This study presents the NPD within the neutrosophic framework, thereby enhancing the model’s flexibility in dealing with uncertain data sets from a variety of real-world circumstances. This work aims to investigate the usage and implementation of the NPD in healthcare data analysis and to demonstrate the practical advantages of the suggested model.

The remainder of this work is structured as follows: Section 2 contains a description of the proposed and other key characteristics. Simulation studies, including the quantile function of the NPD, are explained in Section 3. The estimation process under the neutrosophic logic is presented in Section 4. In Section 5, a concise explanation of significant theoretical findings is followed by some real-world examples. Section 6 summarizes the findings of the study.

2. Proposed Model with Some Useful Properties

This section gives an overview of the suggested distribution and presents it in a coherent framework. The following concepts provide a link between the proposed model and its uses in the applied statistical methods. If the random variable with two parameters and follows the Pareto model, then the density function of the proposed distribution is defined aswhere  = [, ] and are the neutrosophic shape and scale parameters, respectively, of the NPD. Note that the proposed model differs from the existing structure of the classical Pareto model, where shape and scale parameters are precisely determined. When the indeterminate part is considered zero in the proposed model, that is, and , it becomes equivalent to the classical model. Various values of and result in different density curves. A variety of density curves with different neutrosophic shape values and a fixed of scale are plotted in Figure 1.

Figure 1 shows that different indeterminate values of shape parameter resulted in different sturdy curves of the NPD. It is clear from Figure 1 that curves are not symmetric and distorted toward the right. The curve is portrayed as a thick layer instead of a single curve in the neutrosophic framework. The layer width (shaded area) indicates an imprecision region, and total area under the thick curve is one in view of completeness. Another intriguing aspect in probability theory applications is the neutrosophic cumulative function of any density. The is a jointly coupled form of the is given by

The function estimates the probability that a random variable will have a value smaller than a given value. Figure 2 shows the curves for various interval values of the shape parameter of the proposed model.

Figure 2 depicts the cumulative densities of the proposed model for various interval values of shape and fixed value of scale parameter. In each panel of Figure 2, the curve is nondecreasing and ranges from 0 to 1. The nondecreasing nature of the implies that the DF cannot be negative and true for any distribution. Another useful function in the context of the applied statistical method is the possibility that an individual’s life will outlive a certain period of time. This function is referred to as the survival function or simply the survival rate. In the neutrosophic framework, the survival function of the proposed model may be represented as follows:

The graph of is referred to as a survival curve. Figure 3 depicts the survival curve for the proposed NPD. The steep curve can demonstrate a short survival period, or a low survival rate can be shown by the steep curve, as seen in Figure 3(b). A flat or progressive survival curve indicates a longer survival rate, as seen in Figure 3(a).

Another critical function in reliability analysis is the neutrosophic hazard function , often known as the imminent failure rate. It is the ratio of the survival and density functions and may be calculated as follows for the suggested model:

The function provides the failure probability of an individual or item for a minimal time. may increase, decrease, stay constant, or reflect a more complex process. The graphical behavior of the hazard curve can be seen in Figure 4.

Figure 4 provides the hazard curves of NPD at the fixed value of the scale parameter and interval values of the shape parameter. Figure 4 indicates the decreasing trends of the hazard curves of the proposed model.

In this section, we have also further investigated the theoretical background and presented some key distributional properties of the proposed NPD in the context of neutrosophic logic. The distributional properties subject to parameterization as given in (1) are given as follows.

Theorem 1. If follows the NPD, then .

Proof. By definition, the mean of the NPD is given byEquation (5) further yieldedSo,hence proved.

Theorem 2. If follows the NPD, then is the variance of the proposed model.

Proof. The variance of the NPD is given byNow,Simplification of (9) providesEquation (8) thus becomes

Theorem 3. If follows the NPD, then is the median value.

Proof. The median point can be derived from the distribution function aswhere denotes the neutrosophic median value.
Furthermore, simplification of (12) for yielded

Theorem 4. The moment of the NPD is

Proof. By definition, the moment of the NPD is given byFrom (14), we can writehence, is required result, is a general expression for the row moment about the origin of the NPD. By using the following relations, moments about the mean for NPD can be derived as

Theorem 5. The coefficient of skewness of the NPD is .

Proof. By definition, the coefficient of skewness for NPD is given bywhere and .
Substituting in (17) yieldedwhere .

Theorem 6. The coefficient of kurtosis for NPD is .

Proof. By definition, the coefficient of kurtosis is given bywhere and .
Substituting in (19) yieldedwhere .

3. Simulation Analysis of the Proposed Model

In this section, a Monte Carlo technique is employed to generate the random numbers that are expected to follow NPD. In general, the Monte Carlo method refers to any technique for solving a problem that makes use of random outcomes. The objective of this study is to test the theoretical findings listed in Section 2 by simulating random samples from the NPD with known parameter values using the Monte Carlo approach. The inverse approach has been employed as the most straightforward technique to simulate random numbers from the proposed model. This approach enables us to make use of a computer built-in pseudo-random number generator for generating random numbers. The inverse of the proposed model is given bywhere randomly generated numbers from the uniform distribution, and is desired percentile value of the proposed NPD. Let random samples are drawn according to the inverse method from the proposed model with and . Analytical outcomes based on the analytical results given in Section 2 are calculated with baseline parameter values. Estimated values of different distribution properties along with exact results are provided in Table 1.

Table 1 displays the descriptive metrics of the proposed model for known distributional parameter values. The descriptive measures of the simulated data using the proposed model are in intervals due to assumed indeterminacies in defined parameters. The basic framework of the proposed model is validated by the strong agreement between simulated and analytical results.

4. Estimation of Neutrosophic Parameters

In this part, a well-known maximum likelihood (ML) technique is used to determine the neutrosophic parameters of the proposed NPD. The ML technique is defined by considering the parameters unknown and calculating the joint density of all observations in a dataset that are assumed to be identical and dispersed independently. Once the likelihood of the NPD is established, maxima of the function are determined. These ML estimators are essential in the statistical viewpoint because of minimal variance and asymptotic unbiasedness properties. Let are identical, and independently observations from the subjects follow the parametric model given in (1), and then, the joint density is given by

Taking the logarithm of (22) and symbolizing it by

Simplification of (23) yielded

Partially differentiating (24) by unknown values and equating to zero implies

Further solution of (25) provides the following estimates for unknown parameters of the NPD

Note that and will be interval forms because of imprecise sample data. Additionally, we analyze the simulated dataset to demonstrate how the estimation procedure works in neutrosophic environment. Total different random samples from the NPD are generated with values of is taken as , whereas the value of is fixed at . The behavior of ML estimator from unknown shape parameter and scale parameter is also investigated in terms of neutrosophic root mean square error (). is estimated according to the formula given as follows:where and are, respectively, actual and predicted value of the estimated parameter, and is the total number of simulation runs. The R packages EnvStats and Metrics have been utilized to estimate the model’s parameters and calculate the values of root mean square error. The estimated values of at a fixed value of scale parameter along with values are reported in Table 2.

Table 2 shows that when the sample size increases, the value of the estimator tends to the benchmark value , and decreases to zero. This trending behavior reveals that ML neutrosophic estimators efficiently perform with a larger sample size. We can estimate and observe the performance of the scale parameter , but results are not presented here due to a similar trend.

5. Applications of the Proposed Model

Two real datasets are utilized in this section to show how the proposed NPD may be implemented.

5.1. The Dioxin Data

Dioxins are a class of very poisonous chemical substances that are dangerous to humans [23]. In the environment, dioxins pose a threat. Dioxins are a matter of concern due to their extremely hazardous potential impact on human health. Experimental studies revealed that they could impact negatively on reproductive, developmental, and immunological systems and organs in the human body [24].

Additionally, they can affect hormones and result in cancer. Once dioxins enter cells, they remain there for an extended period of time due to their chemical persistence and ability to be absorbed by fatty tissue, where they are subsequently retained [25]. Dioxins are mostly produced as a byproduct of industrial operations; however, they can also be produced naturally. Dioxins are unintended byproducts of various manufacturing processes, including chlorine drying of paper pulp, smelting, and the production of some agricultural chemicals [26]. When it comes to dioxin discharge into the environment, unregulated waste incinerators are frequently the greatest offenders, because of incomplete combustion. The vast majority of dioxins in the food supply are found in dairy products, meat, shellfish, and fish. That is why securing our food supply is so important. Although dioxins are formed locally, their environmental dispersion is worldwide. Dioxins are prevalent in the environment on a global scale. Dioxins emission is therefore monitored by many countries on a regular basis. The total amount of dioxins emitted in Japan is monitored on a regular basis by the Ministry of Environment [27]. Because of good government policy, the quantity occurrence in the ecosystem or in food is now extremely small; routine levels of ingestion are extremely unlikely to cause acute toxicity. The current levels of dioxins in Japan indicate an extremely low risk of cancer. To assess the safety of dioxins exposure, the TDI (tolerable daily intake) is employed as an indication. It is the quantity of a chemical substance that may be safely absorbed into the body over a long period of time, per kg of body weight per day, known as TDI. Thus, TDI is a figure that is used as an indicator of how long it will take for daily intake to have an effect on health [28]. The amount of dioxins absorption from the average diet estimated annually for the time period 1998 to 2015 is published in the annual report 2017 on environmental statistics by the Ministry of Environment Japan [29]. First, the Pareto distribution on intake data for dioxins is evaluated using the distribution fitting package in R software. Figure 5 depicts basic probability plots and empirical density.

When the systematic deviations of the points from the straight line in each graph are considered, it is established that the Pareto distribution is an appropriate model for this dataset. Thus, the visualization plots in Figure 5 show that the process data are skewed. It is, therefore, possible to investigate the data in further depth by applying the previously proposed model. Although intakes of dioxins are initially precise quantities for demonstration purposes, we assume the uncertain sample values as shown in Table 3. The imprecise data are formed according to the strategy devised in [30].

Because of uncertain values, traditional Pareto analysis of these types of data is inappropriate. It is possible to use the suggested NPD to summarize the data containing indeterminacies. Table 4 provides a descriptive summary of the consumption of dioxins from a typical diet using NPD.

Table 4 shows the estimated neutrosophic measures based on the suggested model. All the estimated values are expressed as intervals because of indeterminacies inherent in the analyzed dataset. Thus, the suggested model is more adaptable and capable of efficiently analyzing incomplete data or estimating the parameters with imprecision values.

5.2. The Child Mortality Rate Data

The second dataset used in this analysis provides the childhood mortality rates under the age of five, covering the period 1995 to 2020 for Saudi Arabia. The information has been gathered from a well-known source on the WHO’s global health indicators database, and it is usually expressed as a ratio per 1,000 live births [31]. Even though child fatality has significantly decreased at extremely low rates in many regions of the globe, it is still seen as a significant issue that requires great attention in the country’s policies. Significant worldwide progress has been achieved since 1990 in lowering child fatalities. There has been amazing development and remarkable progress in the health of children and adolescents in Saudi Arabia during the past two decades due to major factors such as malnutrition reduction, immunization of infectious illnesses, and diarrhea control [32]. The data from the source are crisp death rate values during the first five years of life. To aid comprehension of the previous notion of the suggested distribution, neutrosophic data are created using the approach provided in [30]. The interval childhood mortality rates for the period 1995–2020 are given in Table 5.

The noticeable uncertainty estimates in Table 5 are due to the fact that different estimation procedures typically used for reporting the mortality rates hinder the exact estimates. Depending on the number of census errors and the various estimating methodologies, there are likely to be fluctuations in estimates for any particular country. A distribution fitting R tool is used to depict the basic probability plots to test the applicability of the Pareto distribution on average child death rates, as shown in Figure 6.

In Figure 6, the subjective visual examination of the data suggests that the Pareto distribution is a reasonable model for the mortality data as observations are very close to the straight line. As interval childhood mortality rates are utilized in this investigation, the conventional Pareto analysis is inapplicable. It is feasible to summarize data, including uncertainties using the proposed model. Using the suggested neutrosophic model, Table 6 displays a descriptive overview of the mortality statistics.

Table 6 provides the estimated uncertainty bounds of some essential statistics based on the proposed distribution. All estimated values are provided as intervals due to the intrinsic imprecision of the dataset being studied. Thus, the proposed model is more flexible and capable of evaluating an imprecise dataset more effectively.

6. Conclusions

The neutrosophic framework of the Pareto distribution and its applications in applied statistical methods are presented in this work. Statistical characteristics of the newly proposed model using the neutrosophic logic have been widely explored. The key expressions for the suggested model, such as cumulative function, hazard function, reliability function, and survival function, have been derived and discussed in detail. The ML estimates for the unknown parameters of NPD have been developed. The theoretical characteristics of the proposed model have been evaluated using the Monte Carlo simulation approach. The effectiveness of the suggested NPD has been demonstrated by using a real dataset on average dioxins consumption data collected from food samples throughout the country of Japan.

A future study might concentrate on enhancing the capacity of the suggested distribution for various inference techniques and its utility for processing high-dimensional data.

Data Availability

The data that support the findings of this study are available within the manuscript.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors extend their appreciation to the Deanship of Scientific Research at King Khalid University for funding this work through Large Groups (project under grant no. RGP.2/34/43). This research was funded by the Deanship of Scientific Research at Princess Nourah bint Abdulrahman University, through the Research Funding Program (grant no. FRP-1443-21).