Innovative Correlation Coefficient Measurement with Fuzzy Data

Wu, Berlin; Hung, Chin Feng

doi:https://doi.org/10.1155/2016/9094832

Mathematical Problems in Engineering

On this page

Abstract Introduction Conclusion Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2016 | Article ID 9094832 | https://doi.org/10.1155/2016/9094832

Innovative Correlation Coefficient Measurement with Fuzzy Data

Berlin Wu¹and Chin Feng Hung¹

Academic Editor: László T. Kóczy

Received17 Dec 2015

Revised21 Apr 2016

Accepted28 Apr 2016

Published30 May 2016

Abstract

Correlation coefficients are commonly found with crisp data. In this paper, we use Pearson’s correlation coefficient and propose a method for evaluating correlation coefficients for fuzzy interval data. Our empirical studies involve the relationship between mathematics achievement and other projects.

1. Introduction

Human thought processes are mainly based on cognitive awareness of the environment and social phenomena. Human knowledge is fuzzy because of humans’ subjective awareness of time and space. Therefore, Wu [1] proposed fuzzy theory in reference to how humans perceive complex and uncertain environmental phenomena.

To determine the correlation between phenomena and , a scatter plot is often used. Using a scatter plot, the correlation between phenomena and can be determined to be positive, negative, or statistically independent.

In traditional statistical analysis, correlation coefficients are often found using crisp data. In this paper, we use Pearson’s correlation coefficient to calculate correlation coefficients for fuzzy interval data. Fuzzy correlation coefficients are often applied in the fields of engineering or economics but have also been increasingly emphasized in social sciences.

Fuzzy correlations are referenced in the literature. For instance, Nguyen et al. [2, 3] provided the fundamentals of statistics with fuzzy data. Hong and Hwang [4] established the correlation coefficient of intuitionistic fuzzy sets in probability space by using the generalization of fuzzy sets by Zadeh [5]. Chiang and Lin [6] argued that membership degrees are concrete observational values based on the membership functions of fuzzy sets to define fuzzy correlation coefficients. Chaudhuri and Bhattacharya [7] investigated the correlation of two fuzzy sets that were defined by the members of the supports, which were ranked to evaluate the correlation coefficients of two fuzzy sets. Hong [8] and Ni and Cheung [9] also suggested some methods for calculating fuzzy correlations. Based on correlation coefficients developed by Liu and Kao [10], Xie and Wu [11] and Yang [12] established fuzzy correlation coefficients and obtained fuzzy correlation intervals based on fuzzy interval sample data. R. Saneifard and R. Saneifard [13] calculated the correlation coefficient for fuzzy data by adopting the method from central interval. Cheng and Yang [14] proposed a method for determining fuzzy correlation coefficients and explained the application of fuzzy correlation. Hanafy et al. [15] evaluated the correlation coefficients of neutrosophic sets by centroid method. Wu et al. [16] developed a new approach for determining fuzzy correlation and applied this approach to 12-year compulsory education in Taiwan. Lin et al. [17] investigated some problems on marketing research by using a soft computing technique and a new statistics tool.

The main purpose of this paper is to develop fuzzy correlation coefficients for fuzzy interval data. We propose a functional formula for determining fuzzy correlation coefficients of two variables. We can find the maximum and minimum values by differentiating our proposed functional formula. However, the formula can be applied not only when the value of one of two data sets is a real number but also when both data sets are real numbers. Using this method of research, we can provide information for researchers to explain related phenomena in practice.

2. Research Approaches

Let , be a fuzzy sample set; then, the correlation coefficient between and is defined aswhere and are sample means for and , respectively.

Definition 1 (fuzzy interval number). Let a fuzzy number be an interval over the real number , let be the center of interval , and let be the radius of interval ; then, interval can be expressed as or . Consider interval a fuzzy interval number.
Consider the fuzzy sample set , where and are fuzzy interval numbers, as shown in Figure 1.
The algorithm of the correlation coefficient between and consists of the following five steps.

Step 1. For any fuzzy interval number, and , is defined by a rectangle. In addition, the rectangle has four vertices, , , , and , the coordinates of which are , , , and , respectively, as shown in Figure 2.

Step 2. Choose a point lying in the line segment such that two segments’ proportion , where , and the point coordinate is obtained, as shown in Figure 3.

Step 3. Choose a point lying in the line segment such that parallels . Next, choose a point lying in the line segment such that two segments’ proportion , where , and the point coordinate is obtained, as shown in Figure 4.

Definition 2. The domain set .

Step 4. Calculate the correlation coefficient function between and by using formula (1). In this case, the correlation coefficient function is a function of two variables and for the closed region bounded by and is expressed as .

Step 5. By the differentiation method, we can find the maximum and minimum values of the correlation coefficient function .

2.1. The Assumption of Corresponding Points of Each Rectangle

Our initial idea is to find the correlation coefficient for corresponding points of each rectangle. For Example 3, we can find that the correlation coefficient for the centroid of each rectangle.

Example 3. Consider the rectangle sample data , , and , as shown in Figure 5.
For Example 3, we also can find the correlation coefficient for the upper-right point coordinate, the correlation coefficient for the lower-right point coordinate, the correlation coefficient for the lower-left point coordinate, and the correlation coefficient for the upper-left point coordinate of each rectangle, as shown in Figures 6, 7, 8, and 9.
We also can find the correlation coefficient for interior corresponding point of each rectangle; for example, the coordinate as shown in Figure 10.
We assume that the coordinate of corresponding point of each rectangle is fixed; for example, the coordinate for Figure 5 and the coordinate for Figure 6. However, we do not consider the case that each rectangle may have different coordinates , for example, as shown in Figure 11.
The coordinate of rectangle , the coordinate of rectangle , and the coordinate of rectangle in Figure 11; their coordinates are not different.
According to Figures 6–9, we cannot say the maximal value of the correlation coefficient and the minimal value of the correlation coefficient , because there are infinitely many corresponding points for boundary points and interior points of each rectangle; we must evaluate every correlation coefficient for corresponding points of each rectangle. Therefore, we use Steps 1 to 5 and the differential rule of two variables to evaluate the maximal and the minimal values of the correlation coefficient.
Based on Example 3, three point coordinates, , , and , are obtained. We then find the sample means and , respectively. Therefore, the correlation coefficient function between and is for the closed region bounded by .
The first-order derivatives for and are, respectively, Let ; it follows that . There is no critical point for the equation bounded by . The reason is as follows: If , then we obtain , and if , then we obtain . Hence, their critical points on the equation do not belong to the set .
The boundary of the region consists of the lines , , , and . Consideration of extrema on the boundary of the region along leads to the function , . There is no critical point when setting the derivative with respect to equal to zero. The endpoints are and . Consideration of extrema on the boundary of the region along leads to the function , . There is no critical point when setting the derivative with respect to equal to zero. The endpoints are and . Consideration of extrema on the boundary of the region along leads to the function , . There is no critical point when setting the derivative with respect to equal to zero. The endpoints are and . Consideration of extrema on the boundary of the region along leads to the function , . There is no critical point when setting the derivative with respect to equal to zero. The endpoints are and . All candidates for the maximum and minimum values are listed in Table 1. We see that the minimum value is ; the maximum value is . Therefore, the fuzzy interval number of the correlation coefficient is .

Theorem 4. If is continuous on a closed, bounded region, then has a maximum value and a minimum value on the region. These extrema occur either (1) where all first partial derivatives of are zero, (2) where some first partial derivative of does not exist, or (3) on the boundary of the region.

Proof. See [18].

3. Case Studies

In this section, we discuss some cases of fuzzy correlation coefficients. First, we analyze a case in which maximal value 1 or minimal values −1 of the correlation coefficient occur for the closed region bounded by . Second, we analyze a case in which maximal value 1 or minimal values −1 of the correlation coefficient do not occur for the closed region bounded by .

Case 1. Maximal value 1 or minimal values −1 of the correlation coefficient occur for the closed region bounded by .

Example 5. Consider the rectangle sample data , , and , as shown in Figure 12.
Based on the previous discussion, three point coordinates, , , and , are obtained. We then find the sample means and , respectively. Therefore, the correlation coefficient function between and is for the closed region bounded by .
The first-order derivatives for and are, respectively, Let ; it follows that Infinitely many critical points are found for the equation bounded by . For example, there are two points, or .
The boundary of the region consists of the lines , , , and . Consideration of extrema on the boundary of the region along leads to the function , . Setting the derivative with respect to equal to zero gives the point . The endpoints are and . Consideration of extrema on the boundary of the region along leads to the function , . Setting the derivative with respect to equal to zero gives the point . The endpoints are and . Consideration of extrema on the boundary of the region along leads to the function , . Setting the derivative with respect to equal to zero gives the point . The endpoints are and . Consideration of extrema on the boundary of the region along leads to the function , . Setting the derivative with respect to equal to zero gives the point . The endpoints are and . All candidates for the maximum and minimum values are listed in Table 2. We see that the minimum value is ; the maximum value is . Therefore, the fuzzy interval number of the correlation coefficient is .
In this case, the center points of these three rectangles are positively correlated, and . Moreover, these three rectangles are approximately symmetric to the straight line . Hence, the tendency of positive correlation of these three rectangles is high. In other words, the fuzzy correlation coefficient may have a smaller range.

Example 6. Consider the rectangle sample data , , and , as shown in Figure 13.
Based on the previous discussion, three point coordinates, , , and , are obtained. We then find the sample means and , respectively. Therefore, the correlation coefficient function between and is for the closed region bounded by .
The first-order derivatives for and are, respectively, Let ; it follows that . Infinitely many critical points can be found for equation bounded by . For example, there are two points, or .
The boundary of the region consists of the lines , , , and . Consideration of extrema on the boundary of the region along leads to the function , . There is no critical point when setting the derivative with respect to equal to zero. The endpoints are and . Consideration of extrema on the boundary of the region along leads to the function , . Setting the derivative with respect to equal to zero gives the point . The endpoints are and . Consideration of extrema on the boundary of the region along leads to the function , . There is no critical point when setting the derivative with respect to equal to zero. The endpoints are and . Consideration of extrema on the boundary of the region along leads to the function , . Setting the derivative with respect to equal to zero gives the point . The endpoints are and . All candidates for the maximum and minimum values are listed in Table 3. We see that the minimum value is ; the maximum value is . Therefore, the fuzzy interval number of the correlation coefficient is .
In this case, the center points of these three rectangles are positively correlated, but . Moreover, these three rectangles are not symmetric to any straight lines. Hence, the tendency of positive correlation of these three rectangles is not evident. In other words, the fuzzy correlation coefficient may be a large range.

Case 2. Maximal value 1 or minimal values −1 of the correlation coefficient do not occur for the closed region bounded by .

Example 7. Consider the rectangle sample data , , and , as shown in Figure 14.
Based on the previous discussion, three point coordinates, , , and , are obtained. We then find the sample means and , respectively. Therefore, the correlation coefficient function between and is for the closed region bounded by .
The first-order derivatives for and are, respectively, Let ; it follows that . There is no critical point for equation bounded by . The reason is as follows: If , then we obtain , and if , then we obtain . Hence, the critical points for equation do not belong to the set .
The boundary of the region consists of the lines , , , and . Consideration of extrema on the boundary of the region along leads to the function , . There is no critical point when setting the derivative with respect to equal to zero. The endpoints are and . Consideration of extrema on the boundary of the region along leads to the function , . There is no critical point when setting the derivative with respect to equal to zero. The endpoints are and . Consideration of extrema on the boundary of the region along leads to the function , . There is no critical point when setting the derivative with respect to equal to zero. The endpoints are and . Consideration of extrema on the boundary of the region along leads to the function , . There is no critical point when setting the derivative with respect to equal to zero. The endpoints are and . All candidates for the maximum and minimum values are listed in Table 4. We see that the minimum value is ; the maximum value is . Therefore, the fuzzy interval number of the correlation coefficient is .
Based on the scatter plots of Examples 6 and 7, we intuitively think that the scatter plot of Example 7 is more dispersed than the scatter plot of Example 6. Therefore, the fuzzy correlation coefficient of Example 7 will have a larger range.

Example 8. Consider the rectangle sample data , , , and , as shown in Figure 15.
Based on the previous discussion, four point coordinates, , , , and , are obtained. We then find the sample means and , respectively. Therefore, the correlation coefficient function between and is for the closed region bounded by .
The first-order derivatives for and are, respectively, First, let ; it follows that . We obtain . This critical point for equation does not belong to the set . Therefore, no local maximum or minimum is in .
Second, the boundary of the region consists of the lines , , , and . Consideration of extrema on the boundary of the region along leads to the function , . There is no critical point when setting the derivative with respect to equal to zero. The endpoints are and . Consideration of extrema on the boundary of the region along leads to the function , . There is no critical point when setting the derivative with respect to equal to zero. The endpoints are and . Consideration of extrema on the boundary of the region along leads to the function , . There is no critical point when setting the derivative with respect to equal to zero. The endpoints are and . Consideration of extrema on the boundary of the region along leads to the function , . There is no critical point when setting the derivative with respect to equal to zero. The endpoints are and . All candidates for the maximum and minimum values are listed in Table 5. We see that the minimum value is ; the maximum value is . Therefore, the fuzzy interval number of the correlation coefficient is .
Comparing the scatter plots of Examples 7 and 8, we intuitively think that the scatter plot of Example 8 is more concentrated and has a greater tendency of positive correlation than that of the scatter plot of Example 7. Moreover, when fuzzy interval data of the scatter plot increase, the fuzzy correlation coefficient will have a smaller range.
Lin et al. [17] proposed the formula of the fuzzy correlation coefficient as the following four situations: where is the correlation coefficient of the center point of each rectangle, is the correlation coefficient of the interval lengths and of each of the fuzzy interval numbers and , and .
Next, the four scatter plots are observed as follows.
Intuitively, the degrees of spread of the four scatter plots (refer to Figures 16, 17, 18, and 19) do not seem to be the same. Hence, the fuzzy correlation coefficient should not be equal. But the formula (12) of Wu et al. [16] shows that the four fuzzy correlation coefficients are equal, and .
However, our proposed method obtains different results. The four fuzzy correlation coefficients (refer to Figures 16, 17, 18, and 19) obtained through our proposed method are , and , respectively. Therefore, our proposed method produces results that are more consistent with our intuition.

4. Empirical Studies

In this section, we discuss some applications of fuzzy correlation coefficients. First, we analyze a case in which two data sets are fuzzy interval numbers. Second, we change the case to one in which one data set is a fuzzy interval number, and the other is a real number. Finally, we analyze a case in which both data sets are real numbers.

To understand the factors influencing mathematics achievement at a school, we investigate 10 students’ data.

Example 9. Consider the rectangle sample data for 10 students: , , , , , , , , , and , where and denote the mathematics score and weekly online time, respectively, of a student , , as shown in Figure 20.
Based on the previous discussion, the correlation coefficient function between and is for the closed region bounded by .
The first-order derivatives for and are, respectively, First, let ; it follows that or . Therefore, no local maximum or minimum is in .
Second, based on the previous discussion, the fuzzy interval number of the correlation coefficient is .
Clearly, and have a highly negative correlation. In other words, a higher mathematics score correlates with a lower weekly online time. The weekly online time of a student negatively influences the student’s mathematics score.

Example 10. If Example 9 is adjusted to , , , , , , , , , and (i.e., is a real number, ), then the correlation coefficient function between and is bounded by the set .
The first-order derivative with respect to is Let ; it follows that , which is a critical point bounded by the set . Based on the previous discussion, the fuzzy interval number of the correlation coefficient is .

Example 11. If Example 9 is adjusted to , , , , , , , , , and (i.e., both and are real numbers, ), then the correlation coefficient between and is .
According to the results of Examples 9, 10, and 11, we find that Example 9 is the generalized situation of Examples 10 and 11.

Example 12. Consider the rectangle sample data of 10 students: , , , , , , , , , and , where and denote the mathematics score and weekly sleeping time, respectively, of a student , , as shown in Figure 21.
Based on the previous discussion, the correlation coefficient function between and is for the closed region bounded by .
The first-order derivatives for and are, respectively, First, let ; it follows that or . Therefore, no local maximum or minimum is in .
Second, based on the previous discussion, the fuzzy interval number of the correlation coefficient is .
Based on the previous discussion, there is a minor positive correlation between mathematics score and weekly sleeping time. Therefore, a higher mathematics score correlates with a lower weekly sleeping time. The influence of weekly sleeping time on mathematics score is minor.

Example 13. Consider the rectangle sample data of 10 students: , , , , , , , , , and , where and denote the mathematics and Chinese scores, respectively, of a student , , as shown in Figure 22.
Based on the previous discussion, the correlation coefficient function between and is for the closed region bounded by .
The first-order derivatives with for and are, respectively, First, let ; it follows that or . Therefore, no local maximum or minimum is in .
Second, based on the previous discussion, the fuzzy interval number of the correlation coefficient is .
Based on the previous discussion, and have a highly positive correlation. Therefore, a higher mathematics score correlates with a higher Chinese score. Students’ Chinese scores positively influence their mathematics scores.

5. Conclusion

Scientists are accustomed to using binary logic to analyze information. Human logic is fuzzy and complex, and applying binary logic to analyze human thought processes causes some distortion. Fuzzy logic is based on human thought processes, and fuzzy logic has therefore been increasing applied to social science.

Possible methods of calculating fuzzy correlation coefficients are proposed in the literature, but understanding most formulas used in the literature requires a strong mathematical background. In this paper, we use Pearson’s correlation coefficient and the differentiation method to evaluate fuzzy correlation coefficients, which can be applied to cases in which two data sets are fuzzy interval numbers, one of two data sets is a fuzzy interval number and the other is a real number, and both data sets are real numbers.

This paper discusses only fuzzy correlation coefficients of fuzzy interval number. However, we will extend the research method that we used to triangular or trapezoidal fuzzy numbers in the future.

Competing Interests

The authors declare that they have no competing interests.

Acknowledgments

This work of Berlin Wu is partially supported by the National Science Council of the Republic of China under Contract 102–2410-H-004-182.

References

B. Wu, Introduction of Fuzzy Statistics, Wu-Naan Book, Taipei, Taiwan, 2005.
H. T. Nguyen, V. Kreinovich, B. Wu, and G. Xiang, Computing Statistics under Interval and Fuzzy Uncertainty, Springer, Heidelberg, Germany, 2011.
View at: Publisher Site | MathSciNet
H. Nguyen and B. Wu, Fundamentals of Statistics with Fuzzy Data, Springer, Heidelberg, Germany, 2006.
D. H. Hong and S. Y. Hwang, “Correlation of intuitionistic fuzzy sets in probability spaces,” Fuzzy Sets and Systems, vol. 75, no. 1, pp. 77–81, 1995.
View at: Publisher Site | Google Scholar | MathSciNet
L. A. Zadeh, “Fuzzy sets as a basis for a theory of possibility,” Fuzzy Sets and Systems, vol. 1, no. 1, pp. 3–28, 1978.
View at: Publisher Site | Google Scholar
D.-A. Chiang and N. P. Lin, “Correlation of fuzzy sets,” Fuzzy Sets and Systems, vol. 102, no. 2, pp. 221–226, 1999.
View at: Publisher Site | Google Scholar | MathSciNet
B. B. Chaudhuri and A. Bhattacharya, “On correlation between two fuzzy sets,” Fuzzy Sets and Systems, vol. 118, no. 3, pp. 447–456, 2001.
View at: Publisher Site | Google Scholar | MathSciNet
D. Hong, “Fuzzy measures for a correlation coefficient of fuzzy numbers under TW (the weakest t-norm)-based fuzzy arithmetic operations,” Information Sciences, vol. 176, no. 2, pp. 150–160, 2006.
View at: Publisher Site | Google Scholar
Y. Ni and J. Y. Cheung, “Correlation coefficient estimate for fuzzy data,” in Intelligent Systems Design and Applications, vol. 23 of Advances in Soft Computing, pp. 2138–2144, Springer, Berlin, Germany, 2003.
View at: Google Scholar
S.-T. Liu and C. Kao, “Fuzzy measures for correlation coefficient of fuzzy numbers,” Fuzzy Sets and Systems, vol. 128, no. 2, pp. 267–275, 2002.
View at: Publisher Site | Google Scholar | MathSciNet
M. C. Xie and B. Wu, “The relationship between high schools students time management and academic performance: an application of fuzzy correlation,” Educational Policy Forum, vol. 15, no. 1, pp. 157–176, 2012.
View at: Google Scholar
C. C. Yang, “Correlation coefficient evaluation for the fuzzy interval data,” Journal of Business Research, vol. 69, no. 6, pp. 2138–2144, 2016.
View at: Publisher Site | Google Scholar
R. Saneifard and R. Saneifard, “Correlation coefficient between fuzzy numbers based on central interval,” Journal of Fuzzy Set Valued Analysis, vol. 2012, Article ID jfsva-00108, 9 pages, 2012.
View at: Publisher Site | Google Scholar
Y. T. Cheng and C. C. Yang, “The application of fuzzy correlation coefficient with fuzzy interval data,” International Journal of Innovative Management, Information & Production, vol. 4, no. 1, pp. 65–71, 2013.
View at: Google Scholar
I. M. Hanafy, A. A. Salama, and K. M. Mahfouz, “Correlation coefficients of neutrosophic sets by centroid method,” International Journal of Probability and Statistics, vol. 2, no. 1, pp. 9–12, 2013.
View at: Publisher Site | Google Scholar
B. Wu, W. Lai, C. L. Wu, and T. K. Tienliu, “Correlation with fuzzy data and its applications in the 12-year compulsory education in Taiwan,” Communications in Statistics—Simulation and Computation, vol. 45, no. 4, pp. 1337–1354, 2016.
View at: Publisher Site | Google Scholar
H. Lin, C. Wang, J. C. Chen, and B. Wu, “New statistical analysis in marketing research with fuzzy data,” Journal of Business Research, vol. 69, no. 6, pp. 2176–2181, 2016.
View at: Publisher Site | Google Scholar
R. A. Hunt, Calculus with Analytic Geometry, Harper & Row, New York, NY, USA, 1998.

Copyright

Copyright © 2016 Berlin Wu and Chin Feng Hung. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

2051

Downloads

932

Citations