Bayesian Methods for Predicting the Shape of Chinese Yam in Terms of Key Diameters
This paper proposes Bayesian methods for the shape estimation of Chinese yam (Dioscorea opposita) using a few key diameters of yam. Shape prediction of yam is applicable to determining optimal cutoff positions of a yam for producing seed yams. Our Bayesian method, which is a combination of Bayesian estimation model and predictive model, enables automatic, rapid, and low-cost processing of yam. After the construction of the proposed models using a sample data set in Japan, the models provide whole shape prediction of yam based on only a few key diameters. The Bayesian method performed well on the shape prediction in terms of minimizing the mean squared error between measured shape and the prediction. In particular, a multiple regression method with key diameters at two fixed positions attained the highest performance for shape prediction. We have developed automatic, rapid, and low-cost yam-processing machines based on the Bayesian estimation model and predictive model. Development of such shape prediction approaches, including our Bayesian method, can be a valuable aid in reducing the cost and time in food processing.
Chinese yam (Dioscorea opposita) is one of the most exported crops from Japan. The value of yam exports reached 1.89 billion JPY in 2013 . About 90% of the total yield of yam in Japan was produced in two prefectures, Hokkaido (45.8%) and Aomori (44.0%), in 2012 . In both prefectures, mechanical cultivation is used for rapid expansion of production. However, seed yams (seed tubers of yams), which are uniformly cutoff yams (Figure 1), are manually produced and require the effort of 300 people·h/ha. In order to reduce the cost of production and improve the yield of yams, mechanization for producing seed yams is required.
The problem in the mechanization of seed yam production is how to determine the cutoff positions for each yam. It is expected that a yam be uniformly cut with a desired weight and without much loss. Therefore, under the assumption of equal density among yams, it is required that the shape of the yam be measured, since the weight of each seed yam can be calculated using the shape and the cutoff positions.
A straightforward way to measure the shape of a yam is to scan a yam using sensors. However, this includes three problems: () cost of the sensor, () speed of the process, and () accuracy of the scanning (e.g., trichomes of a yam can reduce the accuracy of the scanning). Another way is to use images of yams for shape determination. Such an approach has been widely used in fruit/crop grading, classification and removal before shipment [3–6]. Computational and statistical methodologies have been provided [7–16]. In the case of producing seed yams, the problem is much simpler than the general problem mentioned above for fruits and crops; we can assume a regular pattern of yams (see Figure 1) and do not have to strictly check yam damage, because the purpose here is to know the shape of yams quickly without the use of many devices (i.e., a low-cost way).
In this paper, we propose a Bayesian framework to address issues () and (), that is, to provide a low-cost and high-speed way for shape prediction of yam. Our hypothesis is that shape of yam can be predicted by a few key diameters at fixed positions, under an assumption that shape of yam can be represented by a set of diameters. In order to examine this hypothesis, we need to construct a model that gives a relationship between the diameters to be predicted and the key diameters, which can be measured. A difficulty in the model construction is that measurements of diameters for each sample are insufficient and unsteady. Thus, we introduce a Bayesian framework to relieve such difficulty.
Bayesian method is a technique for statistical inference that updates the probability based on a prior probability for random parameters in a model based on observations. By using Bayesian inference, we can set up a prior distribution for parameters based on prior information, which is available in advance, to obtain robust estimates for parameters for lack of observations, so Bayesian method is especially useful when observational data are insufficient for estimation. In this reason, methods of Bayesian data analysis are widely applied (e.g., ). Bayesian inference is particularly important in time series analysis. For example,  proposed an approach of Bayesian smoothness priors for analyzing time varying structure in a dynamic system; it is useful for a case that there are some missing data in time series. In this paper, we apply the technique of smoothness priors to the problem of shape prediction of Chinese yam.
The proposed method estimates the whole shape of a yam based on a few measurements of the key diameter of the yam. The two issues regarding the measurement of the shape of yams are overcome by using the proposed method, since the diameter of a yam are easily and accurately measured without any sensors. We estimated optimal positions of the diameter to be measured by minimizing the error of the shape prediction. We also illustrated high performance of the proposed method in terms of estimating the shape of yams using a sample data set, which contains the length, weight, and diameters at intervals of 10 to 50 mm (Figure 2, see also Section 2.2) of 111 yams from Hokkaido, Japan. After the construction of the proposed method using the sample data set, the method gives whole shape prediction of yam based on a few key diameters without any scanners or images of yam.
The rest of this paper is organized as follows; Section 2 discusses the procedures for implementing the proposed methods, the results obtained from a set of sample data are show in Section 3, and the result and performance of the proposed methods are discussed in Section 4. Finally, Section 5 concludes the paper.
2. Materials and Methods
2.1. Basic Consideration
In this section, we introduce our sample data set and proposed methods. After the construction of the proposed methods using the sample data set, the methods predict the whole shape of yam, which can be expressed by all the diameters along the length of a yam tuber shaft, based on a few key diameters that can be measured in advance.
We developed Bayesian methods to predict the shape of a yam in three steps.
Step 0. Arrange all yams into interval (Figure 3).
Step 1. Apply Bayesian estimation model to estimate missing diameters (Figure 4).
Step 2. Construct Bayesian predictive model for shape prediction (Figure 5).
First of all, as Step of our Bayesian methods, all yams are arranged into interval (Figure 3). For example, in Figure 3, are actual observations, and (), that is, , are missing. We need a model to estimate all missing diameters. However, a problem is that the number of missing diameters to be estimated exceeded that of the observations. Therefore, we applied the Bayesian model to solve this problem (Step ). In Step , we constructed a predictive model based on the observed diameters and estimated diameters in Step . The details of the sample data set and proposed methods are explained in the following subsections.
2.2. Sample Data Set
In this study, we used data from 111 yams in Hokkaido, Japan, to construct Bayesian models. Each yam had measurements of length (mm), weight (g), and diameters (mm) at suitable positions (Figure 2 and description below). All yams were automatically cut off at the position with a diameter of 25 mm (Figure 2). The mean length, weight, and diameter were (±64.31) mm, 783.24 (±205.67) g, and 44.30 (±14.43) mm, respectively. The diameters were measured at intervals of 25 mm for 87 yams and 50 mm for 24 yams. Out of the 87 yams, 60 had detailed measurements of the diameter at intervals of 10 mm at the front edge of the yam. A scatterplot of the length and weight of the 111 yams in this study is shown in Appendix A. Length and weight were highly correlated with each other (Pearson correlation coefficient , ), implying high quality of the data for model construction.
2.3. Step 1: Bayesian Estimation Model for Estimating Missing Diameters
For a sample yam , we consider the model for the observation of the diameter at the -th point as follows:where , , and are the diameter, true diameter, and measurement error, respectively, is the number of yams in the sample, and is the number of equally spaced points for which the true diameter to be estimated. Note that when there is an observation near the -th point, we regard it as the measure for ; otherwise we consider that the is missing.
A difficulty in estimating the unknown quantities for and is that the number of the unknown quantities that need to be estimated is larger than that of the observations; that is, we have too many missing values for the diameters. In order to alleviate this difficulty, we used a Bayesian model. Here, from the viewpoint of a Bayesian approach, is treated as a random variable. It is assumed that the distribution of this variable can be described with stochastic difference equations that are called smoothness priors (). For a given sample , we express the smoothness priors for by a -nd order stochastic difference equation asIn (1) and (2), and are white noise sequences on , and they are independent of each other, where and are unknown parameters. By introducing the smoothness priors described in (2) into the model in (1), we can construct a set of flexible Bayesian linear models for .
Now, we putThen, the model in (1) and (2) can be expressed by the following state space model:In the state space model comprising (4), the parameter is included in the state vector , so its estimate can be obtained from the estimate of . Moreover, the variances and can be estimated by the maximum likelihood method. The above Bayesian model to estimate diameters of yams was first introduced in  for another application.
When the parameters and are given, we can obtain the estimate of using the algorithm of Kalman filter. The estimates for parameters and are obtained by maximizing a likelihood function which is defined based on the Kalman filter. See Appendix B for the algorithm of Kalman filter and Appendix C for the estimation of the parameters and in detail. See also [18, 20].
2.4. Step 2: Bayesian Predictive Model for Shape Prediction Using Key Diameter(s)
In this section, we propose three models for predicting the shape of a yam based on the results estimated from a set of samples. Let be a key diameter at position (mm) from the tip of the th yam (cf. Figure 5). Also, let and be the key diameters at positions (mm) and (mm) from the tip of the th yam.
2.4.1. Weighted Averaging (WA)
We aim to predict the diameters at all points of a yam from the key diameters .
Defining and , the posterior distribution of the normalized diameter is given by , where is given by the first element of , and is given by the element of , which were obtained from the fixed-interval smoothing mentioned above. The weighted average of the diameters is then calculated bywhich can be regarded as the standard shape of the average yam.
Then, for a yam with the value for the key diameter , its predicted diameter value at point is given by
2.4.2. Regression Models (RM)
Single Regression Model (S-RM). For the estimated value of the diameter and the value of key diameter , a single regression model is constructed asThen, we can obtain the estimates and of the regression coefficients and at point using a least squares method. For a given yam with a key diameter , the predictive value of the diameter at the point is obtained by .
Multiple Regression Model (M-RM). Based on the estimated value of the diameter and the values of and , a multiple regression model is built asThen, the predictive value of the diameter at point is obtained using the relation with , , and being the estimates of the regression coefficients , , and , respectively.
2.5. Evaluating the Performance of the Bayesian Methods
As mentioned above, three kinds of predictive models were constructed. There were two issues related to these predictive models. One was how to determine the location parameters, that is, in the WA and S-RM models or and in the M-RM model. Another issue is how to evaluate these different models. A useful way to address these issues is the use of the mean squared error (MSE) as a criterion for evaluating the predictive models (see, e.g., ).
Specifically, for the WA and S-RM models, the MSE is defined bywhere is the predictive value of the diameter at the th point on the th yam with the location parameter , is the index set with the index set for missing values (so, indicate the actual observations for th yam), and is the total number of indices with measurements. Thus, the mean square differences between predictive values and the observations for the diameters can be expressed. Therefore, we can determine the location parameter by minimizing the value of and then evaluate the predictive models based on the minimum values of .
Similarly, for the M-RM model, MSE is defined by where is the predictive value of the diameter at the th point on the th yam with the location parameters and .
A predictive model that minimizes the minimum values of and is considered to be the best model.
First of all, as Step of the proposed approach, measurements of diameter were disposed at equal intervals with . For example, for the -th yam with a length of 500 mm and a measuring interval of 50 mm, we obtain the measurement of diameter as , and () are missing. We then applied the Bayesian estimation model to estimate the diameters at every as Step of the proposed approach. In Step , predictive models were constructed using the estimated values of parameters. In fact, three approaches of predicting yam shape, that is, WA, S-RM and M-RM, were applied to obtain the prediction for diameters. We set the position mm of the key diameter to be , and the MSE value was calculated for each value of . In the case of M-RM, two positions and for defining the key diameters and were set as and , respectively. The minimum MSE values of WA, S-RM, and M-RM were 18.62 (at mm), 15.71 (at mm), and 11.48 (at mm and mm), respectively. Thus the minimum MSE value was attained by M-RM at mm and mm. Figure 6 shows the change in the MSE value using the three methods. Figure 7 indicates the estimated coefficients , , and for M-RM at mm, mm. The predictive value of the diameter at point is obtained by . We measure two diameters and of a new yam for whole shape prediction. Figures 8 and 9 show observations together with predictions of the diameters at each point using M-RM with two key diameters at and 255.0 mm for the shape of the samples in this study.
First, three predictive models, WA, S-RM, and M-RM, which are constructed based on result of the Bayesian estimation model, for yam shape prediction are compared in terms of MSE. Although WA is a simple approach compared with the other methods, it resulted in a small MSE value of 18.62 at mm. The regression methods performed better than WA; the MSE was 15.71 for S-RM at mm and 11.48 for M-RM at mm and mm. According to Figure 7 for the coefficients in M-RM, the diameter at mm positively and negatively affected the prediction in the range of and , respectively. Another diameter at mm contributed to the estimate for the range of . The two diameters can improve the performance of the estimate through the two coefficients.
After the construction of M-RM using the sample data set in this study, M-RM can be used for whole shape prediction based on two diameters at fixed positions of and mm. The quality of the sample data set is then critical for the performance of the shape prediction. In our data set, yam length and weight were correlated with each other (, , Appendix A). This means that the yams had a uniform shape and there were no outliers that show an irregular shape; if there were thick (short and heavy) and thin (long and light) yams, they might be plotted on the upper-left or lower-right on the scatterplot respectively, and the correlation might be lower. The quality of the sample data set, which was used for the construction of M-RM, seemed to be high for model construction.
The M-RM method performed well according to the MSE value (Figure 6) and visual inspection of the actual shape prediction (Figures 8 and 9). In order to evaluate the weight of the yams based on the predicted shape, we assumed that (a) each yam was circular in cross-section and (b) the shape changed linearly between each pair of positions. The weight was then estimated under the assumption (a) and (b) (Figure 10). M-RM successfully predicted the weight of the yams. Relatively high accuracy can be obtained by adequately treating the outliers (e.g., removing heavy yams with weight > 1200 g = mean + 2SD). We believe that the Bayesian approaches in this paper are applicable not only for shape prediction of yam but also for other shape prediction problems in agriculture.
This paper proposed Bayesian methods, which is a combination of Bayesian estimation model and predictive model, for shape prediction of yam. Three predictive models we applied were weighted average (WA) and single and multiple regression methods (S-RM and M-RM, resp.). Bayesian method with M-RM prediction model with two diameters at fixed positions of and mm attained the highest performance of the estimate in terms of the MSE value. After the construction of M-RM using the sample data set in this study, M-RM predicts the whole shape of yam based on two key diameters. To measure two diameters at those positions of a yam is fairly easy, and this approach does not need any sensors for the shape estimation. Development of such shape prediction approaches, including our Bayesian method, will be required to reduce the cost and time in food processing.
A. Detailed Data for the Sample Yam Data
Figure 11 shows the scatterplot of the length and weight of the 111 yams in this study. Length and weight were highly correlated with each other (Pearson correlation coefficient , ), implying high quality of the data for model construction.
B. Algorithm for Estimating the Diameters
For a given sample , let denote the initial value of the state and let denote a set of observations up to the time point for the -th sample. Assume that . It is well-known that the distribution for the state conditionally on is Gaussian, so it is only necessary to obtain the mean and the covariance matrix of with respect to .
When the values of and , the initial distribution , and an observation set up to the point are given, then the estimates for the state can be obtained using the well-known Kalman filter (for ) and the fixed-interval smoothing (for ) recursively as follows (see, e.g., [18, 20]).
Kalman Filter (Step 1): One-Step Ahead Prediction
Kalman Filter (Step 2): Filter
Fixed-Interval SmoothingHere, denotes an identity matrix. Note that the calculation in the filter step will be skipped when is a missing value.
Then, the posterior distribution of can be given by and , and subsequently the estimates for the parameter can be obtained because the state space model described by (4) in the main text incorporates in the state vector . Hereafter, the estimates of are denoted by .
C. Algorithm for Estimating the Variances
When the observation data for the -th sample are given, a likelihood function for the variances and is defined approximately by where is the conditional density function of given the past history . Assume that is an empty set, then . By taking the logarithm of , the log-likelihood is obtained as As given by  under the use of the Kalman filter, the conditional density is a normal density given by where is the one-step ahead prediction for and is the variance of the predictive error, given by respectively.
Thus, the estimates of and can be obtained using the maximum likelihood method. Specifically, for a given value of , we can obtain the estimate of for by maximizing in (C.2) numerically. Then, the estimate for is obtained similarly by maximizing
By applying the results of and to the above algorithms of the Kalman filter and fixed-interval smoothing, we can obtain the final estimates of and corresponding variances from the results of and .
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Trade Statistics of Japan. http://www.customs.go.jp/toukei/info/index_e.htm.
Portal Site of Official Statistics in Japan. http://www.e-stat.go.jp/SG1/estat/eStatTopPortalE.do.
N. Kondo, “Automation on fruit and vegetable grading system and food traceability,” Trends in Food Science & Technology, vol. 21, pp. 145–152, 2010.View at: Google Scholar
C. Costa, F. Antonucci, F. Pallottino, J. Aguzzi, D. Sun, and P. Menesatti, “Shape analysis of agricultural products: a review of recent research advances and potential application to computer vision,” Food and Bioprocess Technology, vol. 4, no. 5, pp. 673–692, 2011.View at: Publisher Site | Google Scholar
S. Goto, H. Iwata, S. Shibano, K. Ohya, A. Suzuki, and H. Ogawa, “Fruit shape variation in Fraxinus mandshurica var. japonica characterized using elliptic Fourier descriptors and the effect on flight duration,” Ecological Research, vol. 20, no. 6, pp. 733–738, 2005.View at: Publisher Site | Google Scholar
M. Z. Abdullah, J. Mohamad-Saleh, A. S. Fathinul-Syahir, and B. M. N. Mohd-Azemi, “Discrimination and classification of fresh-cut starfruits (Averrhoa carambola L.) using automated machine vision system,” Journal of Food Engineering, vol. 76, no. 4, pp. 506–523, 2006.View at: Publisher Site | Google Scholar
K. Moon Sung, C. Yud-Ren, C. Byoung-Kwan et al., “Hyperspectral reflectance and fluorescence line-scan imaging for online defect and fecal contamination inspection of apples,” Sensing and Instrumentation for Food Quality and Safety, vol. 1, pp. 151–159, 2007.View at: Google Scholar
H. Sadrnia, A. Rajabipour, A. Jafary, J. Arzhang, and Y. Mostofi, “Classification and analysis of fruit shapes in long type watermelon using image processing,” International Journal of Agricultural and Biological Engineering, vol. 1, pp. 68–70, 2007.View at: Google Scholar
A. Gelman, B. John Carlin, S. Hal Stern, B. David Dunson, A. Vehtari, and B. Donald Rubin, Bayesian Data Analysis, vol. 2, CRC Press, Boca Raton, FL, USA, 2014.
G. Kitagawa and W. Gersch, Smoothness Priors Analysis of Time Series, vol. 116, Springer, 1996.
K. Koki and M. Hachiya, “A statistical approach of identifying indexes crucial to characterizing chinese yams in terms of shape,” in Advances in Intelligent Systems, Ford Lumban Gaol, Zenon Chaczko, Kiyota Hashimoto, T. Matsuo and W. Grosky, Eds., pp. 27–34, WIT Press, 2014.View at: Google Scholar
D. O. Brian Anderson and B. John Moore, Optimal Filtering, Prentice-Hall, Inc, 1979.
D. Hrishikesh, V. Ullah, and A. Ullah, Recent Advances in Regression Methods, Marcel Dekker, Inc, 1981.