Abstract

Yield prediction and variety selection are critical components for assessing production and performance in breeding programs and precision agriculture. Since plants integrate their genetics, surrounding environments, and management conditions, crop phenotypes have been measured over cropping seasons to represent the traits of varieties. These days, UAS (unmanned aircraft system) provides a new opportunity to collect high-quality images and generate reliable phenotypic data efficiently. Here, we propose high-throughput phenotyping (HTP) from multitemporal UAS images for tomato yield estimation. UAS-based RGB and multispectral images were collected weekly and biweekly, respectively. The shape of the features of tomatoes such as canopy cover, canopy, volume, and vegetation indices derived from UAS imagery was estimated throughout the entire season. To extract time-series features from UAS-based phenotypic data, crop growth and growth rate curves were fitted using mathematical curves and first derivative equations. Time-series features such as the maximum growth rate, day at a specific event, and duration were extracted from the fitted curves of different phenotypes. The linear regression model produced high values even with different variable selection methods: all variables (0.79), forward selection (0.7), and backward selection (0.77). With factor analysis, we figured out two significant factors, growth speed and timing, related to high-yield varieties. Then, five time-series phenotypes were selected for yield prediction models explaining 65 percent of the variance in the actual harvest. The phenotypic features derived from RGB images played more important roles in prediction yield. This research also demonstrates that it is possible to select lower-performing tomato varieties successfully. The results from this work may be useful in breeding programs and research farms for selecting high-yielding and disease-/pest-resistant varieties.

1. Introduction

Vegetable production is one of the most important components in agriculture, also with grain foods. Commercial vegetable production in the United States was approximately 33.9 million tons and $12.9 billion in 2018 [1]. Notably, tomatoes have the highest values of utilized production, and the value of tomatoes increased by over 10 percent ($1.9 billion) in 2018. Recently, tomato production has faced constant pressure from biotic and abiotic stresses such as climate, disease, and pest that can cause significant loss of production and fruit quality [2]. To identify the potential yield performance in a tomato, advanced phenotyping that can effectively map, monitor, and predict plant traits is required. Despite the importance of vegetable production, the traditional method to develop new cultivars in breeding programs, monitor crop growth/disease, and predict yield has still employed hand-sampling measurements, which are destructive, labor-intensive, time-consuming, and expensive [3, 4].

Precision agriculture requires large amounts of data to ensure informed decision-making at the specific crop and plot level. Remotely sensed data have been used to collect data in a timely or near real-time manner for agricultural applications because uncontacted measurements by sensors have become nondestructive and more efficient in recent decades [5]. However, satellite-based and airborne remote sensing is often unable to provide the suitable data required for plant- or plot-level assessment due to data acquisition being affected by cloud cover, cost, low spatial resolution, and limited temporal resolution [6]. In recent years, the unmanned aerial systems (UASs), namely, unmanned aerial vehicles (UAVs) or drones, have been regarded as a promising technology with high potential for agricultural applications such as crop growth monitoring, disease monitoring, yield prediction, and biomass estimation [7, 8]. UAS also provides new opportunities to collect data with finer spatiotemporal resolution for high-throughput phenotyping (HTP). Additionally, the hardware cost of UAS platforms and sensors is decreasing, creating a lower entry barrier so that students, researchers, and stakeholders can easily adopt UAS. These new technologies are an alternative solution to address the limitations of manual or conventional remote sensing methods to measure crop characteristics [3, 4].

In several studies, it has been proven that UAS-based remotely sensed data could measure crop traits such as canopy cover, plant height, and vegetation indices more frequently and consistently over a larger area than manual measurement [3, 9, 10]. The aboveground biomass (AGB) of arable crops has also been estimated by the UAV-based height model [11]. Jiang et al. [12] and Li et al. [13] also estimated AGB using UAV-based multispectral and hyperspectral images for rice and potatoes, respectively. Yield prediction using UAS data is another main topic in precision agriculture. Jung et al. [14] showed that UAS-based HTP could provide the rank of cotton genotypes, and the top high-yield varieties could produce yields 10 percent higher, while Maimaitijiang et al. [15] estimated soybean biomass from a UAS-based canopy volume model. Recent research has also adopted the artificial intelligence (AI) technique for biomass and yield estimation [16, 17]. Plant stresses such as drought, disease, nutrition deficiencies, pests, and weeds have been monitored and assessed by UAVs [18]. Previous studies extracted crop parameters from UAS data and used the variables directly to develop various methods, but UAS-based measurements could fluctuate due to errors depending on data collection conditions such as weather, sensors, date, and time.

Despite the commercial importance of tomatoes, few studies have employed UAS data for yield estimation. Enciso et al. [19] validated UAV measurements compared with field data for tomato varieties. Johansen et al. [20] used a time series of UAS imagery to monitor phenotypic traits of individual tomato plants, including canopy area, condition, and growth rate, to quantify responses to salinity stress and identify tomato plant accessions that performed the best in terms of yield. Johansen et al. [21] also proposed modelling and predicting the biomass and yield of individual tomato plants on the farm scale through field- and UAS-based phenotyping. In recent years, a machine learning framework was developed for tomato yield estimation using multitemporal remote sensing data collected from UAS [22]. However, these initial works directly used limited time-series data over the entire cropping season and phenotypes.

In this study, we propose a novel method to extract advanced phenotypic features from UAS data in the tomato field for yield estimation and variety selection. The growth and growth rate curves of phenotypic data were generated from multitemporal UAS data to extract crop traits indicating growth timing and speed over the whole cropping season. Factor analysis was applied to analyse more valuable phenotypic features. Finally, we generated a yield estimation model in the tomato field and then demonstrated the possibility of selecting the high-performing varieties and eliminating the lower-performing varieties.

2. Study Area and Materials

2.1. Study Area

The study area was located at the Texas A&M (Agriculture and Mechanical) AgriLife Research and Extension Center in Weslaco, Texas, USA (latitude: 26°924N, longitude: 97°5746W) (Figure 1(a)). Tomato fields consist of two major components for identifying/characterizing resistance against insect-vector transmitted diseases and evaluating planting dates and mulch cover to extend tomato production. The western side of the study area was selected to apply the UAS-based phenotyping method for high-yielding variety selection. Each experimental plot which consisted of four individual tomato plants was established with three planting dates (Feb. 29, Mar. 16, and Mar. 31, 2016), plastic mulch covers (black, white, and bare), and cultivars (9 different varieties). Each variety was replicated three times per planting date and mulch cover in randomized deployment. Tomatoes were harvested 3 times from each plot, and the sum (total yield) of the three harvests was used for the yield prediction model and evaluation.

Ground control points (GCPs) were installed around the study area for precise georeferencing and coregistration of processed UAS data, including orthomosaic images and digital surface models (DSMs) [3]. Although the approximate location of all images was recorded by onboard GPS of UAV, a total of 9 GCPs were installed in this study (Figures 1(b) and 1(c)). Eight GCPs were located around the tomato site, and one GCP was installed in the middle of the study area to correct bowling effects by structure from motion (SfM), which is the most frequently used algorithm to generate orthomosaic images from UAS data [23]. The center coordinate of all GCPs was measured by using an APS-3 Real-Time Kinematic (RTK) GPS (Altus Positioning System, Inc., California, USA).

2.2. UAS Platforms and Sensors

DJI Phantom Products (DJI, Shenzhen, China), which is the most popular commercialized model, and the UAV platform developed by the research team consisting of an Iris quadcopter (3DR, Berkeley, USA) and a Canon S110 digital camera (Canon, Tokyo, Japan), which is a 12-megapixel camera, were used to collect RGB (Figure 2(a)). For multispectral data, another UAS system was developed with X8 octocopter (3DR, Berkeley, USA) and ADC Snap (Tetracam, Chatsworth, UAS) (Figure 2(b)), which can collect 3 bands, including the wavelengths of green, red, and near-infrared (NIR). The Corpus Christi research group designed a mount to integrate the multispectral camera to the bottom of the X8 platform. The mount was printed out by a 3D printer and assembled with dampers. The developed systems collected geotagged RGB and multispectral images. Table 1 shows the specification of the UAS-based RGB and multispectral system used in this study for data collection.

2.3. UAS Data Collection

Two flight teams operated separate UAV platforms and collected data for more frequent time-series datasets. The Corpus Christi research group used the DJI Phantom series (Phantom 2 and 4) and X8 platform with a multispectral camera to collect RGB and a multispectral imagery, respectively, every 2 weeks. RGB and multispectral UAS systems operated separately for more efficient flights. The flight conditions including altitude, flight speed, and camera setting should be different since RGB and multispectral cameras have different specifications such as field of view (FOV) and focal length. Another flight team in Weslaco collected RGB imagery using the Iris platform with a Canon camera.

Tables 2 and 3 show the flight logs for RGB and multispectral images, including parameters such as flight altitude, image overlap, and resolution. 18 flights were conducted to collect RGB images for 3 months (March~June 2016), while the multispectral UAS system flew every other week. The resolution is the ground sampling distance (GSD) of the orthomosaic image generated by the SfM algorithm.

3. UAS-Based High-Throughput Phenotyping (HTP)

3.1. Preprocessing of UAS Images
3.1.1. Orthomosaic Image DSM Generation

PhotoScan Pro software (Agisoft LLC, St. Petersburg, Russia) was used to apply the SfM algorithm to generate DSM and orthomosaic images from UAS raw images. The SfM algorithm is one of the most frequently used commercial software to generated 3D point clouds, DSMs, and orthomosaics from UAS raw images. The GPS coordinates of GCPs were input into the software for precision georeferencing and coregistration of DSM and orthomosaic. Although the image location of UAS raw images was recorded in the metadata of geotagged images, the precision level was not enough, which could cause the bowl effect, one major error of the SfM algorithm. GCPs with high accuracy and good deployment can solve these problems when the SfM algorithm is applied to UAS raw images [24]. DSM and orthomosaics of RGB and multispectral images were generated using PhotoScan Pro with GCPs. The geolocation accuracy of DSM and orthomosaics was less than one pixel.

3.1.2. Radiometric Calibration for Multispectral Image

Radiometric calibration for multispectral images was conducted to calculate more accurate vegetation index such as NDVI. In this study, we used radiometric calibration panels to convert the digital number (DN) in the multispectral orthomosaic image to the reflectance value using the empirical line method (ELM) [25]. Four reflectance panels (5%, 12%, 33%, and 52%) were placed in the tomato field before flying the multispectral platform. The average DN of each reflectance panel in the multispectral orthomosaic image was compared with the actual reflectance values measured in the laboratory (Figure 3). The empirical line (EL) was generated for each flight. The entire pixel value of the multispectral orthomosaic image was converted to reflectance by the developed EL.

3.2. Product Generation from UAS Data
3.2.1. Plot Polygon Generation

The rectangular polygons were generated as plot boundaries for each variety to extract plot-based phenotypic information (Figure 4(a)). To create rectangles of the same size ( meters) according to plot design, the centerline of four individual plants in each plot was manually delineated by using the canopy area in the orthomosaic image on March 28, 2018. The angle and center of the line were considered to determine the polygons automatically. The plot boundary properly included four individual plants (Figure 4(b)). The pixels within the polygon were selected to extract phenotypic data of a single variety. In the study area, there were 81 polygons () per planting date.

3.2.2. Geospatial Product Generation

Canopy cover (CC), canopy volume (CV), excessive greenness index (ExG), and normalized difference vegetation index (NDVI) were generated from RGB and multispectral orthomosaic images and DSMs. The canopy is the aboveground portion of a plant or crop. As the canopy is strongly related to crop status, health, and environment, canopy cover is a useful way of monitoring crop development and productivity. Although the canopy cover has been measured using subjective methods [26, 27], canopy pixels in RGB images, which mean green areas, can be extracted nondestructively and easily. The Canopeo algorithm, defined as Equation (1), was employed to classify canopy pixels from RGB orthomosaic images [28]. Since there are three plastic mulch covers (black, white, and bare) in the study area, two more conditions to determine noncanopy pixels were considered as Equation (2): where and are parameters to classify green pixels and is a parameter that sets the minimum excess green index. and indicate the predominant green of each pixel. effectively classifies dark or gray pixels that cannot be adequately discriminated using and . Basically, we used the suggested , , and values (0.95, 0.95, and 20) in a previous study [28]. is the sum of the digital number (DN) of all bands at each pixel, while indicates the difference between the green and blue bands. Nevertheless, the parameters were empirically adjusted depending on the color tone and hue of each RGB orthomosaic image. and were used to remove white and black plastic mulch covers, respectively. As white and black mulch showed very high or low DN, and with values 600 and 30 were employed to filter out, respectively. A CC map indicating canopy and noncanopy pixels was generated. The canopy area in each plot polygon was divided by polygon size to calculate the CC value of all varieties.

To calculate the CV, a canopy height model (CHM) was generated by subtracting the digital terrain model (DTM) from DSM for each UAS flight. DTM was created from the 3D point cloud data generated from earlier UAS flights on March 23, 2016. The 3D points were classified into ground and nonground points using LAStools, and DTM was generated by the natural neighbor interpolation algorithm from the ground points. The pixel value from CHM, which means canopy height, was multiplied by pixel size to calculate pixel volume. The sum of the pixel volume in the plot polygon was considered CV of each variety.

Two vegetation indices, ExG and NDVI, were generated from RGB and multispectral orthomosaic images, respectively. ExG was calculated from RGB color with Equation (3), while NDVI was generated from multispectral image with Equation (4) [29]. Only ExG and NDVI values of the canopy area in the plot polygon were selected to calculate the average of ExG and NDVI values as the representative for each variety: where

3.3. Feature Extraction from Time-Series UAS Data
3.3.1. Growth and Growth Rate Curve

Multitemporal phenotypes including CC, CV, ExG, and NDVI enabled advanced phenotypic features. Time-series measurements of these phenotypes were used to model the growth pattern of each variety. Sigmoidal and polynomial functions were adopted to fit the optimal curve depending on the input variables. CC and CV measurements over the growing season were fitted with a sigmoid function to generate growth curves (Figure 5). Although there were 18 UAS observations, the last three measurements were not selected for sigmoidal curve fitting. The first derivatives of the growth curve were generated as the growth rate curves.

The 3rd polynomial function was fitted for 18 ExG observations as ExG progression curves (Figure 6). The ExG progression curve was divided into two parts by the date at the maximum ExG value. The left and right sides from the date indicate the increasing and decreasing canopy growth periods, respectively. In terms of NDVI, due to weather conditions and data acquisition time, four multispectral images were selected based on the data quality to fit the 2nd polynomial curve. The multispectral images collected early in the morning contained a significant amount of shadow and saturated pixels caused by a low sun angle and effect from the surrounding windbreaker plot. The NDVI progression curve was also divided into two parts, ExG.

3.3.2. Phenotypic Feature Extraction

In this study, a total of 22 phenotypic features were extracted from growth and progression models derived from time-series UAS measurements (Table 4). The maximum CC (F1) and maximum CV (F5) were extracted from the sigmoid curve (Figure 7(a)). From the growth rate curves, features including the maximum growth rate (F2 and F6), day after planting (DAP) at the maximum growth rate (F3 and F7), and duration over the half maximum period (F4 and F8) from DAPs of half maximum were calculated (Figure 7(b)). The maximum ExG value (F9) and DAP at the maximum ExG value (F10) were extracted from the ExG progression curve. The ExG measurements of both periods were then fitted with linear models to calculate the slopes of increasing and decreasing canopy periods (F11 and F15). The maximum values of the increasing and decreasing lines at the maximum ExG DAP (F12 and F16), duration (F13 and F17), and area of each period (F14 and F18) were calculated from the fitted curves (Figure 7(c)). Four features were extracted from the NDVI progression curve. The maximum NDVI (F19) and DAP at maximum NDVI (F20) were also extracted from the NDVI progression curve. The increasing NDVI slope (F21) and decreasing NDVI slope (F22) were also calculated from the progression curve by connecting the first observation with the maximum NDVI value and the last NDVI observation with the maximum NDVI value.

4. Results and Discussion

4.1. Correlation Coefficient of Phenotypic Features and Harvested Yield

Most researches have tried to find the features highly correlated with yield using UAV data and reported a positive correlation between vegetation indices and both biomass and yield [3032]. A few studies also focus on extracting discrete phenotypic data, such as canopy cover, canopy height, and vegetation indices, related to biomass and yield for tomato [21, 33]. Although they could predict biomass and yield using UAV-derived phenotypes and environment conditions, data volume was limited to consider the whole growing cycle. It is impossible to collect UAV data at the same DAP in different seasons. In addition, the discrete phenotype values such as vegetation indices should be varied according to the conditions of the surrounding environment and UAV (platform and sensors).

Despite these challenges, UAS-based phenotypic features can play an important role in assisting plant breeding efforts. As supports such as tomato cages and wooden sticks should be installed in tomato fields, the plant height (PH) does not show significant variability to indicate the difference in tomato varieties. We calculated the correlation coefficient between the phenotypic features from CC, CV, ExG, and NDVI in Table 4 and the actual yield in 81 plots on the 1st planting (Table 5). The DAP at the maximum growth rate of CV (F7) and the DAP at the maximum value of ExG (F10) were highly correlated with the actual yield (both 0.63). Both features are likely related to robust early crop growth and development phases to expand plant size and likely also indicate a healthy (green) canopy. A high fruit load would cause a stronger shift in the source-sink relationship of the plants because photosynthesis produced at the canopy (leaf) level will be consumed mostly to produce tomato fruits instead of growing. Thus, we expect a high fruit load to cause faster canopy deterioration as the fruits take priority over vegetative growth for carbohydrates. This shift in the source-sink relationship, which will ultimately lead to crop maturity, is usually expressed as a change in leaf color, leaf senescence, or a combination of both. Interestingly, when looking at ExG, the decreasing slope after the maximum for the latter part of the season (F15) also showed a good correlation with crop yield (-0.63). As the decreasing slope is a negative value, this correlation indicates that plants shifting energy to fruits faster can produce high yields.

4.2. Yield Estimation Modelling

In most recent years, AI techniques such as machine and deep learning have received great attention and derived remarkable results for predicting biomass and yield in various crops [21, 31, 34]. The AI-based biomass and yield prediction models resulted in high accuracy of over 90 percent; a critical issue of AI algorithms is that a large number of training datasets are required to obtain robust and accurate machine learning algorithms. However, building a large number of training samples requires a long time and heavy labor [35]. For example, Johansen et al. [21] predicted biomass and yield using 81 UAV-derived variables and a random forest algorithm for 1,200 tomato plants. Although the AI algorithm provides very accurate yield prediction, it cannot be adopted for this study area because of the limited plot numbers causing the singularity. Therefore, we used linear regression to estimate tomato yield using UAV data for 81 tomato varieties.

81 plots on the first planting date with three ground cover conditions (white and black plastic and bare ground) were selected for linear regressions. Yield estimation models were developed by using the actual tomato yield as a dependent variable and the corresponding multitemporal phenotypic features as independent variables in the linear regression. All 22 phenotypic features from UAS data were selected to develop the yield estimation model. In addition, two different feature selection approaches were employed—(1) a forward feature selection approach starting from a null model and (2) a backward feature elimination approach starting from a full model—to automatically select features and develop statistically significant models (Table 6). All yield estimation models resulted in high (>0.7) between the actual and estimated yields. Although the AI algorithm might show higher accuracy, linear regression can be alternative according to field conditions and data specs. In general, the backward elimination approach resulted in a higher than the forward selection, but the number of input variables that automatically remained in the final models was significantly greater than that of the forward selection approach. As the backward elimination approaches tend to overfit with many variables, the forward selection method can be simpler and more efficient.

In the forward selection model, the day at maximum value of ExG (F10), decreasing slope value of ExG (F15), day at maximum growth rate of CC (F3), day at maximum growth rate of CV (F7), duration over the half maximum period of CV (F8), and duration of increasing ExG period (F13) were selected sequentially. Three phenotypic features (F7, F10, and F15) highly correlated with actual yield were entered in the forward selection model. The selected variables in the model can be divided into two groups, crop growth timing (F3, F7, and F10) and crop growth duration (F8, F13, and F15). Crop growth timing indicates the specific day after planting when the plant reaches the status of the greenest canopy and fastest growth. Crop growth timing has positive relationships with actual yield, implying that maturity timing is strongly related to yield performance, as later maturity can make the plant size larger. In crop growth duration, the period of crop growth is a critical variable affecting the yield, while the deterioration speed is also correlated.

4.3. Factor Analysis for UAS-Based Phenotypic Features

Although the variable importance can be calculated to predict biomass and yield, it cannot be consistent for biomass and yield prediction according to the UAV dataset and its specs [21]. In addition, as a few of 22 phenotypic features proposed in this study were highly correlated with each other, factor analysis was performed to avoid collinearity and overfitting issues in the regression model and identify statistically significant phenotypic features. Factor analysis is used to describe the covariance relationships among many variables in terms of a few underlying, but unobservable, random quantities [36]. In this study, the maximum likelihood (ML) method was employed, and two major factors having over 75 percent cumulative explanations were selected using the rotated factor matrix calculated by the Varimax method after seven iterations. The final two factors explained 75 percent of the variance and consisted of nine phenotypic features showing two crop traits (Table 7). The first factor included six phenotypic features, including days after planting when the ExG value was maximum (F10), duration of half maximum canopy cover (F4), days after planting when the canopy volume growth rate was maximum (F7), days after planting when the NDVI was maximum (F20), days after planting when the canopy cover growth rate was maximum (F3), and duration of half maximum canopy volume (F8). These are kinds of days and durations closely related to the “timing of crop growth event.” Four of six variables selected in the linear regression models with the forward selection method were included in the first factor. As mentioned above, crop growth timing and duration strongly affect tomato yield performance. The six variables of the first factors imply that the plant growing faster and longer until canopy deterioration can produce more fruits since the larger plant can use more energy to produce tomato fruits. The second factor included the increasing slope of ExG (F11), the increasing slope of NDVI (F21), and the triangular area of the decreasing slope of ExG (F18), which indicate the “speed of crop growth.” In factor analysis, the other features (F11, F18, and F21) were selected in the crop growth speed group instead of the duration of the increasing ExG period (F13) and decreasing slope value of ExG (F15). F13 and F15 alternated to F11 and F18, indicating similar crop characteristics of growing speed. A very interesting fact is that the phenotypic features from multispectral images were included in both first and second factors. As the fitted curves of ExG and NDVI have similar trends, NDVI-related features have a high correlation with the features of ExG. Two NDVI-based features also represented the timing of crop growth and growth speed. The results of factor analysis found that growth timing, duration, and growth speed were the main traits affecting tomato yield. Our results showed what crop characteristics should be considered in breeding programs for cultivar improvement.

The nine phenotypic features selected by factor analysis were input into the linear regression model to estimate tomato yield. A forward feature selection approach was used, and five phenotypic features were selected for the final model. The features remaining in the final regression model were days after planting when the ExG value was maximum (F10), days after planting when the canopy volume growth rate was maximum (F7), days after planting when the canopy cover growth rate was maximum (F3), duration of the half maximum canopy volume period (F8), and triangle area of the decreasing ExG slope (F18). The yield prediction model, as shown in Equation (5), explained 65 percent of the variation in the actual harvested yield (Figure 8). Four features (F3, F7, F8, and F10) are related to the crop growth time. We realized that the phenotypic features related to crop growth time play a more important role in predicting yield. In addition, the number of days after planting when ExG (F10) was the most critical crop parameter for yield estimation was determined not only by factor analysis but also by general linear regression models.

All variables in the yield prediction model from factor analysis can be extracted from time-series CC, CV, and ExG derived from RGB images at similar accuracy with the phenotypic features from both RGB and multispectral imagery. The results indicated that RGB images can provide enough information to predict yield at the field level. As UAS platforms with RGB sensors are relatively cheaper and easier to operate for data collection and calibration, they should be a good alternative to collect a large number of crop parameters for precision agriculture. As CC and CV can be calculated from multispectral images and NDVI has a trend similar to ExG, it will be possible to adopt either RGB or multispectral images for predicting tomato yield at the field level:

4.4. Tomato Variety Selection

Based on the estimated yield from the final regression model in factor analysis, the rank of the best performing tomato genotype was determined by the precited yield. The top 10, 20, 30, and 40 performing varieties based on the estimation were compared with the rank by the actual harvested yield to find matched varieties. Figure 9(a) shows how many varieties selected by the proposed method are correctly matched in the same number of varieties based on the actual yield. When the estimator selects the top 10 varieties (blue line), five of them were included in the top 10 of the high actual yield varieties, which means a 50 percent match ratio. All top 10 varieties by the estimator were included in the top 40 high-performing varieties by actual yield. In terms of the top 20 varieties selected by the estimation model, we could achieve 100 percent selection accuracy in the top 50 varieties of actual yield. When 40 varieties were selected with the developed model in this study, 34 varieties, which is 85 percent of the 40, were included in the list of the top 40 actual yields. Although we could not select high-performing varieties perfectly, our result shows that the low-performing varieties could be successfully eliminated by the proposed framework. For example, scientists can select and eliminate the bottom 10 varieties by the proposed framework without losing any high-performance varieties. The advantage of this work is that it is not necessary to harvest the eliminated varieties in the field. In other words, we can save economic resources, human power, and time to make breeding programs more efficient.

Jung et al. [14] compared the average cotton lint yield among the remaining varieties after variety selection based on the UAV-derived phenotypes to verify the performance of the UAV-based variety selection framework. In the cotton field, the average lint yield increased by 10 percent compared to the original population after variety selection using UAV data. UAV-selected varieties matched over 70 percent of the same lists ranked by actual field harvest measurements. In this study, to see more details of eliminating low-performing varieties, the average actual yield of the selected varieties by UAV-estimated and actual yield was calculated (Figure 9(b)). When the top 10 high-yield varieties by UAV-estimated and actual yield are selected, the average tomato yield is different by about 10 kg. More variety selection decreases the disparity of the average yield between UAV and field groups. After selecting more than 50 varieties, the difference is less than 1 kg. The average tomato yield increased by 100 and 53 percent compared to all varieties after variety selection using actual and UAV data, respectively. It implies that the proposed method could provide feasible information for selecting high- or low-performance varieties without destructive hand-sampling and actual harvesting.

5. Conclusions

In this study, a novel HTP framework of data was proposed to predict tomato yield using multitemporal UAS data. Shape features, including canopy cover, canopy volume, and vegetation indices derived at the plot level, were determined to fit the mathematical curves. Time-series phenotypes were extracted from the growth and growth rate curves. Although the time-series phenotypic features were individually correlated with actual yield, linear regression models produced high values (>0.7). Based on the factor analysis, two significant factors, growth speed and timing, were figured out to be strongly related to the yield performance of tomato varieties. Finally, five time-series phenotypes were selected for the yield prediction model explaining 65 percent of the variance of the actual harvest. The phenotypic features derived from RGB images played important roles in providing enough information to predict yield. We compared the actual yield with the estimated yield to determine the possibility of UAS-based variety selection in breeding programs. Although the high-performance variety could not be perfectly selected using the estimator, the low-performance varieties were exactly matched between UAS-based and actual yields. Ultimately, the proposed variety selection/elimination process using UAV data increased the average tomato yield of the remaining varieties by 53 percent. The results from this work can be useful in breeding programs to select high-yielding and disease-/pest-resistant varieties for tomato fields. In the future, we will adopt the AI algorithm to develop a more elaborate model for yield prediction and variety selection in agricultural applications.

Data Availability

The data used to support the findings of this study have not been made available because it is only available for researchers and collaborators of Texas A&M AgriLife Research and Extension.

Conflicts of Interest

The authors declare no conflict of interest.

Acknowledgments

This research was supported by Texas AgriLife Research.