Abstract

Artificial neural networks (ANNs) in conjugation with genetic algorithms (GAs) have been demonstrated to be an effective tool for system modelling and optimization in a variety of applications. The current communique is about assessing the capacity of ANN to predict investment on cattle till age at first calving (AFC) and milk production based on the data of 340 Vrindavani crossbreed cattle developed at the ICAR-Indian Veterinary Research Institute in Izatnagar, India. Three distinct artificial neural network (ANN) algorithms, namely, Levenberg–Marquardt (LM), Bayesian regularization (BR), and gradient descent momentum with adaptive learning rate backpropagation (GDX) were used to train the ANN infrastructure for determining milk production and investment based on body weight and AFC as input variables. The results showed that BR with 2 hidden layer neurons showed excellent prediction ability (R2 = 0.999, MSE < 10−6) and was therefore used as an objective function by GA for optimization. The optimized results revealed that higher milk production is achievable at lower investment if the age at first calving is 768 days with a body weight of ∼281 kg. The information generated by this investigation will aid in ensuring food security in terms of higher milk production while making the dairy business more sustainable and profitable for the farmers.

1. Introduction

India with 193.46 million cattle (50.42 million crossbreds/exotic and 142.11 million indigenous) and 109.85 million buffalo population [1] is the largest producer of milk in the world (209 million tonnes with a growth rate of 5.81%). The income of Indian dairy farmers is mainly generated from the volume of milk produced and its fat composition. Due to this, most of the dairy cattle breeding programmes primarily focus on milk yield and milk composition. In order to meet India’s growing demand for milk, crossbred cattle have played a critical role. Despite accounting for only 20.7 percent of India’s dairy herd, crossbreds account for 26% of the country’s annual milk production of 209 metric tonnes [1].

Although dairy farm units maintain official milking and farm records for the herds by monthly testing during lactation, there is lack of understanding on what the optimum age at first calving and body weight of cattle should be for maximizing the milk yield at minimum investment. Few researchers have attempted to identify the age at first calving corresponding to higher milk production for different crossbreeds such as Holstein [2] and Karan–Fries [3]. However, no similar work has been reported yet for Vrindavani, a new and emerging cattle crossbreed developed at ICAR-IVRI, India. Accurate prediction of milk production at lower investment is of prime importance for improving farmer’s economic status and productivity as well as to ensure food security. To achieve this, several modelling and optimization strategies have been investigated over the past decade. Among these techniques, machine learning and evolutionary algorithms have gained tremendous popularity due to their robustness and wide applicability in animal science [4, 5].

Artificial neural networks (ANNs) are multiparametric empirical models that are capable in recognising complicated patterns in data. ANNs are used for the evaluation of different parameters and are able to conduct classification, estimation, prediction, and simulation on new data from identical or similar sources [6, 7]. An ANN model’s fundamental architecture consists of an input layer, one or more hidden layers (HLs), and an output layer (OL). Each layer is made up of a number of neurons that create a network that connects the layers. First, a random assignment of the network strengths or weights is carried out, and as the network is trained, they change. A large number of data sets ensure that the ANN model is adequately trained. A transfer function quantifies the relationship between the input and output signals of each layer. A training function is required for the network to learn and adapt to the learning rate [8]. The goal is to select a training function with a short processing time and good back propagation performance [9]. After that, the trained model is tested and validated, and the prediction data is acquired by simulating the model. The developed model can then be used as an objective function for the optimization of the required parameters [10].

Genetic algorithm (GA) is an evolutionary algorithm (EA) based on Darwin’s theory of natural selection. It combines and matches the independent variables to produce superior ‘offspring,” boosting the model’s flexibility, efficiency, and efficacy. GA uses an objective function to perform three operations in a created population: selection, crossover, and mutation. At first, strings are chosen at random based on their relative fitness. They are then let to enter the mating zone, where random crossing spots with a predetermined likelihood of crossover are picked at random. The term “crossover” refers to the swapping of bit values (genes), i.e., 1-0 and vice versa. This process is repeated until a certain population size is reached. Finally, when random genes within a new population are modified, mutation happens with a probability of mutation. As a result, the population (new generation) has a higher average fitness level. In the literature, a detailed rendition of the stages involved in a genetic algorithm from initiation to termination may be found [1114].

Considering the role of milk production and yield in the sustenance of food security and the application of advanced techniques such as ANN and GA in data prediction, the objective of this investigation was to develop an intelligent model where the optimum age at first calving and body weight can be identified corresponding to the maximum milk production at lowest investment, taking the case study of an emerging crossbreed cattle, Vrindavani.

2. Materials and Methods

2.1. Location and Climatic Variables

This research work was carried out at the Indian Veterinary Research Institute’s cattle and buffalo farm in Izatnagar (India), which is situated at an altitude of 169.2 metres above mean sea level, with a latitude of 28°22′ north and a longitude of 79°24′ east. The location is on India’s upper Gangetic plain [15]. The mean annual temperature is about 21°C. The average monthly temperature fluctuates between 13°C in January and 30°C in May, with extreme temperatures ranging from 5°C to 40°C.

2.2. Origin of Vrindavani Cattle

Vrindavani cattle are an emerging synthetic crossbred cattle strain of India developed in the year 2006 (Figure 1). It has the exotic inheritance of Holstein–Friesian (HF), Brown Swiss (BSW), Jersey, and indigenous inheritance of Hariana cattle at the Indian Veterinary Research Institute, Izatnagar, Bareilly, India [16]. This initiative was later renamed the All-IndiaCo-ordinated Research Project (AICRP) with the goal of developing dairy cattle breed capable of producing 3000 kg or more milk per lactation over the course of its life. Vrindavani is a four-breed synthetic crossbred cattle strain established in India by combining 1/2 Hariana × 1/2 HF, 1/4 Hariana × 1/2 HF × 1/4 BSW, and 1/4 Hariana × 1/2 HF × 1/4 Jersey. Currently, the Vrindavani males are used for the collection and freezing of semen and used in the field for producing progeny in subsequent generations. This crossbreed is well adapted to the Rohilkhand region of India, which is the study area of this investigation.

2.3. Data Collection

For this study, the data on birth weight, body weight, age at first calving, first calving milk output, feeding cost, and total investment were used. Data on birth were collected for 340 Vrindavani cattle from 2012 to 2020 (7 years) and age at first calving from 2015 to 2020 (5 years). The investment made on each cattle from birth till age at first calving was calculated using the information on feed supplied (dry fodder, green fodder, and concentrate) and general maintenance of the adult livestock unit (ALU). ALU is the numerical representation of cattle in a herd based on the live weight of a mature cow (400 kg considered for this study) and was used for determining the feed requirements. The amount of feed supplied to the cattle at different levels of maturity and the ALU considered have been given in Table 1.

2.4. Data Preprocessing

To promote better training of the ANN model, a mean-based normalisation strategy was used to reduce the variation between the data sets of the independent variation. Equation (1) was used to carry out the data normalisation.where represents the actual variable value, is the normalised variable value, and and are the minimum and maximum variable values, respectively.

2.5. Data Modeling and Optimization
2.5.1. Second-Order Modelling

Using the regression function in the data analysis tool pack, a second-order model was built between the independent (age at first calving, body weight) and dependent variables (milk yield, investment) in MS Excel v. 2012. Equation (2) is a mathematical expression of the second-order model.where , , are the model coefficients and and are the variables associated with age at first calving (AFC) and body weight (BW), while represents the response variable.

2.5.2. Artificial Neural Network (ANN) Modelling

The data were subjected to ANN modelling in MATLAB v. 2012a (MathWorks, Inc., USA). The ANN infrastructure consisted of an input and output layer with two neurons each and a hidden layer with varying neurons. A tansigmoidal transfer function was used to process the input signal to the hidden layer (based on a trial-and-error approach), while a linear approximation function (PURELIN) was used on the output layer. The data was split randomly using the dividerand command, in the ratio of 70 : 30 between training and testing. Individually, the effectiveness of the Bayesian regularization (BR), Levenberg–Marquardt (LM), gradient descent with momentum, and adaptive learning rate back propagation (GDX) training algorithms was assessed. Models with varying numbers of hidden layer neurons (HLNs) and training algorithms were created, and the best one was chosen based on the highest correlation coefficient (R), and the lowest MSE. The termination criterion for model training was set as 1000 epochs or a 106 error tolerance, whichever was earlier. The equations (3)–(7) were used to collect model weights and biases in order to create the final ANN model. The input layer signal to the hidden layer neurons (HIj) was defined aswhere and are the input parameters, is the neuron number (1 or 2) in the hidden layer, and are the network weights connecting the input and hidden layer, and is the network bias.

The output signal from the hidden layer after processing can be expressed as equation (4) and can be expanded further as shown in equation (5).

The input signal to the output layer was approximated by the PURELIN transfer function and the final output of milk production and investment was expressed as equations (6) and (7), respectively.

The final models were used as objective functions for performing data optimization using a genetic algorithm.

2.5.3. Genetic Algorithm (GA) Based Data Optimization

The MATLAB optimization toolbox’s “ga-multiobjective” function was used to carry out multiobjective optimization. The goal of the optimization function was to enhance milk output while lowering the investment. The milk yield model was given a negative sign because GA is a default minimization function. This aided milk yield propagation in the negative direction, resulting in a greater absolute value. If the number of generations reached 400 or the distance between people in a generation was less than 10−6 (whichever was earlier), the optimization method was ended.

3. Results and Discussion

3.1. ANN Model

Age at first calving and body weight were taken as input variables to the ANN model with milk production, and investment as output. To avoid excessive network complexity, the number of HLs was taken in the range of 1-2. Figure 2 depicts the topology of the model used to forecast milk output and investment.

The number of iterations was limited to 1000 based on preliminary model runs, which revealed no additional increase in target accomplishment beyond 1000 iterations. The prediction capability of the ANN model in terms of R2 and MSE was found to be better with 2 HLNs as compared to 1 HLN. Furthermore, it was observed that BR performed significantly better over other training algorithms when 1 HLN was used while LM and BR showed comparable performance in terms of R2 when 2 HLNs were used. Dongre et al. [17] also reported a similar performance of LM and BR for the prediction of milk yield in Sahiwal cows. However, in the current study, LM showed lower MSE than BR with 2 HLNs and was therefore selected as the training algorithm for developing the final model. In contrast, Akilli and Atil [18] reported a better prediction ability of BR over other training algorithms for determining 305-day milk yield. Tables 2 and 3 show the comparative evaluation of all three training algorithms (BR, LM, and GDX).

The ANN models for milk production and investment were determined from equations (8) and (9) as

3.2. Second-Order Model and Performance Comparison with ANN

Similar to the ANN model, AFC and BW were used as independent variables, with milk production and investment as dependent variables, in the second-order model. The R2 for the milk production model was determined as 0.062, while it was 0.999 for the investment model (Table 4). The investment was found to be significantly influenced by the linear terms of BW, AFC, and the nonlinear term of BW, with no interaction effect between the variables. Unlike AFC, a gain in BW can influence the investment up to a certain point, after which it may be unaffected or even decline. This also suggests that optimising BW may be more essential in terms of investment than optimising AFC.

The ANN model outperformed the second-order model, as evidenced by its higher R2 value of 0.999 and lower MSE. Furthermore, instead of utilising two distinct second-order models that increase computational power, a single ANN model was sufficient to forecast both milk production and investment. Another benefit of the constructed ANN model is its capacity to adapt to new data, if supplied at a later stage, which is unlikely with second-order models. Dongre et al. [17], Bhosale, and Singh [19] also reported superior performance of ANN models over multiple regression models for the prediction of milk production.

3.3. Optimization by GA

A constraint-dependent generation function was used to generate a population size of 340 at random. The upper and lower bounds of the levels of process variables were used to set the constraints in the following order: milk yield, investment, with the lower bound corresponding to [0, 0] and the upper bound corresponding to [1, 1]. The tournament technique was used to choose the individuals within the population for crossover, with a default size of two. A heuristic crossover function was used with a crossover probability of 0.7. In the case of population mutation, an adaptive feasible function produced better optimized solutions than a constrained dependent function, and hence the former was used. The search algorithm’s termination criterion was set at 400 generations or a functional tolerance level of 10−6, whichever was earlier. The optimum fitness values increased over generations until they partially converged at roughly 200 generations. The search method was allowed to run for 400 generations, yielding 11 solution sets (Table 5), out of which, the best solution was chosen based on higher milk yield and lower investment.

The ideal points for both generations are located in the Pareto front (Figure 3(a)). The average Pareto spread, or spread of the solutions, was plotted with increasing generations which indicated that the average spread of solutions was smaller than 0.1, marking the onset of solution convergence (Figure 3(b)). Here, convergence was defined as the solutions becoming more similar and eventually achieving the same values. The spread across generations grew by 200–300%. The overall best fit for the multi-objective optimization problem was selected from the final solution set.

It was observed that if the age at first calving of a cow is 768 days and the body weight is ∼281 kg, the milk production could reach as high as 2660 kg (2595.31 L) at a relatively lower investment (₹113906.38 or ∼$1430). Although the results of Table 1 suggest that higher milk production (5700–6700 kg) is possible, it greatly increases the cost as well as the age at first calving which ultimately becomes unsustainable. For cross-validation of the optimized results, cattle with age at first calving in the range of 760–800 days were selected. It was observed that their body weight ranged from 200.6–270.5 kg. The corresponding milk production was found to increase from ∼1500 kg (at 760 days AFC) to 2600–3500 kg (at 783 days AFC) for varying body weights and subsequently reduced at higher AFC. The actual body weight at AFC of 783 days at which the highest milk production was recorded (∼3500 kg) was 240.3 kg which is lower than the optimum value. Lowered body weight would increase the investment, thus affirming the importance of the optimum points reported in this investigation.

Currently, farmers do not have any technique to determine what the age at first calving and corresponding body weight should be for higher and economic milk production. The work presented here provides a method and a model to determine what the optimum or rather lowest age at first calving should be for increasing the lactation length while maintaining higher body weight, and ultimately increasing the milk production. Since the researchers in the agricultural sector are in constant touch with small and marginal farmers, they will be able to provide and convey the results of this study to the farming community in a way that the latter can understand. This work, therefore, intends to uplift the socio-economic status of the farmers.

4. Conclusion

In this work, the age at first calving and cattle body weight were used to predict the milk yield and investment in cattle through a machine learning approach. ANN infrastructure with BR training algorithm and 2 HLNs were found suitable for the prediction with an R2 of 98% and MSE of 9.39 × 10−7. The optimized solution set (768 days as AFC and with a body weight of 280.89 kg) corresponding to the lowest investment in cattle with the highest milk yield as determined by GA (milk yield of ∼2595 L at ₹113,906.38 investment). Furthermore, the work revealed that the ANN model outperformed the second-order model and, therefore, may be well suited for forecasting of other livestock production and management parameters. The results of this investigation will help dairy business operators/farmers to increase their milk production and revenue given the high global demand of milk and milk products.

Data Availability

All relevant data pertaining to this work have been provided within the article. Any additional data can be made available on reasonable request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.