#### Abstract

Computer science and technology under the background of big data are closely related to the development of modern agriculture. The application of information processing technology in aquaculture will promote the scientific development of aquaculture. The aquaculture water quality directly affects the effect of aquaculture. Therefore, on the basis of the dynamic monitoring model of water quality, the relevant factors affecting water quality were analyzed, and a prediction model of aquaculture water quality was constructed. Considering the complex relationship between dissolved oxygen and water quality, combined with principal component analysis, a PCA-BP (principal component analysis back propagation) water quality prediction model was proposed. The parameters of PCA-BP water quality prediction model were optimized by genetic algorithm, the threshold and weight of BP neural network were determined, and an improved PCA-BP water quality prediction model was constructed. The experimental results show that the relative error of the GPCA-BP water quality prediction model for the prediction of dissolved oxygen content is less than 0.76% in water quality prediction experiments in different times and regions, and it has the best prediction accuracy. At the same time, GPCA-BP water quality prediction model also has excellent performance in convergence accuracy, prediction accuracy, and MAE error performance test. The research content has important reference value for the application of information technology in modern aquaculture.

#### 1. Introduction

Traditional aquaculture faces problems such as backward technology, many aquatic diseases, and low aquatic product output, which is not conducive to the development of modern aquaculture. In modern freshwater aquaculture, water quality is the key to aquaculture. In aquaculture, the water quality needs to consider many factors. Once there is a problem with the water quality, a large number of fish will die. Therefore, in order to further analyze the complex relationship between water quality factors, the water quality monitoring model was applied to the aquaculture environment. After analyzing a large number of references and aquaculture materials, the aquaculture quality model is based on dissolved oxygen monitoring, and a dissolved oxygen machine learning model is constructed through big data technology and computer technology. This technology combines the Internet of Things with a water quality monitoring system, monitors the hydrological data environment through hardware, and realizes the prediction and processing of hydrological data through machine learning algorithms. This includes the prediction and analysis of the concentration of dissolved oxygen, oxygen consumption of aquatic organisms and other parameters, and the regulation of aquaculture according to the preset concentration threshold. With the help of big data technology, the precise management and control of aquaculture can be realized, the occurrence of aquaculture diseases can be effectively prevented and controlled, and the quality of aquaculture can be improved in an all-round way. The research content innovatively combines big data technology and Internet of Things technology to achieve dynamic management and control of aquaculture, providing an important reference for the development of modern agriculture.

#### 2. Related Work

With the innovation and development of computer science, big data mining technology is widely used in the fields of data classification, mining, electricity, and modern social prosperity. Experts at home and abroad have done a lot of research on big data mining algorithms. An efficient scheme for massive data storage space using filters is proposed, which uses fuzzy operations to adjust hash data from one filter to another to reduce storage requirements. The experimental test results show that the amount of data processed by this scheme is 1.9 times that of the standard [1]. Chen et al. found that big data systems in reinforcement learning have problems with randomness and reliability, a dynamic coherence quality metric based on an axiomatic framework is proposed, and the proposed metric is applied to three empirical studies of wavelet-based big data systems. In data analysis, experimental performance results show that the adopted scheme has excellent performance in terms of efficiency and robustness [2]. A combined dynamic DNA image encryption algorithm is proposed. Three-neuron fractional discretization is used as a pseudorandom chaotic sequence generator. Experimental test results show that the algorithm has better performance than the algorithm reported in [3]. Deblais et al. discovered that technological advances have enabled the generation and storage of large sets of information, using big data to examine foodborne pathogens in poultry, and experimental results show that genomic approaches reveal the importance of gut microbiota in health and disease [4]. It is a challenge to discover that industrial big data technologies use limited energy to transmit data. Therefore, the approximation offline algorithm and the maximum {2,beta} competitive ratio online algorithm are proposed. The performance test results show that this scheme proves that there is no online algorithm with constant competition ratio [5]. Ni et al. found that using a cyberphysical system to process a large amount of heterogeneous data has a good effect, and a joint network structure is proposed. The experimental performance test shows that the method has good performance [6]. Guan and Zhao studied the trajectories and behavior patterns of shrimp fishing boats in a fishing trajectory system and used big data technology to design a shrimp farm distribution management system based on a back propagation algorithm. Experimental test results show that the solution can achieve effective monitoring of trajectory and shrimp distribution [7]. Kumar found that information data security affects people’s daily work. He designed a malware monitoring scheme using big data technology and machine science technology. The simulation test results show that the accuracy rate of the proposed model reaches 99.8% [8]. Gu et al. use big data technology to analyze enterprise organizational resources and improve enterprise performance. The experimental results show that the data analysis ability of the enterprise directly affects the supplier and individual performance development [9]. The origin of the Internet of Things technology and the current situation at home and abroad are analyzed; combined with the management concept of the Internet of Things, a variety of sensors are built in, including identification and communication technologies. The performance simulation test shows that the program can achieve effective management of employees [10].

Computer information processing technology has important application value in agricultural production, and big data algorithms have outstanding effects in aquaculture. Experts at home and abroad have conducted a lot of research on this. Shi et al. found that water quality directly affects fish survival during long-distance transportation. In the past research, the simulation was mainly aimed at the transportation time, which could not play an effective role. According to the research on aquatic water quality, it is found that parameters such as nitrite and dissolved oxygen affect the survival of fish. Therefore, through data collection, a water quality model is constructed through multiple layers of neural networks. After testing, the proposed model can effectively monitor the water quality status with low error and meet the requirements of aquatic product transportation [11]. Mathisen et al. conducted research on the development of modern aquaculture, and traditional aquaculture faces many challenges. Therefore, an aquaculture case system was developed and designed, based on the analysis of traditional cases, to provide data reference for modern aquaculture and to verify the feasibility of the scheme through the Siamese neural network. Tests show that the proposed scheme is reliable in aquaculture [12]. Li et al. found that dissolved oxygen is an important parameter for freshwater aquaculture, and the content of dissolved oxygen will affect the development of aquaculture. The traditional water quality testing scheme has the characteristics of multivariate, and the prediction effect is low. Therefore, a model is constructed by combining convolution and long- and short-term networks, and the model time series is processed. The test results show that the proposed scheme has better dissolved oxygen and oxygen prediction performance compared with the traditional scheme and at the same time has lower error performance, which meets the requirements of aquaculture [13]. Zhang et al. designed a monitoring system for aquaculture through research on the Internet of Things technology. The equipment includes sensors such as dissolved oxygen monitoring and water temperature monitoring, and they developed a communication platform based on a wireless communication network monitoring and processing. After testing, this scheme can effectively realize the transmission of data and monitor the water environment [14].

Research on related technologies at home and abroad shows that big data mining technology and artificial intelligence algorithm technology are widely used in the field of data analysis and prediction. The application of neural network technology in modern aquaculture has a positive impact on the development of aquaculture.

#### 3. Construction of Aquaculture Water Quality Prediction Model Based on Improved PCA-BP

##### 3.1. Research on the Prediction Model of Dissolved Oxygen in Water Quality

In modern aquaculture, the dissolved oxygen parameters of fish pond water will determine the quality effect of aquaculture. Therefore, aquaculture needs to master the content of dissolved oxygen in the pond to ensure the healthy survival of aquatic organisms in the fish pond [15]. The research on the dissolved oxygen level in fish ponds found that many factors in the fish ponds have an impact on the parameters of dissolved oxygen index [16]. Therefore, combining modern aquaculture technology with dissolved oxygen monitoring methods can achieve accurate prediction of dissolved oxygen in fish ponds.

Considering that fish pond water quality prediction is a complex nonlinear mapping problem, it is difficult to achieve accurate prediction results by using common prediction methods, principal component analysis (PCA) method combined with BP (back propagation) neural network. The network will be used to deal with complex nonlinear mapping problems and achieve effective prediction of aquaculture water quality. The idea of principal component analysis is to use the idea of covariance to deal with complex data sample problems and obtain important weights of data and main data [17]. The number of sample variables is defined as the number of samples, the total number of samples is , the first sample variable is , and the first variable index of the sample is ; then, the initial matrix expression of the sample is obtained as

The PCA method is used to solve complex data problems, which requires linear transformation of the matrix to retain the original sample data information and generate new variable data. Described using parameters , the new variable data lacks correlation expression and is described using a linear combination of the original variable data, as shown in the following:

In Equation (2), denotes the variables of the matrix. After obtaining the linear combination description of the original matrix, the matrix expression can be represented by a matrix on a matrix as shown in the following:

In Equation (3), each row of the matrix is a unit row vector, and the matrix is shown as follows: Matrix is shown in the following:

In Equation (3)-Equation (5), and are coefficients 0 and . Each row of the matrix will be sorted from large to small in the form of linear combination variance, indicating that the original data variables are subsorted. When the variance of the principal component accounts for the variance of the principal component, the new variable can replace the total variance of the original variable. it is more than 85%, the dimension of the data can be reduced, thus making data processing simpler and retaining the original data information.

A BP neural network is a prefeedback system that learns to respond to mapping associations between data input and output. The structure of BP network is mainly composed of input, output, and hidden parts. Figure 1 is a schematic diagram of the basic BP neural network structure [17].

In order to illustrate the principle of the three-layer neural network structure, the input layer, hidden layer, and output layer of the BP neural network are defined as ,, and , respectively, and the total number of data samples is ; then, the th input value of the th sample is and the th node of the input layer. The threshold to the th node of the hidden layer is , and the weight of the th node of the hidden layer to the th node of the output layer is ; then, the formula of the hidden layer node is BP neural network:

In (6), is the hidden layer neural node, in which represents the hidden layer node parameters. Then, the nodes of the output layer are computed and represented as shown in the following:

In Equation (7), is the output layer neural node, in which represents the output layer node parameters. The global error function of the network is obtained using the activation function, as shown in the following:

In Equation (8), the ideal output is denoted, and the actual output is denoted. The output is adjusted according to the hidden layer node weights, and the learning rate is taken in the range of 0.1 to 0.3, as shown in the following:

In (9), the variable parameters are represented, and the adjustment formula of the weight of the hidden layer neurons of each neuron weight is finally obtained, as shown in the following:

In Equation (10), the output node error signal is denoted. The principle flow of BP neural network is shown in Figure 2.

Dissolved oxygen content in aquaculture is an important indicator of aquaculture, but water quality factors are complex. Therefore, the water quality monitoring model is constructed through BP, the nonlinear factors of water quality are dealt with, the effective monitoring of dissolved oxygen in aquatic products is realized, the data model is established, and the BP principle and data processing process are analyzed.

##### 3.2. Construction of Aquaculture Water Quality Prediction Model Based on Improved GPCA-BP

Considering the complex relationship between the factors of aquaculture water quality, BP neural network is used to deal with complex nonlinear problems, which has powerful computing power, but the processing and computing effect of high-dimensional output variables is not ideal. Therefore, the idea of combining PCA and BP algorithm is adopted to reduce the dimension of the data and establish a new PCA-BP model. The principle of the model is shown in Figure 3.

In the aquaculture water quality prediction model, various factors such as dissolved oxygen content in water quality, pH (hydrogen ion concentration) value of water quality environment, water temperature, air humidity, ammonium carbon concentration, light, and wind are comprehensively considered. It will affect the dissolved oxygen content [18]. Improve the accuracy of water quality testing. The factors affecting dissolved oxygen need to be fully analyzed. Define the original data of fish pond water quality as the minimum value and the maximum value of the original data of the fish pond . Use the data compression formula to compress the data range to better participate in model testing. The compression processing expression is as shown as follows:

The data of each factor in the pond is compressed, and the standardized data table is used to process the data. The standardized data expression is shown in the following:

In ((12)), the input value of thenormalization process is represented, the output value of the normalization process is represented, the input error signal is represented, and the output error signal is represented. There are eigenvalues defined as . According to the characteristic equation, we can find that the range of eigenvalues is represented by the eigenvectors , and then, the eigenvalues are calculated and expressed as shown in the following:

The contribution to the data is calculated as shown in the following:

The cumulative contribution after processing the data is shown in the following:

Then, according to formula (11)-formula (15), the principal component loading matrix is calculated, and the dissolved oxygen influence factor expression as shown in formula (16) is obtained.

In the above analysis, principal component analysis is used to reduce the dimensionality of the pond data to avoid the influence of the mutual interference between complex factors on the prediction effect of the model. However, the weight parameters and threshold parameters will directly affect the performance of the model, so consider using genetic algorithm (GA) to optimize the weights and thresholds of the BP neural network.

The GA realizes the optimal calculation of data by simulating the biological evolution process. In the optimization of BP neural network, the parameters that need BP optimization are arranged and combined into the chromosome factors of the genetic algorithm. A variety of factors constitute a chromosomal population, and the best genetic genes are selected through population fitness rules. Due to the difficulty in the selectivity of the crossover mutation process of the algorithm, the probability adaptation of the crossover mutation process needs to be adjusted to ensure that the inherited base is the optimal gene. The crossover rate adaptation is adjusted as shown in the following:

In (17), represents the maximum value of the crossover rate, represents the minimum value of the crossover rate, represents the transformation constant and takes a value of 9.9034, represents the change of individual adaptation, and represents the maximum value in the adaptation. Adjust as shown in Equation (18). The following is the variance rate adaptation:

In ((18)), the maximum value of the change rate is shown, and the minimum value of the change rate is shown. The parameters of the BP neural network are optimized by the GA, and the parameters are embodied in the form of individual codes, including the input layer weight parameters, the number of hidden threshold layers, the output layer, and the parameter network of the BP neural network. The individual chromosomes of the genetic algorithm are adjusted by the adaptation formulas of Equations (17) and (18). Among them, it is necessary to judge whether the chromosome settings meet the requirements and whether the decoded thresholds and weights meet the requirements. If the requirements are not met, the fitness needs to be calculated in a loop until the best iterative result is obtained. The final optimal chromosome decoding obtained by the GA is the optimal initial threshold and weight of the BP neural network. As shown in Figure 4, the test principle of the improved GPCA-BP water quality prediction model after the GA is optimized.

The traditional BP model has a good treatment effect on the complex factor relationship of water quality, but it cannot meet the requirements of high-dimensional output variable processing, so the BP model has fallen into local convergence and cannot achieve effective monitoring of dissolved oxygen in water quality. Therefore, PCA is used to reduce the dimension of high-dimensional data of water quality. At the same time, the parameter selection of the BP model affects the overall training effect of the model, so the genetic algorithm is used to optimize the parameters of the BP model to improve the overall applicability of the model.

#### 4. Test of Aquaculture Water Quality Prediction Model

The performance of the improved PCA-BP algorithm is tested, and the three-layer BP neural network structure is used for performance testing, and the data is simulated and analyzed on the MATLAB R2012b platform. The system platform is Win10, the hardware is 3.5 GHz processor, and the running memory is 8 G. Considering the randomness of the traditional BP neural network in the threshold and weight selection, the initial parameter values of the BP neural network are obtained through the GA. The population size value is set to 40, the number of hidden layer nodes of the BP neural network is set to 22, the maximum number of GA iterations is set to 100, and the crossover rate is set to 0.6. The 12-day fish pond data of a breeding base in May was collected as the experimental sample data. The data includes water temperature, humidity, pH, nitrogen, and oxygen concentration collected by the system, and the data is a total of 3000 tests. The data were normalized and preprocessed to obtain experimental feature data. The initial data of the monitoring part of aquaculture water quality are shown in Table 1.

Considering the influence of the weight and threshold parameters of the PCA-BP water quality prediction model on the model performance, in order to improve the data monitoring performance of the model, the GA was used to obtain the optimal parameter neural network of BP after 100 iterations of testing, as shown in Table 2. The thresholds and weights of the BP neural network are obtained from the output of the BP neural network optimized by the GA.

The optimal initial parameters of the improved PCA-BP water quality prediction model are obtained through the GA, and the obtained parameters are the training parameters of the BP model, and the number of iterations of the BP model is set to 100 times.

The convergence accuracy of multiple models is compared. As can be seen from Figure 5(a), compared with the traditional BP and PCA-BP models, the proposed GPCA-BP water quality prediction model has the best convergence performance, and the number of training iterations is 10. When the convergence result is reached, the convergence accuracy value is 0.012. The worst performer is the traditional BP model, which tends to converge after 18 iterations, with an accuracy value of 0.009. Figure 5(b) shows the prediction performance results of various water quality prediction models. Since traditional prediction models cannot adapt to high-latitude sample data, only PCA-BP and GPCA-BP models are compared. It can be seen from the figure that the proposed GPCA-BP water quality prediction model has more accurate prediction performance, and the overall fluctuation range is -0.02~0.03, while the traditional PCA-BP has a larger overall fluctuation, with an interval of -0.04~0.05. It can be seen that the proposed GPCA-BP water quality prediction model has good performance. Select the time period from 9:00 to 10:00 in May, and conduct 20 dissolved oxygen concentration tests on the data of aquaculture fish ponds. Before the experiment, the GPCA-BP water quality prediction model was improved, the dimension of the data was reduced, and the initial threshold and weight were obtained through GA optimization. The predicted results of dissolved oxygen in fish ponds at a certain time are shown in Figure 6.

**(a) Comparison results of convergence accuracy**

**(b) Error comparison results**

The red in the figure is the real measurement value, and a total of 20 groups of real data samples were measured. It can be seen from the data in the figure that the dissolved oxygen content of fish ponds in different time periods is different. Considering the influence of temperature, light, ammonia nitrogen concentration, and pH value on dissolved oxygen content, these factors were ignored, and the experimental average was taken to ensure the accuracy of the experiment. There is a big difference between the traditional BP neural network prediction model and the measured concentration curve. Sample 8 has the largest relative error, with a relative error of 1.26%. At this time, the actual value of dissolved oxygen in the fish pond was 7.76 mg.l, and the measured value was 8.06. The overall prediction performance is poor. In the PCA-BP water quality prediction model, the prediction curve is closer to the true value curve, but there are still large fluctuations. The best prediction effect is the GPCA-BP water quality prediction model, which is basically consistent with the actual predicted value curve, and the overall fluctuation is small. In sample 2, sample 4, sample 6, and sample 8, the measured dissolved oxygen concentrations were 6.95 mg.l, 7.42 mg.l, 7.16 mg.l, and 7.68 mg.l, respectively, with a relative error of less than 0.8%. Finally, it is shown that the GPCA-BP forecasting model has good forecasting performance in actual forecasting, considering the short test range time, less sample data, and other effects. Therefore, the mean square error (MSE) performance of different time periods and different fish pond waters was tested, and the test results are shown in Figure 7.

**(a) Test location 1**

**(b) Test location 2**

**(c) Test location 3**

Three representative areas of the pond were selected for model error performance testing. Figure 7(a) shows the results of water quality testing in the fish rearing area. The maximum error value is obtained at 15:00. Considering that the test area is a feeding area and is easily affected by complex factors, the results of the three model tests all fluctuate greatly at this moment, mainly due to the influence of air temperature and water temperature. But the MAE error performance of the proposed GPCA-BP water quality prediction model is the best. At 15:00, the MAE value is less than 0.196. Figure 7(b) is the water intake area of the fish pond. This area has the least influencing factors. Due to the high dissolved oxygen content in the water intake area, the interference of sunlight and temperature on the test results can be reduced. Therefore, the prediction accuracy of PCA-BP and GPCA-BP water quality prediction models has been improved, but the traditional BP water quality prediction model cannot handle complex high temperature data, the overall error is large, and the MAE value is greater than 0.34. Figure 7(c) is the central area of the fish pond, and the influencing factors are relatively balanced, which can reflect the predicted actual effect of most fish ponds. The traditional BP neural network has the largest error value at 15:00 MAE with a value of 0.379, and the prediction error performance is the worst. The PCA-BP prediction model also reached the maximum MAE error value at 7:00 with a value of 0.286, and the GPCA-BP water quality prediction model reached the maximum MAE error at 13:00 with a value of 0.192. It can be seen that the proposed GPCA-BP water quality prediction model has excellent prediction accuracy performance in different time periods and different regions. Finally, fish ponds were selected for 20 water quality tests at different times and in different regions. The results of each water quality prediction model are shown in Table 3.

It can be seen from the table data that the GPCA-BP water quality prediction model has the best prediction performance for dissolved oxygen content, and the relative error between the actual value and the predicted value can be reduced to less than 0.76%, while the BP model and the GPCA-BP model have the largest errors, respectively, which were 1.96% and 1.16%. To sum up, compared with the PCA-BP model, the algorithm performance of the GPCA-BP model is stronger and the error is lower. In the actual aquatic product prediction, the accuracy rate is high, the detected real value is close to the actual value, and the water quality prediction effect is the best, which meets the requirements of modern aquaculture.

The PCA-BP model optimized by principal components and the GPCA-BP model optimized by principal components and genetic algorithms are compared. In the simple model performance test, the hybrid GPCA-BP model outperforms the first two models in both iterative performance and convergence performance. In water quality detection, the hybrid GPCA-BP model has better prediction accuracy and lower error in different time periods and water quality environments. Aquaculture has very high requirements on water quality, especially differences in dissolved oxygen concentration, pH value, water temperature, etc., which will directly affect the survival of fish. Overall, the GPCA-BP model is more in line with the requirements of aquaculture development. Compared with the traditional water quality detection scheme, the proposed scheme combines machine learning and data mining technology and improves the traditional BP model, the overall monitoring accuracy is better, and the accuracy is better.

#### 5. Discussion and Conclusion

Modern aquaculture is inseparable from the support of advanced intelligent technology. The application of information technology in aquaculture is of great significance to the development of modern fisheries. Dissolved oxygen content is an important criterion for measuring the aquaculture water quality. The relevant factors affecting water quality were analyzed, and a BP neural network water quality prediction model was constructed. Considering that the water quality monitoring data is high-dimensional data, the principal component analysis method is used to process the high-dimensional data, and the PCA-BP water quality prediction model is constructed. The weight and threshold parameters of PCA-BP water quality prediction model were optimized by genetic algorithm, and an improved GPCA-BP water quality prediction model was constructed. The experimental test results show that in the model performance test, the GPCA-BP prediction model has a shorter convergence time than the BP and PCA-BP prediction models and tends to converge after 10 iterations, with a convergence accuracy of 0.012, while in the traditional BP, the prediction model tends to converge after 18 training iterations with a convergence accuracy value of 0.009, the worst performance. In the predicted results of dissolved oxygen in fish ponds in a certain period of time, the GPCA-BP prediction model is in sample 2, sample 4, sample 6, and sample 8, the predicted value of dissolved oxygen concentration is 6.95 mg.l, 7.42 mg.l, and 7.16 mg.l, and the relative errors of 7.68 mg.l were less than 0.8%, respectively, with the best prediction accuracy. According to the mean square error results of different periods and regions, the MAE value of the GPCA-BP water quality prediction model is less than 0.196. Compared with the BP and PCA-BP models, the prediction accuracy is the best. It can be seen that, compared with the traditional aquatic product water quality monitoring scheme, the proposed scheme is more intelligent and the monitoring is more accurate. At the same time, a more advanced combined learning scheme can be used to deal with more complex water quality variables, which cannot be satisfied by traditional schemes. However, there are also problems in the research. The water quality environment still needs to fully consider the relationship between pH, temperature, and other data and needs to be further improved in the later work.

#### Data Availability

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

#### Conflicts of Interest

The authors declare no conflict of interest in this article.

#### Acknowledgments

In 2020, “Intelligent Aquaculture System Based on Artificial Intelligence Technology” was funded by the Special Project of Key Areas of Colleges and Universities in Guangdong Province (Rural Revitalization) (Project No. 2020ZDZX1077). In 2022, “Research on Key Technologies of Adaptive Intelligent Oxygenation Control System” was funded by the Special Project of Key Areas of Colleges and Universities in Guangdong Province (Rural Revitalization) (Project No. 2022ZDZX4105). In 2021, “Development and Application of Zhuhai Baijiao Sea Bass Intelligent Breeding System Under the Background of Rural Revitalization” was funded by the Social Development Science and Technology Plan Project in Zhuhai (Project No. ZH22036201210141PWC). In 2021, Zhuhai City Polytechnic supported the Natural Sciences Key Scientific Research Project “Research on DO Intelligent Monitoring and Prediction System Based on Deep Learning Algorithm in Complex Breeding Environment” (Project No. KY2021Z02Z).