#### Abstract

Establishing the indoor and outdoor humidity values in a greenhouse allows us to describe the crop yield during its entire developmental cycle. This study seeks to develop a predictive model of indoor relative humidity values in a greenhouse with high accuracy and interpretability through the use of optimized fuzzy inference systems, in order to offer greenhouse users a clear and simple description of their behaviour. The three-phase methodology applied made use of descriptive statistics techniques, correlation analysis, and prototyping paradigm for the iterative and incremental development of the predictive model, validated through error measurement. The research resulted in six models which define the behaviour of humidity as a result of temperature, CO_{2}, and soil moisture, with percentages of effectiveness above 90%. The implementation of a Mamdani-type fuzzy inference system, optimized by a hybrid method combining genetic and interior point algorithms, allowed to predict the relative humidity in greenhouses with high interpretability and precision, with an effectiveness percentage of 90.97% and MSE (mean square error) of 8.2*e* − 3.

#### 1. Introduction

Greenhouse farming uses facilities and technologies that meet the growth and development requirements with the aim of establishing or improving environmental meteorological factors to provide a suitable growing environment. This allows to reduce dependence on the natural environment, protect plantations from bad weather, diseases, and pests, and increase production and yield and requires less land area for agricultural activity [1–6].

Similarly, it has been evidenced that the production of crops is affected by the climate generated by the greenhouse, influenced by external climatic conditions such as wind speed, solar radiation, outdoor temperature, and humidity, the structure of the greenhouse, the type and state of the crop, and activation control signals, such as ventilation, heating, and CO_{2} injection, which influence photosynthesis and evaporative cooling to enrich humidity and decrease air temperature [1–3].

The greenhouse climate then becomes a time-varying, nonlinear, and distributed parameter system, with multiple interactions between system inputs and outputs. It presents a dynamic behavior that involves the combination of physical processes such as energy transfer (radiation and heat) and mass transfer (water vapor fluxes and CO2 concentration) that take place inside and outside the greenhouse [5–11].

Some proposals have modeled the greenhouse environment through the use of differential equations for air temperature, humidity, and CO_{2} concentrations, deriving the appropriate mass and energy balances, for indoor temperature and indoor humidity [1, 3, 8, 12–15]. However, these models are difficult to describe mathematically, especially when the structure of the system and the relationship between variables are not well known or too complex. To solve these limitations, artificial intelligence (AI) techniques are implemented—such as artificial neural networks (ANNs), fuzzy logic (FL), and genetic algorithms (GAs), which allow developing various control and modeling techniques [6, 14].

The main environmental factors affecting crops are humidity, temperature, and shade conditions. Proper management of these conditions increases the degree of vitality and yield of plants [13, 16]. Control and modeling of humidity for greenhouse climate management plays a key role here, as it improves the biological processes of plants in relation to transpiration and photosynthesis, which translate into a positive effect on crop growth and production [3, 6, 9].

The use of fuzzy inference systems constitutes a framework for modeling complex nonlinear relationships that provides advantages such as the reasoning mechanism in understandable human terms, the ability to take linguistic information from human experts and combine it with numerical data, simultaneous manipulation of numerical and linguistic information, the ability to approximate complex nonlinear functions with simple models, and representation in linear time-invariant local models [1, 4, 6–8, 10, 17, 18].

Similarly, fuzzy logic has been combined with control for the development of fuzzy controllers to manage greenhouse environments demonstrating that these systems perform with high accuracy in practical applications by maintaining variables at desired conditions and being resilient to disturbances [19, 20].

The processes and interactions of the greenhouse environment make this type of system a potent, efficient, and successful tool to precisely control the management of greenhouse systems in combination with GAs and ANNs [6, 15, 21]. In addition, the implementation of fuzzy clustering techniques allows automatic generation of fuzzy models. K-means, C-means, and subtractive fuzzy clustering are some useful techniques to describe complex dynamical systems and provide an automatic way to generate robust fuzzy systems [17].

On the other hand, many fuzzy modeling approaches concentrate on the accuracy of the model, that is, fitting data as accurately as possible, for which they implement both heuristic and exact AI-based optimization mechanisms, achieving significant improvement in the performance of the models [9, 22–27]. However, similarly they pay little attention to model simplicity and interpretability, which is considered the main objective of fuzzy inference systems [6]. Therefore, it is necessary to find solutions that maintain both the system accuracy and its interpretability, guaranteeing a better performance as well as greater understanding of the modeled system [28].

This research seeks to develop a predictive model of indoor relative humidity values in a greenhouse with high accuracy and interpretability through the use of fuzzy inference systems, optimized by heuristic and accurate methods, in order to offer greenhouse users a clear and simple description of their behaviour.

A moisture prediction model with high interpretability is presented based on a complete analysis of the variables that interact in a greenhouse environment, guaranteeing high accuracy values with the optimization of each model, a strategy that is not currently exploited [28]. The content was divided into the following sections: materials and methods, results, discussion, and conclusions.

#### 2. Materials and Methods

The methodological technique applied for obtaining the model conforms to three phases, as indicated in Figure 1. It starts with the analysis of data obtained from a greenhouse environment, continues with the construction of the setup through the iterative and incremental elaboration of software prototypes, and relies on model validation involving measurement of the degree of accuracy and interpretability to select the predictive model.

##### 2.1. Data Analysis

A statistical analysis was performed based on data obtained from the greenhouse environment of a bean crop in the city of Bogotá, Colombia [31]. Humidity and relative temperature, soil moisture, light intensity, CO_{2} concentration, luminosity, and activation of ventilation system, irrigation system, and heating system were measured. The crop’s behaviour was thus defined, and possible dependencies between the variables were found using RStudio and Orange statistical software.

Due to the data used, the model is restricted to values obtained under climatological conditions in the city of Bogotá D.C. and limited to the sensors used with values of(i)Internal relative humidity: 0 to 100%.(ii)Internal temperature: −40 to 80°C.(iii)Soil humidity: 1 to 1023 V transformed to 0 to 100%.(iv)Light intensity: (*λ*0.5) 350 to 375 nm.(v)CO_{2} concentration: 300 to 10000 ppm.(vi)Luminosity: 188 to 88000 Lux.

##### 2.2. Characterization of Variables

Exploratory statistics allow one to obtain the mean of each variable, its standard deviation, the coefficient of skewness, kurtosis, and the quantiles 0, 0.25, 0.5, 0.75, and 1. This provides a detailed characterization of the variables that form the fuzzy sets.

##### 2.3. Correlation Analysis of Variables

A multivariate statistical analysis was performed for each of the variables, using two Pearson and Spearman correlation coefficients, compared to the selection of attributes using the RReliefF method [32]. The dependencies between variables and their relationship with the indoor relative humidity were identified.

##### 2.4. Model Building

A prototype-based incremental iterative development cycle was applied, determining and verifying each of the parameters that conform to the proposed fuzzy inference systems together with a hybrid optimization mechanism, guaranteeing a high degree of accuracy in each of the proposals configured using Matlab.

##### 2.5. Determination of Input Variables

The input variables are variables that make up the input of each fuzzy inference system and that present a significant effect on the prediction of the indoor relative humidity in the correlation analysis.

#### 3. Prototyping from Different Perspectives

Six prototypes based on the Mamdani, Takagi–Sugeno (TS), and ANFIS fuzzy inference systems represented as were developed. For each of the configurations, a variation was implemented using C-means fuzzy clustering algorithms. 70% of the data from the dataset were used for training, defined as and , and the remaining 30% were used for its validation, and .

The Mamdani fuzzy inference system was defined by the *mamfis* function in Matlab, as shown in the following equation: It features the AND fuzzy operator method for the minimum fuzzy input values, the OR fuzzy operator method for the maximum fuzzy input values, the defuzzification method to calculate sharp output values using the centroid of the area under the output fuzzy set, the implication method to compute the consequent fuzzy set, the consequent membership function at the value of the antecedent result, and the aggregation method to combine rule consequents as the maximum of consequent fuzzy sets. On the other hand, the Sugeno fuzzy inference system is described by the *sugfis* function in Matlab, as shown in the following equation: It implements the AND fuzzy operator method as the product of fuzzy input values, the OR fuzzy operator method as the maximum of fuzzy input values and the probabilistic OR of fuzzy input values for their variation by clusters, the defuzzification method to calculate sharp output values defined as the weighted mean of all rule outputs, the implication method to calculate the consequent fuzzy set, the scale of the consequent membership function by the value of the antecedent result, and finally the aggregation method for combining rule consequents implementing the sum of consequent fuzzy sets.

The configuration of the ANFIS fuzzy inference system starts from the implemented configuration of the Sugeno fuzzy inference system and implements the functions *anfisOptions* and *anfis*, as detailed in the following equation:The maximum number of training epochs from the Sugeno model is defined at 200, the initial training step size at 0.1, and the optimization method by the hybrid method with Backpropagation for the input and least squares for the output.

The fuzzy clustering for the three types of fuzzy inference systems is defined through the Matlab *genfisOptions* and the *genfis* function, as indicated in the following equation:

A configuration with five clusters using the C-means method, a fuzzy partition matrix exponent of 2.0, a maximum number of iterations of 100, and a minimum improvement of the objective function of 1e-5 is implemented for the Mamdani and Sugeno fuzzy inference systems.

##### 3.1. Implementing Optimization Mechanisms

A combination of heuristic and exact techniques is used to optimize all elaborated model proposals, fitting the values that determine the shape of inputs and outputs with a hybrid function that executes a minimization by interior point algorithm after being processed by a genetic algorithm.

The interior point algorithm, represented as , is defined with the *optimoptions* function in Matlab to create the function that continues the optimization after the genetic algorithm finishes, as shown in the following equation:with a maximum number of function evaluations of 3000 and a maximum number of iterations of 30 before the algorithm stops.

On the other hand, the genetic algorithm is implemented using the same *optimoptions* function but includes in its parameters a hybrid function through *HybridFcn* to , as indicated in the following equation:

It determines a population size of 50 individuals, a maximum number of iterations of 200 before the algorithm stops, the number of individuals of the current generation that are guaranteed to survive until the next generation of 2, and the fraction of the population in the next generation (not including those who were guaranteed survival), which creates the crossover function at about 80% and determines the stop time of the algorithm after running for 300 seconds.

Finally, the optimized fuzzy inference system was defined at 7, implementing the Matlab functions *tunefis Options* and *tunefis*:

The hybrid function was defined as the optimization method starting with the genetic algorithm and following with the interior point algorithm, optimizing the existing input, output, and rule parameters without learning new rules and obtaining only the adjustable parameter configuration of the input and output of the fuzzy inference system by means of the *getTunableSettings* function in Matlab.

##### 3.2. Model Validation

Each of the proposed fuzzy inference systems was evaluated in regard to accuracy and interpretability by selecting the predictive model.

##### 3.3. Measuring the Degree of Accuracy

The six prototypes of fuzzy inference systems developed were run on 30 occasions, and the training and validation values of their maximum, minimum, mean square error , root mean square error , and mean absolute error were defined as indicated in the following equation:

##### 3.4. Comparison of Degree of Accuracy and Interpretability

The proposals were compared taking into account the highest degree of accuracy and interpretability of each, selecting the model with the highest degree of interpretation and precision. A *t*-test of two samples was performed to determine whether the data obtained by the prediction in relation to the actual data came from the same distribution with the same mean and variance.

The level of interpretation is defined on a scale of 1 to 6, where 1 indicates the lowest level and 6 the highest. Similarly, the degree of accuracy is defined by assigning a value from 1 to 6 to the respective model, where 1 represents the lowest and 6 the highest value of accuracy. This process is performed for the lowest value obtained from the 30 runs of the model and for the value of all its runs, to finally get an average of both precision values.

Finally, the interpretation level and precision degree values were added to define a selection ranking, where the model to choose corresponds to the one presenting the highest value.

The percentage of effectiveness of the model is defined using the following equation.

##### 3.5. Software Prototyping Paradigm

The prototyping-based paradigm offers the best methodology to evaluate the efficiency of proposed models and their adaptability in iterative and incremental software developments [30]. The process is characterized by five stages, described as follows:(1)Communication: the overall objectives of the software solution are defined, identifying the specific parameters and configurations of each model(2)Quick plan: quick planning of the iteration is done to make a prototype that meets the set needs(3)Modeling and quick design: modeling is carried out in the form of quick design that shows the configuration of the model, as well as the configuration of the user interaction interfaces(4)Construction of the prototype: the development of the software prototype of the model continues(5)Deployment, delivery, and feedback: the implemented prototype is evaluated and validated, in order to identify functionality and performance improvements if required in future iterations

#### 4. Results and Discussion

##### 4.1. Information Analysis and Correlation

A statistical and correlation analysis of a total of 61,925 records is performed with the dataset using statistical software, following these preprocessing tasks: removal of seconds in record dates to facilitate unification of data, selection of the study variables, deletion of the records with missing information, and normalization in the interval of [0, 1].

The statistical data for the variables are shown in Table 1. Most cases show positive kurtosis, except for soil moisture and temperature. Additionally, a negative skew coefficient is presented for relative humidity and soil moisture, evidencing a concentration of the data on the left. In the other cases, it is presented as a skew positive, concentrated on the right, only close to zero in the case of temperature.

The correlation analysis using the Pearson correlation coefficient is shown in Table 2, where the most direct correlation with indoor humidity is evident: temperature, followed by luminosity and subsequently the performance of ventilation and heating systems.

The results of Spearman’s correlation coefficient are shown in Table 3, which mostly indicate the same order of direct correlation between humidity and the environment variables resulting from Pearson’s coefficient.

The RReliefF attribute selection method resulted in the following order for humidity as the target variable of the greenhouse environment:(1)Temperature(2)Luminosity(3)Soil moisture(4)Intensity of light(5)Concentration of CO_{2}(6)Ventilation activation(7)Activation of the irrigation system(8)Activation of heating system

Based on previous analyses comparing correlation coefficients and attribute selection, it is determined that the variables that most affect indoor relative humidity values within greenhouses are(i)Temperature, which in turn relates to the values of luminosity and intensity of light(ii)Concentration of CO_{2}, which in turn relates to the actuators(iii)Soil moisture, which, in turn, relates to activation of the irrigation system

##### 4.2. Construction of the Models

Six proposed models were constructed using the Mamdani, Takagi–Sugeno (TS), and ANFIS fuzzy inference systems, along with their variation using C-means fuzzy clustering. They were optimized by a hybrid algorithm that combined the genetic and interior point algorithms. The current data and two shifts in time for temperature, CO_{2}, and soil moisture were used as input for the prediction of humidity. For each of the inputs, the following input fuzzy sets were proposed:(i)temp: current temperature(ii)temp-1: temperature with one shift in time(iii)temp-2: temperature with two shifts in time(iv)co2_ppm: current CO_{2} concentration(v)co2_ppm-1: concentration of CO_{2} with one shift in time(vi)co2_ppm-2: CO_{2} concentration with two shifts in time(vii)ground_humidity_per: current soil moisture(viii)ground_humidity_per-1: soil moisture with one shift in time(ix)ground_humidity_per-2: soil moisture with two shifts in time

The models without fuzzy clustering, all Gaussian, were comprised by three membership functions representing the values of the variable: low, medium, and high. On the other hand, the inputs of the models with clustering were composed of five membership functions of the same type.

The output fuzzy set conforms to five membership functions representing the values of the variable: very low, low, medium, high, and very high. Sugeno and ANFIS models are linear, and Mamdani models are triangular. For models with Mamdani-type fuzzy clustering, the output is composed of Gaussian-type membership functions. The range of values for system inputs and outputs is defined as follows:(i)Temperature between 0.2 and 0.8(ii)CO_{2}concentration between 0.0 and 0.4(iii)Soil moisture between 0.3 and 0.8(iv)Humidity between 0.4 and 1.0

The exploratory analysis defined the rules implemented in the models without fuzzy clustering as follows:(i)If the historical and current temperature is high, the CO_{2} with two shifts is medium and the previous and current is low; if historical and current soil moisture is low, then humidity is very low.(ii)If the temperature with two shifts is medium and the previous and current is high, the historical CO_{2} concentration is low and the current is medium; the soil moisture with two shifts is low and the previous and current is medium) then humidity is low.(iii)If the historical and current temperature is medium, the historical and current CO_{2} is medium; if the historical and current soil moisture is medium, then the humidity is medium.(iv)If the temperature with two shifts is medium and the previous and current is low, the historical CO_{2} concentration is high and the current is medium; the soil moisture with two shifts is high and the previous and current is medium, then humidity is high.(v)If the historical and current temperature is low, the CO_{2} with two shifts is medium and the previous and current high; the historical and current soil moisture is high, then humidity is very high.

#### 5. Measurement of the Error in Models

For the humidity predictive models, each of the six proposals was run 30 times. Table 4 presents the maximum and minimum error values, , , , and arithmetic mean of the of the training and validation phases.

The implemented model is related to the run that presents the best level of precision for each of the six proposals, resulting as follows:(i)Sugeno with fuzzy clustering, run 13 with an of 9.10*e* − 3.(ii)ANFIS with fuzzy clustering, run 26 with an of 4.75*e* − 3.(iii)Mamdani with fuzzy clustering, run 12 with an of 8.45*e* − 3.(iv)Sugeno, run 21 with an of 9.40*e* − 3.(v)ANFIS, run 20 with an of 5.20*e* − 3.(vi)Mamdani, run 5, 12, or 27 with an of 8.20*e* − 3.

The arithmetic means of the error values for the six proposed models are shown in Table 5, where the lowest errors are presented in the ANFIS models.

The values obtained in the training phase based on the prediction of each of the models with respect to the real values are shown in Figure 2, on the abscissa, the time period between May 24 and July 22, 2021, and on the ordinate, the relative humidity values in percentage (%).

The comparison of the values obtained from the prediction of the models and the real values in the validation phase is shown in Figure 3, obtaining homogeneous results in each model with a high degree of precision.

The obtained in each of the models for the training phase is shown in Figure 4, obtaining the error between the predicted moisture data and the real values, where the abscissa corresponds to time and the ordinate corresponds to error.

Similarly, the obtained in the models for the validation phase is shown in Figure 5, obtaining the error between the predicted moisture data and the real values.

##### 5.1. Model Selection

To select of the model to be implemented, each of the six proposals is related as shown in Table 6, indicating the of the best iteration, the mean of all iterations, and the precision value for both columns, and an interpretability value for the specific type of model is calculated, thus obtaining the ranking of the model and choosing the one with the highest value.

The model selected for relative humidity prediction is run 5 of the Mamdani-type fuzzy inference system, optimized by a hybrid method using genetic and interior point algorithms.

The temperature input of the model is shown in Figure 6, where the current temperature value input set corresponds to Figure 6(a), that with one shift corresponds to Figure 6(b), and that with two shifts corresponds to Figure 6(c).

**(a)**

**(b)**

**(c)**

The CO_{2} concentration value input set corresponds to Figure 7, where the current temperature value input set corresponds to Figure 7(a), that with one shift corresponds to Figure 7(b), and that with two shifts corresponds to Figure 7(c).

**(a)**

**(b)**

**(c)**

The soil moisture input is shown in Figure 8, where the current soil moisture value input set corresponds to Figure 8(a), that with one shift corresponds to Figure 8(b), and that with two shifts corresponds to Figure 8(c).

**(a)**

**(b)**

**(c)**

The output of the model, indoor relative humidity, is shown in Figure 9, where the initial structures of the very low, low, medium, high, and very high inputs are maintained, adjusting the values of their membership functions.

The ruled surface that reflects the behaviour of the rules through the sample space for the input and output variables is shown in Figure 10. The three components are specified at the current times where it is shown. The surface formed by the entries temp and co2_ppm for the output variable is shown in Figure 10(a), the surface formed by temp and ground_humidity_per with the output variable in Figure 10(b), and the surface formed by co2_ppm and ground_humidity_per with the output variable in Figure 10(c).

**(a)**

**(b)**

**(c)**

A two-sample *t*-test is performed, using the real data with those generated by the model, taking into account that , and the test produces a , and since the is greater than , the null hypothesis is not denied. Therefore, both come from samples with equal means and variances, where the test yields a possible variance value of 0.1301. Furthermore, taking into account the of the run 9.03*e* − 2, the effectiveness percentage is obtained with a value of 90.97% for this model.

#### 6. Discussion

Based on correlated variables and Mamdani fuzzy inference systems, the model used enabled a highly interpretable approach that allows users with no specialised knowledge in mathematical representations to interpret the behaviour of indoor humidity, facilitating monitoring and control of the variable in the development of greenhouse crops.

The models obtained have high levels of precision, without affecting their interpretability, optimized by a hybrid method that combines genetic and interior point algorithms.

A comparison is made with other greenhouse internal humidity prediction models, as shown in Table 7, comparing their MSE values obtained in other studies, showing that the developed model presents the highest accuracy value with an value of 8.2*e* − 3. It is also evident that the use of fuzzy inference systems does not significantly affect the accuracy value compared to the use of other accuracy-focused techniques such as neural networks or deep learning techniques.

It must be taken into account that the models are designed to consider the variables of indoor relative humidity, indoor temperature, soil moisture, light intensity, CO_{2}, luminosity, and the activation of ventilation system, irrigation system, and heating system. If it is necessary to add another actor or sensor, the correlation between the variables must be analyzed and it must be verified if the model could be implemented without modifications. In addition, if it is necessary to improve the precision with a specific crop, the model can be trained again, maintaining the same configuration and structure in the fuzzy inference systems.

The developed models can be used for the management of automated greenhouses, implementing fuzzy controllers or predictive controllers based on models. Similarly, to increase the levels of precision while maintaining interpretability, a network of fuzzy inference systems can be created by combining different model proposals developed in this study. However, due to the seasonal behaviour of humidity, ways can be investigated to guarantee a more precise model behaviour, either through combinations of various training methods for fuzzy inference systems and recurrent neural networks or through the development of an automatic adjustment of the models based on the error trend and a set of *n* historical data.

#### 7. Conclusions

The six different configurations established resulted in a high level of interpretability and a high degree of precision, using a predictive model of indoor relative humidity in greenhouses based on correlated variables, through a Mamdani fuzzy inference system with an effectiveness percentage of 90.97% and an value of 8.2*e* − 3.

For the prediction of indoor humidity in greenhouses, the use of fuzzy inference system configurations that implement optimization through a hybrid method made up of a genetic algorithm and an interior point algorithm significantly increases the degree of accuracy of the models, without affecting their level of interpretability.

#### Data Availability

The greenhouse data used to support the findings of this study have been deposited in the greenhouse_dataset repository (https://thingspeak.com/channels/1342776).

#### Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.