Abstract

Prediction of monthly mean sea surface temperature (SST) values has many applications ranging from climate predictions to planning of coastal activities. Past studies have shown usefulness of neural networks (NNs) for this purpose and also pointed to a need to do more experimentation to improve accuracy and reliability of the results. The present work is directed along these lines. It shows usefulness of the nonlinear autoregressive type of neural network vis-à-vis the traditional feed forward back propagation type. Neural networks were developed to predict monthly SST values based on 61-year data at six different locations around India over 1 to 12 months in advance. The nonlinear autoregressive (NAR) neural network was found to yield satisfactory predictions over all time horizons and at all selected locations. The results of the present study were more attractive in terms of prediction accuracy than those of an earlier work in the same region. The annual neural networks generally performed better than the seasonal ones, probably due to their relatively high fitting flexibility.

1. Introduction

The temperature of water at around 1 m below the ocean surface, commonly referred to as sea surface temperature (SST), is an important parameter to understand the exchange of momentum, heat, gases, and moisture across air-sea interface. Its knowledge is necessary to explain and predict important climate and weather processes including the summer monsoon and El-Nino events. SST predictions are sought after by the users of coastal communities dealing with fishing and sports. Like the air above it SST changes significantly over time, although relatively less frequently due to a high specific heat. The changes in water temperature over a vertical are high at the sea surface due to large variations in the heat flux, radiation, and diurnal wind near the surface, and hence SST estimations involve considerable amount of uncertainty.

There are a variety of techniques for measuring SST. These include the thermometers and thermistors mounted on drifting or moored buoys and remote sensing by satellites. In case of satellites the ocean radiation in certain wavelengths of an electromagnetic spectrum is sensed and related to SST. Microwave radiometry based on an imaging radiometer called the moderate resolution imaging spectroradiometer is also popularly used to record SST.

In order to predict SST physicallybased as well as data driven methods are practiced. The latter type is many times preferred when site specific information is required and considering the convenience. The data driven schemes include a seasonally varying Markov model (Wu, [1]), an analogue method that searches for a similar time progression from the past (Xue and Leetmaa, [2]), empirical models designed through canonical correlation analysis (Agarwal et al., [3]), a regression model based on a lagged relationship (Laepple et al. [4]), models based on hurricane numbers (Neetu et al., [5]), and genetic algorithms and empirical orthogonal functions (Wu et al., [6]). One of the most popular methods in modern data driven approaches however is neural network (NN), also called artificial neural network. Some investigators have in recent past applied this technique to predict the SST as described below.

2. SST Predictions Using Neural Networks

Tripathi et al. [7] produced seasonal predictions of SST over a certain region in the tropical Pacific using an NN that had the input of seven modes of the initial wind stress empirical orthogonal function and also that of the SST anomalies themselves. Over a lead time of 6 months the NN models yielded predictions with a similar level of accuracy as that of the El Nino Southern Oscillation (ENSO) based models. The NN worked satisfactorily even up to a 12-month prediction horizon in ENSO predictions. A comparison of NN with canonical correlation analysis and a sophisticated version of linear statistical regression in predicting equatorial Pacific SST was made by Tanvir and Mujtaba [8], who did not find significant difference in prediction skills probably because of the linearity of the dynamics at seasonal scales. The study of Pozzi et al. [9] indicated usefulness of NN as a complementary tool to more conventional approaches in SST and paleoceanographic data analysis. Wasserman [10] predicted SST anomalies over the tropical Pacific using NN. Authors derived the SST principal components over a 3- to 15-month lead time using the input of SST anomalies and sea level pressure. Garcia-Gorriz and Garcia-Sanchez [11] developed many NN based correlations for predicting the temperature elevation of sea water during desalination.

SST data for a certain region in the Indian Ocean were analyzed by Tripathi et al. [7]. Using area-average SST twelve different NN models were developed for the twelve months in a year. Typically the model to predict SST for the month of January was based on all past observations of January. On evaluating the performance of the networks the authors found that the models were able to predict the anomalies with a good accuracy and whenever the dependence of present anomalies on past anomalies was nonlinear the NN models worked better than the linear statistical models. Collins et al. [12] used meteorological variables as input to predict targeted satellite-derived SST values in the western Mediterranean Sea. The networks trained in this way predicted the seasonal as well as the interannual variability of SST well. The impact of the heat wave that occurred during the summer of 2003 on SST was also reproduced satisfactorily. Kug et al. [13] compared prediction skills of different methods based on certain transfer functions, regressions, and NN. Authors analyzed data of radiolarian faunal abundance from surface sediments observed at the Antarctic and Pacific Oceans. The error statistics associated with the NN predictions were found to be more attractive than the other methods, although NN yielded lesser geographic trends than those of the other methods.

The prediction skills of a Bayesian NN with support vector machine and linear regression were also compared by Tang et al. [14]. Authors used SST and sea level pressure together with warm water volume across the equatorial Pacific as input. It was found that nonlinear methods were better than the linear ones, but support vector machine could not produce better overall predictions than the NN. Tanang et al. [15] used NN for identifying sources of errors in satellite-based SST estimates. The temperature of air, direction of wind, and relative humidity affected the SST derivations significantly.

A review of the readily available publications as above showed that the technique of NN is promising in predicting SST but needs more experimentation to improve on the accuracy and reliability of the results. The present work is directed along these lines. It shows usefulness of the nonlinear autoregressive type of neural network vis-à-vis the traditional feed forward back propagation type. Further, it deals with the actual numerical values of SST rather than the anomalies obtained by subtracting long-term means, since neural networks implicitly take care of such means in their training procedures and since the data used were result of reanalysis and thus bias corrected. Further, the range of data values was also not small enough (up to around 6 degrees), necessarily requiring the use of mean subtracted values. The prediction of such SST values, rather than anomalies, could have applications in planning coastal activities such as sports events and fishing expeditions and also in calibration of satellite measurements.

3. The SST Data

The data used in this study were made available by Indian National Centre for Ocean Information Services (INCOIS) and pertained to monthly mean SST at six different locations in the North Indian Ocean, code named as AS, BOB, EEIO, SOUTHIO, THERMO, and WEIO as shown in Figure 1. These data were collected from various sources such as voluntary observing ships, moorings, drifters, and Argo. The gaps in observations were filled up by suitable interpolation methods at the source level. The data involves NCEP/NCAR global reanalysis products generated through assimilation and model and uses all ship and buoy SSTs and satellite derived SSTs from the NOAA Advanced Very High Resolution Radiometer (AVHRR). The duration ranged from January 1945 to December 2005. There were thus 732 values over the period of 61 years.

Basic data statistics, namely, the minimum, maximum, and mean values, standard deviation, kurtosis, and skewness can be seen in Table 1. As can be seen from the table the temperatures varied from 23.61°C to 30.67°C at the different locations. All of the locations had comparable maximum values while the locations near the land boundary (AS, BOB, and THERMO) had lower minimum values than those in the open ocean and near the equator (EEIO and WEIO). Expectedly the sites of EEIO and WEIO that are closer to the equator had higher means. While the minimum temperatures showed relatively large changes (from 23.61°C to 27.34°C) the standard deviations at locations AS, BOB, and THERMO were higher than those at sites EEIO, SOUTHIO, and WEIO, which could be due to their coastal proximity and shallower depths. Most of the data with the exception of EEIO were associated with negative skewness as well as kurtosis values indicating concentration of data to the right of the mean in the frequency distributions and that such distributions were flatter than the normal distribution with wider peaks and spreads around the mean, respectively. These complexities indicate the necessity of application of a nonlinear technique like neural networks to predict future values. Neural networks do not require data preprocessing as a precondition for training and hence the same was not performed. Incidentally the presence of statistically significant trend in the observations was as well not seen.

4. Networks and Training

A neural network in general consists of interconnected neurons, each acting as an independent computational element. The most common network is in the form of a multilayered perceptron with ability to approximate any continuous function (see Figure 2). Basic information on neural networks can be seen in text books such as those of Martinez and Hseih [16] and Lee et al. [17]. The primary network operation is however mentioned below.

Each neuron of the hidden and output layer sums up the weighted input, adds a bias term to it, passes on the result through a transfer function, and produces the output. (Figure 2). Mathematically the four-step procedure followed in obtaining the network output is as given below.(1) Sum up weighted inputs, that is, where = summation for the th hidden node; = total number of input nodes; = connection weight th input and th hidden node; = normalized input at the th input node; and = bias value at the th hidden node.(2) Transform the weighted input: where = output from the th hidden node.(3) Sum up the hidden node outputs: where = summation for the th output node; = total number of hidden nodes; = connection weight between the th hidden and th output node; and = bias at the th output node. (4) Transform the weighted sum where = output at the th output node.

Before its actual application the network has to be trained from examples or by presenting input and output pairs to the network and thereby determining the values of connection weights and bias through a training algorithm and over many epochs (presentation of complete data sets once to the network). The network is presented with the input-output pairs till the training error between target and realized outputs reaches the error goal or alternatively till no further reduction in the error is achieved despite increasing the number of epochs, as done in the present case.

In the present work SST values were predicted using a time series forecasting approach. A sequence of past values was fed as input to the network so as to enable it to recognize some hidden pattern in it and produce the forecast by moving along the time scale. The number of past values considered at a time equals the number of input neurons, while the output neuron belongs to the forecasted value over the next or any future time step. By trials aimed at getting the best outcome the past sequence length was selected as 24 months and predictions were made over a period of subsequent 12 months, but one at a time. Note that the total sample size was of months as mentioned in the preceding section and out of it the sequence of past 24 months at every current time step (month) was used as input to the network in a sliding window manner. This is indicated mathematically as follows: where at month (),   = current month, and .

To begin with, the most common network of feed forward back propagation type (FFBP) (Figure 2) was used. However in order to see if more improved predictions were possible, another network called nonlinear autoregressive network (NAR) was also employed. The NAR is a recurrent network with feedback arrangement as shown in Figure 3 (left part), where the output is fedback to the input of the feed forward network (right part). There is thus a feedback of the true output instead of the estimated one in the input. This makes the feed forward architecture more accurate and allows the use of only static back propagation while training. The networks were trained using the common algorithm of Levenberg-Marquardt. The feed forward network had sigmoid activation function for the hidden layer and a linear activation function for the output layer neurons. The number of input neurons was 24 while that of the output neurons was 1.

5. Testing of Networks and Results

In order to assess the performance of these networks the model predictions yielded by both FFBP and NAR were compared with actual observations for the testing period of 1997–2005, while the training was performed based on monthly SST for the period 1945–1996. As explained in the preceding section for any given time step (month) the preceding 24 months’ sequence was used as network input during the training and testing exercise. The assessment of testing was done through time history comparisons and scatter diagrams, an example of which is given in Figure 4 for the location AS. The testing period ranged from 1997 to 2005. In this figure the two figures at the top show comparative predictions for the next and the next-to-next month, that is, 1 and 2 months ahead, while the bottom two figures indicate the same for the 11th and 12th months in future. These plots pertain to the application of the FFBP network. Similar comparisons based on the NAR network predictions are given in Figure 5 for the same location. Here also the testing period ranged from 1997 to 2005.

All the scatter diagrams and time history based plots as above for all the locations indicated that both FFBP and NAR were able to predict the SST values over the future time horizon varying from 1 to 12 months satisfactorily as reflected in the closeness between the observed and the predicted SST values. In order to further confirm this quantitatively and also to study the relative performance of both the networks error statistics of correlation coefficient (CC), mean absolute error (MAE), mean square error (MSE), and Nash-Sutcliffe efficiency (NSE) coefficient were worked out. The underlying expressions as well as the strengths and weaknesses of these parameters are given in the appendix.

As an example of magnitudes of the error measures of CC, MAE, MSE, and NSE and also their distribution over the lead time ranging from 1 to 12 months Table 2 can be seen. This information pertains to location AS and to the underlying network of NAR. It indicates that for all the lead times the CC values were high, varying from 0.95 to 0.99 and so also NSE that changed from 89.20% to 96.90%. Further, the MAE and MSE were low and ranged from 0.15°C to 0.30°C and 0.04°C2 to 0.15°C2, respectively.

On the same lines it was noticed that for all the sites and over all prediction horizons the correlation coefficients and the NSE coefficients were high and close to 1.0, while the MAE and MSE were low. But this was true for the NAR network rather than the FFBP one. This can be seen in the example, Figure 6(a), showing the changes in MAE with increasing prediction horizons at locations AS and BOB and the same in NSE coefficient in Figure 6(b). The MAE gives an idea of the average error distributions and does not get influenced by their higher magnitudes. Figure 6(a) shows that it is very low in the first month of predictions but becomes high subsequently but gets confined to low values of 0.30°C and 0.38°C at the two locations shown when the NAR network was used. The values over the next month are highly dependent on the same of the current month, and hence the MAE is high for the first month. Similar discussion is valid for the Nash-Sucliffie efficiency coefficient shown in Figure 6(b). The changes in the values of CC and MSE with increasing prediction horizons at sites EEIO, SOUTHIO, THERMO, and WEIO are shown in Figures 6(c) and 6(d), respectively. As regards the coefficient of correlation is concerned (Figure 6(c)) both FFBP and NAR networks showed good performance, but the NAR network were clearly advantageous than the FFBP, indicating their better capabilities to handle the data nonlinearities. The locations of EEIO and WEIO that were near the equator showed lesser prediction performance than the sites of SOUTHO and THERMO indicating high data nonlinearities that were somewhat difficult to model. Figure 6(d) further confirms the edge of NAR over FFBP through lower MSE values at all the locations. The NAR network predicted the SST values almost with equal efficiency at most of the locations.

Tripathi et al [7] had predicted monthly SST values for 12 months in advance over the Indian Ocean region using FFBP models. For this purpose the sample size was 52 years from which 12 different time series were formed, trained, and tested, unlike a sequential modeling done in this study based on 61 years’ data. The testing in the present work has been done with the help of the data segment of last 10 years as against the last 5 data points of each month in the work of Tripathi et al. [7]. Table 3 shows a comparison of the present NAR based model (while testing) with this past study in terms of CC between the target and the predicted values. Although the locations were not the same generally better performance (higher CC values) of the present modeling can be noticed. The past study under reference did not indicate systematic lowering of CC with an increasing lead time as probably can be expected and so also the present one; however it had outlying CC values of 0.59 and 0.76 which were not seen in this study indicating more consistent predictions. Table 3 also shows better performances at sites AS, BOB, and THERMO where means were lower and standard deviations were higher than the other sites, that might have made the model fitting more adaptable.

The above networks pertained to annual models where a past sequence of 24 months was used to predict SST values for 12 months in advance. An alternative SST predictions were attempted based on only past season’s data at the six locations. Four seasons, namely, winter (December, January, and February), summer (March, April, and May), monsoon (June, July, August, and September), and after monsoon (October and November) were considered. The monthly SST values for the next season were predicted based on the same past season and also the past month. It was however found that such seasonal predictions were not as good as those of the annual predictions described in all preceding sections. This can be seen from the example of Table 4 that shows the correlation coefficients between the predicted and target values typically at the SOUTHIO location for every calendar month based on predictions that were both season based and biannual data based. This could be due to a higher level of fitting flexibility involved when past 24 values were considered in training as against the 3 or so of the seasonal models. A larger training sequence can thus be recommended.

It is known that the climate systems of El Nino Southern Oscillation (ENSO) and Indian Ocean Dipole (IOD) are governed by higher or lower than normal SST in the Pacific and Indian Ocean, respectively. These phenomena should be discernable from the SST records. This was not attempted in the present work, since a correct way to do so would be to analyze the SST anomalies (with amplified differences) rather than the real SST values as considered here. Nonetheless a good match between the predicted and target SST values at the crests and troughs seen in the time history comparisons exemplified in Figure 5 is indicative of capturing the extremes associated with the large scale climate systems mentioned above.

It may be noted that neural networks are basically site-specific models, since they are built on data collected at a given location. Their spatial applicability would depend on the area around the particular location from where the SST values were binned.

The success of SST predictions using neural networks described in this paper may inspire application of neural networks to SST predictions over smaller time intervals such as weeks. It would also be useful to explore if other network types such as recursive networks and other training schemes like conjugate gradient decent and quasi-Newton result in better learning.

6. Conclusions

The nonlinear autoregressive (NAR) neural network was found to yield satisfactory predictions over all time horizons and at all selected locations and such predictions were superior to those based on the common FFBP type network architecture.

For most of the locations the error statistics indicated highly satisfactory performance of the NAR network with correlation coefficients between predicted and measured values lying above 0.90 and MSE, MAE less than 0.23°C and 0.38°C, respectively, and NSE coefficient of the order of 80%.

A sequence of past 24 months’ SST was found necessary to provide adequate training.

The results of the present study were more attractive in terms of prediction accuracy than an earlier work in the same region.

The network trained using the input of past 24 months’ SST was found to be more beneficial than the one trained with the help of a smaller data segment.

Appendix

The Expressions of the Error Measures

Correlation coefficient (CC)

where = observed SST, = mean of , = predicted SST, = mean of , and = number of observations.

The correlation coefficient, , shows the extent of the linear association and similarity of trends between the target and the realized outcome. It is a number between 0 and 1 such that the higher the correlation coefficients the better the model fit is. It however gets heavily affected by the extreme values.

Mean absolute error (MAE)

The mean absolute error has the advantage that it does not distinguish between the over- and underestimation and does not get too much influenced by higher values. The lower the value of MAE is the better the forecasting performance is.

Mean square error (MSE)

The mean square error is suited to iterative algorithms and is a better measure for high values. It offers a general picture of the errors involved in the prediction but is also sensitive to high values.

Nash-Sutcliffe efficiency (NSE) coefficient

It is the ratio of sum square error to the variance of the observed values over its mean, subtracted from 1.0. It ranges from “0” (“no knowledge” model—every forecast is same as the observed mean) to “1” (perfect model). The negative values are also likely, indicating the worst model performance than the “no knowledge” model.