Abstract

This study evaluates the performances of all forty different global climate models (GCMs) that participate in the Coupled Model Intercomparison Project Phase 5 (CMIP5) for simulating climatological temperature and precipitation for Southeast Asia. Historical simulations of climatological temperature and precipitation of the 40 GCMs for the 40-year period of 1960–1999 for both land and sea and those for the century of 1901–1999 for land are evaluated using observation and reanalysis datasets. Nineteen different performance metrics are employed. The results show that the performances of different GCMs vary greatly. CNRM-CM5-2 performs best among the 40 GCMs, where its total error is 3.25 times less than that of GCM performing worst. The performance of CNRM-CM5-2 is compared with those of the ensemble average of all 40 GCMs (40-GCM-Ensemble) and the ensemble average of the 6 best GCMs (6-GCM-Ensemble) for four categories, i.e., temperature only, precipitation only, land only, and sea only. While 40-GCM-Ensemble performs best for temperature, 6-GCM-Ensemble performs best for precipitation. 6-GCM-Ensemble performs best for temperature and precipitation simulations over sea, whereas CNRM-CM5-2 performs best over land. Overall results show that 6-GCM-Ensemble performs best and is followed by CNRM-CM5-2 and 40-GCM-Ensemble, respectively. The total errors of 6-GCM-Ensemble, CNRM-CM5-2, and 40-GCM-Ensemble are 11.84, 13.69, and 14.09, respectively. 6-GCM-Ensemble and CNRM-CM5-2 agree well with observations and can provide useful climate simulations for Southeast Asia. This suggests the use of 6-GCM-Ensemble and CNRM-CM5-2 for climate studies and projections for Southeast Asia.

1. Introduction

Global climate change has been observed and poses a fundamental threat to humanity. The evidence of rapid global climate change includes global temperature rise, warming oceans, shrinking ice sheets, glacial retreat, decreased snow cover, sea level rise, declining Arctic sea ice, more extreme weather events, and ocean acidification [1, 2]. Better understanding of climate change and ability to predict the future climate change and its potential impacts are important for climate change adaptation and mitigation. Since climate change varies from region to region, it affects different regions of the world differently. Hence, climate change studies for each region in detail are important for increasing resilience of the society to climate change.

Southeast Asia is one of the most vulnerable regions to climate change, where its average temperatures have risen every decade since 1960. A report by Germanwatch on the global climate risk index [3] has listed Vietnam, Myanmar, the Philippines, and Thailand to be among ten countries in the world most affected by climate change during the period of 1997–2016. Vietnam has also been listed by the World Bank to be among five countries most likely to be affected by global warming in the future [4]. According to the Asian Development Bank, Southeast Asia could suffer bigger losses than most regions in the world [5].

Southeast Asia’s climate is tropical [6]. Its weather is mostly hot and humid with high annual precipitation amounts. Precipitation is mostly convective [7]. Intense convective precipitation could cause floods and landslides. Southeast Asia has often been affected by weather-related natural disasters, i.e., floods, droughts, landslides, and tropical cyclones. Since extreme weather events can be intensified by climate change [8], it is essential to understand and accurately project climate change and its impacts to the region. To accomplish this, a climate model that can provide useful climate simulations and projections for Southeast Asia is required.

Climate models are important tools for understanding and predicting the complex Earth’s climate. Several global climate models (GCMs) have been developed by several research centers around the world. Forty GCMs from 20 research groups have participated in the Coupled Model Intercomparison Project Phase 5 (CMIP5) [9]. Their global climate simulations and projections are publicly available. Since there are many GCMs that can be used and they could perform differently for different regions of the world, the main objective of this study is to evaluate the performance of these GCMs in order to find GCMs that perform well in Southeast Asia and that should be employed for climate simulations and projections in the region.

Previous studies have evaluated the performances of these GCMs to be used for specific regions, e.g., eastern Tibetan Plateau [10], Australia [11], US Pacific Northwest [12], northeastern Argentina [13], northern Eurasia [14], US continental areas [15], and Southeast Asia [16, 17]. Since climate is different for different regions and GCMs also perform differently for different regions, results for different regions cannot be directly compared. Despite the importance of climate change studies for Southeast Asia as it is one of the most vulnerable regions to climate change, there are only few previous studies [16, 17] that evaluate the performances of CMIP5 GCMs in the region. Raghavan et al. [16] have evaluated the performance of CMIP5 GCMs for Southeast Asia with the focus on only historical precipitation simulations for 20 years of 1986–2005 without considering temperature simulations. Although results from [16] have shown that there is no particular model performing well for climatological precipitation simulations in Southeast Asia, it only evaluates 10 GCMs out of the total 40 CMIP5 GCMs and employs not many performance metrics.

Even though our preliminarily study [17] has evaluated all 40 CMIP5 GCMs for climate simulations for Southeast Asia, it does not address detailed results for each performance metric and does not consider the performances of ensemble averages of different GCMs. The performances of all 40 CMIP5 GCMs for simulating climatological temperature and precipitation in the twentieth century are evaluated in further details in this study using observation and reanalysis datasets, where 19 performance metrics are employed for evaluation. The performance of the best GCM is also compared with the of ensemble averages of different GCMs.

Section 2 describes the research methodology employed in this study, which includes the study area, GCMs, observation and reanalysis datasets, and performance metrics. Section 3 presents the evaluation results. Section 4 summarizes and concludes the paper.

2. Research Methodology

2.1. Study Area

Southeast Asia is a subregion of Asia and consists of 2 main portions, i.e., the mainland and a string of archipelagoes to the south and east of the mainland. Southeast Asia is composed of 11 sovereign states, including Brunei Darussalam, Cambodia, East Timor, Indonesia, Laos, Malaysia, Myanmar, the Philippines, Singapore, Thailand, and Vietnam. Figure 1 shows topography (m) above the mean sea level of the study area, where the Shuttle Radar Topography Mission digital elevation model (SRTM) [18] with the spatial resolution of 90 m is employed. The study area covers latitudes from 12.75°S to 24.25°N and longitudes from 88.25°E to 144.75°E.

Southeast Asia lies in the tropics with tropical climate [6]. Since the incident angle of solar radiation is small, the temperature is generally hot and does not fluctuate much throughout the year. Employing observation and reanalysis datasets used in this study, where their details will be described later in Section 2.3, shows that the mean annual temperature for years 1960–1999 for the study area is 26.38°C. The 40-year monthly average temperatures for January–December for the study area are ranged from the minimum of 25.10°C, which occurs in January, to the maximum of 27.30°C, which occurs in May. The entire region is strongly affected by the southwest and northeast monsoons [19], which are due to differences in land and sea temperatures caused by solar radiation. The southwest monsoon is typically from late May to September. It particularly affects Thailand and Myanmar and causes the rainy season to be in the period. The northeast monsoon is typically from November to March. It brings relatively dry and cool air and little precipitation to the mainland and causes rain in the southern part of Southeast Asia in the period. Southeast Asia receives considerable annual precipitation. Employing observation and reanalysis datasets used in this study, where their details will be described later in Section 2.3, shows that the mean annual precipitation for years 1960–1999 for the study area is ∼2,034.25 mm. The mean annual precipitation for each of the 40 y ranged from 1,810.20 mm in 1972 to 2,336.72 mm in 1996. Most precipitation in this region is strongly driven by convection [7]. Southeast Asia is often affected by weather-related natural disasters. The Philippines and Vietnam are often affected by tropical cyclones. Hence, climate change studies for this region are crucial.

2.2. Global Climate Models

The Coupled Model Intercomparison Project Phase 5 (CMIP5) [9] is a collaborative effort with the aim to improve the climate change knowledge. CMIP5 involves 20 climate modeling research groups around the world with 40 GCMs. CMIP5 outputs include historical climate simulations for years 1850–2005 and climate projections for near term (out to about 2035) and long term (out to 2100 and beyond) by considering 4 Representative Concentration Pathways (RCPs). To evaluate the performances of these GCMs, their simulated climatological temperature and precipitation for years 1901–1999 are employed.

The forty GCMs evaluated in this study together with their spatial resolutions and numbers of ensemble members driven by different initial conditions are shown in Table 1. For GCMs with more than one ensemble members, the average of all ensemble members is employed for evaluation. The GCM outputs employed in this study include monthly averages of near-surface air temperature, daily-minimum near-surface air temperature, daily-maximum near-surface air temperature, and surface precipitation.

2.3. Observation and Reanalysis Datasets

Two observation datasets and two reanalysis datasets are employed in this study to evaluate GCMs and are listed in Table 2. The two observation datasets include the University of Delaware Air Temperature and Precipitation (UD) version 3.01 [20] and the University of East Anglia Climatic Research Unit (CRU) TS3.10.01 [21]. The two reanalysis datasets are the National Centers for Environmental Prediction (NCEP)-National Center for Atmospheric Research (NCAR) 40-Year Reanalysis [22], which will be later called NCEP and the European Center for Medium-Range Weather Forecasts 40-Year Reanalysis (ERA40) [23].

UD global monthly temperature and precipitation data are on a 0.5° × 0.5° grid and are available for years 1901–2010. UD is produced using observations from the Global Historical Climate Network and the archive of Legates and Willmott. CRU global monthly temperature and precipitation data are on a 0.5° × 0.5° grid and are available for years 1901–2009. CRU is produced using observations from the National Meteorological Services and other external agents. Both UD and CRU are available over land only.

ERA40 monthly reanalysis is produced using a data assimilation system employing many sources of observations including radiosondes, balloons, aircraft, buoys, satellites, and scatterometers. It is available on a 2.5° × 2.5° grid for 45 years from 1957 to 2002. NCEP monthly reanalysis is produced using a data assimilation system employing many sources of observations including land surface measurements, ships, rawinsonde, pibals, aircrafts, satellites, and other data. It is on a 1.9° × 1.9° grid and is available from 1948 to 2012. NCEP and ERA40 are available for both land and sea.

Different observation datasets are different among themselves due to different original observations and methods employed. Although NCEP and ERA40 reanalysis datasets are produced using numerical models with observation assimilation, several studies have employed them for evaluating historical climate simulations [11, 12, 2426]. Figure 2 compares average annual temperatures (°C) for years 1960–1999 of CRU, UD, NCEP, and ERA40. It shows obvious differences among all datasets both in terms of value and resolution. Although CRU and UD are both observations, they are significantly different. To evaluate GCM temperature simulations, averages of CRU, UD, NCEP, and ERA40 temperature are employed. Since only CRU and NCEP provide monthly averages of daily-minimum and daily-maximum near-surface air temperature, their averages are employed for these parameters. Figure 3 compares average annual precipitation (mm) for years 1960–1999 of CRU, UD, NCEP, and ERA40. Precipitation from these datasets are also obviously different. Since ERA40 precipitation is significantly lower than others, it is not employed in this study. To evaluate GCM precipitation simulations, averages of CRU, UD, and NCEP precipitation are employed.

2.4. Performance Metrics

Several performance metrics for evaluating the performances of GCMs have been proposed. Most performance metrics employed in this study are from [12] and [27], where the root mean squared error (RMSE) is added. The performance metrics and time periods used for computing each performance metric are listed in Table 3.

There are 19 performance metrics employed in this study. Eleven performance metrics are computed for land only, sea only, and both land and sea for the 40-year period of 1960–1999 when UD and CRU are available. The eleven performance metrics include (1) mean annual temperature (Mean-T), (2) mean annual precipitation (Mean-P), (3) mean diurnal temperature range (MDTR-MMM), where MMM designates a season, (4) mean seasonal cycle amplitude of temperature (Season-Amp-T) defined as the temperature difference between warmest and coldest months, (5) mean seasonal cycle amplitude of precipitation (Season-Amp-P) defined as the precipitation difference between wettest and driest months, (6) correlation coefficient between simulated and observed mean temperatures (Cor-MMM-T), (7) correlation coefficient between simulated and observed mean precipitation (Cor-MMM-P), (8) standard deviation of mean temperature (STD-MMM-T), (9) standard deviation of mean precipitation (STD-MMM-P), (10) root mean squared error of mean temperature (RMSE-MMM-T), and (11) root mean squared error of mean precipitation (RMSE-MMM-P). MDTR-MMM, Cor-MMM-T, Cor-MMM-P, STD-MMM-T, STD-MMM-P, RMSE-MMM-T, and RMSE-MMM-P are computed separately for 3 seasons, including the hot season from February to April, the rainy season from May to October, and the cold season from November to January. To compute these 11 metrics, 40-year averages for individual pixels are computed first and are then averaged for all pixels. For example, to compute Mean-T, 40-year mean annual temperatures for individual pixels are computed first and are then averaged to get Mean-T.

Mean-T and Mean-P are employed to evaluate the biases in model simulated temperatures and precipitation, respectively. MDTR-MMM is employed to evaluate performances of the models to simulate differences in seasonal daily-maximum and daily-minimum near-surface air temperatures and is computed using monthly averages of daily-maximum and daily-minimum near-surface air temperature provided by each GCM, CRU, and NCEP. Season-Amp-T is employed to evaluate performances of the models to simulate temperature differences between warmest and coldest months. Season-Amp-P is employed to evaluate performances of the models to simulate precipitation differences between wettest and driest months. Cor-MMM-T and Cor-MMM-P are employed to evaluate performances of the models to simulate the spatial patterns of temperature and precipitation, respectively. STD-MMM-T and STD-MMM-P are employed to evaluate performances of the models to simulate the spatial variations of temperature and precipitation, respectively. RMSE-MMM-T and RMSE-MMM-P are employed to evaluate performances of the models to simulate the values of temperature and precipitation, respectively.

The other eight performance metrics evaluate the long-term performance of simulated climatological temperature and precipitation for the 99-year period of 1901–1999. Due to the availability of observations, they are computed for land only and include (1) variance of annual average temperature (Var-T) defined as the variance of 99-y annual average temperatures, (2) coefficient of variation of annual precipitation (CV-P) defined as the mean-normalized standard deviation of 99-year annual precipitation, (3) root mean squared error of annual average temperature (RMSE-T), (4) root mean squared error of annual precipitation (RMSE-P), (5) linear trend of annual average temperature (Trend-T) defined as the slope of the best-fit linear line for the time series of 99-year annual average temperature, (6) linear trend of annual precipitation (Trend-P) defined as the slope of the best-fit linear line for the time series of 99-year annual precipitation, (7) correlation coefficient of the cold-season temperature mean and Niño3.4 index (ENSO-T), and (8) correlation coefficient of the cold-season precipitation and Niño3.4 index (ENSO-P). To compute these 8 metrics, the values for individual pixels for 99 year are computed first. Then, values for all pixels are averaged. For example, to compute Trend-T, the slope of the linear line that best fits annual average temperature values for 99 year for each pixel is first computed. The slopes for all pixels in the study area are then averaged to get Trend-T.

Var-T and CV-P are employed to evaluate performances of the models to simulate the 99-year variations of temperatures and precipitation, respectively. RMSE-T and RMSE-P are employed to evaluate performances of the models to simulate the values of temperature and precipitation, respectively. Trend-T and Trend-P are employed to evaluate performances of the models to simulate the 99-year trends of temperature and precipitation, respectively. Since the Niño3.4 index measures anomalies of sea surface temperatures in the east-central tropical Pacific, ENSO-T is employed to evaluate performances of the models to simulate the linear relationships between anomalies of sea surface temperatures in the east-central tropical Pacific and temperatures in the study area. ENSO-P is employed to evaluate performances of the models to simulate the linear relationships between anomalies of sea surface temperatures in the east-central tropical Pacific and precipitation in the study area.

2.5. Evaluation Method

Since spatial resolutions and grid locations of different GCMs and observation and reanalysis datasets are different, they are bilinearly interpolated into the same 0.15° × 0.15° grid covering the study area. To evaluate each performance metric, averages of observation and reanalysis datasets listed in Table 3 are employed.

From all 19 performance metrics listed in Table 3, the absolute error Ai,j for each performance metric i and each global climate model j is computed as , where Oi and Si,j are the performance metric i of observations and simulated performance metric i of the global climate model j, respectively. Due to different magnitude scales of different performance metrics, the relative error for each performance metric i and each global climate model j, i.e., Ri,j = (Ai,j − Ai,min)/(Ai,max − Ai,min), is used, where Ai,min and Ai,max are minimum and maximum absolute errors for each performance metric i, respectively. The total error of a global climate model j is computed as , where n is the total number of performance metrics.

3. Results

3.1. Performances of 40 Global Climate Models

Figure 4 shows relative errors for all performance metrics for the 40 GCMs. The GCMs are listed on the left of the figure in the order of lowest to highest total errors from top to bottom, respectively. Relative errors are very different for different GCMs. Each GCM has mixed results for different performance metrics. Comparison of relative errors of 40 GCMs for all performance metrics obviously shows that CNRM-CM5-2 performs best.

Figure 5 shows total errors of the 40 GCMs computed using all performance metrics. Total errors of different GCMs vary considerably. This emphasizes the need for this study to find GCMs that can provide useful climatological temperature and precipitation for Southeast Asia. The six GCMs that perform best are CNRM-CM5-2, CNRM-CM5, BNU-ESM, CESM1-BGC, CESM-CAM5, and CCSM4, respectively. GISS-E2-R performs worst, where its total error is ∼3.25 times higher than that of CNRM-CM5-2.

GCMs are further evaluated for 4 different categories, including temperature only, precipitation only, land only, and sea only, where only performance metrics for each category are considered. The six best GCMs for each category are listed in Table 4. The numbers shown in Table 4 are total errors for different categories. Results show that each GCM performs differently for different categories. For example, although CNRM-CM5 performs best and second best for sea and temperature, respectively, it performs sixth for precipitation. CCSM4 performs second best for precipitation, but it performs sixth for sea and worse than sixth for temperature and land. CNRM-CM5-2 is the only GCM that is in the top three for all categories. The best GCMs for temperature, precipitation, land, and sea are CNRM-CM5-2, CESM1-BGC, CNRM-CM5-2, and CNRM-CM5, respectively. CNRM-CM5 is the second best for temperature, and its total error is close to that of CNRM-CM5-2. The performances of the top six GCMs for simulations over sea are not that different. When all categories are considered, the total error of the second best GCM, i.e., CNRM-CM5, is only 7.44% higher than that of CNRM-CM5-2, but then the total error of the third best jumps to 24.17% higher than that of CNRM-CM5-2.

Since there is no single GCM that performs best for all categories, overall performance of an ensemble average of different GCMs that perform well in each category could be better than that of a single GCM. It is observed from Table 4 that all top two GCMs for individual categories are within top six when results for all categories are combined. Hence, the next three sections compare the performance of the single best GCM, i.e., CNRM-CM5-2, with those of the ensemble average of the six best GCMs based on the total error, which will be called 6-GCM-Ensemble, and the ensemble average of all 40 GCMs, which will be called 40-GCM-Ensemble, for temperature, precipitation, and overall simulations, respectively.

3.2. Performances of CNRM-CM5-2 and GCM Ensembles for Simulating Temperature

Figure 6 compares the mean annual temperatures (Mean-Ts) (°C) for years 1960–1999 of observations, CNRM-CM5-2, 6-GCM-Ensemble, and 40-GCM-Ensemble. They agree well with observations both in terms of temperature values and patterns. They all show temperature gradient from south to north of the mainland and lower temperature for highly elevated areas. All models are biased slightly lower than observations, where the mean errors (MEs; E[model–observations]) of CNRM-CM5-2, 6-GCM-Ensemble, and 40-GCM-Ensemble are −0.56, −0.33, and −0.18°C, respectively. Historical temperature simulations by averaging all 40 GCMs are least biased.

The mean diurnal temperature ranges (MDTRs) (°C) of observations, CNRM-CM5-2, 6-GCM-Ensemble, and 40-GCM-Ensemble are 3.10, 3.02, 2.69, and 2.61°C, respectively, for the hot season, are 2.64, 2.43, 2.07, and 2.12°C, respectively, for the rainy season, and are 2.69, 2.77, 2.42, and 2.37°C, respectively, for the cold season. Results show that CNRM-CM5-2 performs best and is the best to provide the information about daily temperature variation for all seasons.

Figure 7 compares the mean seasonal cycle amplitudes of temperature (Season-Amp-Ts) (°C) for years 1960–1999 of observations, CNRM-CM5-2, 6-GCM-Ensemble, and 40-GCM-Ensemble. Overall Season-Amp-T values and patterns of all models and observations agree well. The main difference is over the northern part of the mainland, where Season-Amp-T of observations is lower than that of all models. When the mean of Season-Amp-T for all pixels in the study area is computed, Season-Amp-T for observations, CNRM-CM5-2, 6-GCM-Ensemble, and 40-GCM-Ensemble are 2.20, 2.22, 2.23, and 2.19°C, respectively. All models perform comparably well in providing the temperature difference between warmest and coldest months.

Figure 8 compares average seasonal temperatures for years 1960–1999 of observations, CNRM-CM5-2, 6-GCM-Ensemble, and 40-GCM-Ensemble separately for hot (FMA), rainy (MJJASO), and cold (NDJ) seasons. Temperature values and patterns of all models agree well with observations. Table 5 shows correlation coefficients (CCs) between observations and model simulations, standard deviations (STDs) normalized by the standard deviation of observations, and root mean squared errors (RMSEs) of average seasonal temperature for the three seasons for years 1960–1999 of CNRM-CM5-2, 6-GCM-Ensemble, and 40-GCM-Ensemble. Bold face shows the best value for each performance metric. CCs of all models are high and are almost the same for all seasons. Simulated seasonal temperatures of all models are strongly correlated with observations. Since STD presented in Table 5 is normalized by the standard deviation of observations, the number closet to 1.0 is the best. When the three seasons are considered, 40-GCM-Ensemble performs best in terms of STD, although CNRM-CM5-2 has the best STD for the hot season. 40-GCM-Ensemble also has the lowest RMSEs for all seasons and is followed by 6-GCM-Ensemble and CNRM-CM5-2, respectively.

There are 4 performance metrics employed for evaluating long-term temperature simulations for years 1901–1999. The 99-year standard deviations of annual average temperature (STD-Ts) for observations, CNRM-CM5-2, 6-GCM-Ensemble, and 40-GCM-Ensemble are 0.31, 0.34, 0.22, and 0.16, respectively. CNRM-CM5-2 agrees best with observations and is followed by 6-GCM-Ensemble and 40-GCM-Ensemble, respectively. The 99-year root mean squared errors of annual average temperature (RMSE-T) (°C) of CNRM-CM5-2, 6-GCM-Ensemble, and 40-GCM-Ensemble are not much different and are 1.56, 1.46, and 1.44°C, respectively. The 99-year linear trends of annual average temperature (Trend-Ts) (°C century−1) of observations, CNRM-CM5-2, 6-GCM-Ensemble, and 40-GCM-Ensemble are 0.16, 0.34, 0.55, and 0.45, respectively. CNRM-CM5-2 is the best to provide the rate of change in long-term annual average temperatures in the study area.

Figure 9 compares the correlation coefficients of cold-season temperature and Niño3.4 index (ENSO-Ts) for years 1901–1999 of observations, CNRM-CM5-2, 6-GCM-Ensemble, and 40-GCM-Ensemble. The results show that CNRM-CM5-2 agrees best with observations and is the best to tell how well anomalies of sea surface temperatures in the east-central tropical Pacific represented by the Niño3.4 index are correlated with temperatures in the study area. When ENSO-Ts are averaged for all pixels in the study area, they are 0.28, 0.19, 0.40, and 0.69 for observations, CNRM-CM5-2, 6-GCM-Ensemble, and 40-GCM-Ensemble, respectively. Although ENSO-T of 6-GCM-Ensemble agrees well with that of observations over the mainland, it is obviously higher over archipelagoes to the south and east of the mainland, particularly for Papua New Guinea. ENSO-T of 40-GCM-Ensemble is significantly higher than that of observations for most of the study area.

3.3. Performances of CNRM-CM5-2 and GCM Ensembles for Simulating Precipitation

Figure 10 compares the mean annual precipitation (Mean-P) (mm·y−1) for years 1960–1999 of observations, CNRM-CM5-2, 6-GCM-Ensemble, and 40-GCM-Ensemble. Overall precipitation patterns of all models agree well with observations. CNRM-CM5-2 and 6-GCM-Ensemble obviously have higher precipitation over high mountains in Papua New Guinea. MEs (E[model–observations]) of CNRM-CM5-2, 6-GCM-Ensemble, and 40-GCM-Ensemble are 158.83, 167.77, and 277.57 mm·y−1, respectively. Although CNRM-CM5-2 simulated mean annual precipitation has some wavy patterns, it is obviously the least biased.

Figure 11 compares the mean seasonal cycle amplitudes of precipitation (Season-Amp-Ps) (%) for years 1960–1999 of observations, CNRM-CM5-2, 6-GCM-Ensemble, and 40-GCM-Ensemble. Season-Amp-P is calculated as percentage of the mean annual precipitation. Overall Season-Amp-P values and patterns of all models and observations agree well. When the mean of Season-Amp-P for all pixels in the study area is computed for each model and observations, Season-Amp-Ps for observations, CNRM-CM5-2, 6-GCM-Ensemble, and 40-GCM-Ensemble are 4.10, 2.53, 3.51, and 3.97%, respectively. 40-GCM-Ensemble agrees best with observations and is the best to provide difference in precipitation in wettest and driest months.

Figure 12 compares average seasonal precipitation for years 1960–1999 of observations, CNRM-CM5-2, 6-GCM-Ensemble, and 40-GCM-Ensemble for hot (FMA), rainy (MJJASO), and cold (NDJ) seasons. Overall seasonal precipitation values and patterns of observations and simulations of all models agree well. The main discrepancies for the average seasonal precipitation in the hot season include the following: (1) all models have higher precipitation than observations over high mountains in Papua New Guinea, and (2) 40-GCM-Ensemble has obvious higher precipitation over the areas in the lower half of the figure than other models and observations. The main discrepancies for the average seasonal precipitation in the rainy season include the following: (1) observations have higher precipitation than all models along the west coast of the mainland, and (2) CNRM-CM5-2 has higher precipitation than observations over the sea southern of the mainland. The main discrepancy for the average seasonal precipitation in the cold season is that CNRM-CM5-2 and 6-GCM-Ensemble have higher precipitation over high mountains in Papua New Guinea. CNRM-CM5-2 has wavy patterns for all three seasons.

Table 6 shows CCs between observations and model simulations, STDs normalized by the standard deviation of observations, which means that the best is the closest to 1.0, and RMSEs of average seasonal precipitation for hot (FMA), rainy (MJJASO), and cold (NDJ) seasons for years 1960–1999 of CNRM-CM5-2, 6-GCM-Ensemble, and 40-GCM-Ensemble. All models are highly correlated with observations for all seasons. CCs of 6-GCM-Ensemble are the highest for all seasons and are very close of those of 40-GCM-Ensemble. When the three seasons are considered, 6-GCM-Ensemble performs best in terms of STD although CNRM-CM5-2 has the best STD for the rainy season. 6-GCM-Ensemble obviously has the lowest RMSEs for all seasons and is followed by 40-GCM-Ensemble and CNRM-CM5-2, respectively.

There are 4 performance metrics for evaluating long-term performance of precipitation simulations for years 1901–1999. The 99-year coefficients of variation of annual precipitation (CV-P) of observations, CNRM-CM5-2, 6-GCM-Ensemble, and 40-GCM-Ensemble are 0.13, 0.14, 0.05, and 0.02, respectively. CNRM-CM5-2 performs best and its CV-P almost equals to that of observations. The 99-year root mean squared errors of annual precipitation (RMSE-Ps) (mm) of CNRM-CM5-2, 6-GCM-Ensemble, and 40-GCM-Ensemble are 246.03, 590.37, and 548.92 mm, respectively. CNRM-CM5-2 performs much better than 6-GCM-Ensemble and 40-GCM-Ensemble. The 99-year linear trend of annual precipitation (Trend-P) (% century−1) of observations, 40 GCMs, 6-GCM-Ensemble, and 40-GCM-Ensemble are 5.26, 4.17, 3.70, and 2.16, respectively. CNRM-CM5-2 is the best to provide the rate of change in long-term annual precipitation in the study area.

Figure 13 compares the correlation coefficients of cold-season precipitation and Niño3.4 index (ENSO-Ps) for years 1901–1999 of observations, CNRM-CM5-2, 6-GCM-Ensemble, and 40-GCM-Ensemble. Results show that CNRM-CM5-2 agrees best with observations and is the best to tell how well anomalies of sea surface temperatures in the east-central tropical Pacific represented by the Niño3.4 index are correlated with cold-season precipitation in the study area. When ENSO-Ps are averaged for all pixels in the study area, they are −0.10, 0.19, −0.20, and 0.69 for observations, CNRM-CM5-2, 6-GCM-Ensemble, and 40-GCM-Ensemble, respectively. ENSO-T of 6-GCM-Ensemble is obviously lower than that of observations for the lower part of the mainland and is obviously higher than that of observations over archipelagoes to the south and east of the mainland. ENSO-P of 40-GCM-Ensemble is significantly higher than that of observations for all of the study area.

3.4. Overall Performances of CNRM-CM5-2 and GCM Ensembles

This section evaluates overall performances of CNRM-CM5-2, 6-GCM-Ensemble, and 40-GCM-Ensemble. When all performance metrics are considered, Figure 4 also shows relative errors of 6-GCM-Ensemble and 40-GCM-Ensemble for each performance metric. It shows that 6-GCM-Ensemble performs best and is followed by CNRM-CM5-2, and 40-GCM-Ensemble, respectively. Although 40-GCM-Ensemble performs worst among the three models, it performs better than the rest of single GCMs.

The performances of CNRM-CM5-2, 6-GCM-Ensemble, and 40-GCM-Ensemble for temperature-only, precipitation-only, land-only, and sea-only categories are compared in Table 7. Numbers in the table are total errors for different categories. The best models for temperature-only, precipitation-only, land-only, and sea-only categories are 40-GCM-Ensemble, 6-GCM-Ensemble, CNRM-CM5-2, and 6-GCM-Ensemble, respectively. When all categories are considered, 6-GCM-Ensemble performs best, and overall total errors of CNRM-CM5-2 and 40-GCM-Ensemble are 15.63 and 19.00% higher than that of 6-GCM-Ensemble, respectively. The performance of 6-GCM-Ensemble for temperature simulations is close to that of 40-GCM-Ensemble, as the total error of 6-GCM-Ensemble is only 3.12% higher.

4. Summary and Conclusion

The performances for simulating climatological temperature and precipitation for Southeast Asia of 40 CMIP5 GCMs are evaluated using observation and reanalysis datasets for both land and sea for the 40-year period of 1960–1999 and for land for the 99-year period of 1901–1999. Nineteen different performance metrics are employed, where the sum of relative errors of all performance metrics is used to evaluate each GCM. Results are also subdivided into 4 different categories, including temperature only, precipitation only, land only, and sea only. Since averaging different GCMs could improve the simulation performance, the performance of the best GCM is compared with those of the ensemble averages of the 6 best GCMs called 6-GCM-Ensemble and the ensemble averages of all 40 GCMs called 40-GCM-Ensemble.

The performances of the 40 GCMs are very different, where the total error of the worst GCM is ∼3.25 times higher than that of the best GCM. This emphasizes the need of this study to find GCMs that can provide useful climate simulations for Southeast Asia. When all performance metrics are considered, CNRM-CM5-2 has the lowest total error among all 40 GCMs. Although there is no GCM performing best for all categories, CNRM-CM5-2 is the only GCM that is in the top three for all categories. The top two GCMs for each category are within the top six when all categories are considered.

Comparisons of CNRM-CM5-2, 6-GCM-Ensemble, and 40-GCM-Ensemble show that when all categories are combined, 6-GCM-Ensemble performs best and is followed by CNRM-CM5-2 and 40-GCM-Ensemble, respectively. The total errors of CNRM-CM5-2 and 40-GCM-Ensemble are 15.63 and 19.00% higher than that of 6-GCM-Ensemble, respectively. The 40-GCM-Ensemble, 6-GCM-Ensemble, CNRM-CM5-2, and 6-GCM-Ensemble perform best for temperature-only, precipitation-only, land-only, and sea-only categories, respectively. Although 6-GCM-Ensemble performs second best for temperature simulations, its total error is only 3.12% higher than that of 40-GCM-Ensemble.

Detailed comparisons of 6-GCM-Ensemble and CNRM-CM5-2 simulations with observations for each performance metric show that their simulations agree well with observations. Results in this study lead to different conclusions from that found in the previous study [16], which only evaluates 10 GCMs out of the total of 40 CMIP5 GCMs and focuses only on precipitation for a relatively short term of 1986–2005. Although results from [16] show that no model performs well for climatological precipitation simulations in Southeast Asia, five out of six best GCMs found in this study are not evaluated in [16].

This study finds that 6-GCM-Ensemble and CNRM-CM5-2 can provide useful simulated climatological temperature and precipitation for Southeast Asia. This suggests the use of 6-GCM-Ensemble and CNRM-CM5-2 for climate simulations and projections for Southeast Asia. There is a tradeoff to be considered between using 6-GCM-Ensemble and CNRM-CM5-2. Although 6-GCM-Ensemble is 15.63% more accurate, using the averages of 6 GCMs will involve ∼6 times more amount of data than using a single GCM and hence will require more time and computational resources, particularly for complex applications of these models, e.g., the use of GCM outputs as inputs for mesoscale models for dynamical downscaling in order to obtain climate simulations and projections at high resolution [2831].

Data Availability

CMIP5 data employed in this study are available at the Program Climate Model Diagnosis and Intercomparison (PCMDI) website (http://pcmdi9.llnl.gov/). Observations and reanalysis are publicly available.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This study was supported by the Interdisciplinary Graduate School of Earth System Science and Andaman Natural Disaster Management of the Prince of Songkla University, Phuket Campus, Thailand.