The explosion of data in the information age has provided an opportunity to explore the possibility of characterizing the climate patterns using data mining techniques. Nigeria has a unique tropical climate with two precipitation regimes: low precipitation in the north leading to aridity and desertification and high precipitation in parts of the southwest and southeast leading to large scale flooding. In this research, four indices have been used to characterize the intensity, frequency, and amount of rainfall over Nigeria. A type of Artificial Neural Network called the self-organizing map has been used to reduce the multiplicity of dimensions and produce four unique zones characterizing extreme precipitation conditions in Nigeria. This approach allowed for the assessment of spatial and temporal patterns in extreme precipitation in the last three decades. Precipitation properties in each cluster are discussed. The cluster closest to the Atlantic has high values of precipitation intensity, frequency, and duration, whereas the cluster closest to the Sahara Desert has low values. A significant increasing trend has been observed in the frequency of rainy days at the center of the northern region of Nigeria.

1. Introduction

One of the visible impacts of climate change and climate variability is extreme weather events that occur from time to time in several parts of the globe. A climate extreme is “the occurrence of a value of a weather or climate variable above (or below) a threshold value near the upper (or lower) ends of the range of observed values of the variable” [1]. Climate extremes can result in changes in the frequency, intensity, spatial extent, duration, and timing of climatic phenomena. There is a growing global concern that anthropogenic activities are a major cause of the variability in the intensity and frequency of weather and climate extremes [24]. However, it can also be argued that climate extremes are just a part of decadal global climate cycle and variability [5].

Climate extremes, including the ones related to precipitation, can be analyzed using several approaches [68]. The use of indices to characterize the frequency, intensity, and duration of precipitation extremes is one of the ways to assess them [912]. The joint working group CCI/CLIVAR/JCOMM Expert Team on Climate Change Detection and Indices (ETCCDI) (http://etccdi.pacificclimate.org/index.shtml) has 27 standardized and recommended climate extreme indices [2, 1315]. This set of indices includes both temperature and precipitation indices.

Precipitation is one of the most important climatic parameters responsible for flood and drought in several vulnerable parts of the world. According to [1], there is a high chance that the frequency of heavy precipitation and total precipitation will increase in several parts of the globe in the 21st century. Evidence of climate change in Nigeria has been discussed by [16], who reported increasing damage caused by wind and rainstorms, which are projected to increase in southern regions [17]. The northern part of the country has been experiencing a reduction in rainfall and an increase in the rates of dryness and heat [18, 19], while the rainfall amounts have been increasing in the southern part with an irregular pattern [19, 20]. Moreover, climate projections for the 21st century show a significant increase of temperature over all the ecological zones [17] which may have negative impacts on agriculture and food security. Reference [21] evaluated how crop yield might respond to climate change in Nigeria, and [22] discussed the challenges of agricultural adaptation to climate change.

Proper study of climate extremes depends on the quality and quantity of data, as well as on how rigorously they are analyzed [1]. When dealing with precipitation extremes, it is important to consider their synoptic aspects and dimensions. The recent increase of data from ground observations, satellites, and numerical models enables the opportunity for exploring the use of data mining techniques in climatic studies [6]. One of such techniques is Cluster Analysis (CA). The use of CA to gain insight from geographical data had more serious adoption since the 1990s [23]. Several attempts have been made in the past to characterize extreme climate using machine learning methods. Self-Organizing Maps (SOM) have successfully been applied to several climate research including the analysis of atmospheric circulations variability [24], time evolution of seasonal climate [25], and climate model downscaling [26] and to access the stationary of global climate models [27].

Reference [28] did a study on the spatial and temporal variation of extreme weather in the Iberian Peninsula using seven different temperature and precipitation indices. Using geostatistics, these indices were analyzed and Inverse Distance Weighting (IDW) maps were produced to compare the spatial and temporal surface distribution of the indices. Clustering was thereafter done using SOM to study areas with similar climatic characteristics for two time periods, 1951–1980 and 1981–2010. The SOM analysis was done separately for temperature and precipitation indices as mixing both indices together did not provide consistent conclusions in the Iberian Peninsula. Reference [6] also visualized extreme climate conditions using linear models (ordinary kriging and ordinary cokriging) and a nonlinear model (a three-dimensional SOM). They made use of indices calculated from precipitation data obtained between 1998 and 2000 from nineteen meteorological station covering Madeira Island in Portugal.

Nigeria has had its fair share of climate extremes in recent times. The floods of July 10, 2011, in Lagos, August 26, 2011, in Ibadan and more recently nationwide floods of 2012 are all pointers to the extreme precipitation being experienced in the country [29]. In Nigeria, making use of nine indices, the authors [30] were able to study climate extremes over Kano making use of temperature and precipitation data. Using temperature data, the authors could notice a warming trend characterized by an increase in the number of warm days and warm spell. The rainfall data showed a similar increase in the amount of rainfall over the region. Other studies have attempted to use linear approaches to study temperature and rainfall trends over various parts of the country [3133]. However, none of these studies have attempted to study their spatial local patterns. A first attempt to use data mining techniques in climatic studies of Nigeria was carried out by [34]. Making use of a Time Lagged Feedforward Network (TFLN) and recurrent network, they could predict the future values of 8 climatic parameters.

This study will however be focused on the geoexploration of climate extremes as opposed to its prediction. Based on the framework proposed by [6], this research will be making use of a SOM to visualize the phenomenon from a global perspective.

The atmosphere is a continuum and SOM aids the visualization of this continuum by placing very different atmospheric states on distant nodes and similar atmospheric states on adjacent nodes. Using a two-dimensional sheet, SOM (the default SOM shape) creates 4 vertices and hence produces an edge effect [35], which can be solved with the cylinder and toroidal shapes which also provide a continuous surface [36].

The objective of this research is to cluster precipitation extreme over Nigeria, characterize regions of similar precipitation patterns, and characterize its evolution over a period of 31 years. Our study area is first introduced with emphasis on its climate to give a background on the type of climate we are working with. CHIRPS dataset is subsequently introduced as the precipitation data from which precipitation extreme indices are calculated. The methodology used for this research is explained and results obtained are laid out. We wrap up this paper with a conclusion.

2. Materials and Methods

2.1. Study Area

Nigeria is a country located in West Africa. It lies between 4°N and 14°N latitude and 4°E and 14°E longitude and is bordered in the South by the Atlantic Ocean and the North by the Sahara Desert (Figure 1). This gives the country a very wide range of climatic pattern experienced throughout the year.

Its weather system is controlled by the Intertropical Discontinuity (ITD). The ITD is the area of lowest pressure over West Africa separating the moist Southwest Monsoon from the Atlantic Ocean and the dry northeast trade winds from the Sahara Desert. Hence, the ITD can be located with the aid of a change in wind direction as well as the amount of moisture in the atmosphere. This is also given by the dew point temperature. The south of the ITD usually has a dew point temperature greater than 15°C while the north of the ITD usually has a dew point temperature lower than 15°C. The atmosphere to the south of the ITD is usually moist aiding the formation of clouds at low altitude including fog while the atmosphere to the north of the ITD is usually dry and dusty preventing the formation of clouds except altocumulus and cirrus at high level [3739].

The movement of the ITD controls the weather systems in Nigeria. It has three unique overlapping movements throughout the year, an annual movement which follows the path of the sun and is responsible for seasons, a diurnal movement consisting of a slight southward shift in the morning and a slight northward shift in the afternoon, and intermediate movements observed during the northern hemisphere winter months [37].

According to the Koppen climate classification [40], Nigeria has four climatic zones: the warm desert climate in the northeast, the warm semiarid climate in the other parts of the north, the monsoon climate in the Niger-Delta, and the tropical savannah climate in the middle belt and parts of the southwest. The main ecological zones in Nigeria are the tropical rainforest in the south, savannah in the middle belt, and semiarid zones in the north.

2.2. Data

A set of high resolution reanalysis climatic daily data from the Climate Hazards Group Infrared Precipitation with Stations (CHIRPS) from 1985 to 2015 covering our study area will be used (ftp://chg-ftpout.geog.ucsb.edu/pub/org/chg/products/CHIRPS-2.0/). CHIRPS is a quasiglobal (50°S–50°N) satellite and observation based precipitation estimates over land. It is a 0.05-degree resolution (about 5.5 kilometres) gridded dataset [41]. Reanalysis data, unlike conventional data, provides a more wholesome look at global climatic circulation and can be used as an alternative to ground observation data [42]. The data comes in the Network Common Data Form (NetCDF) format and will be manipulated using Matlab®, Microsoft Excel®, and ArcGIS® software. The Network Common Data Form (NetCDF) is a file format for storing multidimensional scientific data (variables) such as temperature, humidity, and rainfall. The CHIRPS reanalysis data has specifically been validated for the monitoring of climatic extremes with good performance. This data was obtained online from the Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS) (ftp://chg-ftpout.geog.ucsb.edu/pub/org/chg/products/CHIRPS-2.0/; accessed: September, 2016). For details, see [43].

This research utilized four of the precipitation indices as defined by ETCCDI (Table 1). This is because each of the indices, by itself, shows only a part of the problem [13, 14]. The four selected are able to achieve a holistic characterization of precipitation in its different perspectives [6]. By holistic characterization, we are referring to their ability to capture changes in amount, frequency, and intensity.

2.2.1. Data Preparation and Preprocessing

The representation and quality of data are very important in determining the quality of clusters that will be seen [44]. Hence, there is a need to do some amount of preprocessing to the data before clustering. Given that we are making use of a gridded reanalysis data, we expect to have a consistent and coherent data without outliers or missing values. For this research, several data preparatory tasks were carried out prior to actual analysis, including the following:(i)Dimensionality Reduction. Out of the several temperature and precipitation indices defined by ETCCDI to characterize climate extremes, four precipitation indices (Table 1) were selected based on previous studies by [6] and expert opinion to characterize the duration, intensity, and frequency of precipitation over our study area.(ii)Data Extraction. Matlab was used to calculate the four indices and extract them into Microsoft Excel tables for each year. ArcMap® was further used to clip out our area of interest (Nigeria) from the extracted dataset and export them back to Microsoft Excel for exploratory data analysis(iii)Data Normalization. This is the scaling down of the values of the selected indices. This step will prevent one variable from dominating over all others (or allow it if that is the aim), thus enabling the data analysis method to treat the data “fairly” [45]. Normalizing the data will help to standardize the scale effect each variable has on the final results. To achieve this, each index was standardized using its minimum and maximum values. This way, we ensured that the values of each variable range between 0 and 1.

2.3. Methodology

SOM are nonsupervised neural networks used for clustering, dimension reduction, and visualization. This research aims to achieve these three objectives by visually showing the areas with similar precipitation extreme characteristics using the indices outlined before. SOM can map high-dimensional data onto one or two dimensions while maintaining the topology of the data structure. [46]. SOM works by mapping an n-dimensional data space onto a grid of neurons. These grids of neurons are usually in a two-dimensional data space and rectangular. During training, the Euclidean distance between a neuron and all units in the data space is calculated and the closest is selected. This is called the Best Matching Unit (BMU). This process is iterated and a parameter called the learning rate is used to ensure that the training converges. Although, no preference is given to the spatial property of our climatic data, spatial autocorrelation makes it possible for the BMU attached to each neuron to be geographically close thereby creating clusters that are geographically together. Compared to other clustering methods, Self-Organizing Maps (SOM) are robust even in the presence of outliers and results easier to understand and interpret. They also have very good visualization tools like the GeoSOM suite [47]. This algorithm has been implemented in a GeoSOM suite based on the SOM toolbox in Matlab and is used for clustering the climatic data. The dataset is trained and modelled using the SOM algorithm, producing several views and interactively exploring the data, hoping to gain valuable insights. Several parameters were used to initialize the SOM to obtain different models for each year and the final parameters used are given in Table 2. The model with the least quantization error was chosen as the best fit [48].

For this research, GeoSOM was also used to detect outliers, for sensitivity analysis of the parameters of the methods used, for the analysis of the U-matrix, and for component planes and for the final clustering. After removing the outliers, a new 4 × 1 SOM was trained using the parameters in Table 2.

After clustering, the index values of the centroid of each cluster are calculated and their trend is analyzed through time. The Mann-Kendall test is used to verify if those index values exhibit a monotonic trend. The Mann-Kendall statistic is calculated as follows. Let S be the number of positive differences minus the number of negative differences between data values:where and are data values, is the number of years under study, and sgn is an indicator function that takes on the values 1, 0, or −1 according to the sign of .

A positive (negative) value of indicates an increasing (decreasing) trend. is normally distributed [49, 50] with variance given byThe test statistic is given byThe significance of the trend can be verified by comparing the observed value of with the appropriate percentiles of the standard normal distribution (critical values), for a given significance level. We used the 5% significance level to test the null hypothesis that no monotonic trend is present, against the alternative hypothesis that a (upward or downward) monotonic trend is present.

3. Results and Discussion

3.1. Exploratory Analysis

A total of 30155 equally spaced points spread over Nigeria were analyzed for each year. Descriptive statistics were computed for each of the four indices for each year as well as collectively for the entire period under study (Table 3). With Nigeria being sandwiched between the moist Atlantic Ocean and the dry Sahara Desert in the tropics, Table 3 shows the huge climatic disparity in Nigeria with some parts of Northern Nigeria having only 23 wet days in a year while some areas in the south have as much as 244 wet days out of 365 days in a year. Also, some areas have a maximum one day precipitation of 8.1 mm while other areas have a maximum one day precipitation of 271.2 mm. Scatter-plots (not shown) of the indices for each year provide evidence of a strong positive linear relationship between all indices. This linear relationship was summarized using the correlation coefficient (Table 4). The values of the correlation coefficient are all less than the significance level (0.05) and hence, there is significant evidence to conclude that there is a significant linear relationship between the indices. Therefore, we can conclude that all indices are moderately correlated, thus indicating their suitability to characterize the different features of the precipitation regimes in Nigeria [6]. This means that all four indices are moderately interconnected and where we have a very high maximum 5-day precipitation, for example, we will also most likely have a high precipitation intensity and vice versa.

The temporal variation of the mean index values was also investigated (Figure 2) and shows that 2006 had both the highest mean consecutive 5-day precipitation (R×5d index) averaged over our entire study area with a value of 112.2 mm and the maximum highest consecutive 5-day precipitation with a value of 502 mm. However, a minimum highest consecutive 5-day precipitation occurred in 1989 with a value of 19.1 mm. 2013 had the highest number of wet days with 244 days having rainfall greater than 1 mm (R1 index), while 1987 had just 23 wet days being the driest in our study period. The highest average intensity of rainfall in a raining day (SDII) was recorded in 1999 as 29.1 mm/day, while the lowest average rainfall intensity was recorded in 1989 as 3.4 mm/day. Furthermore, the highest maximum 1-day precipitation (R×1d index) of 271.2 mm was observed in 2004, and the minimum was 8.1 mm in 2009.

Histograms (not shown) were also plotted for each index by year to check for moderate and heavy extreme values which could be typical values, outliers, or perhaps errors. It also helped to graphically perceive the distribution frequency of the indices. Some years exhibit a “bell-shaped” curve in the histograms, but the majority shows a positively skewed distribution.

The previous analyses are based on averaged values over the entire study region, thus they are not reflective of the unique precipitation property of specific regions. Nigeria has a broad range of precipitation extremes in various regions. Hence, further analyses were conducted to explore and characterize each region. Exploratory spatial data analysis was used to detect spatial patterns and formulate hypothesis based on the geography of the data. From the posting of the data points (Figure 3) it is clear that the spatial resolution of the dataset is so high (approximately 5.5 km) that maps look like interpolated surfaces. Hence, there is no need to interpolate the data points to a surface using a linear model (e.g., ordinary kriging or cokriging). The overall trend in the study region corresponds to decreasing values from the southern part of Nigeria to the north in all indices. However, each index shows a unique pattern of variation (Figure 3).

3.2. GeoSOM Results
3.2.1. Outlier Analysis

Training was first done using a 20 × 20 hexagonal sheet SOM on all the indices for each year. Outliers were subsequently identified by searching for very high values in a U-matrix. A U-matrix is a visual representation of a self-organizing map (SOM) where the Euclidean distance between neurons is represented in a colour-coded image. If represented as an image with colours ranging from blue to red, then lighter shades of blue will indicate closely spaced neurons while darker shades of blue will indicate distant neurons. Therefore, a group of light colours (blue) can be regarded as a cluster and the dark colours (blue) regarded as boundaries of the cluster. From our analysis, outliers can be clearly identified as bright red spots at the upper left corner of the U-matrix (Figure 4(a)). Plotting the data on a boxplot confirms them as outliers as they have very large values outside the quantile range of the dataset (Figure 4(b)). Further plotting these data (Figure 4(c)) shows that they mostly fall in the southeastern and southwestern region of Nigeria. This region is noted for extremely high precipitation. Hence, it can be argued that these high values are not outliers but simply extreme values which should be of interest for further analysis. However, including them can significantly impact the overall results. Hence, these outliers were removed to better understanding and clustering of the indices values.

3.2.2. Spatial Trends

A 4 × 1 SOM was used because we wanted to obtain 4 clusters which is in line with number of Koppen climate classification zones over Nigeria. As expected, there is evidence of spatial autocorrelation in the pattern of the clusters (Figure 5). Four regions with similar precipitation characteristics are evident varying from the south to the north through the middle belt of Nigeria. Although these clusters bare some resemblance with the Koppen climatic zones, there are still several variations in the spatial extent of each zone when both are compared. However, this is not surprising since the Koppen classification characterizes mean climatic characteristics, and the clusters are based on extreme precipitation indices. For better clarity in interpretation, we have applied a colour scheme to the value of each precipitation index in a tabular form so that we can easily identify the colour’s that represent high (red), mean (yellow), and low (green) values of each index. The table in Figure 5 shows the mean value of each index in a cluster.

Cluster 1 covers the Niger-Delta and southeast region of the country. It is characterized by very high precipitation amount, intensity, and frequency. This is because of the southwestern trade winds that bring a lot of moisture inland from the Atlantic Ocean in the south. Because of the high moisture, this region experiences heavy and abundant monsoonal rainfall and is cloudy all year round. Hence, they have a typical tropical monsoon climate.

Cluster 2 covers the southwest and extends further inland. Although not as high as cluster 1, it is also characterized by high intensity, frequency, and amount of rainfall. This region experiences two peaks of rainfall in a year with a little dry season in August. The first rainfall peak is usually characterized by thunderstorms and occurs in June, while the second rainfall peak is usually monsoonal and occurs around September. A portion of this cluster appears as an island in the middle belt around Jos, Kaduna and Abuja, surrounded by cluster 3. This might be explained by the terrain of this region, which is a plateau with high elevation. Hence, it has a semitemperate climate. Therefore, it is interesting to note that it bears similar precipitation characteristics with southwestern Nigeria.

Cluster 3 covers a major part of the middle belt. Unlike cluster 2, this region exhibits just a single maximum of rainfall during the raining season. Rainfall in this region is primarily associated with thunderstorms and high wind gusts. This is the region where the south-westerlies meet the north-easterlies to create what is known as the Intertropical Discontinuity (ITD). The ITD is a region of low pressure and an important factor for Nigeria weather and climate [51].

Cluster 4 is located predominantly in the northern part of the country. This is a dry and dusty area with the North-Easterly trade winds bringing in dry and dusty air masses from the Sahara Desert. Results show that this region experiences minimal rainfall in terms of amount, duration, frequency, and intensity (see table in Figure 5). The raining season in this region is very short lasting just about three months.

3.2.3. Temporal Trends

There was a tendency for the maximum 1-day precipitation, maximum 5-day precipitation, and frequency of rainy days to increase with time in clusters 1, 2, and 4. This result is consistent with previous study done in Kano State, Nigeria, which showed an increase in maximum 5-day precipitation with time [30]. In contrast, the maximum 1-day precipitation decreases with time in cluster 3. The intensity of rainfall was constant in cluster 1 throughout the period under study, while a decreasing trend is noticed in rainfall intensity and maximum 1-day precipitation in cluster 3. However, considering the results of the Mann-Kendall test, those trends were not significant (Figure 6). The only statistically significant upward trend is noticed in the frequency of rainy days in cluster 4. These results are not different from that obtained and reported globally in some regions of the world [52]. Specially in the tropics, the intensity of precipitation extremes is expected to increase with a warming climate [53].

4. Conclusion

The main objective of this research was to study the spatial and temporal patterns of precipitation extremes in Nigeria. This was achieved by computing a GeoSOM with four precipitation indices computed from 1985 to 2016. The four clusters created for each year summarize the precipitation dynamics that underline the indices. The spatial extents of the clusters have some resemblances with the Koppen climatic zones, but there are some relevant differences throughout the years. We also identified a significant increasing trend in the frequency of rainy days at the centroid of cluster 4, which predominantly covers the northern part of the country. This trend significantly increases the risk of flooding in Nigeria. Floods near the coast in southern Nigeria will be exacerbated by rising sea level with fixed infrastructure at risk. Because of these precipitation trends, the opportunities for adapting agricultural practices will be important in determining future crop yields in the region.

We have been able to identify the spatial and temporal patterns of extreme precipitation and thus gain valuable insights into the spatial and temporal dynamics of precipitation in Nigeria. However, the temporal resolution of the dataset is too small to adequately characterize the long-term behavior of extreme precipitation in Nigeria. Further research with a higher temporal resolution dataset should be pursued.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.


The work described in this paper was supported by the European Commission (EC) Education, Audiovisual and Cultural Executive Agency (EACEA) Erasmus Mundus scholarship. The authors also gratefully appreciate the Information Management School of the Universidade Nova de Lisboa (NOVA IMS), Department of Geoinformatics at the University of Munster and Universitat Jaume I.