Computational Intelligence and Neuroscience

Volume 2019, Article ID 2537689, 16 pages

https://doi.org/10.1155/2019/2537689

## Arabic Sentiment Classification Using Convolutional Neural Network and Differential Evolution Algorithm

^{1}School of Computer Science and Technology, Wuhan University of Technology, 122 Luoshi Road, Wuhan, Hubei 430070, China^{2}Department of Mathematics, Faculty of Science, Zagazig University, Zagazig, Egypt

Correspondence should be addressed to Shengwu Xiong; nc.ude.tuhw@wsgnoix

Received 20 October 2018; Revised 18 January 2019; Accepted 30 January 2019; Published 26 February 2019

Academic Editor: Rodolfo Zunino

Copyright © 2019 Abdelghani Dahou et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

In recent years, convolutional neural network (CNN) has attracted considerable attention since its impressive performance in various applications, such as Arabic sentence classification. However, building a powerful CNN for Arabic sentiment classification can be highly complicated and time consuming. In this paper, we address this problem by combining differential evolution (DE) algorithm and CNN, where DE algorithm is used to automatically search the optimal configuration including CNN architecture and network parameters. In order to achieve the goal, five CNN parameters are searched by the DE algorithm which include convolution filter sizes that control the CNN architecture, number of filters per convolution filter size (NFCS), number of neurons in fully connected (FC) layer, initialization mode, and dropout rate. In addition, the effect of the mutation and crossover operators in DE algorithm were investigated. The performance of the proposed framework DE-CNN is evaluated on five Arabic sentiment datasets. Experiments’ results show that DE-CNN has higher accuracy and is less time consuming than the state-of-the-art algorithms.

#### 1. Introduction

People and organizations are posting their information and opinions on various social media platforms such as Twitter and Facebook. Understanding public sentiments, emotions, and concerns expressed on these platforms is a crucial issue, which is the interest of sentiment analysis (SA). SA is a natural language processing (NLP) application that focuses on automatically determining and classifying the sentiment of large amounts of text or speech [1, 2]. Arabic is a Semitic language known by its morphology richness and different written and spoken forms such as modern standard Arabic (MSA) and its various dialects. Arabic morphology and structure complexity create many challenges such as the shortage of large datasets and limited tools to perform sentiment analysis [3, 4]. Even deep neural networks (DNNs) [5] and convolutional neural networks (CNNs) [6] have shown promising and encouraging performance, little research on sentiment analysis using deep learning (DL) techniques has been done for Arabic language [7–9] while many researches have been done on other languages [10–12]. Research on Arabic language using deep-learning techniques is still relatively scarce which is worth to be investigated.

To choose the best architecture and hyperparameters for a DL model and apply it to Arabic sentiment classification, the model is usually evaluated on different architectures and hyperparameter combinations manually or using previous successful models directly [13]. Moreover, the building task of a DL model for SA remains a very crucial process that requires the involvement of specialists in the domain and language or the integration of feature engineering techniques. In addition, designing a DL model is still a complex and time-consuming task. The assessment of DL models parameters requires a fitting and evaluation process on the test data, which can be very expensive and infeasible on small computing units. Therefore, an effective methodology to select the best architecture for a DL model with optimal hyperparameters is needed to build a successful Arabic sentiment classification system. A lot of work has been done in order to evolve DL models using NeuroEvolution (NE) methods [14] on different tasks such as image classification [15] using particle swarm optimization (PSO), and handwritten digit recognition based on genetic algorithms and grammatical evolution [16]. In the same context, this paper presents an alternative NE approach for Arabic sentiment classification using the differential evolution (DE) [17]. The DE algorithm is adopted since it is known by its remarkable performance using different mutation strategies in several literatures, as well as, it has less parameters to fine-tune. To the best of our knowledge, this is the first work that attempted to address the problem of automatically building a deep neural network model for Arabic sentiment classification using DE algorithm. The proposed DE-CNN model focuses on utilizing DE algorithm to automatically find and tune appropriate parameters to build the optimal CNN model. Since CNN have been applied extensively to sentiment classification on other languages, Arabic sentiment classification is chosen as a well-known and widely used task, which constitutes a good environment to validate and evaluate the performance of DE-CNN.

DE-CNN starts by generating a population, where each individual represents a configuration selected randomly from each parameter possible values. Then, DE-CNN evaluates each individual through computing fitness function value using the current configuration. After that, all individuals in the population are updated using DE algorithm operators. These steps are repeated until the terminal criteria are satisfied. To evaluate the performance of the proposed framework, various Arabic sentiment classification datasets covering Twitter data are used. The evaluations on these datasets show that the proposed framework outperformed existing methods.

The main contributions of this paper can be summarized as follows:(i)Modeling the problem of evolving CNNs as a metaheuristic optimization task to build an Arabic sentiment classification system(ii)Using two different fitness evaluation techniques to assess the generalization of the CNN(iii)Integrating two different mutation strategies to improve the exploration and exploitation ability of DE algorithm(iv)Building and training different CNN architectures with variable number of parallel convolution layers

The rest of this paper is organized as follows: Section 2 provides related works with respect to Arabic sentiment classification and NE. Section 3 describes the fundamentals of DE algorithm and CNN. The proposed framework is introduced in Section 4. Section 5 presents the evaluation of the proposed framework, while Section 6 gives the conclusion and presents our future work.

#### 2. Related Work

In this section, we will review the most recent works related to Arabic sentiment classification and NE. Recently, many works have been conducted on SA targeting English, and other European languages. However, a small number of researches focus on the Arabic language [18, 19] using DL models. Sallab et al. [20] trained several DL models as described in their original papers for Arabic sentiment classification including deep neural networks (DNNs), deep belief networks (DBNs), deep autoencoder (DAE), and recursive autoencoder (RAE). Al-Azani et al. [21] investigated the problem of imbalanced datasets for Arabic sentiment polarity determination. They conducted a comparison between different traditional machine learning classifiers and ensembles such as k-nearest neighbor (k-NN), support vector machines (SVMs), voting, bagging, boosting, stacking, and random forests. Moreover, Al-Azani et al. [22] conducted an empirical evaluation of two state-of-the-art DL models, which are unidirectional and bidirectional Long Short-Term Memory (LSTM) and its simplified variant Gated Recurrent Unit (GRU), to detect sentiment polarity of Arabic microblogs. Alayba et al. [23] used DNN and CNN alongside several Machine Learning algorithms to perform Arabic SA on health services. In their experiments, they reported that the best classifiers were SVM and stochastic gradient descent (SGD) in which they did not investigate the effect of DL models architecture and parameters.

NE is considered as a subfield within artificial intelligence (AI). It aims to automatically evolve neural networks architectures and hyperparameters based on the use of evolutionary algorithms. For example, Young et al. [24] presented a framework named multinode evolutionary neural networks for deep learning (MENNDL) to learn optimized CNN hyperparameters via a genetic algorithm (GA). Restricting convolutional layer to three layers, hyperparameters such as filter size and the number of filters for each convolutional layer were optimized. Verbancsics and Harguess [25] proposed a modification of hypercube-based NeuroEvolution of augmenting topologies (HyperNEAT) [26] to evolve a CNN for image classification task. The methodologies were evaluated on MNIST dataset. Tirumala et al. [27] studied the feasibility of using evolutionary approaches to propose the prospects of evolving deep architectures with the aim of reducing training time of DNNs. By evaluating their approach on MNIST dataset, the training time of DNN was accelerated over the regular approach with a time difference of over 6 hours. Based on reinforcement learning, Baker et al. [28] proposed a metamodeling algorithm named MetaQNN. For a learning task such as image classification, MetaQNN is used to automate the generation of CNN architectures and tested over MNIST dataset. Loshchilov and Hutter [29] proposed an alternative deep neural network hyperparameters optimization instead of the grid search, random search, or Bayesian optimization. Covariance Matrix Adaptation Evolution Strategy (CMA-ES) was used to evolve several hyperparameters in the optimizer, convolution, and fully connected layers. Based on Cartesian genetic programming, Suganuma et al. [30] presented a work to perform image classification on the CIFAR-10 dataset by automatically building and optimizing CNN architectures. Their core of research was focusing on convolution blocks and tensor concatenation, and they do not consider dense layers or hyperparameters optimization. They automatically generated competitive CNN architectures that can compete with the state-of-the-art networks. Xie and Yuille [31] have adopted genetic algorithms (GAs) to evolve CNN architectures by proposing a binary encoding method to represent GA individuals in a fixed-length string. They used two common datasets such as MNIST and CIFAR-10 to perform visual recognition and evolve the CNN architectures based on recognition accuracy. Following the same working principles as NEAT [32], Miikkulainen et al. [33] introduced an automated approach for evolving deep neural networks named Cooperative DeepNEAT (CoDeepNEAT), which learn complex convolutional, feedforward, and recurrent layers to evolve the network architecture. Real et al. [34] introduced a technique based on GAs to generate a fully trained neural network that does not require any postprocessing.

#### 3. Preliminaries

##### 3.1. Differential Evolution

The differential evolution (DE) is one of the most popular evolutionary algorithms introduced by Storn and Price in [17, 35]. DE has been used in different optimization tasks such as computer vision [36, 37] and text classification [38]. The DE starts by initializing training parameters such as population size *N*, individual dimension , mutation scaling parameter *F*, and crossover probability . At the beginning, a population *X* of size *N* and dimension is generated usingwhere *L* and *U* represent lower and upper boundaries of the search space, respectively. is the function used to generate a random matrix in the interval [0,1].

*Mutation operator* is used to create a new individual from the current parent individual . DE scheme (or DE/rand/bin) defined in Equation (2) performs the mutation operation.where , , and are different individuals randomly chosen from the population at iteration *t*.

*Crossover operator* is used to generate an offspring individual from and as the following:where is a random value chosen for the *j*th decision variable and represents a random decision variable index taken from .

Then, the fitness function of the parent individual and the fitness function of the offspring are computed.

*Selection operator* is used to select the best individual from the parent individual and the offspring based on the calculated fitness function values as defined in the following equation:

The previous steps are repeated until the stop condition is met. If it is satisfied, the DE stops and returns the best individual. Otherwise, it will continue by starting again from mutation phase. DE algorithm can use different strategies to perform mutation, where some of them are used to improve the exploration and exploitation ability of the search space [39, 40]. These strategies can be distinguished through using the representation “DE/a/b” where “DE” refers to the differential evolution, “a” indicates the solution to be mutated, and “b” represents the number of different solutions used. In this paper, only two strategies are used where the first one is the “DE/best/1” given aswhereas the second one is “DE/best/2” given aswhere represents the best solution at the iteration *t*.

##### 3.2. Convolutional Neural Network

Deep learning approaches known by their ability to automatically learn features have shown remarkable performance in various fields. For example, computer vision (CV) [41], speech recognition [42, 43], NLP [44, 45], and a large variety of applications [46]. In this section, a common deep learning model named parallel convolutional neural network (CNN) for sentence classification is described. Figure 1 shows the parallel CNN architecture where the CNN model consisting of one-dimension parallel convolution layers (1D-CNN) is used to capture local semantic features by using a unique filter size at each parallel convolutional layer [44]. To select global semantic features, a one-dimension pooling layer is implemented at the end of each convolution layer. The outputs from pooling layer are concatenated and fed to a fully connected (FC) layer. Finally, an FC layer with sigmoid or Softmax acts as an output layer, which is used to produce the classification results based on the inputted features from previous layers. CNN is known by its convolution operation that uses filters, where each filter can learn to produce a feature map. At the same layer in CNN, same filter weights are shared. CNN takes input as a matrix that represents a sentence, where each row is dimensional vector assigned to a specific word from the sentence. These word vectors are build using a neural language model (NLM) such as word2vec [47, 48] which represents the semantic relations between words as vectors. As an example, if we assume that the input sentence has 20 words and each word is represented as a dimensional vector, then the size of the input layer of the CNN will be . To address the problem of overfitting, layers such as pooling and dropout are commonly used. For the convolution and fully connected layers, Sigmoid, Hyperbolic Tangent (tanh), and Rectifier (ReLU) [49] are activation functions which can be applied in neural networks.