Abstract

The challenges involved in the traditional cloud computing paradigms have prompted the development of architectures for the next generation cloud computing. The new cloud computing architectures can generate and handle huge amount of data, which was not possible to handle with the help of traditional architectures. Deep learning algorithms have the ability to process this huge amount of data and, thus, can now solve the problem of the next generation computing algorithms. Therefore, these days, deep learning has become the state-of-the-art approach for solving various tasks and most importantly in the field of recognition. In this work, recognition of city names is proposed. Recognition of handwritten city names is one of the potential research application areas in the field of postal automation For recognition using a segmentation-free approach (Holistic approach). This proposed work demystifies the role of convolutional neural network (CNN), which is one of the methods of deep learning technique. Proposed CNN model is trained, validated, and analyzed using Adam and stochastic gradient descent (SGD) optimizer with a batch size of 2, 4, and 8 and learning rate (LR) of 0.001, 0.01, and 0.1. The model is trained and validated on 10 different classes of the handwritten city names written in Gurmukhi script, where each class has 400 samples. Our analysis shows that the CNN model, using an Adam optimizer, batch size of 4, and a LR of 0.001, has achieved the best average validation accuracy of 99.13.

1. Introduction

Cloud computing is equipped with good solutions to meet the increasing demand of data storage. It has provided users with various benefits, thereby reducing the efforts to manage the data in an efficient and effective manner. The cloud relies mainly on the data centers, and the data centers, which are located far away from the user, are further linked together to build data center networks. So, the next generation cloud computing architectures have made it possible to process the data closer to the user instead of processing it at the data center. These emerging paradigms of cloud computing generate huge amount of data. But there is the possibility that this huge data may not be analyzed to uncover the new information [1]. Various algorithms can be applied for the analysis of data. Performance of traditional algorithms decreases when the amount of data increases. But, the performance of these deep learning algorithms improves when the amount of data increases. Deep learning is the subset of machine learning, and it has gained the attention of various researchers due to its strength of handling huge amount data. This is the reason why applications of deep learning in the emerging paradigms of cloud computing are also gaining the attention of research community. These days, all the information in our lives is being processed through electronic devices. As computers are involved in every field, there is a requirement to transfer all the information between humans and computers with the help of some efficient and fast algorithms. So, there exists text recognition, which helps in providing an interface, so that humans and computers can interact with each other. An example of such systems lies in the digitization of documents images, which may be handwritten or printed. Such systems are relevant in many applications like automatic form processing, postal automation, cheques processing, preservation of historical documents, etc. In this proposed work, recognition of city names that are written in Gurmukhi script is implemented and is one of the application research areas of postal automation. As manual sorting of mails is a cumbersome as well as labor-intensive task due to the high labor cost involved in the process. So, it is required to develop a postal automation system that can read all the necessary fields of the postal document and can help in reaching the document to its destination. Gurmukhi script is the Punjab state’s official language, which is used by the Punjab state’s government officials to communicate their documents. An example of such a document is shown in Figure 1, in which the address field of the document to be posted is written in Gurmukhi [2] script, and the city name “Mohali” is also highlighted.

Several researchers are already working in the field of postal automation, and research is already going on for the recognition of various scripts like Devanagari, Bangla, English, Russian, etc., but still no postal automation system exists for Gurmukhi script. This proposed work aims to employ a deep learning technique based on CNN for the recognition of Gurmukhi handwritten city names. Out of various deep learning techniques, CNN is widely used for the purpose of text recognition [1]. The most important challenge for the recognition of Gurmukhi script is its cursive writing style, and the characters are so closely written, which makes their segmentation a difficult task. So, this paper has proposed the recognition technique using a segmentation-free approach, which is known as the Holistic approach. Figure 2 is showing the sample of handwritten Gurmukhi script.

Emerging cloud computing paradigms have helped the user by generating data at the edge of the network without being transferred at the far cloud center. In this section, a literature survey is given on the use of ConvNet in the emerging cloud computing architectures. Huang et al. [3] have proposed a ConvNet model using edge computing algorithm for the classification of various mosquitoes. A device was developed for the detection of mosquitoes and preprocessing of videos before sending them to data center. Later, a ConvNet model is employed for the classification of mosquitoes. Similarly, Liu et al. [4] have also proposed a CNN based ConvNet model for the recognition of food. A server equipped with the centos 7.0 was used at the cloud. Later, again, the CNN based model ConvNet is implemented for the purpose of identification and classification of food images. Azimi et al. [5] have used ConvNet for the diagnosis of heart diseases. For the reading of files, a local Wi-Fi is programmed and for the uploading POST request to the edge device, a machine with Apache server is used. Hosain et al. [6] have proposed ConvNet model using edge cloud computing, in which data is sent to the center using radio access technology. For the formation of edge cloud, a MEC server is used, and later, a CNN based ConvNet is used for the classification. Similarly, in the field of postal automation, i.e., for the recognition of various fields like city name, pin code, and street name, a huge amount of data is required. This huge data can be processed efficiently by deep learning algorithms. Among the relevant works, recognition of such fields is done in various scripts using character-level recognition, in which the word is segmented into individual characters before recognition (analytical approach) or word level recognition, which is segmentation-free approach (holistic approach). Pal et al. [7] have employed character-level recognition using machine learning for the recognition of multilingual script-based city names for Indian postal automation. Similarly, Thadchanamoorthy et al. [8] have proposed a technique for the recognition of Tamil city names. The accuracy obtained is 96.89%. Some have worked on the recognition of pin codes only, which are written in different scripts like English, Bangla, etc. One such example is Vajda et al. [9], where the authors have recognized pin codes written in Bangla as well as in English script using a nonsymmetric half-plane hidden Markov model and achieved an accuracy of 94.13% and 93%, respectively. Sahoo et al. [10] have implemented a holistic approach for the recognition of city names written in Bangla script using shape-context features, while multiclassifiers are employed for the classification purpose. The datasets used here are the large datasets that are stored on the cloud. There are some other examples where authors have implemented only a holistic approach for the recognition of various scripts like English, Bangla, and Arabic and achieved an accuracy of 90.3%, 83.64%, and 63% [1113]. Manchala et al. [9] have used NN for the recognition of English script using the Holistic approach. Similarly, Bhowmik et al. [10] have proposed a technique for the recognition of Bangla script. Wahbi et al. [11] have worked on the recognition of Arabic script again by holistic approach, and the model employed for the recognition is hidden Markov model. Few other authors have also used the holistic approach for the recognition of text, in which features are manually extracted [14, 15], while others have used CNN for character recognition, in which features are automatically extracted [16]. The aim of this proposed work is to employ CNN with the holistic approach for the recognition of Gurmukhi handwritten city names. All these techniques have been employed on huge datasets, which are stored on cloud.

2.1. Contributions of the Proposed Work

(i)A dataset having 4000 samples of the handwritten images in the Gurmukhi script for the 10 city names has been generated.(ii)A CNN model for the automation of postal system for the recognition of Gurmukhi handwritten city names has been prepared. The model can recognize all the 10 city names with an average validation accuracy of 99.13%.

3. Present Work

In this proposed work, a dataset of 4000 Gurmukhi handwritten city names is created for 10 different classes (city names), where each class is having 400 samples that can be fed to the model. As manual sorting of postal documents is a labor-intensive task, so, recognition of city names will help in the automation of postal system for the state of Punjab. Finally, the designed model has predicted various parameters for the recognition of city names. The methodology followed for the recognition is shown in Figure 3.

3.1. Dataset

For the preparation of dataset, each sample is written 10 times by 40 different writers generating 4000 samples. For collecting the dataset, two sheets were given to each writer to write each word five times on each sheet, generating total 10 samples for each word from both sheets. So, each writer generated 100 samples. Writers were selected from different age groups, educational levels, and different dialects. Writers were free to use any colored pen.

3.2. Digitization and Preprocessing of Dataset

For the digitization of collected samples, a scanner with 300 dpi was used to scan the collected sheets containing samples. For the preprocessing of dataset, each scanned sheet is converted into gray scale image, and later, normalization is performed. Further, brightness adjustment, contrast adjustment, and intensity level adjustment are performed to improve the quality of the image before cropping the word samples from the digitized sheets. Adobe Photoshop is used for the purpose of contrast adjustment, brightness adjustment, intensity level adjustment, and cropping of images. Later, cropped samples were placed in their respective folders that were named as per the city name. Later, the prepared dataset is stored on the cloud as it has to be accessed using the deep learning network.

Once the preprocessing is done, the dataset is divided into training and validation dataset [17, 18]. 80% of the data is kept for training the model, and 20% is kept for validating the performance of the model. Table 1 below is showing the city name and its corresponding handwritten digitized image in the Gurmukhi script.

3.3. Data Augmentation

Data augmentation is the technique that helps in increasing the available data. So, available data is further increased by flipping or rotating the images, and data augmentation is the inbuilt function of the proposed model. Rotation of city name “Amritsar” is shown in Figure 4.

3.4. Model Design

To build the CNN model, three layers are required: (i) convolution layer; (ii) pooling layer; (iii) output layer. The primary function of the first convolution layer is to apply the predefined filter weight to derive the features from an image. Based upon the weighting filter used, the number of feature maps is produced. The complexity of the extracted features keeps on increasing with the increasing model depth, while the last convolution layer of the model generates the feature maps, which are much closer to the required recognition task. The next layer is the pooling layer, and the most commonly used pooling technique is max pooling. This further helps in preserving the features by selecting the maximum value as this has the closest similarity to the required features. The pooling layer also helps in reducing the size of the image by getting rid of the features, which are not important. The last layer is the fully connected layer, and from here, the output classes are obtained. The proposed CNN model is shown in Figure 5.

The first convolution layer used in this proposed work has 32 filters of size 3 3 with a stride of 1 1, 32 feature maps are derived from this, and the convolution layer is followed by the ReLU activation function. The obtained feature maps are then passed to the max pooling layer of 2 2 filter size and a stride of 2 2, which means that the pooling layer has reduced the size of the feature map by a factor of 2. The obtained pooled feature maps are passed to the next convolution layer, which has 64 filters with a size of 3 3 and a stride of 1 1, which is again followed by max pooling layer of size 2 2 and a stride of 2 2, which is further followed by another max pooling layer of the same size. Lastly, a fully connected layer is introduced with the SoftMax activation function, having 2048 neurons in the input layer, then 120 neurons in the middle layer, and 10 neurons in the last, which is the output layer. The fully connected layer transforms the obtained feature maps to the 10 classes. Rectified linear unit (ReLu) is used as the activation function for all the layers in the model, except for the pooling layers.

4. Experiments and Results

The proposed model is implemented on the dataset of 4000 images using Python with the help of Keras and Tensorflow, which are machine learning libraries.

4.1. Experimental Setup and Performance Metrics Used

The efficiency of the model is impacted by various parameters, but in this paper, three important parameters are considered: the optimizer, LR, and the batch size of the model. The optimizer is used to update the network weights; also, the choice of optimizer means good results in minutes, hours, or days. LR tells how rapidly the neuron weights will be adapted, while the batch size tells the number of samples that are processed before the model is being updated. The proposed model is analyzed using two different optimizers, Adam and stochastic gradient descent (SGD), three different LRs, 0.001, 0.01, and 0.1, and the batch size: 2, 4, and 8. It is known that the batch size is an important hyperparameter for deep learning systems. Large batch size helps in speeding up the computation, but it leads to poor generalization [19]. So, it is always preferable to use a small batch size. This proposed model has given good generalization with batch size of 2, 4, and 8 only, while the accuracy drastically reduced when the batch size is further increased. Response time for the training and validation of the model varies from model to model, depending on the dataset to be trained, batch size, and LR, and it also depends on the hardware of the system used like CPU, GPU, RAM, etc. [20]. To evaluate the proposed CNN model, various parameters are calculated like training and validation loss, validation accuracy [21, 22], precision, and recall also. All these parameters are calculated using the different metrics of the confusion matrix, which are true positive (TP), false positive (FP), true negative (TN), and false negative (FN). The parameters are defined as follows:

4.1.1. Accuracy

Accuracy is defined as the ratio of the number of correct predictions made by the model to the total number of predictions made as shown in

4.1.2. Precision

It is a metric that tells about the proportion of cases that report true and are actually true as shown in

4.1.3. Recall (Sensitivity)

The recall measures the ability of the designed model to detect positive samples. It is calculated as the sum of true positive across all classes divided by the sum of true positive and false negative across all classes as shown in

4.2. Results Obtained Using Adam Optimizer with a Batch Size of 4

In this section, various parameters are obtained on three different LRs, while the Adam optimizer is used with a batch size of 4.

4.2.1. Results Obtained with a LR of 0.001

Table 2 shows the values of various parameters obtained with the LR of 0.001, while the optimizer used is Adam with a batch size of 4. Maximum obtained validation accuracy on the validation dataset is 99.8% with a minimum validation loss of 0.01, and the average obtained validation accuracy is 99.13%.

Figure 6 shows the different parameters convergence plots for the training dataset, as well as for the validation dataset. Y-axis is showing the particular value obtained, while the X-axis is the number of epochs, for which the model is trained. Validation accuracy is the main parameter; while checking the model’s performance for the recognition of text, it can be observed that the accuracy plot is almost increasing after the run of few epochs. The maximum validation accuracy obtained is 99.8%. The value of loss should be less, and the minimum value of the loss obtained is 0.01 and 0.07 for the validation and training dataset. Values of other parameters are approaching 1, which shows that the designed model is reasonably good.

Results can also be analyzed by plotting the confusion matrix. Figure 7 is showing the plot of the confusion matrix for multiclassification results obtained in Figure 6. On the X-axis, the predicted labels are depicted, while on the Y-axis, true labels are depicted. The confusion matrix tells the information about the actual (true) and the predicted classification done by the designed classification model. The highlighted value in blue boxes represents the true positive values, which tell how much the designed model has correctly predicted the positive classes as positive [23]. For example, for the city Amritsar, the designed model has correctly predicted all the 80 samples, while, for the city Fazilka, the model has correctly predicted 78 samples and incorrectly predicted 2 samples as Ludhiana, which can be observed in Figure 7.

4.2.2. Results Obtained with a LR of 0.01

Now, the LR is changed to 0.01, while the optimizer used is Adam. Table 3 shows the outcomes of various parameters, when the LR is changed from 0.001 to 0.01, while other parameters are kept the same. From the obtained results, it can be observed that the LR of the model impacts the accuracy results. The validation accuracy obtained on 10th epoch is 99%, and the values of training loss, validation loss, validation precision, and validation recall are 0.11, 0.01, 0.99, and 0.99, respectively. The average obtained validation accuracy is 96.95%, which is less than the average validation accuracy obtained, with the LR of 0.001, which was 99.13%. Figure 8 shows the parameters convergence plot, and the confusion matrix for the same is shown in Figure 9, where it has misclassified 4 classes. It can be observed from Figure 8 that the obtained plots are not so linear as compared to the plots obtained in Figure 6, while linearity can be observed in the last few epochs only [24].

4.2.3. Results Obtained with a LR of 0.1

Table 4 shows the outcomes of various parameters, when the LR is further changed from 0.01 to 0.1, while other parameters are kept the same. On the 10th epoch, maximum validation accuracy obtained is 99.3%. The obtained values for training loss, validation loss, validation precision, and validation recall are 0.10, 0.00, 0.99, and 0.99. The average obtained validation accuracy is 98.17%, which is less than the average validation accuracy obtained with the LR of 0.001, which was 99.13%. Figure 10 shows the parameters convergence plot, and the confusion matrix for the same is shown in Figure 11. In Figure 10, all the plots are linearly varying, the loss plot is almost decreasing, and other plots are approaching to 1.

From Figure 11 of confusion matrix for LR 0.1, it can be observed that the proposed model has misclassified few more classes as compared to the previous confusion matrix. 56 samples of the city “Fazilka,” 73 samples of the city Hoshiarpur, 78 of Jalandhar, and so on are correctly predicted.

It can be observed from Tables 2, 3, and 5 that the LR of the model impacts the accuracy results obtained for recognition. A LR of 0.001 has generated the average validation accuracy of 99.13%, which is highest as compared to the validation accuracies obtained by other LRs.

4.2.4. Optimal LR Selection with Adam Optimizer

Analysis of the validation accuracy obtained from Tables 2, 3, and 5 is done in Figure 12. It shows that the Adam optimizer with a LR of 0.001 has performed better for the available dataset. Figure 12 shows the validation accuracy obtained on 10 epochs for all the three LRs. Y-axis shows the validation accuracy obtained, while the X-axis shows three different LRs on 10 epochs. It can be observed on the 10th epoch that the validation accuracy obtained with a LR of 0.001 is the highest as compared to the LR of 0.01 and 0.1. It can also be observed that the validation accuracy is high for most of the epochs using LR of 0.001. Figure 13 shows some of the misclassified results obtained by Adam optimizer. City name “Ludhiana” is misclassified as “Patiala” in Figure 13(a), while city “Fazilka” is misclassified as “Ludhiana” in Figure 13(b).

From the above discussions of the accuracy results obtained using Adam optimizer with a LR of 0.001, 0.01, and 0.1, it can be observed that the proposed model has achieved the best validation accuracy with a LR of 0.001, as the proposed model has also misclassified few images. Results for the same are shown in Figure 13.

4.3. Results Obtained Using Adam Optimizer: LR of 0.001 on Different Batch Sizes

It has been analyzed from Figure 12 that LR of 0.001 has achieved better validation accuracy as compared to the LR of 0.01 and 0.1. It has been observed that the batch size of the model also impacts the accuracy results [25]. Now, the model is analyzed using three different batch sizes (2, 4, and 8) with Adam optimizer and a LR of 0.001. Table 6 shows the various results obtained by changing the batch size, while the LR of 0.001 is used. It can be observed from the table that batch size of 4 has given good accuracy results as compared to batch size of 2 and 8. The best average validation accuracy achieved is 99.13% when the batch size is kept at 4, while it is 97.24% and 92.07% with a batch size of 8 and 2 with a LR of 0.001.

4.3.1. Comparison of Three Different Batch Sizes Using Adam with a LR of 0.001

It can be observed from Figure 12 that LR of 0.001 has given good results as compared to LR of 0.1 and 0.01. In Table 4, various parameters are obtained using three different batch sizes. Average validation accuracy obtained by batch size 2 is 92.07%, batch size 4, 99.13%, and batch size 8, 97.24%. Now, validation accuracy obtained on 10 different epochs of Table 4 is compared in Figures 14 and 15 by three different batch sizes. In Figure 14, Y-axis is representing the validation accuracy obtained, while the X-axis is representing the three different batch sizes on 10 epochs and in Figure 15, Y-axis is representing the average validation accuracy obtained, while the X-axis is representing the batch sizes 2, 4, and 8. It can be observed from Figure 14 that batch size 4 has given higher accuracies as compared to other batch sizes, while Figure 15 shows that the high average validation accuracy is given by a batch size of 4.

4.4. Results Obtained Using SGD Optimizer with a LR of 0.001 and Batch Size of 4

The optimizer employed in CNN model also affects the accuracy obtained. In this section, the results are obtained using SGD optimizer with a LR of 0.001 and a batch size of 4. LR of 0.001 and batch size of 4 are chosen based upon the best accuracy results obtained by them in the previous sections. Table 7 shows the results for various parameters obtained using SGD optimizer, while Figure 16 shows the parameters convergence plot, and Figure 17 shows the confusion matrix for the same. Figure 16 shows that the plots are almost linear for all the parameter values obtained in Table 7. The plot in Figure 16(a), training and validation loss, is approaching value “0” with each epoch, while plots in Figures 16(b)16(d) for training and validation accuracy, precision, and recall are approaching “1.” The maximum validation accuracy obtained in the last epoch is 98.8%, which is less than the best accuracy obtained by Adam optimizer.

4.4.1. Comparison of Three Different Batch Sizes Using SGD Optimizer with a LR of 0.001

The comparison of SGD optimizer on three different batch sizes is carried out in this section. Training loss, validation loss, validation accuracy, and average validation accuracy are compared using 10 epochs. The average validation accuracy obtained by batch size 2 is 87.6%, batch size 4, 94.18%, and batch size 8, 93.31%. It can be observed from Table 8 that the highest average validation accuracy is obtained by batch size 4. Figure 18 shows the comparison plot for validation accuracy using batch sizes 2, 4, and 8 for SGD optimizer with a LR of 0.001, while Figure 19 shows the comparison plot for average validation accuracy for the same. It can be concluded that, here, also batch size of 4 has worked better for the available dataset using SGD optimizer although the achieved validation accuracy is less than the validation accuracy achieved by Adam optimizer.

5. Testing of Some Images

In this section, some randomly selected images are given to the CNN model, which has employed Adam optimizer with a LR of 0.001 and BS of 4. Figure 20(a) shows that the image of city name “Fazilka” has been recognized with an accuracy of 84.12%, while the image of city name “Amritsar” has been recognized 100% correctly.

6. Analysis

It can be analyzed from Tables 4 and 7 that Adam optimizer with a LR of 0.001 and batch size of 4 has given the good results in terms of average validation accuracy. The best average validation accuracy obtained by Adam is 99.13%, while best average validation accuracy obtained by SGD optimizer is 94.18%. Analysis plot of average validation accuracy obtained by Adam optimizer and SGD optimizer is shown in Figure 21. It can be concluded that Adam with a LR of 0.001 and batch size of 4 has performed much better as compared to SGD optimizer using the same LR and batch size.

7. Conclusion

Deep learning is the machine learning, which is applied to the large datasets. Deep learning requires huge amount of data to train the network, which can be stored on cloud. Therefore, cloud computing helps in making deep learning accessible and easier in handling large amount of data, and also, the training of algorithms can be easily done on the distributed hardware. It also helps in providing the access to configurations of various hardware such as FPGAs, GPUs, and high performance computing systems. It can be concluded that the emerging paradigms of cloud computing work very well with the deep learning algorithms instead of traditional algorithms. The performance of the deep learning model depends on various hyperparameters like the LR, batch size, choice of optimizers, and so on. In Tables 2, 3, and 5, various parameters are obtained using an Adam optimizer with a LR of 0.001, 0.01, and 0.1, and the obtained average validation accuracies are 99.13%, 96.9%, and 98.17%. Table 4 has also obtained the average validation accuracy by changing the batch size of 2, 4, and 8, while the LR is kept fixed at 0.001. Results are also analyzed by changing Adam optimizer to SGD optimizer for the LR of 0.001. Average validation accuracy obtained for SGD optimizer using batch size of 2, 4, and 8 is 87.6%, 94.18%, and 93.31%, while the best average validation accuracy obtained by Adam and SGD is 99.13% and 94.18%, respectively. From the above experimentation, it has been observed that Adam optimizer with a batch size of 4 and a LR of 0.001 has achieved the best average validation accuracy of 99.13%. In the future, this model can be implemented for word recognition for other scripts also.

Data Availability

The data will be available from the author upon request ([email protected]).

Conflicts of Interest

The authors declare that they have no conflicts of interest.