Abstract

Handwritten character recognition (HCR) is a mainstream mobile device input method that has attracted significant research interest. Although previous studies have delivered reasonable recognition accuracy, it remains difficult to directly embed the advanced HCR service into mobile device software and obtain excellent but fast results. Cloud computing is a relatively new online computational resource provider which can satisfy the elastic resource requirements of the advanced HCR service with high-recognition accuracy. However, owing to the delay sensitivity of the character recognition service, the performance loss in the traditional cloud virtualization technology (e.g., kernel-based virtual machine (KVM)) may impair the performance. In addition, the improper computational resource scheduling in cloud computing impairs not only the performance but also the resource utilization. Thus, the HCR online service is required to guarantee the performance and improve the resource utilization of the HCR service in cloud computing. To address these problems, in this paper, we propose an HCR container as a service (HCRCaaS) in cloud computing. We address several key contributions: (1) designing an HCR engine on the basis of deep convolution neutral networks as a demo for an advanced HCR engine with better recognition accuracy, (2) providing an isolated lightweight runtime environment for high performance and easy expansion, and (3) designing a greedy resource scheduling algorithm based on the performance evaluation to optimize the resource utilization under a quality of service (QoS) guaranteeing. Experimental results show that our system not only reduces the performance loss compared with traditional cloud computing under the advanced HCR algorithm but also improves the resource utilization appropriately under the QoS guaranteeing. This study also provides a valuable reference for other related studies.

1. Introduction

With the increasing number of mobile devices (e.g., smartphones, tablet computers, and laptops), the input method has become one of the most important applications. Thus, handwritten character recognition (HCR) technology, one of the main input methods for the smartphone, has received considerable research attention and has consequently improved in quality [1, 2]. Nevertheless, some advanced HCR algorithms are difficult to embed in mobile devices because of their resource capacity limitations and the time complexity of the algorithms. Furthermore, embedding bespoke HCR engines into applications is resource and effort intensive, limiting advanced HCR algorithm use and research by individual enterprises. Now, cloud computing [3, 4] provides an innovative networking application model with supercomputing resource capacity. This provides parallel framework to achieve high performance and also supports cross-platform clients [5], freeing clients from the limitations of the computational power and resources in local devices. Furthermore, cloud computing can always provide an elastic distributed resource that can be dynamically allocated to meet varying computing needs. Hence, offloading the HCR task to cloud computing is an effective way to address the conflict between resource capacity limitations and the time complexity of HCR in mobile devices.

However, the task offloading to cloud computing also brings a new challenge: using the pay-per-use [6, 7] model to adjust the resource size according to different workloads, which may cause an impairment in the quality of service (QoS) and resource utilization [8]. Especially for the delay sensitivity of HCR tasks, the complexity distribution architecture in cloud computing (e.g., Hadoop and Spark) is insufficient. In addition, the key technologies of cloud computing (e.g., computational resource virtualization and resource scheduling) are also important elements that impact the performance.

With these in mind, here, we propose an HCR container as a service (HCRCaaS) based on QoS guarantee policy, which not only provides an advanced HCR algorithm (e.g., a deep convolution neutral network (DCNN) [9]) to provide better recognition accuracy but also reduces the performance loss with container technology for the delay-sensitive requirement. To guarantee the QoS as well as high resource utilization, we propose a resource scheduling algorithm based on a performance evaluation under the resource scheduling greedy policy. Our main contributions are as follows:(i)Designing an HCR engine based on DCNNs as a demo of the advanced HCR algorithm for high-recognition accuracy in cloud computing(ii)Using containers to deploy the service in order to reduce the performance loss of the virtualization layer and easily expand the resources under different workloads(iii)Designing a greedy resource scheduling algorithm based on the performance evaluation in order to improve resource utilization under the QoS guaranteeing

The rest of this paper is organized as follows: The related work is described in Section 2. The details of the overall architecture of the system and the communication architecture are presented in Section 3. A description of the HCR engine design and the experiments are presented in Section 4. The resource scheduling method is presented in Section 5. Our experimental design and results compared to traditional systems are presented in Section 6, and we conclude in Section 7.

Some new systems providing cloud computing online machine learning services have recently been introduced. Triguero et al. [10] developed a MapReduce-based architecture to distribute functions and overcome the challenges of classifying large datasets. Wettinger et al. [11] proposed a new architecture that was different from the systematic classification of DevOps artifacts to model and deploy application topologies. Kaceniauskas et al. [12] developed cloud software services for patient-specific computational analyses of blood flow through the aortic valve on a private university cloud, while Anjum et al. [13] designed a cloud-based video analytics framework for the scalable and robust analysis of video streams based on cloud computing. Verbelen et al. [14] designed and evaluated graph partitioning algorithms that allocated software components to machines in the cloud. Tao et al. [15] proposed an image annotation scheme that transmitted mobile images compressed by Hamming-compressed sensing to the cloud. Tripathy and Mittal [16] designed and combined kernel and possibilistic approaches for image processing based on Hadoop, while Xia et al. [17] introduced a short-term traffic flow forecasting system also based on Hadoop. Xin et al. [18] proposed the novel “Adaptive Distributed Extreme Learning Machine” using MapReduce for distributed computing. Similarly, Zhang et al. [19] proposed a distributed algorithm for training the RBM model based on MapReduce. Thus, the task offloading to a cloud computing platform became a hot research field. To date, these proposed systems have not provided appropriate resource scheduling methods to improve the resource utilization or to guarantee the QoS.

To improve the resource utilization of cloud computing, Xia et al. [20] used a queueing model to evaluate the expected request completion time and rejection probability of a system. Chiang et al. [21] proposed an efficient green control algorithm based on three queueing models. The aim of their work was to find the proper parameters to reduce the power consumption. Du et al. [22] used a queueing model to analyze cloud computing resources. This model optimized the QoS of a video online service in order to reduce the queue length and time delay. To reduce the cost of a hybrid cloud computing platform, Li et al. [23] proposed minimizing the communication costs with an online dynamic provision algorithm based on a queueing model. Khazaei et al. [24] used an M/G/m/m+r queueing model to evaluate the performance of a cloud computing online service. On the basis of this research, they considered that the queueing model presented the relationship between the number of servers and the input buffer size and they could obtain important performance metrics including the task blocking probability and total waiting time incurred during user requests. Bi et al. [25] considered a cloud data center as an M/M/1/n/∞ queueing system. Vakilinia et al. [26] considered that the job arrival rate followed the Poisson process, and the number of jobs in the system could be modeled as an M/G/n/n queueing system. Furthermore, Zhang et al. [27] used an M/G/n queueing model to present the container service process of a Google cluster. Based on the queueing model, the researchers evaluated the average service time. Cao et al. [28] modeled a multicore server processor as a queueing system with multiservers. Based on the model, they proposed an algorithm to optimize the speed of the cores. Feng et al. [29] considered the cloud market as a multi M/M/1 queueing model. Maguluri and Srikant [30] proposed an optimization job-scheduling algorithm to optimize the QoS of a cloud computing service and used a queueing model to present the cloud service process.

Based on these works, the structure of a cloud computing service can be regarded as a queueing model. Although these research works are useful for improving the QoS or resource utilization of cloud computing, there are limitations in their approaches, which ignore the resource overbooking that can impact the performance of services as well as the resource utilization. This creates a large gap between the real execution behavior and the behavior initially expected.

To improve the HCR accuracy, traditional methods including the modified quadratic discriminant function (MQDF) [31] and the graphical lasso quadratic discriminant function (GLQDF) [32] have successfully been used to improve recognition accuracy. Graham used DeepCNet [33] based on DCNNs with good effect at ICDAR 2013 [34], which is the premier competition for document analysis and recognition, and is a Chinese handwritten character recognition competition. Since then, different methods have been proposed to improve character recognition, for example, Murru and Rossini [35] proposed an original algorithm to initialize the weights in a back propagation neural net to improve character recognition training, and Tao et al. [36] proposed a new dimension-reduction method termed sparse discriminative information preservation (SDIP) for Chinese character font recognition. Wang et al. [37] proposed a unified framework to expand short texts based on word embedding clustering and convolutional neural networks (CNNs). Zhong et al. [38] proposed the GoogLeNet models to improve Chinese handwritten character recognition accuracy. The studies [39, 40] also proposed the DCNN-based models to obtain high handwritten character recognition accuracy. These previous studies indicated that DCNN-based models can achieve better recognition accuracy. Based on these works, we also proposed an advanced HCR engine based on DCNN as a demo to present how the advanced HCR cloud service is designed as a real project.

As mentioned above, although the HCR is a traditional online machine learning service, there are some differences. First, because of the requirements of the character input speed, the HCR service is a delay-sensitive application, which requires a simple system architecture. As mentioned above, the HCR service has poor performance under the resource capacity limitation in the mobile devices. How to design an HCR service in cloud computing to provide higher performance is an issue that is worth studying. Second, the HCR service is also a recognition accuracy-sensitive application. Thus, how to design an HCR service in cloud computing to provide better recognition accuracy is a research hotspot. We design a HCR engine based on a DCNN model to achieve better recognition accuracy. Third, owing to the huge number of mobile devices, how to improve the resource utilization in cloud computing also needs to be studied. Based on a queueing model, we design a resource scheduling algorithm under a performance evaluation and the greedy policy. Resource scheduling can guarantee the QoS as well as the improvement of resource utilization. To the best of our knowledge, this is the first work to address these three problems when designing a high-efficiency HCR service in a real cloud computing project.

3. System Design

3.1. Scheme of Handwritten Character Recognition Container as a Service

Cloud computing uses virtualization technology as a resource-sharing method to provide elastic and configurable on-demand resources for the tenants. As mentioned in Section 1, HCR is a common application in mobile devices and is a resource needed to satisfy delay-sensitive requirements. Additionally, the data for handwritten characters are composed by point data, so the size and dimensions of a handwritten character are small [2]. Thus, the delay of data transmission between mobile devices and cloud computing can be overlooked. Therefore, the delay of the recognition process becomes the main element that impacts the performance.

However, VMs based on traditional virtualization technology, for example, KVM, Xen, and Hyper-V, are complete virtualization technologies with a full-guest operating system (OS). They cause such high performance loss that only a few virtual machines (VMs) can be created from one physical machine [41]. According to [42], the time required for creating a VM is 15 s, which makes the resource scheduling lag behind workload changes.

Compared with VMs, containers have different architectures that are useful tools for software deployment and packaging in differently configured environments. The container uses the container engine instead of a Hypervisor layer to isolate the configurable resources environment. Thus, the container can directly run the CPU threading of the physical machine (PM) without a virtualization layer, and it is generally considered that a lightweight virtualization technology is less resource consuming [43, 44]. IBM conducted a performance test of VMs and containers [43], and experiments showed that containers are superior to VMs in terms of CPU, memory, and I/O performance. The startup time of containers is expressed in milliseconds whereas that of VMs is expressed in seconds. Furthermore, containers have been suggested as a solution for more interoperable application packing in the cloud [45]. Hence, containers in cloud computing are more appropriate for HCR service deployment.

Based on the elastic service architecture in cloud computing [46], the HCR container as a service (HCRCaaS) includes these key components:(i)The container: the container is used for resource isolation and lightweight virtualization running environment configuration for the HCR service.(ii)The host cluster (resource pool): each PM in the cluster can be considered as the container host, which runs daemon threading with the container engine service to provide the container environment. The PM also provides the resources, for example, CPU, memory, and storage. To provide the resource scheduling management, the Python RabbitMQ client library [47] is run to listen to the message from the resource management server for the resource scheduling on demand. According to the message, the PM creates or deletes the containers.(iii)Registry container image storage: the container image storage provides the server storage to store the container image, so that the user can upload and download the container image from the server. It is used to provide the management of the standard HCR container templates for batch elastic expanding. Based on the HCR engine, we design a standard container image based on Ubuntu OS downloaded from the official container hub. Then, to create the HCR container image, the docker-file is used to build the running configuration environment and copy the engine bin file. We set the autoexec of the engine bin file in the final docker-file line.(iv)The load balance server: the load balance server schedules the data from the client devices to the containers to build parallel computing. It is designed to provide the reverse proxy of HCRCaaS by using Nginx 1.9. To provide a transmission control protocol (TCP) load balance, Nginx with the with-stream configuration parameter is set, and the task balance policy is weighted round robin. Since the system containers are the same, the weight values are equal. The server runs daemon threading with the Python RabbitMQ client library to listen to the message from the resource management server for adding or removing the container from the load balance configuration.(v)Resource scheduling management: the resource scheduling manager provides central resource management in HCRCaaS and allocates the container to a container host. Based on the Python framework, three software frameworks are used for resource scheduling management design as follows: (1) The Python Numpy library is used for the resource scheduling algorithm designing, (2) Web Server Gateway Interface (WSGI) provided the tenants with a hypertext transfer protocol (HTTP) interface, and (3) message queueing is designed using the RabbitMQ server software package to send the message for creating or deleting the container from the container hosts.

The architecture of the proposed HCR system based on cloud computing is shown in Figure 1.

3.2. Communication Architecture

As shown in Figure 1, there are two types of communication architecture in the system:(i)Service communication is responsible for transmitting the handwritten character data to the container and returning the recognition result. The handwritten character data are time-continuous data that are sampled by the client device, for example, a smartphone. The client device stores the character index dictionary, in which the character index is the same as that in the recognition engine. The system receives the data points from the client device and returns the recognition result, which contains the largest probability index of the character classification. The client device uses the largest probability index to provide the candidate character for users.(ii)Management communication provides the message queueing-RabbitMQ service between the resource scheduling manager and other servers. It is mainly responsible for transmitting the message from the tenants to the servers. Using the creation of a container as an example, the tenant sends a message to the resource management server. Then, the message is transmitted to the message queueing server. The server exchanges the message with the resource scheduling policy to send a command message to the target server for creating the container. After finishing the action, the target server replies with a message to the tenant through the message queueing server. The management communication architecture is shown in Figure 2.

4. Handwritten Character Recognition Engine Design

In recent years, DCNN has achieved excellent results in image classification. It has a better model expression capability. A previous work [48] indicated that each layer in DCNN can be equivalent to a special function component, for example, the convolution layer can be considered as a feature extraction component, the max pooling layer can be considered as a local extremum component, and the activation function can be considered as a nonlinear regression. DCNN can extract high-order features layer by layer, and the final fully connected layer integrates the output features of the final convolution layer or max pooling layer for classification. Thus, DCNN can be considered as a multilayer and nonlinear complex model. Furthermore, owing to the back propagation training method for the entire model, the parameter values of each layer in DCNN can be unitedly adjusted to make the data processing in each layer more coordinated.

To obtain the advanced HCR model, first, we use a comparison experiment to obtain the proper structure of the DCNN-based model. Thus, on the basis the previous work [33, 49], we design three DCNN-based models, which have structures with multiple convolutional layers and fully connected layers, in order to determine the proper structure. Second, for the normalization, we use batch normalization (BN) [50], which can normalize nonlinear inputs and stabilize the distribution by reducing the internal covariate shift to provide the option of using higher learning rates to expedite network convergence. For some deep networks, BN can also effectively solve the problem of vanishing gradients. Third, we use the gradient back propagation training approach. For parameter optimization in the training process, when the training loss decreases, the learning rate should decay to prevent oscillations near the best point. However, setting a small learning rate may cause the low training speed to fall into the local optimal solution. Thus, we set the learning decay rate, which increases with the number of training epochs. The learning rate can be obtained as follows:where is the number of training epochs, is the learning rate basement value, and is the decay rate.

The three models use the CASIA-HWDB 1.1 dataset [2], which has 300 sets and 1,174,364 handwritten samples, to train on a stand-alone computer. The training dataset includes 240 sets. We use softmax regression as the output. The output is described as follows:where denotes the weight and bias corresponding to the output, and is the input feature. The output of the softmax regression layer can be regarded as the confidence probability of the input handwritten characters, which belong to different character classes.

To demonstrate that the DCNN-based methods can achieve better recognition accuracy, we use the ICDAR 2013 competition dataset [34] to conduct the experiment. We not only compared the proposed three models with the traditional methods [2, 51, 52] but also compared them with other DCNN-based methods [34, 3840]. The comparison results are listed in Table 1. The Number 1–3 models are the proposed models and the others are the comparison models.

From Table 1, the results clearly show that the DCNN-based methods outperform the traditional methods, and the proposed Number 3 model can achieve the best result. Furthermore, to demonstrate that model Number 3 in Table 1 can fit the HCR best, based on model Number 3 in Table 1, we test models with different structures, for example, the structure with one added and removed convolutional layer, structure with one added and removed max pooling layer, and structure with one added and removed fully connected layer.

As shown in Table 2, the recognition accuracies of the different numbers of layers in the DCNN-based models are lower than that of the original DCNN-based model (i.e., Number 3 model in Table 1). This demonstrates that the Number 3 model in Table 1 is the proper structure for 3,755 classes of Chinese handwritten character recognition.

Some previous studies, for example, fisher vector-based method [53, 54], defensive distillation DCNN [55], discriminative spatiality embedded dictionary learning-based representation (DSEDR) [56], data-augmentation [57], robust and sparse fuzzy K-Means with capped (RSFKM) [58], and GA-Bayes [59], have made the prominent achievements in the text classification. In addition, note that the 3,755 categories only contain Chinese handwritten samples. Thus, we compare the three proposed models with these previous methods using the MNIST dataset [60], which contains 10 classes of digit handwritten samples, as shown in Table 3.

In Table 3, the results clearly show that the proposed models can also achieve comparable results that are greater than 99% under the digit handwritten dataset. To test the Latin handwritten recognition, we also use the EMNIST letters dataset [61], which contains 26 balanced classes of Latin handwritten samples, to conduct the experiment. The classified number of output layers of the proposed models is modified to 26. The experiment results are listed in Table 4.

From the results in Table 4, the proposed model also achieves a recognition accuracy of 93% or above in the Latin handwritten dataset. The Number 3 model can achieve the best results.

Although DCNN-based models outperform the traditional methods, compared with the performance of PMs, they have a serious response delay for mobile devices because of the high time complexity. We use the Number 3 model with the best result in Table 1 to compare the average processing time for a single handwritten sample between a mobile device and PM. Loop unrolling is a well-known and efficient strategy to improve speed, especially for large loops. In addition, the BLAS library has been shown to be an efficient way for CPU-based implementation of CNNs. We also use these configuration optimization methods, for example, the BLAS library, loop unrolling, and GPU, to obtain the comparison performance results. The comparison results for a Huawei MATE 7 mobile device and Caffee [63] deep learning framework in PM are listed in Table 5.

From Table 5, HCR service has a serious delay in mobile devices. The motivation of this paper is to design HCR service based on cloud computing. To deploy the most accurate parallel handwritten character recognition service on HCRCaaS, we select the model (Number 3) with the best results in Table 1 as the advance HCR model demo. Based on the model, we used the method with Loop unrolling + BLAS LIB in Table 5 to and combine the model with a TCP interface library to design a feed-forward DCNN-based HCR engine. Since the handwritten data cannot guarantee the same number of data points, they cannot be directly used as input data. Therefore, before the data are inputted to the HCR engine, the system connects the sample point data over time to form a handwritten picture for the input of the DCNN model. The feed-forward DCNN structure is shown in Figure 3.

After compiling the recognition engine using C++ in a Caffe deep learning framework, we deployed it to the container and built the container as a container image for expansion on demand.

5. Resource Scheduling Algorithm

As mentioned above, the containers, which are a lightweight virtualization technology, share their host resources, for example, CPU, in the same host. Owing to the resource capacity limitation of the PM, if the total number of containers is lower than the number of the physical CPU cores, each container can use one isolated core. However, if there are too many containers in a single container host, containers must share these CPU cores, resulting in resource overbooking and degraded performance. However, resource overbooking means that a PM achieves higher resource utilization. Thus, the resource scheduling method needs to achieve a trade-off between resource utilization and performance. Based on the architecture shown in Figure 1, we consider that the HCR service should follow the “first come-first served” (FCFS) principle, which means that each container can be regarded with an queueing model aswhere and are the total arrival intensity and the arrival intensity of the container following the Poisson distribution, and the expected number of task arrivals is equal to during time . The load balance server uses the round-robin policy to allocate the task to each container; thus, the arrival intensity of each container is when there are containers in the system. The service rate , which is the number of tasks be processed in time , can be divided into two types: (1) when the number of containers is lower than the number of PM CPU cores, is equal to the service rate of each CPU core; (2) when the number of containers is higher than the number of CPU cores, more than one container will share the same core service rate, and will be lower than that of each core.

The average length of queue can be calculated as

The expected wait time can easily be found using Little’s formula, which is defined as

Note that is also the average processing time, and we consider as the QoS and performance metric. More important, from Equations (4) and (5), it can be seen that a poor service rate will cause a worse service quality according to the queueing model.

According to a different number of containers in the same PM, we test the average processing time of each sample () to obtain a relative performance function. The system performance for different numbers of containers in the same PM is listed in Table 6.

When the PM is overbooked, the containers in the same PM will share CPU resources. We consider that there is a linear relationship between the service rate of each container and the number of containers as

Based on Table 6, we obtain a linear performance degradation relationship function using order-1 linear differential equations and the least squares method. The fitting effect and the R-squared metric are shown in Figure 4.

The linear performance degradation relationship function is calculated as

The R-squared value is 0.998, meaning that Equation (7) describes the relationship between the performance and the number of the containers well. On the basis of this knowledge, we propose a greedy performance evaluation (GPE) resource scheduling algorithm to evaluate the performance of each container in the same PM. Taking the resource overbooking into consideration, the algorithm finds the proper PM to place the containers to guarantee the QoS and improve the resource utilization under the greedy policy. The resource scheduling is trigged by the tenant or load balance server under QoS monitoring. The pseudocode for the GPE algorithm is shown in Algorithm 1.

The pseudocode of the GPE algorithm
Input:
(1)Set the current arrival number for each container
(2)The number of containers in each container host , and is the number of hosts in the system
(3)The number of CPU cores of the container hosts
(4)The maximum waiting time
(5)The container scheduling trigger command , where if launching the new containers, if keeping the containers, and if deleting the containers
(6)The performance relationship function
(7)Output: the updating expected waiting time
Step 1:
If
 Sorting the index of the container hosts with the number of the containers in on-descender order
 Find the host which has the minimum containers
 Set
 Re-calculate
 Calculate the of the containers in container host from (3)–(7)
 If
  Deleting a container in container host
  Go to Step 3
 Else
  Set
  Repeat Step 1
End If
Step 2:
If
 Sorting the index of the container host with the number of the containers in on-ascender order
 Find the host which has the maximum containers
 Set
 Re-calculate
 Calculate the in container host from (3)–(7)
 If
  Creating a container in container host
  Go to Step 3
 Else
  Set
  Repeat Step 2
 End If
End If
Step 3: Updating and adding or deleting the container in the load balance server.

6. Experiments and Analysis

To demonstrate the efficiency, we design a HCRCaaS prototype system. The experimental system is composed of seven nodes including one controller server, one load balance server, one image storage server, and four container hosts. The software configurations of each node are listed in Table 7, and the hardware configurations of the nodes are listed in Table 8.

Furthermore, a stand-alone server is also built with the same hardware and software specifications as the container host for a comparison experiment. We perform a series of experiments to evaluate system performance and resource utilization. We consider that the processing time is the key metric of the HCR service. The longer the processing time, the lower the performance.

6.1. Performance Comparison between Stand-Alone Server, Container, and KVM

Similar to the previous work [64], to demonstrate the efficiency of the HCR delay-sensitive service under the container, the objective of this experiment is to obtain a performance loss (i.e., processing time delay) comparison between KVM and the container. We test the stand-alone server without virtualization technology as the standard of the highest performance. The experiment is conducted using 32,768 samples from CASIA-HWDB [2] for testing the performance of the HCR service in different environments. The processing times of the HCR engine in the container, KVM, and stand-alone server are shown in Figure 5.

It can be seen from Figure 5 that the processing time of the stand-alone server is the shortest, which shows that the stand-alone server performs best. The processing time delay ratio between the container and the stand-alone server, that is, , is smaller than that between KVM and the stand-alone server, that is, . This is because KVM is a large and complex software process. Each KVM has its own virtualization hardware resources including CPU, memory, and NIC. Furthermore, KVM must run its own guest operating system to provide a software environment; therefore, the architecture itself results in performance loss. However, the container is a lightweight virtualization based on a Linux container (LXC) that can directly exploit the container host’s hardware resources including its CPU and memory. Furthermore, it does not run the guest operating system to provide the running environment. It can be considered a useful tool to provide different isolated configurations in the stand-alone server. Thus, the container can outperform KVM. Owing to the delay sensitivity of the HCR service, the container can reduce the response delay for providing high resilience and agile computation service quality. This demonstrates that the container can achieve higher performance than KVM in cloud computing.

6.2. Performance Comparison between HCRCaaS and Stand-Alone Server

The performance of a stand-alone server and HCRCaaS (i.e., 16 containers in HCRCaaS) is compared under the same hardware specification. The processing time is tested for different numbers of samples from 128 to 131,072. The processing time of HCR is measured from the data arriving at the load balance server to when the test client receives the result. The results of the performance testing are shown in Figure 6 and Table 9.

In Figure 6 and Table 9, the red line denotes the processing time of HCRCaaS, and the blue line denotes the processing time using a stand-alone server. When the number of samples is lower than 128 (a light workload), the HCR engine can completely process these data in a short period. Owing to the load balance server in HCRCaaS, the processing time of HCRCaaS is shorter than that of the stand-alone server. However, when the number of samples increases and the workload becomes large, the processing time increases dramatically in the stand-alone server but not when using HCRCaaS. For example, when the number of samples increases from 128 to 512, the processing time in the stand-alone server increases by 9.589 s, while it increases by 0.5331 s in HCRCaaS. Furthermore, the processing time comparison ration between the stand-alone server and HCRCaaS is for 131,072 samples. The efficiency of HCR can be significantly improved using HCRCaaS.

We also compare the processing time of 32,768 samples in different numbers of containers in one host. The result is shown in Figure 7. The performance increases linearly with increasing numbers of containers when the number of containers is lower than 8. This is because each host contains eight cores, and each container completely occupies one core. By contrast, when the number of containers is higher than that of the cores, overbooking results in more than one container sharing the same core, degraded container performance, and, thus, only a slow increase in cloud performance.

Furthermore, when the number of containers is much higher than that of the cores, there is serious resource overbooking and degradation of overall system performance to guarantee the QoS. Therefore, the processing time increases when the number of containers is greater than 32. Overall, performance can be improved by increasing the number of containers when resource overbooking is not serious, but when the system is heavily overbooked, there is a significant degradation in system performance and hence the need for the GPE algorithm.

6.3. Comparison of Performance with Load Balance Server and without Load Balance Server

The load balance server is a key component of a parallel computing system and is responsible for scheduling the HCR data to different containers. Its performance may also have an impact on the QoS. We test the processing time of six groups of samples from 128 to 131,072 with and without use of the load balance server, as shown in Figure 8 and Table 10. When the number of samples is lower than 512, the processing times with and without the load balance server are similar. Moreover, when the number of samples increases, the processing time with the load balance server also increases slightly. This is because the data size of the HCR is small, as mentioned above, and we consider that the load balance can be regarded as an queueing model. Thus, the performance loss of the load balance service is minor. The results show that the largest loss of performance is still very small at .

6.4. Performance Comparison between GPE and Greedy Algorithms

To evaluate how the QoS is impacted by the resource scheduling algorithm, we test a large workload using the GPE algorithm and the greedy scheduling algorithm, that is, each host can be in heavy resource overbooking to achieve the highest resource utilization. We set the average maximum waiting time to 0.032 s, the number of users to 16, and the number of samples to 32,768. The results are shown in Figure 9 and Table 11.

It can be seen from Figure 9 and Table 11 that the greedy algorithm does not guarantee the QoS as the number of containers increases. This is because the greedy algorithm only considers the resource utilization, resulting in heavy resource overbooking to place as many containers as possible in the same container host. With the GPE algorithm, when the number of containers is lower than 32, the containers will be created in the same container host if the performance can guarantee the QoS. However, when the number of containers is higher than 32 and the maximum number of containers in the container host is 15 to guarantee the QoS (based on the average maximum waiting time), the GPE algorithm tries to find a proper container host to reduce resource overbooking. The resource scheduling strategy in these situations is equivalent to the average resource scheduling strategy. Therefore, the GPE algorithm can achieve the proper trade-off between the resource allocation balance and utilization.

6.5. Resource Utilization Evaluation and Analysis

To highlight the resource utilization improvement, we compare and analyze the resource utilization of the traditional PM cluster and HCRCaaS. We define the resource utilization comparison ratio between HCRCaaS and the PM cluster as follows:where is the number of containers in the container host, is the container resource utilization, is each core utilization of the container host, and is the number of hosts. When the result of Equation (8) increases, more containers are created in the cluster. If the ratio is higher than 1, the number of containers is higher than that of the PMs. Thus, the larger the result of Equation (8), the larger the resource utilization. According to Section 5, we suppose that . Based on the hardware configuration, we compare the resource utilization comparison ratio between HCRCaaS and the PM cluster under different numbers of containers when the number of PMs is 4 (i.e., k = 4). The results are listed in Table 12. It is obvious that the resource utilization improves as the number of system containers increases.

7. Conclusion

In this paper, we designed a handwritten character recognition system based on a container cloud to better utilize handwritten character recognition technology. Using parallel computing and lightweight virtualization technology, we successfully improved the system performance. To overcome problems caused by resource overbooking, we proposed a performance evaluation approach to evaluate the performance of each container as the resource size changed. Using a greedy policy, we designed a GPE algorithm to guarantee the QoS and improve the resource utilization. Our experiments showed that the system efficiency increased significantly with container expansion. This system can easily be extended to other applications, for example, text line recognition, formula recognition, image pattern recognition, and video pattern recognition. The system can also easily be deployed via Amazon, Rackspace, or Windows Azure and private cloud computing platforms.

Future work will aim to improve the system in three respects. First, we will improve the model using more features such as HOG or SIFT to obtain more accurate recognition. Second, we will evaluate more models such as the long-short term memory model in order to provide more efficient and accurate recognition. Third, we will improve the resource scheduling method to provide an adaptive scalable method in which the number of containers can be automatically adjusted according to the workload. In addition, we will design a workload prediction model for a proactive scheduling resource policy. This will avoid frequent tenant monitoring of the workload and optimize the resource utilization.

Data Availability

Previously reported CASIA datasets are used to support this study and are available at http://www.nlpr.ia.ac.cn/databases/handwriting/home.html. MNIST datasets are used to support this study and are available at http://yann.lecun.com/exdb/mnist/. EMNIST datasets are used to support this study and are available at https://www.nist.gov/itl/iad/image-group/emnist-dataset.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research was supported in part by Guangdong Science Technology Project (Grant nos. 2017A010101027, 2015B010101004, and 2015B010131004), National Nature Science Foundation of China (Grant nos. 61472144, 61673182), GD-NSF (no. 2017A030312006), Science and Technology Program of Guangzhou, China (Grant no. 201707010160), and the National Key Research and Development Program of China (no. 2016YFB1001405).