Abstract

Image recognition and image processing usually contain the technique of image segmentation. Excellent segmentation results can directly affect the accuracy of image recognition and processing. The essence of image segmentation is to segment each frame of a certain image or a video into multiple specific objects or regions and represent them with different labels. This paper focuses on the segmentation results obtained in image segmentation of images used for intelligent monitoring of Mandarin exams are usually visualized for image analysis. In this paper, we first investigate the performance improvement techniques for semantic segmentation in the image segmentation task for intelligent monitoring of Mandarin exams, improve the pixel classification capability by performing semantic migration, and, for the first time, extend the dataset substantially by style transformation to improve the model’s recognition of advanced features. In addition, to further address the shortcomings of the dataset, this paper improves the performance of image segmentation using synthetic datasets by investigating synthetic dataset image segmentation improvement techniques that reduce the reliance on manually annotated datasets. Image segmentation techniques continue to advance, and there are even thousands of commonly used segmentation methods for image segmentation development to date. Among them, they can be broadly classified as region-based segmentation methods, threshold-based segmentation methods, edge-based segmentation methods, specific theory-based segmentation methods, and deep learning-based segmentation methods. However, the methods used in this paper have all been experimentally demonstrated to improve the effectiveness of the techniques and proved to outperform other existing methods in the same field in the publicly available datasets LSUN, Cityscapes, and GTA5 datasets, respectively.

1. Introduction

Image segmentation is a fundamental technique for numerous computer vision applications such as scene understanding, human resolution, and autonomous driving, and its wide range of applications has made it highly valued by researchers. With the rapid technical updates of convolutional neural networks, especially full convolutional networks, a large amount of excellent work has driven the advancement of image segmentation techniques [1]. And in recent years, image segmentation techniques have been applied in more and more fields and subtasks, such as indoor scene reconstruction and other tasks that can greatly improve the final accuracy by estimating the layout of indoor rooms as a segmentation technique. However, since pixel-level segmentation labels can lead to very expensive annotation costs, existing datasets often lack sufficiently rich samples with annotations, which has led some researchers to devote themselves to the research of more realistic weakly supervised and unsupervised learning methods. And with advances in computer graphics, neural networks have been able to update models by training synthetic datasets to update them. The use of synthetic datasets saves a lot of labor, but in the complete absence of real images of images [2].

The performance of models trained with synthetic datasets is difficult to achieve the desired results when labeled, and domain adaptation techniques are techniques that have significant research value in solving the domain mismatch between real and synthetic images. Image recognition and image processing usually contain the technique of image segmentation. Excellent segmentation results can directly affect the accuracy of image recognition and processing. The essence of image segmentation is to segment each frame of a certain image or a certain video into multiple specific objects or regions and represent them with different labels. The segmentation results obtained are usually intuitive for image analysis. With the continuous efforts of researchers, image segmentation techniques are continuously progressing and there are even thousands of commonly used segmentation methods in the development of image segmentation so far. Among them, they can be broadly classified as region-based segmentation methods, threshold-based segmentation methods, edge-based segmentation methods, specific theory-based segmentation methods, and deep learning-based segmentation methods. There is also a view that image segmentation techniques label the pixels in an image that belong to the same particular object with the same number or notation [3].

As computers look smarter and smarter, there is also a growing expectation that they will have human-like perception and understanding. This has made computer science and technology, especially artificial intelligence, one of the hottest and fastest-growing fields today. For computers to understand the world as humans do and adapt to different tasks, researchers have designed algorithms to give computers similar sensory and understanding capabilities based on human perception of the outside world. Among them, the algorithm that allows the computer to acquire “vision” is very important for the computer to be able to perceive the outside world effectively. Computer vision is an integral part of various application areas. Images are the basis of vision and the first step in computer vision is the understanding of images. If the computer cannot understand the image effectively, it cannot process the image. To understand an image, image segmentation [4].

It is in the most critical position. Image segmentation techniques have a wide range of applications in the fields of scene understanding, artificial parsing, autonomous driving, medical diagnosis, military engineering, etc. In addition, image segmentation techniques play an important role in other key technologies such as scene reconstruction and object recognition. In recent times, the technology of convolutional neural networks, especially full convolutional networks, has been rapidly updated and a large amount of excellent work has driven the advancement of image segmentation techniques. Image segmentation techniques are also gradually serving as a pre-step for a large number of advanced techniques use, applied in a growing number of domains and subtasks, such as image segmentation tasks for intelligent monitoring of Mandarin exams, providing a better understanding of indoor scenes, which is crucial for other tasks such as indoor navigation, object detection, and depth recovery. Furthermore, applying high-level indoor scene representation to intelligent robotics and augmented reality is also feasible. The indoor room layout estimation task is of great research value as a sub-task of picture segmentation. In deep learning, the size of the dataset directly affects the effectiveness of model training. The larger the dataset, the stronger the performance that the model can show will be. However, pixel-level segmentation of labels leads to very expensive annotation costs, and existing datasets often lack sufficiently rich samples with annotations, which has led some researchers to devote themselves to the research of more realistic weakly supervised and unsupervised learning methods, such as domain adaptation, which is one of the commonly used methods for unsupervised learning. In image segmentation, the study of domain adaptation is very necessary because manual pixel labeling is a rather expensive act, while with the advances in computer graphics, neural networks have been able to update models by training synthetic datasets to update them. This process saves a lot of labor, but it also suffers from the problem of domain mismatch between real and synthetic images, and model performance is usually not comparable to supervised learning directly on real image datasets. The most difficult case, i.e., when the data labels of the real picture are completely absent, the performance of the model trained with the synthetic dataset is hardly desirable, and the domain adaptation technique is the technique to solve the domain mismatch between the real and synthetic images, which has significant research value.

The main contributions and innovations of this article are as follows: we investigate the performance improvement techniques for semantic segmentation in the image segmentation task for intelligent monitoring of Mandarin exams; we extend the dataset substantially by style transformation to improve the model’s recognition of advanced features; this paper improves the performance of image segmentation using synthetic datasets by investigating synthetic dataset image segmentation improvement techniques that reduce the reliance on manually annotated datasets.

The rest of the article is organized as follows: in Section 2, we devote to discuss related work; Section 3 presents image segmentation technology for intelligent monitoring of Putonghua examinations; Section 4 presents experimental results and analysis; in Section 5, we summarize the full text.

Image segmentation has been receiving a lot of attention from researchers since its inception. The related technology will be described from the early traditional image segmentation algorithms and deep learning segmentation algorithms after using convolutional neural networks on the current state of research on image segmentation, and in the end [5], the future development trend of image segmentation will be analyzed in some way. The literature et al. pioneered the objective evaluation of segmentation algorithms only after proposing the corresponding evaluation methods and metrics for binary image segmentation for the first time. The literature proposes methods and guidelines for evaluating the performance of segmentation algorithms for conventional images (including color images, depth images, and medical images). The paper conducted an important study on the evaluation of segmentation algorithms, proposed the objective evaluation method of “final accuracy criterion,” and further realized the importance of systematic research on segmentation evaluation methods and evaluation criteria, and became the first person to study the performance evaluation of segmentation algorithms, which laid the foundation of segmentation evaluation research and greatly promoted the research progress of segmentation algorithm evaluation. Evaluation of the research progress of the subject; in addition, the literature had also contributed to the development of the subject, put forward insights and claims about segmentation evaluation, but unfortunately still not studied in depth. In general, the evaluation methods on segmentation algorithms at this stage still have not completely escaped the reliance on subjective participation, and researchers are happy to work on single-frame or the small number of sample picture sets to present their findings on segmentation evaluation, but such findings lack statistical significance and their evaluation conclusions cannot be easily generalized to out-of-sample images or other application situations [6].

It has been distinctly pointed out in the literature that using only small sample images is not conducive to a comprehensive, accurate, and objective evaluation of the algorithm, and it is proposed that convincing evaluation results can only be achieved by segmenting and evaluating all images of an image library containing a large amount of data. Because of this, some scholars even argue that objective performance evaluation can only be achieved by incorporating the task context of the algorithm, called system-level evaluation methods [7]. It was not until the beginning of the 21st century that the academic status and significance of segmentation evaluation and the urgency of conducting research were generally recognized and fully affirmed, and one after another, experts and scholars in the field from universities and commercial organizations began to research evaluation methods for segmentation algorithms. The literature has proposed very systematic and novel evaluation methods; the literature has even opened a subject webpage on image segmentation evaluation and made the framework of evaluation methods into a finished software product on its homepage for the public, which has promoted the development and progress of image segmentation evaluation and made important contributions to the evaluation of image segmentation. It must be noted, however, that research units and individuals in picture segmentation evaluation are mainly concentrated in institutions such as universities or individual commercial companies in European and American countries and regions. To summarize, compared to the brilliant achievements of image segmentation algorithm research in the past decades, the research on evaluation of segmentation algorithms lags far behind the research on segmentation algorithms themselves, and few scholars and scientific institutions are working on evaluation methods for image segmentation techniques in a targeted manner, which is a worrying situation. So far, there are only a few existing segmentation evaluation methods, and the existing research on segmentation evaluation is far from enough for the current image segmentation techniques that are easily available in thousands; moreover, all these methods have the problem of poor generality. The number of new literature on segmentation evaluation each year is only in single digits or even zero [8], most of which are evaluations of conventional image segmentation algorithms, and almost no one is involved in the evaluation of the performance of segmentation algorithms and the quality of segmentation results for non-conventional images, such as SAR images. As for the systematic study of evaluation methods, there are only a few, which is called the “academic gap.”

In general, although there have been very few preliminary discussions on the evaluation of image segmentation techniques at home and abroad so far, the research results are generally fragmented and unsystematic, especially the research on the evaluation methods supported by mature theoretical background is completely blank. Therefore, this paper will be devoted to sorting out, summarizing, and concluding the existing research results on segmentation evaluation, improving the current loose research situation, and proposing novel segmentation evaluation methods to provide theoretical support and application examples for better utilization, development, and improvement of image segmentation techniques [9]. Image segmentation technology is a bottleneck in the development of scientific research fields such as image engineering and computer vision, and the research on image segmentation technology will have a meaningful impact on image analysis, image understanding, and semantic description of images [10]. Therefore, the study of image segmentation was, and still is, and will be, a pressing problem. Therefore, conducting applied research on the performance analysis and evaluation of picture segmentation algorithms is not only of great theoretical importance but also of non-negligible practical application value [11].

3. Research on Image Segmentation Technology for Intelligent Monitoring of Putonghua Examinations

3.1. Research on Image Segmentation Techniques

The process of image segmentation is the process of labeling pixels in an image according to certain properties [12]. However, giving a generic definition for image segmentation has been controversial for many years, and there are slightly different formal definitions of the image segmentation problem in mainstream computer vision and image processing textbooks. That is, image segmentation lacks a universal definition so far, and the definition of image segmentation from the literature is used here in this paper. Image segmentation techniques have a place in both traditional digital image analysis and processing as well as in the field of computer pattern recognition. Image segmentation techniques are the most widely used in computer vision algorithms, based on the principle of pixel characteristics, for similar feature attributes of different pixel types appearing in a single image, the picture is divided into multiple subsets of mutually non-intersecting sets of pixels, each independent subset having consistent specific features. In other words, some of the regions of interest in the image are separated from the rest of the uninteresting background to facilitate further image analysis. Here, we can define an original image as , where the elements (), satisfy the condition [13].

The first formula is the core content of the article, and its main purpose is to accurately segment the image, and then process the image, so as to better monitor the Putonghua test. Whether it is in the field of automatic intelligent driving, augmented reality technology, smart security devices, biometric identification, medical image analysis [14], image segmentation techniques can be broadly classified into instance segmentation and semantic segmentation based on the requirements of the separation purpose. Semantic segmentation: similar to the process of labeling the target pixels in an image, shaped like a classification operation, the class of targets to be labeled in an image is identified uniformly at the semantic pixel level. For an example of semantic segmentation of an image, the same color in the segmentation result is labeled with objects of the same category, different colors indicate different categories, different colors are used to represent different categories in the same image. Instance segmentation: can be seen as a combination of semantic segmentation and object detection, only need to mark the specific individual pixels from the current image for the category of interest, respectively, from the segmentation results in the region of interest into a single independent individual. There are many methods of image segmentation; there are edge detection-based methods, in the early days when computer technology was not very advanced, most of the earliest segmentation methods that people studied were based on edge detection. The variation of the edge pixels of the object in the image is large and the pixel features inside and outside the edge are more obvious, so the discontinuity of such features and different differential operators sensitive to the boundary gray value of the transform is used to finally determine the edge point by calculating the first-order derivative extremum point or the second-order derivative over zero point. Several commonly used operators for edge detection are Sobel operator, Canny operator, Laplace operator, and Roberts operator [15]. Before derivation as the operator is sensitive to the effect of noise, most of the time a Gaussian smoothing filter is used to convolve with the original picture for noise reduction. In processing grayscale images due to strong variation in gray level of target edges, therefore, edge detection-based methods are widely used. The flow chart of the method is as Figure 1.

Region-based segmentation methods use the internal connectivity properties of the same target objects in the image and the disjoint features between different targets to segment. The most commonly used region segmentation methods are threshold segmentation, region growing, and region splitting and merging methods, the commonly used threshold segmentation method converts the original image into a binary image, the pixel values in the image are between 0 and 1, indicating the foreground and background, the pixel values between the neighboring targets are basically the same, and the pixel values of different targets differ greatly, the target value T is selected, then T is the uniform fixed image segmentation set for the whole image threshold value [16]. This is only for a single-target segmentation task, and this simple threshold selection is not applicable when the splitting task is for multiple targets, so it has been proposed to apply the maximum entropy-based principle for automatic threshold selection. The region split-merge method can be seen as the inverse process of region growth, from a macroscopic perspective, the whole image is split, cut into non-overlapping sub-regions, and two adjacent regions are merged when certain merging conditions are satisfied until finally reaching the region to be split. Its region-based segmentation method is shown in Figure 2.

Research on image segmentation techniques based on Mandarin exams can only be monitored, before deep learning was widely used, a considerable number of image segmentation methods were developed to accomplish the task of image understanding, during this period, disciplines or knowledge such as digital image processing, topology, and mathematics were used as principles or tools for image segmentation, giving birth to many original segmentation methods with different ideas. Although the continuous development of deep learning combined with the continuous enhancement of computer hardware brought artificial coll., after the great innovation in the domain, feature-based image segmentation methods are no longer competitive enough. Advances in hardware and the high speed of convolutional neural networks have led to the gradual emergence of deep learning as one of the mainstream research areas. Its formula is as follows [17].

A convolutional neural network is a powerful visualization model and its great results in the fields of image classification, structured output, object detection, and key point prediction have led the field of image segmentation to experiment with deep learning frameworks to better perform image segmentation tasks [18]. The pooling layer of convolutional neural networks actively discards a large amount of information to reduce computational effort and reduce redundant information. However, such a process is irreversible and can lead to a large amount of information loss and irreversibility, and in some cases, such as when the network is too deep, there may be too much information loss and a severe shortage of actionable information. The formula is shown below.

To solve this problem, a basic idea is to solve the above problem by up-sampling this operation can make up some of the missing information to some extent and improve the resolution of the image. The image is shown in Figure 3.

The advantage of GGNet is mainly that the last three fully connected layers use most of the parameters of the full network, which makes the parameters of the model do not explode as the network deepens. In addition, based on the basic properties of convolutional neural networks [19], the perceptual field of a multilayer convolutional layer composed of multiple small convolutional kernels is equivalent to that of a convolutional layer composed of one large convolutional kernel; for example, a two-layer convolutional layer composed of two 3×3 convolutional kernels is the same perceptual field as a convolutional layer composed of one 7×7 convolutional kernel. The equation is shown below.

18 is much less than 49, i.e., the number of parameters using small convolutional kernels is much less than using large convolutional kernels, and the computation is simpler and the learning capability is more scalable. The excessive number of fully connected layers of VGGNet also makes it use more parameters and occupy more memory compared to a fully convolutional network. As deep learning networks become deeper and deeper, a very important difficulty that exists is that the deeper the deep learning network is, the smaller the gradient will be in the transfer process, and when the network is deeper to a certain degree, the gradient will even disappear, and the error elevation effect becomes worse is a side effect of all deep networks after deepening continuously, and the reason why deep networks cannot deepen infinitely to increase complexity. In this process, the disappearance of the gradient will make the training impossible. The formula is as follows [20].

ResNet has made substantial contributions to the research of making deep learning networks deeper and has become another benchmark work in the development of deep learning network frameworks. Currently, ResNet is up to 152 layers deep, far surpassing VGGNet’s 19 layers, and is considered a turning point in deep learning. The mystery of ResNet lies in its residual module design, where the network is deepened by learning residuals in an ever-stacking manner. Its structure is shown in Figure 4.

In addition, Discriminative Feature Network (DFN) focuses on the macro level and tries to solve the intra-class inconsistency and inter-class inconsistency problems that are difficult to be distinguished by pixel-level segmentation. DFN addresses the intra-class inconsistency and inter-class inconsistency problems that are difficult to be handled by conventional segmentation networks and proposes two networks to handle the intra-class inconsistency problem and inter-class inconsistency problem separately. The segmentation accuracy is improved. The context encoding model is a simplification of the image segmentation problem. The module captures the semantic context of the scene and performs more efficient label selection through context dependency. Simply put, the class model passes through the image field. Pixel-level segmentation labels have very expensive annotation costs, and existing datasets often lack annotated examples and a diversity of object classes. The generality and scalability of the segmentation algorithm are severely limited. The formula is shown below.

In response to the above problems, weakly supervised methods have entered the attention of researchers to make semantic segmentation models more scalable and relevant. The common denominator of weakly supervised approaches is the use of lower-cost annotations such as bounding boxes and scribbles. These annotations are weaker than pixel-level labels, are easily available due to the lower cost of annotations, and can significantly increase the amount of data with annotations. Among the various types of weak annotations for semantic segmentation, image-level class labels have been most widely used by researchers.

3.2. Design of Intelligent Monitoring System for Putonghua Exam

For the design of the intelligent monitoring system of the Putonghua examination, we analyzed the requirements of the system and considered the advantages and disadvantages of the two-tier C/S mode architecture and the three-tier B/S mode architecture. The three-layer B/S model architecture is used. The first layer is the representation layer, which is also called the business appearance layer in the system, that is, the client browser. The user connects to the entire system through this layer. The client application is streamlined to common browser software, such as Microsoft IE. The browser will use JSP technology to generate a web page file of the user’s request, which has certain interactive features that allow the user to enter information on the application form provided on the web page and submit it to the backend and make a request for processing. The formula is as follows.

The second layer is the functional layer, also known as the business logic layer in the system, which implements the business rules and calls the data access layer and the webserver. The backend of the client is the web server, and the Tomcat server is used in the design of this system. It calls the corresponding application process to respond to the client’s request, generates the request page dynamically using JSP technology, encapsulates the request information in a JavaBean component and passes it to the database, and then returns the result of the database to the client’s browser. The third layer is the data layer, also known as the database access layer in the system, that mainly refers to the underlying database Platform. Access database is used in this system. Its formula is as follows.

JSP technology and Java Bean components are used to access the database, and the request results are fed back to the functional layer to provide a transparent database access process. The architecture of the B/S-based Mandarin test management system is designed to realize the functions of test-taking and management by the administrator, so the system is divided into three layers in the system hierarchy: the core part is the data layer, which is used to store all the information management data, and the test-takers can access the database server transparently through the browser on the web for the Mandarin test. The administrator can also update and maintain the content stored in the database server through a browser from anywhere on the web, check the basic information of candidates, test results, and issue instructions to all candidates. The formula is as follows.

Database design is the development of a database or the work done when a database is needed for some application system. Generally, there is no pure database that exists independently, but the database is designed in dependence on a particular application system. So, database design means that for a given application environment, the optimal database storage model is constructed for it and the database is established so that the application system can access the data quickly and efficiently to meet the various application requirements of different users. The formula is as follows.

Usually, a disciplined database design can be divided into six different phases; they are (1) requirements analysis, (2) conceptual architecture design, (3) logical architecture design, (4) database physical design, (5) database implementation, and (6) database operation and maintenance.

4. Experimental Results and Analysis

4.1. Experimental Results

To demonstrate the efficiency and accuracy of image segmentation under the synthetic dataset segmentation framework proposed in this paper, the performance of this paper is compared with previous work using GTA5 as the source domain dataset and Cityscapes as the target domain dataset through quantitative experiments. Among them, self-supervised learning (SSL) is widely used in semi-supervised learning in cases such as insufficient dataset labels or noisy datasets. In the flow of F to M in this paper, a large number of images with possible translation errors are used, which coincides with the case where self-supervised learning addresses the presence of noise in the dataset. Therefore, for the image segmentation network M, there is a good reason and motivation to use the self-supervised learning approach for improvement. The efficiency of its two algorithms is compared in Figure 5. as follows.

By comparing the experiments with the U-Net network and the MultiResNet network, the latest upgrade of U-Net, we conclude that both the multi-bridge network and the MultiResNet network exhibit performance beyond that of the original U-Net. Both the MultiBridge and MultiResNet networks introduce residual connectivity and cross-region connectivity channels, so it is assumed that these two aspects can make the networks sharper in processing edge information. However, the MultiResNet network lags behind the MultiBridge network in dealing with more ambiguous edges and imperfect voids. It is experimentally demonstrated that the inverted U-shaped structure in the MultiBridge network and the convolutional blocks designed for different levels can well solve the phenomenon of under-segmentation in the network. Also, the independent connection paths designed for different layers largely facilitate the fusion of the front and back layer features. The image representation is shown in Figure 6.

A histogram is a statistical tool that counts the relationship between the gray levels of pixels in an image and the statistical value of the pixel gray levels. Usually, the vertical coordinate of a histogram is used to represent the number of pixels at the same gray level in an image, and the horizontal coordinate is used to represent the gray level of an image. Since the histogram can roughly describe the grayscale range of an image, the distribution of gray levels, and the average brightness of an image, so in this section the histogram of an image containing different sizes of pretzel noise will be given, and the histogram of an image can be used to understand the information about the pixel points of an image in a simple way. For better observation, the histograms of images containing 0.01, 0.10, and 0.50 pretzel noise are given in this section for two different types of histograms, which are shown in Figure 7.

The principle of the dot product operation is to make the dark parts of an image darker and the light parts brighter by dot multiplying a constant. Often images acquired from outside sources contain noise, with Gaussian noise being one of the most common types of noise. In this case, the principle of point multiplication can make the dark parts darker and the light parts brighter, which makes the target information and background information more distinguishable, easy to distinguish and identify the image, and is conducive to the division of the image segmentation region (or make the division of image segmentation region more detailed and obvious), so the image segmentation method based on the dot product operation will achieve a very obvious image segmentation effect and facilitate the processing of images. The efficiency of its image segmentation is shown in Figure 8.

5. Conclusion

This paper introduces the value of this research by analyzing the research background and significance of the branch of picture segmentation technology used for intelligent monitoring of Mandarin examinations, introduces the status of research on image segmentation technology at home and abroad, and leads to the most rapidly developing deep learning technology at present. The concepts and classifications of picture segmentation techniques are briefly sorted out and further demonstrated in conjunction with convolutional neural networks in deep learning. For the problems such as the different morphology of each tissue of the cell nucleus picture and the limited amount of data, a series of image preprocessing approaches are proposed to be expanded by data enhancement, after which a novel convolutional neural network is first proposed, which is confirmed to have an enhanced effect on the prediction of the final model by combining residual connections and skip connections, so that by analyzing this network and the traditional network, a multi-bridge type convolutional neural network is finally proposed based on based intelligent image segmentation method for Mandarin exams. The overall design of this network is demonstrated through comparative experiments that the performance of the network is more effective in image segmentation.

Regarding the “image segmentation” technology, I think that in the future development, the video can also be developed. It can segment the screenshots of the video, and contact to better deal with the things that need to be analyzed. This is the part of technology that will be developed in the future. The main innovative work of this paper is summarized as follows: for the traditional segmentation techniques have the problems of being sensitive to the external picture acquisition environment and needing a lot of human intervention, a deep learning direction based convolutional neural network for cell nucleus segmentation is proposed, which combines residual links and skip links in U-Net to redesign a convolutional block using Keras and TensorFlow framework for the system, performing the collation. Ultimately, by comparing with other networks, it is concluded that the convolutional block design of this structure performs better than the other two networks when performing cell nucleus segmentation.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The author declares that he has no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.