Research Article

Separating Chinese Character from Noisy Background Using GAN

Figure 5

Data synthesis method. For each font type, 708 basic Chinese characters which contain various basic components are chosen to synthesize overlapping scenarios. Handwritten characters and printed characters are then randomly paired and overlapped to constitute data samples in training set. Synthesized in a similar way, the test set contains more Chinese characters and the character structure is more complex. Each data sample has a corresponding ground truth (i.e., original printed or handwritten character) for evaluating performance. The total size of the test set reaches approximately 12,200 unique overlapping characters.