Abstract

An approach has been proposed for automatic adaptive subtitle coloring using fuzzy logic-based algorithm. This system changes the color of the video subtitle/caption to “pleasant” color according to color harmony and the visual perception of the image background colors. In the fuzzy analyzer unit, using RGB histograms of background image, the R, G, and B values for the color of the subtitle/caption are computed using fixed fuzzy IF-THEN rules fully driven from the color harmony theories to satisfy complementary color and subtitle-background color harmony conditions. A real-time hardware structure has been proposed for implementation of the front-end processing unit as well as the fuzzy analyzer unit.

1. Introduction

Subtitles are textual versions of the dialog in the movies and television programs, usually displayed at the bottom of the screen. They can either be a form of written translation of a dialog in a foreign language or a written rendering of the dialog in the same language with or without added information intended to help viewers to follow the dialog. In context of learning technologies, deaf- and hearing-impaired students also use subtitles to read dialogue and identify characters. Some simple manipulations such as the increasing of text sizes and the changing of color are often made available statically.

Subtitling is included with analogue and digital video broadcasts, DVD movies, and other multimedia platforms. Additionally, subtitles are produced to enable hearing audiences to watch foreign language films and television [1]. Till now, the subtitled movies, pictures taken by camera, and TV shows had a constant color subtitles in white, yellow, green, and so forth. It caused sometimes an unpleasant view of subtitles on the images when the background color was similar to subtitle color. Humans perceive color (and pretty much everything else) relatively. If yellow subtitles are used for example, the rest of the frame is going to look too blue.

In this area, some activities have been done in [2] which mostly focused on the algorithms that encode subtitle data to reduce processing power needed for imposing the high-resolution subtitles to the videos. In the approach proposed in this paper, the system using some algorithms based on fuzzy logic adaptively changes the color of the subtitle to a “pleasant” color considering visual perception of background view and color psychology concerns.

These criteria are included in a set of fixed IF-THEN fuzzy rules which have been applied to a fuzzy analyzer structure. The concept of this approach is depicted in the block diagram of Figure 1. This method is also a solution to the problem of the present subtitling systems in which a dark bar is placed in the bottom of the screen to highlight the subtitle text that obviously causes losing the screen visibility under the dark bar.

The idea is to compute the subtitle color in a way that it tends to be contrary to the average color of the background image on the color wheel (see Figure 2). However, the rigid color opposition may not cause a pleasant combination of subtitle and background image color. In the proposed method using color theory [3] and color wheels [4] the fuzzy analyzer chooses a pleasant color for the subtitle for each frame of the video to show harmonious frames on the screen. Analogous color schemes use colors that are next to each other on the color wheel. They usually match well to create comfortable scenes. Analogous color schemes are often found in nature and are harmonious and pleasing to the eye [57].

The fuzzy rules come from the tests on different subtitle colors on the same scene of a movie experienced by a group people. From these tests, we found out that the pleasant opposite (complementary) colors may not be opposite directly on the color wheel. The selected color was sometimes cooler (toward blue) or warmer (toward red) depending on the image background (see Figure 2) [8].

The procedure was to collect 25 men and women to show them 30 screen shots of different movies from which, 15 with the warm color background average and 15 with cooler background average. Each screen shot was subtitled with five different colors: (1) exact opposite color, (2) cooler opposite color, (3) warmer opposite color, (4) fixed yellow, and (5) fixed white. Each person has got questionnaries’ to be asked to rate the most pleasant subtitle color that s/he sees on the screen for each screen shot.

Fusing all the data obtained from the questionnaries’ and computing its average, it was found that for those screen shots with cooler background average nearly 70% of the people chose the subtitle with cooler opposite color and for those screen shots with warmer background average, nearly 65% of people chose the warmer opposite color. Fixed white and fixed yellow were the second choices in most of screen shots. In this paper we develop the fuzzy rules and membership functions manually. Using the data set obtained from the tests, it is also possible to develop inputs and outputs membership functions and rule-base using machine learning techniques (such as recursive least squares (RLS), gradient method (GM), modified learning from example (MLFE), or neural networks (NN)). The automated rule-base and membership function development methods probably save much manpower for setting up the models as automatically trained [9].

The color wheel is a visual representation of color theory. Colors are arranged according to their chromatic relationship. Primary colors are positioned equidistant from one another and are connected by a bridge using secondary and tertiary colors. The first color wheel has been attributed to Isaac Newton, who in 1706 arranged red, orange, yellow, green, blue, indigo, and violet into a natural progression on a rotating disk. As the disk spins, the colors blur together so rapidly that the human eye sees white. From there the organization of color has taken many forms, from tables and charts, to triangles and wheels [10, 11].

The arrangement of colors around the color circle is often considered to be in correspondence with the wavelengths of light, as opposed to hues, in accord with the original color circle of Isaac Newton. Modern color circles include the purples, however, between red and violet. In this paper line color scientists and psychologists, we use the additive primaries, red, green and blue and refer to the arrangement around a circle as a color circle as opposed to a color wheel [11, 12].

In the first section of the paper we develop the assumptions and describe the image front-end processing unit (FPGA) and needed computational resources (see Figure 1). The structure of the fuzzy analyzer and needed computational resources (DSP) are explained in the second section. The last section demonstrates some simulation results.

2. Image Front-End Processing

This section demonstrates the computational resources needed for image front-end processing that leads to preparing information needed for fuzzy analysis. For sake of visibility of the subtitle, the closer image pixels to the subtitles should affect more on the color of subtitles. There are some possibilities to implement this idea: one method is to use a descriptive distance formula that computes each pixel distance to the center of the subtitle area. In this case, there will be huge number of computations that must be done for each frame. This tends to be impractical using the available FPGAs and DSPs in the front-end processing unit.

To avoid huge number of distance computations, we define some fixed regions of the image to assign weight to each. Assuming that the subtitle is captioned in the bottom part of the image screen, we divide the image area into three fixed regions: (1) full screen image (F) which contains all pixels of the image screen, (2) bottom part of the image (B) which includes the bottom part of the image screen, and (3) subtitle region (S) which is a rectangular space in which the subtitle is going to be placed (see Figure 3). The regions F, B and S are weighted differently regarding to the visibility importance: where W(x) is the weight of region . For sake of subtitle visibility, the background color of region is more important than the regions and . The regional weighting results in a subtitle color which is more opposite to the region rather than and regions.

For simplicity, the weights are assumed to be constants (a, b, c). The weights are: where , , and are the areas of F, B, and regions of the given image. According to the block diagram shown in Figure 1, in the front-end processing stage, the gray level histograms of the image red, green and blue channels are computed and weighted separately for F, B, and S regions based on inequality of (1). Then, the weighted histograms of each channel for all of the regions are summed: where , , and are summation of regional weighted histograms for red, green, and blue channels, respectively. is defined as x-channel histogram of the y region.

The hardware platform proposed for implementation of the front-end processing, which includes intensively computation of histograms, is an FPGA platform with pretty large number of logic cells available on (e.g., Virtex-2 [13] or Virtex-5 which are programmed with VHDL code [14] using Xilinx ISE foundation or with graphical tool using LabVIEW FPGA [15]). Computation of the R, G, and B histograms for the S, B, and F regions could be performed totally in parallel. That is why the proposed hardware includes FPGA at the front-end where the image has been acquired. For high-definition (HD) movies which huge amount of data is acquired, FPGA can handle the parallel computations in real-time. The mean value of the R-, G-, and -summed histograms are the values to be given as the fuzzy analyzer unit.

3. Fuzzy Analyzer Unit

In this section the fuzzy analyzer and its membership functions, fuzzy rules, defuzzification method, and needed computational resources are discussed briefly. Three fuzzy variables (mean values of red, green, and blue histograms of the background image) are inputs of the subtitle fuzzy analyzer. After fuzzification of inputs, the rules of subtitle colouring are applied, and the output matrix (R, G, and B values of the subtitle color) is generated and defuzzified using the centre of area defuzzification method [16].

The block diagram of the fuzzy analyzer is depicted in Figure 4, in which the left blocks are the mean values of red, green, and blue histograms of the background image are given to the fuzzy analyzer and the right blocks are the red, green, and blue channels value of the selected color for subtitle.

The fuzzy sets of the RGB channels for data input fuzzification on RGB cube are shown in Figure 5.

The fuzzy analyzer consists of 48 fuzzy rules which are derived from the tests on people’s visual preferences for subtitle color, color psychology, and color harmony between subtitle color and image background colors. We used three principles for forming the rules:(1)to maximize the subtitle visibility, the color of subtitle should be opposite to the averaged summation of the image background colors,(2)in an image with warm colors (red, orange, or yellow) do not use very cool colors like blue as subtitle color and vice versa,(3)in a dark image do not use very bright subtitle colors (and vice versa). RR1:If (Red is L) and (Green is L) and (Blue is L) then (  is H).RR2:If (Red is M) and (Green is M) and (Blue is L) then (  is ML). RR3:If (Red is H) and (Green is L) and (Blue is H) then (  is L).

where zero (Z), low (L), medium (M), and high (H) are linguistic values of fuzzy sets for the averaged summation of red, green, and blue histograms of the image see Figure 6.

The fuzzy set of generated subtitle color is: zero (Z), very low (LL), low (L), medium-low (ML), medium (M), medium-high (MH), and high (H) are shown in Figure 7.

We use a general form to describe the fuzzy rules: :If (Red is ) and (Green is ), and (Blue is ), then (SubtitleRed  is ), i = 1 48. : If (Red is ) and (Green is ), and (Blue is ), then (SubtitleGreen  is ), i = 1 48. :If (Red is ) and (Green is ), and (Blue is ), then (SubtitleBlue  is ), i = 1 48.

where X1i, X2i, and X3i are triangle-shaped fuzzy term sets and is a fuzzy singleton. Let X and Y be the input and output space, and P, V be arbitrary fuzzy sets in X. Then a fuzzy set, in Y, can be determined by each. We use the supproduct compositional rule of inference (t norm): where ,, and are membership functions of red, green, and blue histograms that are averaged summation of the image, respectively. The 3D visualizations of some rules are shown in Figure 8. Some samples of these rules are given below [17, 18].

By using the center of area (centroid) method in the defuzzifier, the crisp outputs can be obtained: where , and are the centre of th fuzzy sets (the areas, resp.) [19]. There are not so various options for hardware implementation of the fuzzy analyzer while all of the fuzzy computation must be executed in real time. The proposed hardware platform, which is capable of executing the fuzzy equations and to be flexible enough to modify the criteria for subtitle coloring, is a DSP-based platform. Floating-point DSPs (e.g., TMS-C6713 [20] programmed with TI Code Composer Studio or IDE Diamond [21]) are preferred because of the fuzzification and defuzzification membership function computations.

Generally, fixed-point computation in FPGAs is preferred because memory usage and speed of operation are optimum only for the limited word length fixed-point numbers. The conversion of a floating-point model to a fixed-point model usually introduces some error to the results which could be avoided by selecting word length and fractional point carefully by checking the word length and operands range before and after each mathematical operation. For the floating-point DSPs proposed for this project, such limitation does not exist. A fixed-point processor (i.e., FPGA) can perform the front-end process in which the mean values of the histograms are computed.

4. Simulation Results

A simulation has been developed in MATLAB to demonstrate the proposed method results. The MATLAB code can be converted to VHDL and C code in order to implement in the proposed hardware structure. The codes developed for image preprocessing and fuzzy analysis get the image file and text of the subtitle and produce the output image on which the colored subtitle is mounted. The RGB histograms of the image are shown as well as R, G, and B values of the subtitle color. The following photos have been subtitled using the approach proposed in this paper in which the subtitles are colored according to the input images.

There are two sorts of comparison can be done for evaluation of the proposed approach: (1) comparing the colored subtitle with the fixed color subtitle (white, yellow, etc.) and (2) comparing with solid complementary color subtitle computed using (6) for given R, G, B centroid values for the background image. We use the second comparison method that is particularly for precise comparing. To compare the fuzzy logic-based approach and mathematical approach the Peak Signal-to-Noise Ratio (PSNR) parameter is used for comparison (7). where and are the subtitled images that are colored with fuzzy and mathematical approaches, respectively. In (7), MAX is the maximum value of a pixel in the image. For an 8-bit image format MAX is 255.

The higher value of PSNR, the lower level of similarity between two images in subtitle color which are generated first using fuzzy logic and second, using mathematical approach. Typical value for the PSNR in image comparison is  dB. For sake of simplicity in visual comparison and saving space, the figures are divided into two parts: left part demonstrates the subtitled image using fuzzy approach and right part demonstrates the same image subtitled with solid complementary color.

The centroid of the histograms of the RGB channel for the Figure 9 are R = 129, G = 121, and B = 133. The R, G, and B values of the subtitle color are (149, 215, and 196). The R, G, and channels of exact opposition color are (126, 134, and 122).  dB. The fuzzy coloring according to the minimum values of PSNR for each channel and the background colors of the Figure 9 is more visible.

The centroid values of the histograms for the RGB channel in Figure 10 are (56, 44, and 9). The R, G, and values of the fuzzy subtitle color are (214, 198, and 234). The R, G, and channels of exact opposition color are: (199, 211, and 246).  dB.

According to the warm colors of the background image the fuzzy subtitle color is warmer than exact complementary color. PSNR(R) shows that the reddish in fuzzy subtitle is more than mathematical subtitle. The centroid values of the histograms for the RGB channel in Figure 11 are , and . The R, G, and values of the fuzzy subtitle color are (149, 199, and 196). The R, G, and channels of exact opposition color are (199, 211, and 246).  dB. According to the cool colors of the background image in Figure 11 the fuzzy subtitle color is cooler and brighter than exact complementary color.

The centroid of the histograms of the RGB channel for the Figure 12 are , , and . The R, G, and values of the subtitle color using fuzzy approach are and using mathematical approach (exact opposition) are (163, 217, and 227). PSNR(R) = 32.83 dB, PSNR(B) = 37.81 dB which means that in red and blue channels there are great differences (noises) that cause the fuzzy colors to be more warmer than mathematical colors. In green channel there is no distinguishable change. In Figure 13 the subtitle color of the Figure 12 with a fixed white can be compared. The contrast of the white color on a fairly dark background makes it too vivid while its contrast to the bright backgrounds is negligible that cause loosing visibility of the subtitle.

5. Conclusion

The real-time subtitle and caption coloring along with its hardware implementation issues using fuzzy logic concepts have been fully discussed in this paper. The main objective of the paper has been to give a direction on how to determine the color of subtitles and captions for video in real time according to the visual perception of background image colors and color psychology concerns giving some examples. The proposed fuzzy analyzer in this system uses information extracted from images to select the R, G, and B values for coloring the subtitle using some perceptional fuzzy rules based on the principles discussed in Section 3. A set of codes in MATLAB environment have been developed and tested for some photos to demonstrate better high performance of the proposed fuzzy analyzer. Based on the nature of the computations needed for implementation of the fuzzy-based subtitle coloring, a real-time hardware structure consisting of an FPGA and a DSP has been introduced. The fixed-point computations of the histogram mean values are parallelized in the proposed hardware platform based on FPGA. The floating-point computations of the fuzzy analysis are going to be executed in a DSP platform linked to the FPGA. This work can be expanded to use other input information like context of subtitle and emotional concerns as well. The future work would probably include the machine learning techniques to develop the membership functions of the inputs and outputs and the fuzzy rule base.