Abstract

In order to improve the accuracy of motion pattern recognition, this paper combines the artificial visual neural network to construct a motion pattern recognition system. Moreover, this paper discusses the psychological perception properties of human eyes to color stimuli and gives a description of the observation field of view where the color stimuli are located. At the same time, this paper analyzes the phenomenon of color adaptation and provides a method of color appearance matching through modeling to achieve color appearance matching under variable observation conditions. Based on the chromatic adaptation transformation, a chromatic appearance model is given, which can predict the corresponding color and also predict the chromatic appearance properties of color stimuli under given observation conditions. In addition, this paper constructs an intelligent motion pattern recognition system combined with artificial visual neural network. The experimental results show that the motion pattern recognition system based on artificial visual neural network can accurately identify the motion pattern category.

1. Introduction

Trajectory resampling refers to making the time interval between consecutive sampling points in the trajectory the same by inserting the mean or median value among the sampling points. Trajectory segmentation refers to segmenting the trajectory at important turning points by MDL value or DP algorithm and dividing a trajectory into multiple trajectory segments for analysis and mining. Trajectory feature extraction refers to extracting features with strong identification ability from the trajectory. The quality of feature extraction largely determines the accuracy of motion pattern recognition and is a key step in trajectory motion pattern recognition. In addition, the extracted trajectory features generally include trajectory motion features, shape features, location features where the trajectory appears, and time features. The classifier refers to constructing a classifier based on the trajectory data of the known category. The model can predict the trajectory data of the unknown category and obtain one of the given categories. Establishing a classifier means using the feature vector of the training trajectory as the input of the classifier to train a classifier for motion pattern recognition. In the test phase, the feature vector of the test trajectory is used as input, and the motion pattern category of the test trajectory is obtained through the trained classifier. Currently, commonly used classifiers include k-nearest neighbors, decision trees, random forests, neural networks, support vector machines, Bayesian networks, conditional random fields, etc.

With the popularization of electronic devices with global positioning system functions, the behavior characteristics and laws of individuals or groups can be analyzed by collecting the location information of moving objects. It can provide an important basis for commanding decision-makers in the fields of processing and large-scale military operations. Group movement pattern analysis is to analyze the movement state of the group and extract its behavior pattern through the information of the position of each member in the group changing over time, so as to comprehensively evaluate the group behavior pattern and action effect. The process of group motion pattern analysis mainly includes the steps of data preprocessing, constructing time series, using interpolation algorithm to calculate the corresponding member positions of the time series, and calculating various description parameters for data analysis. Among them, data preprocessing mainly includes coordinate transformation, time synchronization, and exclusion of abnormal nodes. Since each member node records and reports its position information at its own rhythm, the original data set generally does not contain the position data of all nodes at a certain time during group motion pattern analysis. It is necessary to construct a unified time series of the group on the basis of the time series of each member and then use the interpolation algorithm to determine the position information of the member nodes in the group at this moment under the time series.

In this paper, the artificial visual neural network is combined to construct a motion pattern recognition system to improve the effect of motion pattern recognition and effectively improve the accuracy of motion pattern recognition.

Human body motion pattern recognition refers to the process of identifying various motion states of the human body, mainly by analyzing the data of inertial sensors to identify motion patterns. Motion pattern recognition technology is widely used in competitive sports, health detection, medical research and pedestrian navigation, and other fields [1]. In recent years, it has also been used in some rescue work [2]. Literature [3] placed accelerometers on the waist and thighs and used wavelet transform to analyze postures such as walking, jogging, and lying down. The experimental results were compared with video recordings, indicating that the data from the accelerometer can be used to identify human posture. Literature [4] proposes a recognition algorithm in which the accelerometer is placed on the wrist and compares and analyzes the recognition results of human behavior patterns on the wrist and buttocks. Literature [5] places the sensor motion node on the front right hip and proposes a new human action recognition framework based on compressed sensing and sparse representation theory. Literature [6] uses decision tree and logistic regression technology to propose a new prediction model based on machine classification learning. The recognition results of running and sitting are good, but the recognition results of going up and down the stairs are not very good. Literature [7] analyzed the data of two inertial measurement units placed on the foot and shoulder to realize indoor pedestrian navigation. Literature [8] uses multiple sensors placed at different positions of the body and proposes a classification and recognition algorithm suitable for wearable sensor platforms, whose accuracy is higher than that of a single sensor.

It is proposed that current anomaly detection methods can be mainly divided into two categories: trajectory-based methods [9] and appearance feature-based methods [10]. The former, as the most traditional anomaly detection method, usually consists of two steps, tracking the target in the video scene to obtain the motion trajectory and then modeling and analyzing the tracking trajectory. In complex scenes with many targets, both target tracking and trajectory analysis are difficult, resulting in high computational cost. The most common technique to solve this problem is to use the method of feature extraction, which proposes a visual feature called optical flow texture and combines it with spatial information to detect abnormal behavior, but this method is only suitable for detection. Behavior differs from normal movement patterns. Optical flow mainly focuses on the motion information of objects and often ignores some nondynamic abnormal information [11]. Some more complete representations are proposed to ensure the inclusion of dynamic and static information. Literature [12] uses optical flow and contour features to generate spatiotemporal descriptions and uses Nonnegative Locality-Constrained Linear Coding (NLLC) to detect abnormal behaviors. Many different kinds of abnormal behavior detection models have emerged in recent years. Literature [13] proposes to segment the context of a video into various semantic regions, establish a semantic context model for moving objects in the context, and then use the model to detect abnormal behavior. Literature [14] proposed a joint sparse model (Jointly Sparse Model, JSM) to train the trajectory in the training sample to obtain an overcomplete dictionary and use the dictionary to sparsely reconstruct the trajectory of the test sample and then detect anomalies through the reconstruction error. Topic models can identify behavior patterns in scenes and detect anomalies through underlying features that appear at different levels and have achieved great success in the field of behavior recognition. Literature [15] utilizes hierarchical Bayesian models, such as Latent Dirichlet Allocation (LDA) and Hierarchical Dirichlet Process (HDP), to describe typical behaviors in videos. Literature [16] proposed a trajectory analysis method based on the LDA model and used it for abnormal behavior detection. Literature [17] proposed to use the Probabilistic Latent Semantic Analysis (PLSA) method to build a topic model with local information and quantify the location and size information of the image through rich spatiotemporal gradient descriptors to extend topic-based analysis and local descriptors, supplemented with some easy-to-ignore exception local information. However, traditional probabilistic topic models lack a mechanism to directly control the sparsity of document representation. Literature [18] proposed a sparse local coding (Sparse Topical Coding, STC) method to find latent representations in a large dataset and directly control the sparsity of model representations through sparsity-inducing regularization terms. A method for detecting abnormal video behavior based on motion pattern analysis is proposed. In the underlying processing process, the spatiotemporal descriptors are extracted and combined with position information to generate visual words, so that the visual words contain sufficient dynamic and static information, which can be used for the detection of various abnormal behaviors [19].

Previous related research work mainly focused on extracting the motion features of trajectories, including velocity, acceleration, and motion direction. If only the motion features of the trajectory are extracted, the important location features of the trajectory are often ignored. In some special cases, such as in traffic jams, the motion characteristics of multiple motion modes are similar and difficult to distinguish. Literature [20] believes that the location information of the trajectory can help researchers improve the accuracy of trajectory motion pattern recognition, so it proposes a region classification rule considering duration and a path classification rule mining algorithm considering duration.

Due to the large drift and low accuracy of MEMS devices, the auxiliary means of Zero Velocity Update (ZUPT) are usually used to suppress the accumulation of errors, and the solution accuracy is further improved by periodic error clearing. In the process of walking, the time when the feet remain relatively stationary with the ground is very short. Therefore, the detection of the zero-speed interval is not only the core of the zero-speed correction, but also the key link of the pedestrian navigation solution [21]. There may be missed judgments or misjudgments during zero-speed zone detection. Missing judgment means that all zero-speed state points are not judged, and misjudgment means that non-zero-speed state points are judged to be zero-speed state. Since the boundary point between zero-speed and non-zero-speed states cannot be accurately determined, the occurrence of missed judgments is inevitable [22].

3. Machine Vision Feature Recognition Algorithms

If the apparent brightness of a color stimulus is B, the brightness is L, and the apparent brightness of the reference white point is , the relationship between them can be expressed as

If the chroma of a color stimulus is M, the chroma is C, and the chroma of the reference white point (same as the brightness) is , the relationship between them can be expressed as

Saturation indicates the purity of the color or the difference from a neutral. If hue is the perception of the dominant wavelength, saturation is the degree to which other wavelengths of light are doped in the dominant wavelength. The wider the range of wavelengths contained in the color is, the less saturated the resulting color will feel.

If the saturation of a color stimulus is recorded as S, then

The observation condition attribute is used to describe the scene (Scene) in which a color stimulus is observed. The scene is often referred to as the viewing field or viewing conditions. Viewing conditions have a large impact on color perception. This subsection will define a simple typical observation field (Figure 1), which consists of four parts: the color stimulus (Stimulus), the proximal field (Proximal field), the background (Background), and the surrounding environment (Surround).

The phenomenon of spatial structure refers to the phenomenon that color appearance changes with the spatial structure and background of color stimuli. The most famous phenomenon is simultaneous contrast, that is, the phenomenon that color appearance changes to its opposite direction under a contrast perception. If a brighter background induces a stimulus to appear darker, a darker background will induce a brighter appearance. In the same way, red induction produces green, green induction produces red, yellow induction produces blue, and blue induction produces yellow. Figure 2 shows how the color appearance changes when grays of the same shade are placed against different backgrounds. Among them, the two color blocks above are placed on the same gray background, and the visual perception is the same. However, when they are placed against the different backgrounds below, the grays placed against the black background are visually perceived as “brighter,” while the grays placed against the white background feel slightly darker. Figure 3 further summarizes the strong effect of changes in the background on color appearance. In the figure, the background is getting brighter and brighter from left to right, and the chromaticity values on each color bar are exactly the same, but in visual perception. The color difference from left to right is felt due to the gradient of the background color.

The change in the sensitivity of the human eye caused by the change of the illumination light can be shown in Figure 4, which reflects the situation of changing from sunlight illumination to incandescent lamp illumination. In daylight illumination, since the spectral distribution of sunlight is roughly flat, the red, green, and blue sensitivities are roughly balanced. When changing to incandescent lighting, the red component increases and the blue component decreases, so the sensitivity of the red photoreceptor decreases and the blue photoreceptor increases, resulting in always a fixed response. This explains why the color appearance does not change.

According to the von Kries hypothesis, its color adaptation model can be established through the following steps.(1)First, the algorithm converts the CIEXYZ tristimulus value to the LMS cone response value (sometimes also called RGB or response), and the conversion relationship isAmong them, M represents the Hunt-Pointer-Estevez transformation matrix, and its elements are determined by the chromaticity coordinates of the three basic primary colors.(2)By multiplying the initial vertebral response LMS of the human eye to color stimuli and the independent gain control coefficient , the adaptive vertebral signal is obtained. The gain control coefficient is the most critical part of most color adaptation models. In the von Kries model, the reciprocal of the scene’s maximum -pheasant response value is used as the independent gain control coefficient. Usually, the maximum LMS pheasant response of the scene is the pheasant response of the scene white point, so the von Kries adaptation is usually also called white point adaptation and has [23]Among them,(3)Through the adapted pheasant signal , the algorithm calculates the adapted color stimulus value as follows:

For any given color stimulus, it is very useful to obtain the adapted CIEXYZ value, but more often it is more important to obtain the corresponding color of the color stimulus under another observation condition. Chromatic Adaptation Transform (CAT) is used to calculate the corresponding color. The derivation of the von Kries chromatic adaptation transformation is as follows.

The color stimulus value under the source observation condition is XYZ, and the corresponding color is under the target observation condition. The task of von Kries color adaptation transformation is to match the color appearance of two color stimuli under the source and destination observation conditions, that is, to make the adaptive vertebral response signals equal under the two observation conditions.

For the convenience of expression, we represent this transformation as a linear matrix. From the von Kries model introduced above, we can see that the adaptive response value of the pheasant under the source observation condition is

The response value of the pheasant after adaptation under the objective observation condition is

Among them, , and represent the cone response value of the target observation white point, so the corresponding color under the target observation condition is

This paper proposes a model that combines the linear process established by the von Kries model and the nonlinear process established by the exponential adaptation, which is called the Nayatani model.

When calculating the adaptive cone response signal , the Nayatani model uses a nonlinear model to multiply the gain adjustment coefficient by an episodic function. The exponent of the curtain function is based on a variable that adapts to the field brightness. This enables the Nayatani model to reflect luminance phenomena such as Hunt and Stevens effects. In addition, a noise control function is added for threshold prediction. The gain adjustment factor is used to control the brightness of the nonselective samples (gray) as the adaptation field brightness to avoid complete color constancy. The specific formula is as follows:

Among them, , and represent the added additive noise, , and are variables based on adaptive brightness level, and , and are the coefficients that make neutral gray stimuli produce color constancy.

The basic structure of the Fairchild model is consistent with the von Kries model, and only the method of calculating the gain control coefficient is more complicated. The formula for calculating the adaptive pheasant response value is as follows:

The formula for calculating coefficient is as follows (the other two coefficients and are calculated in a similar way):

Among them, refers to the brightness of the adaptation field, and the unit is , and represent the response value of the pheasant adapted to the white spot stimulus, and , and represent the response value of the pheasant with equal energy illumination. When is equal to 1 (the adaptive white point is equal energy illumination , that is, ), it is considered to be fully adapted, and the model is completely transformed into the von Kries model. When is and is 1, it is considered to be zero adaptation. When the value is between the two, it belongs to incomplete adaptation. The degree of adaptation depends on the degree of adaptation to the field brightness level and the degree of adaptation to energy illumination such as white point deviation. This color-adaptive conversion method is similar to the von Kries conversion method described above.

The main difference between the spectral sharpening chromatic adaptation model and the abovementioned chromatic adaptation model lies in the first step of chromatic adaptation, that is, the calculation method of the pheasant response value. This type of model first normalizes the CIEXYZ tristimulus values by dividing by the component. The obtained cone response value RGB does not represent the physiological pheasant response signal, but the spectral sharpened pheasant response signal, which can often better maintain color saturation and color constancy. A specific spectral sharpening model, the Bradford (BFD) chromatic adaptation model, implements the following steps.(1)The algorithm normalizes the CIEXYZ tristimulus values and converts them to the RGB spectral sharpening cone response space through the BFD matrix. The formula isAmong them, the elements in the matrix are obtained by optimizing the Lam\&Rigg dataset.(2)The algorithm calculates the adaptive spectral sharpening pheasant response value .

The model uses a nonlinear power function for short waves, and the calculation formula is as follows:

Among them, , and represent the spectral sharpening pheasant response values adapted to the white point stimulus, and D represents the degree of adaptation. Usually, D is taken as 1.0 in hard copy, which means full color adaptation; that is, the effect of lighting is completely canceled. In soft copy, D takes 0, which means zero adaptation, and the influence of light is not offset. When it is in a darker condition like projection, D takes an intermediate value, which means that the effect of illumination is partially offset.

The model calculates the tristimulus values after adaptation and the method of color adaptation transformation is similar to the aforementioned von Kries transformation method.

The input parameters of the CIELAB model include two sets of CIEXYZ tristimulus values, one is the color stimulus itself, and the other is the color stimulus value of the reference white point. The model calculation formula is as follows:

In the formula, the function is defined as follows:

Among them, X, Y, and Z are the tristimulus values of color stimuli, is the tristimulus value of the reference white point, and the normalization makes .

The definition of the color appearance model requires the calculation of at least three properties of lightness, chroma, and hue. The model provides the calculation of lightness L, chroma , and hue , and the a and b coordinate values are calculated using the following formulas:

Although the CIELAB model is the prototype of the colored appearance model, it has the following defects:(1)Wrong von Kries chromatic adaptation leads to inaccurate chromatic appearance prediction.(2)The CIELAB model does not consider most of the color appearance phenomena, such as brightness, surrounding environment, background, offset lighting phenomenon, etc.(3)The CIELAB model always assumes 100% white point adaptation and does not consider the phenomenon of incomplete color adaptation.(4)The CIELAB model is designed to predict small color differences between similar objects under fixed viewing conditions, so it is difficult to generalize as a color appearance model.(5)The hue constancy of the CIELAB model, especially in the blue region, is very important for some image processing techniques such as gamut matching.

The RLAB model first converts the color tristimulus values to the corresponding colors under the reference standard environment (D65, 2° observer, illuminance of 318 cd/m2, hard copy media) through color adaptation conversion. Then, it calculates the color appearance properties using the corresponding color. For the corresponding color prediction under other conditions, it can be calculated by the inverse model of this model.(1)The algorithm calculates the LMS vertebral response value:Among them, M still represents the Hunt-Pointer-Estevez transformation matrix. The matrix is normalized so that equal vertebral response values (L = M = S = 100) can be obtained when the color stimuli are equal energy illumination (i.e., X = Y = Z = 100). White dot tristimulus values are also converted to pheasant response values.(2)The algorithm calculates the pheasant response value after color adaptation.Using the color adaptation matrix A, the adaptive pheasant response value is obtained as follows:Among them,The D factor describes the degree to which light is offset in chromatic adaptation. In general, D = 1.0 in hard copy conditions, indicating complete chromatic adaptation. D = 0 in soft copy, which means no offset light. When in a darker condition such as projection, D takes an intermediate value, which means that the illumination is partially offset. Its value depends on the specific observation conditions and generally takes 0.5 empirically. and are calculated similarly.(3)The algorithm calculates the corresponding color under the reference standard conditions.The reference standard condition is D65 illumination, observers, and hard copy of the adaptation field illumination ; the calculation formula is as follows:Among them, represents the pheasant response value of the D65 illumination white point. Therefore, if , it is a constant matrix, and(4)The algorithm calculates the color appearance attribute.

The related color appearance attribute parameters and calculation methods are as follows:

Among them, the index is the surrounding environment parameter, which is 1/2.3 in the average environment; 1/2.9 in the dark environment; 1/3.5 in the dark environment.

The LLAB model adopts the Bradford color adaptation conversion method, and the specific implementation steps are as follows.(1)The algorithm calculates the RGB cone response valueSimilarly, the RGB pheasant response values of the source adaptive field white point and the reference standard white point can be calculated, which are denoted as and , respectively. Among them, the white point under the reference standard condition is CIED65, and its tristimulus value is a constant .(2)The algorithm performs color adaptation conversion and calculates the corresponding color under the reference standard conditions. The algorithm first calculates the pheasant response value under the reference standard condition, which isIf , thenOtherwise,Among them, Then, the algorithm calculates the corresponding color under the reference standard condition, which is(3)The algorithm calculates the color appearance attributes.

The related color appearance attribute parameters and calculation methods are as follows:

In the formula, the function is defined as follows:

In addition to lightness , chroma , and hue , the model can also predict visual chroma, saturation, etc. Since these color appearance attributes are beyond the scope of this study, their specific formulas are not given.

The spectral sharpening space conversion adopts the equal energy balance matrix to convert the CIEXYZ tristimulus values to the space, so as to obtain the sharpening space response values are also converted into , and the specific calculation formula is as follows:

The adaptive sharpening signal is obtained by multiplying the RGB signal by the independent gain control coefficient , namely:

Among them,

The D factor is used to reflect the degree of color adaptation and is a function of the environment and . Theoretically, the value of D should be from 0 (can not adapt to the selected white point) to 1 (complete adaptation). In fact, the minimum value of D does not fall below 0.65 in the dark environment and grows exponentially with . The relationship curves of under the three environments are shown in Figure 5.

The cone response space transformation transforms into the pheasant response space before the nonlinear response is compressed. Through experimental comparison, it is found that nonlinear response compression in a space similar to the cone response can predict better color appearance attribute values. The conversion formula is as follows:

The specific formula of nonlinear response compression is as follows:

Among them,

and are calculated similarly, and and can also be converted.

The calculation method of color appearance attribute is as follows:

Among them,

A represents the achromatic response, and the constant term −0.305 in the formula determines the minimum luminance value, so that when Y is 0, A is also 0, and the achromatic response of the white point can be calculated similarly. The formula for calculating lightness J is

Among them,

With the lightness and temporary variables t, the chroma C can be calculated as follows:

4. Accurate Recognition of Motion Patterns Based on Artificial Visual Neural Network

Figure 6(a) is a schematic diagram of the visual movement of the RS neuron activated by the moving target. In this figure, a moving target (that is, the solid-line sphere in the figure) rotates CCW around the center point. At different moments in the rotational motion (i.e., the dotted sphere in the figure), the translational motion direction of the moving target changes regularly. That is, relative to the previous moment, the translational movement direction of the moving object at the current moment always changes incrementally in the counterclockwise direction. Figure 6(b) is a schematic diagram of the basic visual mechanism for RS neurons to recognize rotational motion patterns. Moreover, each step in the figure corresponds to a specific visual signal processing in the visual channel.

Figure 7(a) presents a schematic diagram of the basic visual mechanism of DRS neurons in perceiving deep rotational motion patterns. In order to perceive deep rotational motion patterns of moving objects, visual signals are processed in four steps in the visual pathway of DRS neurons, that is, (1) visual signal processing and perceived lumen change, (2) acquisition of changes in visual excitation and inhibition, (3) extraction of changes in translation direction of moving objects and depth of motion behavior, (4) synthetic depth of motion behavior and changes in translation direction, and DRS neuron output membrane potential. Each step corresponds to a specific visual signal processing process in the visual pathway. After layer-by-layer visual information processing, DRS neurons can perceive the spatiotemporal energy changes caused by deep rotational motion in the visual field. Figure 7(b) is a schematic diagram of the amount of visual motion behavior implied by a moving target that performs a counterclockwise deep rotation motion on the horizontal plane.

On the basis of the above research, the effect of the motion pattern recognition based on artificial visual neural network proposed in this paper is verified, and the motion pattern recognition effect of the model in this paper is counted, and Table 1 and Figure 8 are obtained.

Combined with the simulation recognition experiment, it can be seen that the motion pattern recognition model based on artificial visual neural network can accurately identify the motion pattern category.

5. Conclusion

The motion pattern recognition of moving objects is generally divided into training process and testing process. In the training process, all training trajectories (trajectories of known motion patterns) are preprocessed first, then features are extracted for each trajectory, and finally a classifier is constructed. In the test phase, all test trajectories (trajectories with unknown motion patterns) are preprocessed first, then features are extracted for each test trajectory, and finally the motion pattern recognition is performed on the test trajectories using the classifier obtained in the training phase. This paper combines the artificial visual neural network to construct a motion pattern recognition system to improve the motion pattern recognition effect. The experimental results show that the motion pattern recognition system based on artificial visual neural network can accurately identify the motion pattern category.

Data Availability

The labeled dataset used to support the findings of this study is available from the corresponding author upon request.

Conflicts of Interest

The author declares that there are no conflicts of interest.

Acknowledgments

This study was sponsored by Hubei University of Technology.