Abstract
This paper investigates the application of digital image processing techniques to the detection of neurological disorder. Visual information extracted from the postures and movements of a human gait cycle can be used by an experienced neurologist to determine the mental health of the person. However, the current visual assessment of diagnosing neurological disorder is based very much on subjective observation, and hence the accuracy of diagnosis heavily relies on experience. Other diagnostic techniques employed involve the use of imaging systems which can only be operated under highly constructed environment. A prototype has been developed in this work that is able to capture the subject's gait on video in a relatively simple setup, and from which to process the selected frames of the gait in a computer. Based on the static visual features such as swing distances and joint angles of human limbs, the system identifies patients with Parkinsonism from the test subjects. To our knowledge, it is the first time swing distances are utilized and identified as an effective means for characterizing human gait. The experimental results have shown a promising potential in medical application to assist the clinicians in diagnosing Parkinsonism.
1. Introduction
Parkinsonism or Parkinson syndrome is a clinical entity produced by
several different etiological agents, and it is associated with a variety of
pathological processes which damage the extrapyramidal system. The diagnosis is usually not difficult
when the full clinical picture—tremor, rigidity, postural instability, and a
decrease in spontaneous movement—is present. However, in the early stage, often show fragments
of the total syndrome are evident in most of the patients and diagnosis then may not be
completely certain [1–4]. Generally, follow up with the
development of other symptoms and signs makes diagnosis possible.
Patients with
Parkinson syndrome stand in a posture of general flexion with the spine bent
forward, the head bowed, the arms moderately flexed at the elbows, and the hips
and knees mildly flexed. They stand immobile and rigid, with a paucity of
automatic movements and a mask-like face. Although the arms are held immobile,
there may be a slow frequency tremor that involves the fingers and wrists. This
is often accentuated or even brought out once walking commences.
When walking
commences, there is a restricted rotation of the trunk. As a result of the body
being carried on the toes, the trunk bends even further forward, which pushes the
center of gravity ahead of the foot support. This results in a propulsive gait
with an inability to halt forward progression and risk of falling. One or both
arms may fail to swing. The legs remain bent at the hips, knees, and ankles with reduced angular excursion at all the
joints. The step cycle is lengthened in duration mainly due to an increase in
the stance phase whereas the swing phase is reduced. The feet scrape and
shuffle along the floor due to a reduced step height and a reduced stride
length.
There is a
disturbance in postural reactions due to abnormalities of central reflexes
involved with postural adjustments. If the patient is pushed backwards or
forwards,
he may not be able to compensate with flexion
or extension movements of the trunk, and he may fall precipitously. He often has trouble initiating gait after
standing still or sitting in a chair. The gait may become arrested by minor
visual or proprioceptive stimuli and, if psychological stress is added, the
patient may become “frozen". On the other hand, it is well recognised that, in some patients,
walking may be facilitated by different external stimuli. Trivial signals, for
instance an object placed on the floor, may help initiate walking [4–6].
Parkinsonism patients have shown significant
movement restrictions on the limbs which have not been adequately utilized in
traditional diagnosis that it solely depends on the experience and judgment of
the clinicians, which could be subjective and inconsistent. To provide
accurate and quantitative measurements, two kinds of image processing
diagnostic systems were proposed in the literature. One requires complex
laboratory settings and body attachments such as motion marker systems and
ground reaction force plate to measures various features for Parkinson syndrome,
which may effect the gait movement [7–11]. The other is based on automatic
analysis and recognition of human behavior by gait in videos recorded in an
environment with relatively simple settings [12–17]. The second approach is
of particular interest to this work.
Automatic
analysis and recognition of human behavior by gait are subject to increasing interest and they have the unique
capability to recognize people at a distance when other biometrics is obscured. Its interest
is reinforced by the longstanding computer vision interest in automated
noninvasive analysis of human motion. Its recognition capability is supported
by studies in other domains such as medicine (biomechanics), mathematics, and
psychology which continue to suggest that gait is unique. Current approaches
confirm the early results that suggested gait could be used for identification,
and now on much larger databases. Gait has benefited from the developments in
other biometrics and has led to new insight particularly in view of covariates.
As such, gait is an interesting research area, with contributions not only to
the field of biometrics but also to the stock of new techniques for the
extraction and description of objects moving within image sequences. In
biomedical applications such as analyzing neurological disorder and monitoring
rehabilitation process after orthopedic surgeries, this could greatly reduce
the hardware setup and give the patients the maximum possible comfort during
diagnosis process. Recent survey papers on gait analysis gave comprehensive
treatment of the subject, especially in the framework of two prominent gait
analysis programs: Human ID at a Distance [18, 19], and Human
ID Gait Challenge [20].
The work reported this paper falls in the
second category: automatic analysis and recognition of human behavior by gait. Based on the forward striking instant of the
gait posture, the system measures, in a simple setting, features such as the
swing distances of arms and legs as well as the angles of various joints to give
an objective measurement to assist clinicians in determining potential Parkinsonism
in the patients.
2. System Structure
As shown in Figure 1,
our video analysis system comprises of three subsystems: (1) image acquisition,
(2) image processing and analysis, and (3) decision-making.
Figure 1: System diagram.
(i)
The main purpose of the image acquisition
subsystem is to digitize and store the images from video sequences, so that the
images can be processed and analyzed in the next stage.
(ii)
The image processing subsystem aims
at extracting important visual information from the digitized images. Features
such as swing distances and joint angles are extracted in this stage. The most effective
features are then selected by a combination process of sequential backward
selection (SBS)/general regression neural network (GRNN), and histogram
analysis.
(iii)
The decision-making subsystem is
based on a feed-forward neural network, which is trained by the features
selected previously. The network is then used to diagnose new neurological
data.
3. The Image Acquisition Subsystem
In this work, the image acquisition
subsystem comprises of two major steps: (1) video recording (2) image capturing
and digitization. These two steps can be implemented independently or coexist
at the same time.
Several distinctive
colors are used on the tracksuit to highlight different parts of the body. Black
color is used for the torso and the leg facing the camera. White color is used for
the leg further away from the camera, and red color for the arms (see Figure 2).
Rather than using expensive materials for the costume, the new suit was made
out of tracksuit materials which
are close to the normal outfit
people wearing daily, and subjects
tend to behave more naturally in
the taped video. As a result, a more realistic gait patterns were observed, and
more accurate data had been collected. In addition, this practical laboratory
setting would be more beneficial in the context of medical research.
Figure 2: Image acquisition subsystem. (a) Example of the captured image (b) hardware settings.
A high-quality
portable S-VHS video camera is used to capture the walking sequence of the
human subjects. In this work, only the side-on views of the subjects are
investigated. An example of the captured image is shown in Figure 2(a). The
entire gait sequence was transferred to a computer using Videonics Python, a
video capturing device as illustrated in Figure 2(b), and the relevant frames
can be precisely selected for the experiment.
The special designed tracksuit and uniform
colored backdrop were used to simplify the segmentation process. In the realistic
system, these constrains should be replaced by the subjects wearing normal
outfit and the experiment be carried out at any indoor environment. This may
involve shooting in stereograph picture to obtain the distance information of
the limbs from the camera, hence to perform spatial segmentation.
4. The Image Processing Subsystem
Segmentation subdivides an image into
its constituent parts or objects. The levels, to which this subdivision goes
on, are depending
on how much detail is desired for the overall function to be successfully
completed. Before an image was segmented, it was cropped to the desired size, and
then filtered by a low-pass filter to remove random noise and a median
filtered to eliminate speckle noise. The resulting image is served as the input
to the segmentation phase.
4.1. Color Segmentation
Although the previous work to segment
images in the grayscale domain
produced promising results, fine adjustments were required to improve the image
quality every time a new image was processed. In addition, the image
acquisition subsystem was able to capture high-quality color images, processing
those images in grayscale
domain has wasted the rich information carried in the color images. Hence, color
domain processing is investigated in this work. Furthermore, because color
segmentation is more sensitive to the change of color, this approach gives us
more flexibility in the costume design (less distinct colors can be used).
Hence, the capability of the image processing subsystem will eventually be
improved. In this paper, the RGB color format is chosen, as illustrated in
Figure 3.
Figure 3: RGB components: (a) original
image, (b) red component, (c) green component, (d) blue component.
Segmentation
of the grayscale images was
done by threshold technique to trace the boundary of each segment. Subsequently,
a region growth algorithm is applied to each enclosed region to complete the
segmentation process. To yield an optimal outcome, the contrast of each region
needs to be significant. As a result, image qualities such as brightness and
contrast need to be adjusted manually for every new image. Imagine the same
procedure is applied to three different layers for RGB image format. It would
be even harder to obtain a good color-segmentation. Further more, as observed
from Figure 3, it would be ineffective by human eyes to distinguish red and
black in both green and blue layers,
let alone by computer programs. Hence, a neural network approach for color
segmentation is proposed.
The RGB values of various pixels from
different regions were used to train a back propagation neural network to guide
the system to recognize different colors presented in the image. The network
consists of three inputs corresponding to the RGB values in every pixel. Four
outputs were used to represent the four colors appeared in the image, corresponding
to red—the upper limb color, black—the torso color, white—the back leg color, and blue—the color of the backdrop. The network was
trained by taking small patches of pixels from each region, hence learning to
recognize the color in different regions. Then, the trained network is applied
to separate different regions in the images. Once the network has identified
the color represented in each pixel, it then produces a segmentation map
according to the outcome of each pixel. Figure 4(a) shows an example of the
segmentation result in the RGB color space with red = (255,0,0), black =
(0,0,0), white = (255,255,255), and blue = (175,190,240). This was to ensure
that the color of each region was uniformly presented in the image. Pixels
failed to be classified successfully will remain its original RGB values. Morphological
processing techniques [21] are then applied to remove the misclassified
small regions. Dilation has the effect of expanding an image whereas erosion
has the opposite effect. Different combinations of dilation and erosion could
be used to remove the small regions effectively. It is experimentally
determined that the best sequence of the dilation and erosion operations was Dilation
Dilation
Erosion
Erosion, as illustrated in Figure 4(b).
Note that although the segmentation of the head is still not satisfactory, it
had little effect in our current gait analysis.
Figure 4: (a) Color segmentation before dilation and erosion, (b) resultant image
after dilation and erosion.
A boundary line has also been traced
between different regions. The process was done by scanning the pixels of the
segmentation map horizontally. At each point, the value of the previous pixel
and next pixel was compared, if the values were not the same, that is, there
was a color change taking place. The original color of the current pixel was
replaced by a green dot, and then move to the next pixel. Figure 5 illustrates
the flowchart of the process.
Figure 5: Flow chart for boundary detection.
The boundary outlines the shape of
each region, and the medial point can be determined by taking the average of
the two boundary points at every horizontal cross section. The resultant color
segmentation is shown in Figure 4(b), which consists of one torso segment
(black), two arm segments (red), and one back leg segment (white).
The resultant image after the dilation/erosion processes
removes the unclassified points on the face and the hand. It also fills the impurity at the edge of
the white leg (see Figure 4(a)) to form a complete region (see Figure 4(b)).
The body segment is
incomplete due to the fact that the front arm obscures part of the body. Therefore,
further processing of this segment is required, such that the skeletonization procedure
can be properly carried out on the body segment. The body restoration process
is adopted from the algorithm developed in [15]. A reference image, which shows
that the position of the front arm is located within the torso region, is
chosen. This method assumes that the position of the upper body, in terms of
tilt or stoop, does not vary
much through the gait sequence. Figures 6(a) and 6(b) display the original and
reference images. Figures 6(c) and 6(d) represent the body segment before and
after restoration.
Figure 6: Image restoration example.
4.2. Skeletonization
Following the segmentation of the
image, we are ready to proceed with the extraction of the relevant
representation of the subjects. Clinical experience shows that these features
are joint angles and swing distances. Extracting these features is made easier by
skeletonize the segmented
image. The skeletons in this paper are assumed to be the medial axis of each
particular segment found in segmentation. The first task is to thin each of the
segments obtained earlier. The process is done by adopting the algorithm
developed in [22]. The algorithm performs the medial axis transform on each body
segment separately. Each thinned body segment is then further processed to
remove all the erroneous branches in the partial skeleton, which were caused by
portions of segment protruding from the proper image segment. This is important
especially when it comes to the body segment. In some cases, a small protrusion
is left as a remnant after the body segment restoration. This is enough for the
thinning algorithm to see it as a separate part to the segment that needs to be
thinned. Thus, a branch will be created out from the central skeleton of the
segment, as illustrated in Figure 7(a). The skeletons of each segment are then
combined to give the whole body skeleton. This skeleton is used to measure the
arm swing distances. Figure 7(a) represents a skeleton that was combined
without branch removal, as compared to the clean whole body skeleton shown in Figure
7(b).
Figure 7: Skeletonization examples of (a) whole body skeleton, and (b) skeleton after
debranch.
5. Feature Extraction
Neurological signs in PD patients are typically
characterized by symptoms such tremor, bradykinesia and rigidity, and gait/posture stability. These symptoms
can show significant differences between normal people and PD patients
in terms of the gait features, such as joint angles, swing distances, and swing
trajectories of the limbs which can
be analyzed from both static and dynamic perspectives. This research focuses on
analyzing Parkinsonism from the static processing perspective due to its
simplicity and relative effectiveness. Specifically, two groups of features
will be extracted: (1) swing distances between the ends of the limbs and those
between the ends of the limbs and the median axis of the torso, (2) joint
angles between sections of the limbs and those between the limbs and the torso.
The extracted features will be analyzed and used as the inputs to the decision
making subsystem for the classification.
In this work, four features were
considered in the distance group and six in the angle group. These features are
illustrated in Figure 8.
Figure 8: Locations of all features.
The four features in
the distance group are
(i)
front hand to the median axis of
torso. (F8),
(ii)
back hand to the median axis of
torso. (F9),
(iii)
front hand to back hand (F10),
(iv)
heel of the front foot to the toe
of the back foot (F7).
The value of the arm swing distance
may be negative to indicate the swing direction. The purpose is to distinguish
the abnormality of the arm swing of the patients from the normal people. To
keep the measurement uniform amongst different samples, all the distance
features are normalized by the height of the human subject.
In the joint angle
group, six features are extracted:
(i)
two knee-joint angles (F1 and F4),
(ii)
two
ankle-joint angles (F2 and F3),
(iii)
two
joint angles at the elbows
(F5 and F6).
5.1. Distance Feature Calculation
It is noted that the swing distance of
the limb is the most suitable feature to assess the ability of the limbs to
stretch (stretchability). It is also apparent that there are certain
correlations between the
distances and the joint angles, which makes it feasible to substitute the
distance for the joint of the same limb. For example, when the arms fully
swing, both the swing distance and the shoulder joint angle will reach their
maximum values simultaneously. The best option to represent the flexibility of
the limb would be the joint angle features. However, it is in general difficult to obtain precise angle
measurement. We therefore also consider using distance features.
The advantages of
using the distance features are multitude. First of all, they are relatively
easier to obtain. It only requires two end points to extract this feature. Secondly,
the results of the distance calculation are robust to noise, due to the fact
that the distance features calculation only depends on the two end points. In
general, the obtained distance features are more accurate than the angles features
calculated by the current method based on skeleton.
The distances measurement is based on
the skeleton figure obtained, since it can accurately represent the general
structure of the human subject. We identify the end points of the limbs from
the skeleton, and then calculate the distances. First, the center of the torso
in the horizontal direction needs to be found. This is not as easy as it
sounds, because of the variations in the physical build of the human subjects can be substantial.
Particularly in the aged group, the median point of the body around the chest
in the horizontal direction is not very
accurate. Through the analysis of the data samples, it is apparent that the two
legs’ merging point around the hip area more accurately represents the real
median point of the body. Also it is noted that the bearings
of the gait are crucial
while extracting the distance between the front foot heel and the back foot toe
(F7). In this project, the black leg is always in the front, and the white leg
is away from the camera. Since the walking direction can be either to the right
or to the left, the walking direction for each individual image has to be
identified before any further processing. For the distance between the two
feet, a chain coding algorithm is used [22]. The measurement of the distance for
the arm is based on the skeleton figure to the arm region only, because it
identifies the end points from the skeleton of the arm.
5.2. Joint Angle Calculation
Previous work attempted to use chain
coding to represent the skeleton figure, and from the directional information
of the chain code, to calculate the various joint angles. This method was found
to be very sensitive to straight line segments. Therefore, a more robust method
based on the Hough transform is investigated.
5.2.1. Hough Transform
The Hough transform [1] transforms an
image into the parameter space, where the image is represented by the
parameters of straight lines or curves inherent in the original image plane.
One of the most popular applications of Hough transform is edge linking, where
the objective is to link together separate segments of lines in an image [23].
This involves finding the subsets of points, which lie on the same straight
lines.
Consider a point
on a straight line in an image.
The Hough transform can take two forms to represent the straight line as the
input, one in the sloe-intercept form,
, and the other in the normal form,
. The drawback of using the slop-intercept
form is that the gradient becomes infinity for vertical lines. Therefore, the
normal representation of a line is the preferred form. This is illustrated
graphically in Figure 9. Since
a computer processes information digitally, the original image is quantised
spatially. Similarly, the Hough transformed image must also be quantized. This
is just a matter of using a 2D matrix with a desired number of elements for
and
. In Figure 9, the transformed image has a
of –90 degrees,
of +90 degrees,
of
√2 D pixels, and
of
+√2 D pixels, where
is the distance between the corners of the original image in
pixels. The actual number of elements in the parameter space depends on the
desired accuracy and resolution. The larger the number of
cells, the finer the resolution of the angle of
a single line is. The larger the number of
cells, the straighter the line represented by
one cell of the transformed image. Each cell contains the number of pixels in
the image that lie on the line with the corresponding parameter values.
Figure 9: Hough transform using normal representation of a line.
5.2.2. Joint Angles Extraction
The knee joint angle is the angle
between the upper and lower legs.
This angle is always more than 90 degrees for walking sequences, since a person
have to be running to achieve knee angle of less than 90 degrees. Once the
different limbs have been extracted, the Hough transform is applied to obtain
their parametric representations. The cell with the highest pixel counts in the
parameter matrix then represents a straight line with the most pixels through
it and thus represents the most likely line through that part of the skeleton.
The absolute angle of this line can be found immediately by reading the
coordinate of the cell. This process is repeated
until the straight lines for all the limbs are found.
5.3. Quantitative Evaluation of the Precision of the Features
Both the distance and the angle features are
estimated based on the stick figure calculated by skeletonization. When the
image is noise free, the stick figure is the medial axis of the human body (with
subpixel precision), and the accuracy of the distances is also within subpixel
precision. The precision of estimating the joint angles depends on the resolution
of the
cells. The larger the number of
cells, the finer the resolution of the joint
angle one can obtain, but the more computationally intensive the algorithm
becomes. In this work, the resolution of the angle is approximately 0.5 degree
which is a proper compromise between accuracy and efficiency.
6. Feature Selection
Previous work attempted to use
histogram analysis to select the most significant features to train a
multilayer back propagation neural network [15]. Although it achieved a classification accuracy of approximately
85%, it requires a subjective judgment of the result to obtain a reliable
feature selection. Therefore, we investigated a hybrid feature selection method
combining the strengths of a sequential backward selection (SBS) procedure for
feature selection and a general regression neural network (GRNN) for feature
evaluation.
6.1. General Regression Neural Networks (GRNN)
GRNN can be considered as a special
example of the radial basis function (RBF) network, where the units in the
hidden layer adopt the Gaussian kernel as the nonlinear activation function
while the second layer consists of linear summation units. Unlike the
conventional RBF network where the centers and the widths of the Gaussian
kernels are determined by iterative clustering procedures, the corresponding
parameters in GRNN are represented as a deterministic function of the training
data. In other words, no iterative training procedure is required to
reconstruct a mapping using GRNN, hence allowing rapid evaluation of the
relevancy of different feature subsets [18, 24].
6.1.1. Mathematical Background of GRNN
GRNN is a memory-based feed-forward neural network. The
regression of a dependent variable,
,
on an independent variable,
, is the
computation of the most probable value of
for each value of
based on a finite
number of possibly noisy measurements of
and the associated values of
. The
variables
and
are usually vectors. In system identification, the dependent
variable,
, is the system output,
and the independent variable,
, is
the system input. In order to implement system identification, it is usually
necessary to assume some functional form with unknown parameters
. The values of the
parameters are chosen to make the best fit to the observed data. In the case of
pattern recognition, the independent variable
denotes the feature vector of the pattern to be classified, and
is the classification result. Assume
that
represents the known
joint continuous probability density function of a vector random variable,
, and a scalar random variable,
. Let
denote a
particular measured value of the random variable
. The conditional mean of
given
is
(1)where the density
is usually
unknown, and in GRNN, this probability density function is usually estimated
from samples of observations of
and
using nonparametric estimators. The
estimator used in this work is the class of consistent estimators proposed by
Parzen [18]. This probability
estimator
is based upon sample values
and
of the random variables
and
:
(2)where
denotes the
number of samples, and
denotes the
dimension of the vector variable
. A
physical interpretation of the probability estimator
is that it assigns sample probability of width
for
each sample
and
, and the probability
estimate is the sum of those sample probabilities. Substituting the joint
probability estimate
into the conditional mean yields
(3)
where
is
defined as
(4)
The only known
parameter in the above equation is the width of the estimating kernel which can
be estimated by using a cross validation method called the leave-one-out
method. For a particular value of
with
a training data set of
samples, the
leave-one-out method moves one sample at a time and constructs the GRNN using
the remaining
samples. Then,
the GRNN is used to classify the sample excluded. This is repeated
times, and each classification result is
stored. Then, the mean square classification error
is
calculated.
6.1.2. General Regression Neural Network
architecture
The above-mentioned
regression algorithm can be implemented in a neural network architecture which
is shown in Figure 10. It consists
of four layers: the input layer, the hidden layer, the summation layer, and the
output layer. The function of the input layer is to pass the input vector variables
to all the units in the hidden layer.
The hidden layer consists of all the training samples
. When an unknown pattern
is
presented, the squared distance
between the unknown pattern and the training
sample is calculated and passed through the kernel function. The summation
layer has two units A and B, unit A computes the summation of
multiplied by the
associated with
. The B unit computes the summation of
.
The output unit divides A by B to provide the prediction result.
Figure 10: GRNN architecture.
6.2. The Feature Selection Procedure
The walking sequences
of 90 people were video-taped and processed, including 50 from normal people (the
control group) and 40 from the patients group. These samples were evenly
divided into four subgroups: NA, NB, PA, and PB, where N represents normal
people, P represents patients, A represents the training data, and B represents
the testing data. Various techniques were carried out to select the best
features for classification.
(i)
Features selected by color
histograms.
(ii)
Features selected by SBS/GRNN.
(iii)
Features selected by the
combinations of color histogram and SBS/GRNN.
(iv)
All the features calculated in
feature extraction.
6.2.1. Features Selected from Histogram Analysis
In the histogram
analysis technique, ten color histogram features are obtained. The features
with the best classification power between the control group and the patient
group had histograms showing little overlap between the normal and patient
data. On the other hand, the color histograms of features displaying substantial
overlap perform poorly in classification (see Figure 11). We observed that the most
distinguished features obtained then are the front arm swing distance, the
front elbow angle, back foot’s ankle angle, back arm swing distance, and from leg
angle at the knee [25].
Figure 11: Examples of (a) good and (b) bad feature. The green area represents the patient group, and the blank area
represents the control group.
6.2.2. Feature Selection by SBS/GRNN
The sequential
backward selection (SBS) method is a simple top-down feature selection
procedure. Starting from the complete set of features,
features are discarded one by one at each stage. The feature
discarded is the one with the least discriminatory power form the current feature
set. Assume
features have been
discarded to form a feature set,
, which has
features at this moment. Then, feature
is discarded from the
remaining
features if
(5)
where
is
the mean squares error function defined before, evaluating classification
performance. Then,
. The Algorithm is
initialized by setting
.
To identify the most discriminatory feature
set, the algorithm will stop at a point where the classification error begins
to increase if removing more features from the current feature set. To find the
relative importance of all the
features within the set
, the
algorithm will continue until only one feature remains.
The SBS method is
first used to find out the discriminatory power of those 10 features, described
in the previous chapter. The order of the discriminatory power of those 10
features is shown in Table 1. The mean square versus the number of feature
included in classification is plotted in Figure 12. We observe that the minimum
mean square error occurs when the top eight discriminatory features are
included.
Table 1: The order of the
discriminatory power of the 10 features using the SBS method.
Figure 12: The result of sequential backward selection method.
As noted in the
resulting graph for the SBS algorithms (see Figure 12), the error curves
exhibit a distinct minimum point at the feature subset
and beyond which the error starts to
increase again. Since the GRNN modeling process does not incorporate any
explicit trainable parameters, it would be difficult for GRNN to model the
characteristics of the features beyond a certain maximum number of features. For
limited number of training samples and their increasing sparseness in
high-dimensional spaces, the error starts to rise beyond the local minimum
point.
6.2.3. Combination of Histogram and SBS/GRNN
Although the
discriminatory power given in Table 2 is supposed to indicate the importance of
the individual features in characterizing normal and patient groups, this may
not be the case for those features beyond the minimum point. Hence, a technique
which combines SBS/GRNN and histogram was proposed. It obtains five good
features using GRNN technique, and then uses the histogram to select the two
features with the most discriminatory power from the remaining four features.
Table 2: Classification result.
(i)
The resulting five features from
the SBS algorithm are front hand to back hand distance, front elbow angle, back
elbow angle, heel of the front foot to the toe of the back foot distance, and
back ankle angle.
(ii)
The best two of the remaining
features obtained by histogram analysis are back knee joint angle and front
hand to torso distance.
7. Classification Results
We implemented the proposed processing
method into our video analysis system and tested the system on a database of
both PD patients (the patient group) and healthy people (the control group).
The dataset was video-taped at Sydney Westmead Hospital,
and at the University of Sydney. In total, 90
images (40 from the patients group and 50 from the normal group) were extracted
from the videos and processed. The samples from the two groups were evenly
divided into four subgroups: NA, NB, PA, and PB (N stands for normal and P for
patient). The data were divided in such fashion, hence while using group A (NA
and PA) as training data, group B was used as testing data, and vice versa.
The decision making part consists of a
three-layered back propagation neural network. It was trained by the features
selected from the previous stage, and the performance of the system was tested
using new images. Separate neural networks were constructed for training
features selected by the above techniques.
(1)
Histogram analysis. For the
features obtained by histogram technique, a network consisting of 6 input
neurons corresponding to the six most significant features selected by histogram
analysis, namely, back knee angle (F4), both elbow angles (F5, F6), front hand
swing distance (F8), back hand swing distance (F9) and the distance between the
front hand and back hand (F10), 4 hidden neurons, and 2 output neurons was constructed. The
network was trained using the six features selected by the histogram analysis.
(2)
Sequential backward selection. The
second network consists of 8 input neurons, 5 hidden neurons, and 2 output
neurons. The eight most effective features determined by the SBS algorithm were
used to train this network.
(3)
Combined
SBS and histogram analysis. This network consists of 7 input neurons; each
corresponds to the features obtained by this feature. The network consists of 6
hidden neurons and 2 output neurons. The network uses five of the most significant
features selected using SBS/GRNN and two other features, backhand elbow angle
(F6) and backhand swing distance (F8), obtained from histogram analysis as inputs
to train the network.
(4)
All the features included. The
last network was constructed to be trained with all 10 features and compared to
the classification results obtained by the feature extraction strategies
described above. Six hidden neurons and 2 output neurons were implemented in
this network.
Four experiments were conducted: (1) using NA, PA as the training
set and NB, PB as the testing set, (2) using NB, PB as the training set and NA,
PA as the testing set, (3) using NA, PB as the training set and NB, PA as the testing
set, (4) using NB, PA for
training and NA, PB for testing. The data were subdivided into those four
groups to avoid over training in the case of a large sample space. Those four
experiments were performed for cross validation.
From Table 2, we have the following observations.
(i)The histogram analysis method
selects the features according to the data distribution for each feature. By
discarding the “bad features”, the system uses only “good” and “medium” for the
classification. The system yields an 83.7% correct detection rate which is
acceptable.(ii)Selecting features by SBS/GRNN, the
classification results were slightly improved compared with those obtained from
histogram analysis. This indicates that SBS/GRNN, the more systematic feature
selection method, is able to better explore the information carried in the
database in terms of the features being studied.(iii)By combining the best features
selected by SBS/GRNN and
histogram analysis, the average classification rate has been further improved
to 88.4%. This clearly demonstrates the complementary discriminatory powers of
the two feature selection methods.(iv)When all the features were used
for classification, the results were substantially worse than those obtained
using the selected sets of features.
In conclusion, there is not enough evidence
to justify whether distance features are better than joint angle features in
determining Parkinsonism patients. However, from the results, it is clear that
certain features, such as front arm swing distance and front elbow angle, are
more significant in classifying Parkinson symptom than others. When all
features, including those features with less significance, were used for
classification, the results clearly dropped.
8. Conclusions and Future Work
This paper applies image processing
techniques to assist clinical analysis of patients with gait altering neurological
disorders. The main objective of the work is to remove constrains of laboratory
environment and to provide a more realistic setup for assist in diagnosing
neurological disorders. In the paper, a neural network based color image
segmentation method was introduced to effectively utilize the rich information
in the color domain to improve quality of object segmentation, and a feature
selection method-based sequential backward selection and general regression
neural network was combined with a histogram feature selection mechanism to select
an effective feature set for robust decision making.
The system developed
in this work is able to automatically extract features, swing distances, and
joint angles, under the given environment setting. Hence, it provides a solid
foundation for developing intelligent computer-assisted systems to assist
neurologists in diagnosing posture and movement disorders.
The
image acquisition system in this work still needs some constrained settings. The
reason to have the setting is due to the difficulty of image segmentation. As
we all know, state of the art in image processing is still not able to support
high quality segmentation in a complete setup free environment. However, new
techniques in interactive computer vision such as graph cut [26] have demonstrated their effectiveness in segmenting objects in relatively complex
environment in an interactive fashion, thus providing a potential solution to
image segmentation when real-time processing is not required as in the application
on hand. We will investigate application of graph cut to video analysis
of human gait and posture for assisting clinicians in diagnosing Parkinson
disease.
References
- D. H. Ballard and C. M. Brown, Computer Vision, Prentice-Hall, Englewood-Cliffs, NJ, USA, 1982.
- S. Geman and D. Geman, “Stochastic relaxation, Gibbs distribution, and the Bayesian restoration of images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 6, no. 6, pp. 721–741, 1984.
- M. Selzer, S. Clarke, L. Cohen, P. Duncan, and F. Gage, Textbook of Neural Repair and Rehabilitation, Cambridge University Press, Cambridge, UK, 2006.
- C. K. William and R. L. Watts, Movement Disorders: Neurologic Principle & Practice, McGraw-Hill Professional, New York, NY, USA, 2004.
- A. C. England and R. S. Schwab, “Parkinson's syndrome,” The New England Journal of Medicine, vol. 265, pp. 785–887, 1961.
- J. M. S. Pearce, Parkinson's Disease and Its Management, Oxford University Press, New York, NY, USA, 1992.
- L. A. Gundersen, D. R. Valle, A. E. Barr, J. V. Danoff, S. J. Stanhope, and L. Snyder-Mackler, “Bilateral analysis of the knee and ankle during gait: an examination of the relationship between lateral dominance and symmetry,” Physical Therapy, vol. 69, no. 8, pp. 640–650, 1989.
- J. Han, H. S. Jeon, B. S. Jeon, and K. S. Park, “Gait detection from three dimensional acceleration signals of ankles for patients with Parkinson's disease,” in Proceedings of the International Special Topic Conference on Information Technology in Biomedicine, Ioannina, Epirus, Greece, October 2006.
- L. Lee and W. E. L. Grimson, “Gait analysis for recognition and classification,” in Proceedings of the 5th IEEE International Conference on Automatic Face and Gesture Recognition
(FGR '02), pp. 148–155, Washington, DC, USA, May 2002.
- H. Mitoma, R. Hayashi, N. Yanagisawa, and H. Tsukagoshi, “Characteristics of parkinsonian and ataxic gaits: a study using surface electromyograms, angular displacements and floor reaction forces,” Journal of the Neurological Sciences, vol. 174, no. 1, pp. 22–39, 2000.
- M. W. Whittle, Gait Analysis: An Introduction, Elsevier Health Sciences, Oxford, UK, 2002.
- R. Chang, L. Guan, and J. A. Burne, “An automated form of video image analysis applied to classification of movement disorders,” Disability and Rehabilitation, vol. 22, no. 1-2, pp. 97–108, 2000.
- R. D. Green and L. Guan, “Quantifying and recognizing human movement patterns form monocular video images—part II: applications to biometrics,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 14, no. 2, pp. 191–198, 2004.
- R. D. Green, L. Guan, and J. A. Burne, “Real-time gait analysis for diagnosing movement disorders,” Journal of Electronic Imaging, vol. 5, pp. 253–269, 1999.
- T. Tan, L. Guan, and J. A. Burne, “A real-time image analysis system for computer-assisted diagnosis of neurological disorders,” Real-Time Imaging, vol. 5, no. 4, pp. 253–269, 1999.
- L. Wang, T. Tan, W. Hu, and H. Ning, “Automatic gait recognition based on statistical shape analysis,” IEEE Transactions on Image Processing, vol. 12, no. 9, pp. 1120–1131, 2003.
- C.-Y. Yam, M. S. Nixon, and J. N. Carter, “Extended model-based automatic gait recognition of walking and running,” in Proceedings of the 3rd International Conference on Audio- and Video-Based Biometric Person Authentication (AVBPA '01), pp. 278–283, Halmstad, Sweden, June 2001.
- M. S. Nixon and J. N. Carter, “Advances in automatic gait recognition,” in Proceedings of the 6th IEEE International Conference on Automatic Face and Gesture Recognition
(FGR '04), pp. 139–144, Seoul, Korea, May 2004.
- M. S. Nixon and J. N. Carter, “Automatic recognition by gait,” Proceedings of the IEEE, vol. 94, no. 11, pp. 2013–2024, 2006.
- S. Sarkar, P. J. Phillips, Z. Liu, I. R. Vega, P. Grother, and K. W. Bowyer, “The humanID gait challenge problem: data sets, performance, and analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 2, pp. 162–177, 2005.
- A. K. Jain, Fundamentals of Digital Image Processing, Prentice-Hall, Englewood-Cliffs, NJ, USA, 1989.
- R. Chang, L. Guan, and J. A. Burne, “A computer assisted image analysis system for diagnosing movement disorders,” in Proceedings of the 10th Australian Joint Conference on Artificial Intelligence (AI '97), pp. 290–301, Perth, Australia, November-December 1997.
- R. C. Gonzalez and R. E. Woods, Digital Image Processing, Addison-Wesley, Reading, Mass, USA, 1993.
- D. F. Specht, “A general regression neural network,” IEEE Transactions on Neural Networks, vol. 2, no. 6, pp. 568–576, 1991.
- H. Lee, in Video analysis of human gait and posture to determine neurological disorders, Master of Engineering Thesis, University of Sydney, Sydney, Australia, August 2000.
- Y. Y. Boykov and M.-P. Jolly, “Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images,” in Proceedings of the 8th IEEE International Conference on Computer Vision (ICCV '01), vol. 1, pp. 105–112, Vancouver, BC, Canada, July 2001.