Abstract

The problem considered in this paper is how to measure the degree of resemblance between nonarthritic and arthritic hand movements during rehabilitation exercise. The solution to this problem stems from recent work on a tolerance space view of digital images and the introduction of image resemblance measures. The motivation for this work is both to quantify and to visualize differences between hand-finger movements in an effort to provide clinicians and physicians with indications of the efficacy of the prescribed rehabilitation exercise. The more recent introduction of tolerance near sets has led to a useful approach for measuring the similarity of sets of objects and their application to the problem of classifying image sequences extracted from videos showing finger-hand movement during rehabilitation exercise. The approach to measuring the resemblance between hand movement images introduced in this paper is based on an application of the well-known Hausdorff distance measure and a tolerance nearness measure. The contribution of this paper is an approach to measuring as well as visualizing the degree of separation between images in arthritic and nonarthritic hand-finger motion videos captured during rehabilitation exercise.

1. Introduction

This paper presents an approach to quantifying and visualizing the degree of separation between images in arthritic and non-arthritic hand-finger motion videos captured during rehabilitation exercises. The proposed approach is based on tolerance near set theory. In this paper, a complete procedure for determining the degree of resemblance between non-arthritic and arthritic hand movements is presented. Measuring resemblances between hand motions during rehabilitation exercise has two main advantages: (i) apart from measurements of stiffness and pain before and after rehabilitation exercise, the separation as well as the degree of resemblance between what would be considered normal hand-finger motion and arthritic hand-finger motion can be measured (resemblance between sequences of non-arthritic and arthritic hand-finger movements are reported in this paper) and (ii) hand motion resemblance measurements provide a basis for assessing the efficacy of rehabilitation exercise regimes for arthritic patients. Videos made during hand-finger motion tracking that are part of a telerehabilitation system for automatic tracking and assessment of rehabilitation exercise by those with arthritis are a source of image sequences that are analyzed in this paper (see, e.g., [1]). The approach presented here can be used for assessment and comparison in problem domains that can be formulated in terms of a set of objects with descriptions represented by feature value vectors. A feature vector is an 𝑛-dimensional vector of numerical features representing an object description. Disjoint sets containing objects with similar descriptions are near sets. As an example of the degree of nearness between two sets, consider Figure 1 as two pairs of ovals containing colored segments. Each color in the figures corresponds to an equivalence class where all pixels in the class have matching descriptions, for example, pixels with matching colors. Thus, the ovals in Figure 1(a) are closer (more near) to each other in terms of their descriptions than the sets in Figure 1(b). Specifically, in comparing hand-finger movement images, image patches (collections of subimages with similar descriptions) provide information and reveal patterns of interest. The contribution of this paper is an approach to measuring as well as visualizing the degree of separation between images in arthritic and non-arthritic hand-finger motion videos captured during rehabilitation exercise.

This paper is organized as follows. Section 2 presents related works to help establish a context for this research. Section 3 gives a brief introduction to near set theory, Section 4 presents the image processing necessary to perform feature extraction on the hand images, Section 5 presents the algorithm used to generate the results presented in this paper, and finally Section 6 presents a discussion on the results.

The hand-finger motion classification method reported in this paper is an outgrowth of earlier work on medical imaging [2, 3] and, in particular, on comparing hand movement image sequences [4]. The term arthritis is derived from the Greek words arthron (referring to joints) and the suffix itis (inflammation of). Interest in arthritis has not always been approached with as much fervor as other human ailments, particularly since the most common form (osteoarthritis) is not likely to be fatal [5]. However, human life expectancy has continued to improve, and, hence, an increase in arthritis cases is highly probable. Typically, with age there is a much greater likelihood of joints degrading and potentially wearing out. There are a a number of factors that lead to arthritis, for example, lifestyle, heredity, joint trauma, and even work-related, repetitive tasks [5]. Although the prognosis may not be fatal, quality of life for arthritis patients can be severely limited due to pain and disability. The resulting costs associated with health care for arthritis patients has become significant. Forbes published a list of the most expensive diseases and arthritis made the list in the USA, totaling 7.8 billion dollars of annual spending reported from 2002 [6]. As a result of reduced quality of life and the burden placed on health-care systems, continued research efforts are ongoing in drugs, joint replacements, intra-articular injections, and other experimental treatments of the disease [7].

A principal contribution of this paper is an application of near set theory in providing a basis for quantifying the extent that hand-finger motion images resemble each other. Near set theory has connections in topology [8], proximity spaces [9, 10], metric spaces [11], tolerance spaces [12, 13], and approach spaces [14, 15]. Near sets have proved to be useful in solving problems based on human perception [8] that arise in areas such as image analysis [2, 4, 14, 16], image processing [2, 4, 12, 13, 1618], face recognition [19], ethology [20], image morphology, and segmentation evaluation [21, 22] as well as many engineering and science problems.

While the applications presented in this paper are based on the comparison of hand movement images, the proposed approach is suitable for investigation of problems formated in a similar manner. For example, Schubert et al. [23] presented a neural cell detection system to measure fluorescent lymphocytes in images of tissue sections. Their approach was to use a neural network, trained from a set of cell image patches, to determine if a pixel is the centre of one fluorescent cell. Each pixel was associated with a 6-dimensional feature vector generated by principal component analysis (PCA) on a 15×15 subimage centred on the pixel. Another example of a problem formated in a manner conducive to the proposed approach to discovering affinities in medical data is given by Yu et al. [24] in terms of a protein-protein interaction extraction from biomedical text. Given an abstract of an article containing instances of proteins, the system detects whether a relationship exists for each pair of proteins in the abstract. This problem is solved by using support vector machines, where each sentence containing a reference to proteins in a given abstract is considered an object and lexical and syntactic features are used to create a feature-value vector.

3. Tolerance Near Sets

Tolerance near sets are defined in the context of tolerance spaces. The term tolerance space was coined by Zeeman in 1961 in modelling visual perception with tolerances [25]. A tolerance space 𝑋, consists of a set 𝑋 and a binary relation on 𝑋 (𝑋×𝑋) that is reflexive (for all 𝑥𝑋, 𝑥𝑥, instead of (𝑥,𝑦) we write 𝑥𝑦) and symmetric (for all 𝑥,𝑦𝑋, if 𝑥𝑦, then 𝑦𝑥) but transitivity of is not required. In this case, is called a tolerance relation (on 𝑋) or simply a tolerance.

All sets in near set theory consist of perceptual objects, defined as something that has its origin in the physical world. Moreover, all objects need to be described in some manner. This is accomplished by a probe function, a real-valued function representing a feature of a perceptual object [26]. Next, a perceptual system 𝑂,𝔽 consists of a nonempty set 𝑂 of sample objects, and a non-empty set 𝔽 of real-valued functions 𝜙𝔽 such that 𝜙𝑂 [8]. The elements of 𝑂 are called perceptual objects and the functions in 𝔽 are called probe functions. The description of an object 𝑥𝑂 is a vector given by𝜙𝝓(𝑥)=1(𝑥),𝜙2(𝑥),,𝜙𝑖(𝑥),,𝜙𝑙(,𝑥)(1) where 𝑙 is the length of the vector 𝝓 and each 𝜙𝑖(𝑥) in 𝝓(𝑥) is a probe function value that is part of the description of the object 𝑥𝑂. Keeping these concepts in mind, a perceptual tolerance relation can be described as follows.

Definition 1 (perceptual tolerance relation [12, 13] see [2, 17] for applications)). Let 𝑂,𝔽 be a perceptual system and let 𝜀. For every 𝔽 a reflexive and symmetric tolerance relation ,𝜀 is defined as follows: ,𝜀=(𝑥,𝑦)𝑂×𝑂𝝓(𝑥)𝝓(𝑦)2𝜀.(2) For notational convenience, this relation can be written as instead of ,𝜀 with the understanding that 𝜀 is inherent to the definition of the tolerance relation.

Definition 1 gives rise to two very useful types of sets, namely, a neighbourhood and a tolerance class. A neighbourhood of an object 𝑥𝑂 is defined as𝑁(𝑥)=𝑦𝑂𝑥𝑦.(3) An example of a neighbourhood in 2D feature space is given in Figure 2 where the position of all the objects is given by the numbers 1 to 21 and the neighbourhood is defined with respect to the object labelled 1. Notice that the distance between all the objects and object 1 is less than or equal to 𝜀=0.1 but that not all the pairs of objects in the neighbourhood of 𝑥 satisfy the tolerance relation. In contrast, all the pairs of objects within a preclass must satisfy the tolerance relation. A set 𝑋𝑂 is a pre-class when 𝑥𝑦 for any pair 𝑥,𝑦𝑋 [27]. A maximal pre-class with respect to inclusion is called a tolerance class. An example of a tolerance class is given in Figure 2 since no object can be added to the orange set and still satisfy the condition that any pair 𝑥,𝑦𝑋 must be within 𝜀 of each other.

As mentioned above, we are interested in sets that have some objects that are similar to each other, where the term “similar” is quantified by the tolerance relation given in Definition 1. Thus, we introduce the following definition for tolerance near sets.

Definition 2 (tolerance near set relation [12, 13]). Let 𝑂,𝔽 be a perceptual system, and let 𝑋,𝑌𝑂,𝜀. A set 𝑋 is near to a set 𝑌 within the perceptual system 𝑂,𝔽(𝑋𝔽𝑌) if and only if there exists 𝑥𝑋 and 𝑦𝑌 and there is 𝔽 such that 𝑥,𝜀𝑦.

Definition 3 (tolerance near sets [12, 13]). Let 𝑂,𝔽 be a perceptual system, and let 𝜀,𝔽. Further, let 𝑋,𝑌𝑂 denote disjoint sets with coverings determined by the tolerance relation ,𝜀, and let 𝐻,𝜀(𝑋),𝐻,𝜀(𝑌) denote the set of tolerance classes for 𝑋,𝑌, respectively. Sets 𝑋,𝑌 are tolerance near sets if and only if there are tolerance classes 𝐴𝐻,𝜀(𝑋),𝐵𝐻,𝜀(𝑌) such that 𝐴𝔽𝐵.

As a practical example, consider an application in the area of image processing. Define an RGB image as 𝑓={𝐩1,𝐩2,,𝐩𝑇}, where 𝐩𝑖=(𝑐,𝑟,𝑅,𝐺,𝐵)T, 𝑐[1,𝑀], 𝑟[1,𝑁], 𝑅,𝐺,𝐵[0,255], and 𝑀,𝑁, respectively, denote the width and height of the image and 𝑀×𝑁=𝑇. Further, define a square subimage as 𝑓𝑖𝑓 such that 𝑓1𝑓2𝑓𝑠= and 𝑓1𝑓2𝑓𝑠=𝑓, where 𝑠 is the number of subimages in 𝑓. Next, 𝑂 can be defined as the set of all subimages, that is, 𝑂={𝑓1,,𝑓𝑠}, and 𝔽 is a set of functions that operate on images. Then the sets 𝑋,𝑌𝑂 are perceptually near each other if there are 𝑥𝑋 (i.e., subimages from 𝑋) and 𝑦𝑌 (i.e., subimages from 𝑌) and there is 𝔽 such that 𝑥𝑦. This would be the case when there are two or more subimages that have similar descriptions using the probe functions in .

Definition 2 provides a means of determining whether two sets of perceptual objects are near each other. Suppose, however, that we want to consider the problem of comparing objects in tolerance near sets (such as sets created by separate images) and measure the degree of similarity between the two sets. This problem is of interest because its solution provides a formal basis for measuring the resemblance of sets of objects that are described by feature value vectors and has many applications, such as the problem of measuring image resemblance presented in this paper. In other words, a method for determining the degree in which two tolerance near sets are similar is needed. Let 𝑋 and 𝑌 be two disjoint sets, and let 𝑍=𝑋𝑌. Then a nearness measure [2, 28, 29] between two sets is given bytNM,𝜀(𝑋,𝑌)=𝐶𝐻,𝜀(𝑍)||𝐶||1𝐶𝐻,𝜀(𝑍)||𝐶||||||,||||min𝐶𝑋𝐶𝑌||||,||||.max𝐶𝑋𝐶𝑌(4)

The idea behind (4) is that similar sets should produce tolerance classes that are evenly divided between 𝑋 and 𝑌. This is measured by counting the number of objects that belong to sets 𝑋 and 𝑌 for each tolerance class and then comparing these counts as a proper fraction. Then, the measure is simply a weighted average of all the fractions. A weighted average was selected to give preference to larger tolerance classes with the idea that a larger tolerance class contains more perceptually relevant information.

Calculating the proper fraction for a single tolerance class 𝐶 is shown graphically in Figure 3 using the example given above concerning subimages. Figure 3(a) gives a sample tolerance class in 3D feature space, while Figure 3(b) shows the position of the subimages in the images (i.e., sets 𝑋 and 𝑌) that belong to the tolerance class in Figure 3(a). Observe that a tolerance class in feature space can be distributed throughout the images and that tNM would compare the number of objects from the tolerance class in set 𝑋 to the number of objects from the tolerance class in set 𝑌. In this case, the ratio would be close to 1 because the number of objects in both sets 𝑋 and 𝑌 are roughly the same.

4. Segmenting Hand Motion Images and Feature Extraction

Recall that the focus of this paper is to present an application of the tolerance near set approach by way of comparing the hand movements of an arthritic patient with normal hand movements during rehabilitation exercises. Digital images obtained from video captured during the exercises are used to make the comparison (see, e.g., [30]). As a result, a brief presentation of the image segmentation and feature extraction methods used in the reported experiments are presented in this section. Measuring the resemblance of hand motion images is made possible by segmenting the images. Segmenting a digital image is a separation of image regions that are nonoverlapping and is important in this work since it facilitates separation of image background from the portion of a hand in an image (see, e.g., Figure 4).

4.1. Mean Shift Segmentation Algorithm

The mean shift algorithm (introduced in [31]) segments an image using kernel density estimation, a nonparametric technique for estimating the distribution of a random variable. Nonparametric techniques are characterized by their lack of assumptions about the distributions and differ from parametric techniques which assume a given distribution and then estimate parameters which describe the density, like mean or variance [34]. The estimate of the distribution at a point 𝐱 is calculated from the number of observations within a volume in 𝑑-dimensional space centred on 𝐱 and a kernel that weights the importance of the observations [34]. The segmentations used in this paper were created using an implementation of the mean shift segmentation algorithm called EDISON [32], a system for which both the source code and binaries are freely available online. A sample segmentation produced by the EDISON system is given in Figure 4.

4.2. Multiscale Edge Detection

Mallat's multiscale edge detection method uses Wavelet theory to find edges in an image [33]. Edges are located at points of sharp variation in pixel intensity identified by calculating the gradient of a smoothed image (i.e., an image that has been blurred). Then, edge pixels are defined as those that have locally maximal gradient magnitudes in the direction of the gradient. Examples of our own implementation of Mallat's edge detection and edge orientation methods are given in Figure 5.

4.3. Feature Extraction

An example of the type of images obtained directly from the video is given in Figure 6(a). These images needed to be further processed to remove the common background (e.g., all the images contain the white desktop, the square blue sensor, etc.) that would produce results indicating that all the images were similar. Therefore, the mean shift segmentation algorithm was used to create a segment containing only the hand in each image. The resultant segmented image is given in Figure 6(b) where pixels with similar colour are now grouped together into segments. The next step was to use the segment representing the hand as a mask to separate the hand from the original image (given in Figure 6(c)). Next, notice the absence of the majority of the black background (representing the masked pixels in the original image) in Figure 6(d). Each image was cropped to an image containing only the hand because the output of probe functions on the black background would be the same for each image.

Next, perceptual objects are created in the same manner as the example given in Section 3. Specifically, each image was divided into square subimages such that no subimage overlapped, where each subimage represents an object in the near set sense and a probe function is then any function that can operate on images. In this case, we used only one probe function, namely, the average orientation of lines within a subimage. For example, the orientation can be determined (using the process given in Section 4.2) for each pixel considered part of a line detected in an image. Then, the probe function takes an average of all the orientations for pixels belonging to edges within a specific subimage. An example of the output of this probe function is given in Figure 6(d).

5. Tolerance Class Algorithm

The practical application of the nearness measure, tNM, rests on the ability to efficiently find all the classes for a set 𝑍=𝑋𝑌. In the case where 𝜀=0, the process is straightforward, that is, the first object is assigned to a tolerance class, then the description of each subsequent object is compared to objects in each existing tolerance class. If a given object's description does not match any of the descriptions of the existing tolerance classes, then a new class is created. Thus, the algorithm runtime ranges from order 𝑂(|𝑍|2) in the worst case, which occurs when none of the object descriptions match, to 𝑂(|𝑍|), which occurs when all the object descriptions are equivalent. In practice, the runtime is somewhere between these two extremes.

The approach to finding tolerance classes in the case where 𝜀0 is based on the observations presented in the following Propositions.

Proposition 1. Given a tolerance space 𝑋,,𝜀, all tolerance classes containing 𝑥𝑋 are subsets of neighbourhood 𝑁(𝑥).

Proof. Given a tolerance space 𝑋,,𝜀 and tolerance class 𝐴,𝜀, then (𝑥,𝑦),𝜀 for every 𝑥,𝑦𝐴. Let 𝑁,𝜀(𝑥) be a neighbourhood of 𝑥𝑋 and assume that 𝑥𝐴. For 𝑦𝐴,(𝑥,𝑦),𝜀. Hence, 𝐴𝑁,𝜀(𝑥). As a result, 𝑁,𝜀(𝑥) is superset of all tolerance classes containing 𝑥.

Proposition 2. Let 𝑧1,,𝑧𝑛𝑍 be a succession of objects, called query points, such that 𝑧𝑛𝑁(𝑧𝑛1)𝑧𝑛1, 𝑁(𝑧𝑛)𝑁(𝑧𝑛1)𝑧𝑛1𝑁(𝑧1)𝑧1, and define 𝑁(𝑧0)𝑧0 as the original set of objects (i.e., 𝑁(𝑧0)𝑧0=𝑍). In other words, the series of query points, 𝑧1,,𝑧𝑛𝑍, is selected such that each subsequent object 𝑧𝑛 (where 𝑧𝑛𝑧𝑛1) is obtained from the neighbourhood 𝑁(𝑧𝑛1), that is created only using objects from the previous neighbourhood. Then, under these conditions, the set {𝑧1,,𝑧𝑛} is a pre-class.

Proof. For 𝑛2, let 𝑆(𝑛) be the statement that {𝑧1,,𝑧𝑛} is a pre-class given the conditions in Proposition 2.
Base Step (𝑛=2)
Let 𝑧1𝑍 be the first query point, and let 𝑁(𝑧1) be the first neighbourhood. Next, let 𝑧2 represent the next query object. Since 𝑧2 must come from 𝑁(𝑧1) and all objects in 𝑥𝑁(𝑧1) satisfy the tolerance relation 𝑧1,𝜖𝑥, 𝑆(2) holds.
Inductive Step
Fix some 𝑘2 and suppose that the inductive hypothesis holds, that is, {𝑧1,,𝑧𝑘} is a pre-class, and choose 𝑧𝑘+1 from 𝑁(𝑧𝑘)𝑧𝑘. Since 𝑁(𝑧𝑘)𝑁(𝑧𝑘1)𝑧𝑘1𝑁(𝑧1)𝑧1, 𝑧𝑘+1 must satisfy the perceptual tolerance relation with all the objects in {𝑧1,,𝑧𝑘}. Consequently, {𝑧1,,𝑧𝑘+1} is also a pre-class.

Therefore, by MI, 𝑆(𝑛) is true for all 𝑛2.

Corollary 1. Let 𝑧1,,𝑧𝑛𝑍 be a succession of objects, called query points, such that 𝑧𝑛𝑁(𝑧𝑛1)𝑧𝑛1, 𝑁(𝑧𝑛)𝑁(𝑧𝑛1)𝑧𝑛1𝑁(𝑧1)𝑧1, and define 𝑁(𝑧0)𝑧0 as the original set of objects (i.e., 𝑁(𝑧0)𝑧0=𝑍). In other words, the series of query points, 𝑧1,,𝑧𝑛𝑍, is selected such that each subsequent object 𝑧𝑛 (where 𝑧𝑛𝑧𝑛1) is obtained from the neighbourhood 𝑁(𝑧𝑛1) that is created only using objects from the previous neighbourhood. Then, under these conditions, the set {𝑧1,,𝑧𝑛} is a tolerance class if |𝑁(𝑧𝑛)|=1.

Proof. Since the cardinality of 𝑁(𝑧1) is finite for any practical application and the conditions given in Proposition 2 dictate that each successive neighbourhood will be smaller than the last, there is an 𝑛 such that |𝑁(𝑧𝑛)|=1. By Proposition 2 the series of query points {𝑧1,,𝑧𝑛} is a pre-class, and by Proposition 1 there are no other objects that can be added to the class {𝑧1,,𝑧𝑛}. As a result, this pre-class is maximal with respect to inclusion and by definition is called a tolerance class.

The above observations are visualized in Figure 7 using the example first introduced in Figure 2. Starting with the the proof of Proposition 2, a visual example of the base step is given in Figures 7(a) and 7(b). In this case, only the first 21 objects of 𝑍 are shown, where 𝑧1 is the object labelled 1 and 𝑁(𝑧1) is the circle containing the objects {1,,21}. Next, according to Proposition 2, another query point 𝑧2𝑁(𝑧1)𝑧1 is selected (i.e., 𝑧2 can be any object in 𝑁(𝑧1) except 𝑧1). Here, 𝑧2=20 is selected because it is the next object closest to 𝑧1. Since 𝑧1,𝜖𝑧2, the class {𝑧1,𝑧2} is a pre-class. Also, note that Figure 7(b) also gives an example of 𝑁(𝑧2)𝑁(𝑧1) as the area shaded grey, and the area shaded red is the part of 𝑁(𝑧1) that does not satisfy the tolerance relation with 𝑧2. Continuing on, an example of the inductive step from the proof of Proposition 2 is given in Figure 7(e). In this case, there are 𝑘=5 objects and {𝑧1,,𝑧5}={1,20,10,6,15}. The area shaded grey represents 𝑁(𝑧5)𝑧5,,𝑁(𝑧1)𝑧1 along with the query points {𝑧1,,𝑧5}(according to the conditions given in Proposition 2 queries points are not included in subsequent neighbourhoods), and the other shaded areas represent the parts of successive neighbourhoods that no longer satisfy the tolerance relation with every query point. For instance, all the colours except red are in 𝑁(20), and all the colours except red and purple are in 𝑁(10) and 𝑁(6). Notice that all the objects in the grey area satisfy the tolerance with all the query points but that the grey area does not represent a pre-class. Moreover, any new query point selected from 𝑁(𝑧5)𝑧5={16,18,3,14,11} will also satisfy the tolerance relation with all the query points {𝑧1,,𝑧5}. Finally, Figure 7(f) demonstrates the idea behind Corollary 1. In this figure, the area shaded grey represents the neighbourhood of 𝑧7=3 along with all previous query points. Observe that (besides query points) the shaded area only contains one object, namely, 𝑧7. Also, note that there are no more objects that will satisfy the tolerance relation with all the objects in the shaded area. As a result, the set {𝑧1,,𝑧7} is a tolerance class.

Using Propositions 1 and 2 and Corollary 1, the following algorithm gives the pseudocode for an approach for finding all the tolerance classes on a set of objects 𝑍. The general concept of the algorithm is, for a given object 𝑧𝑍, to recursively find all the tolerance classes containing 𝑧. The first step, based on Proposition 1, is to set 𝑧 as a query point and to find the neighbourhood 𝑁(𝑧). Next, consider the nearest neighbour of 𝑧 from the neighbourhood 𝑁(𝑧) as a query point and find its neighbourhood only considering objects in 𝑁(𝑧). Continue this process until the result of a query produces a neighbourhood with cardinality 1. (The result of a query will always be at least 1 since the tolerance relation is reflexive.)

Lastly, the series of query points becomes the tolerance class.

Algorithm 1 (see [28]). (1)Take an element 𝑧𝑍 and find 𝑁,𝜀(𝑧).(2)Add 𝑧 to a new tolerance class 𝐶. Select an object 𝑧𝑁,𝜀(𝑧) such that 𝑧𝑧. (3)Add 𝑧 to 𝐶. Find neighbourhood 𝑁,𝜀(𝑧) using only objects from 𝑁,𝜀(𝑧). Do not include 𝑧 in 𝑁,𝜀(𝑧). Select a new object 𝑧𝑁,𝜀(𝑧) such that 𝑧𝑧. Relabel 𝑧𝑧, 𝑧𝑧 and 𝑁,𝜀(𝑧)𝑁,𝜀(𝑧).(4)Repeat step 3 until a neighbourhood of only 1 element is produced. When this occurs, add the last element to 𝐶 and then add 𝐶 to H𝜀(𝑍).(5)Perform step 2 (and subsequent steps) until each object in 𝑁,𝜀(𝑧) has been selected at the level of step 2.(6)Perform step 1 (and subsequent steps) for each object in 𝑍.(7)Delete any duplicate classes.

Finally, note the following. We used an added heuristic for step 2 to reduce the computation time of the algorithm. Namely, an object from 𝑁,𝜀(𝑧) can only be selected as 𝑧 in step 2 if it has not already been added to a tolerance class created from 𝑁,𝜀(𝑧) (i.e., this rule is reset each time step 1 is visited). In addition, the Fast Library for Approximate Nearest Neighbours [35, 36] was used to find all the neighbourhoods in this algorithm.

The tolerance class originally given in Figure 2 was produced using this algorithm, and the intermediate steps of this algorithm are visualized in Figure 7. To begin with, Figure 7(a) represents Step 1 of the algorithm with 𝑧=1. Step 2 is given in Figure 7(b), where 𝑧=20. Steps 3 and 4 are given in Figures 7(c)7(f). Observe that in Figure 7(f)|𝑁,𝜀(3)|=1 since all the other bold objects in the grey area have been added to 𝐶, and, as such, are not allowed to be included in subsequent neighbourhoods. Step 5 can be explained as follows. Figure 7 shows the sequence of steps for selecting 𝑧=20 (the closest object to 1) at the level of Step 2. Hence, Step 5 states that each object in the neighbourhood of 1 (except 1 itself) should be selected at Step 2. Moreover, the heuristic given after the algorithm states that any object added to a tolerance class derived from the neighbourhood of 1 should not be considered at Step 2. As a result, in this example, the objects {3,6,10,15,16} should not be considered again at Step 2 for finding tolerance classes derived from the neighbourhood of object 1. Lastly, note that Step 1 must be performed for all objects in 𝑍.

Finally, this section is concluded by mentioning a few observations about the algorithm. The runtime of the algorithm in the worst case is 𝑂(|𝑍|2𝑇), where 𝑇 is the complexity of finding an object's neighbourhood among the other |𝑍|1 objects. However, it should be noted that the algorithm is rarely run on the worst case data. The worst case suggests that either the epsilon value is much too large or that the data is so clustered that, from a perceptual point of view, every pair of objects in the set resembles each other. In either case, the data is not interesting from a nearness measure or image correspondence perspective. The runtime on typical data is of order 𝑂(|𝑍|𝑐𝑇), where 𝑐|𝑍| is a constant based on the object 𝑧𝑍 that has the largest neighbourhood. Lastly, this algorithm lends itself to parallel processing techniques, and the results in this paper were also obtained using multithreading on a quad core processor. A comparison of two images used to generate the results in this paper using our implementation was on the order of 0.2 sec.

6. Results

The goal of this paper is to present an application of the tolerance near set approach by way of comparing the hand movements of an arthritic patient with normal hand movements during rehabilitation exercises. Consequently, this section presents results of comparing images from three patients, two of which do not have arthritis, using the tolerance near set approach. As mentioned, the images were obtained from a video taken during a rehabilitation exercise (see, e.g. [30]). This section presents the selection of parameters used to obtain the results and ends with a look at a comparison of tNM with an existing measure called the Hausdorff distance.

6.1. Selection of Epsilon

For normalized feature values, the largest distance between two objects occurs when one object has a feature vector (object description) of all zeros and the other has a feature vector of all ones. As a result, 𝜀 is in the interval [0,𝑙], where 𝑙 is the length of the feature vectors. In any given application, there is always an optimal 𝜀 when performing experiments using the perceptual tolerance relation. For instance, a value of 𝜀=0 produces little or no pairs of objects that satisfy the perceptual tolerance relation, and a value of 𝜀=𝑙 means that all pairs of objects satisfy the tolerance relation. Consequently, 𝜀 should be selected such that the objects that are relatively (Here, distance of “objects that are relatively close” will be determined by the application.) close in feature space satisfy the tolerance relation, and the rest of the pairs of objects do not. The selection of 𝜀 is straightforward when a metric is available for measuring the success of the experiment. In this instance, the value of 𝜀 is selected based on the best result of the evaluation metric, where a plot of 𝜀 versus the metric usually resembles an inverted parabola. Fortunately, in this case, precision versus recall plots, defined in the context of image retrieval, can be used to evaluate the effectiveness of 𝜀.

To demonstrate the selection of 𝜀, the database of hand-finger movement images from three patients is used. One of the patients has rheumatoid arthritis, while the other two do not. Here, the goal is to perform content-based image retrieval and separate the images into three categories, one for each patient. An image belonging to one of the three patients is used as a query image, and then the images are ranked in descending order based on the value of tNM with the query image. For example, the database of images contains 98 images, of which 30 are from the patient with arthritis, and, respectively, 39 and 29 of them are from two patients without arthritis. Then, each image is in turn selected as the query image, and a value of tNM between the query image and every other image in the database is determined. Subsequently, a tolerance 𝜀 can be selected based on the number of images that are retrieved from the same category as the query image before a false negative occurs (i.e., before an image from a category other than the query image occurs).

Using this approach, Figure 8 contains a plot showing the number of images retrieved before the precision dropped below 90% for a given value of 𝜀. The image (out of all possible 98 images) that produced the best query results is given in red, and the average is given in blue. Notice that the best results in the average case occur with tolerance 𝜀=0.05, which is close to the 𝜀=0.07 in the best case. This plot suggests that retrieval of images in this database benefits from a slight easying of the equivalence condition, but not much.

Verifying the validity of selecting 𝜀 in this manner can be accomplished both by the visualization of the nearness measure for all pairs of images in the experiment and by observing the precision recall plots directly. First, an image can be created where the height and width are equal to the number of images in the database, each pixel corresponds to the value of tNM generated from the comparison of two images, and the colours black and white correspond to a nearness measure of 0 and 1, respectively. For example, an image of size 98×98 can be created like the one in Figure 9(a) where patient B is the one with arthritis, and each pixel corresponds to the nearness measure between two pairs of images in the database. Notice that a checkered pattern is formed with a white line down the diagonal. The white line corresponds to the comparison of an image with itself in the database, naturally producing a nearness measure of 1. Moreover, the lightest squares in the image are formed from comparisons between images from the same patient, and the darkest squares are formed from comparisons between the arthritis and healthy images. Also notice that the boundaries in Figures 9(c) and 9(d) are more distinct than for images created by other values of 𝜀 suggesting that 𝜀=0.05 or 𝜀=0.07 is the right choice of 𝜀. Similarly, the square corresponding to patient C has crisp boundaries in Figures 9(a) and 9(h) and is also the brightest area of the figure, suggesting that a value of 𝜀=0.3 would also be a good choice for images belonging to patient C.

Next, Figure 10 gives plots of the average precision versus recall for each patient. These plots were created by fixing a value of 𝜀 and calculating precision versus recall for each image belonging to a patient. Then, the average of all the precision/recall values for a specific value of 𝜀 are added to the plot for each patient. The results for selecting 𝜀=0.05 are given in red, and, in the case of patients B and C, the choice of 𝜀 that produced a better result than 𝜀=0.05 is also highlighted.

6.2. Hausdorff Distance

This section introduces an additional measure for determining the degree that near sets resemble each other. The Hausdorff distance is used to measure the distance between sets in a metric space [37] (see [38] for English translation) and is defined as𝑑𝐻(𝑋,𝑌)=maxsup𝑥𝑋inf𝑦𝑌𝑑(𝑥,𝑦),sup𝑦𝑌inf𝑥𝑋𝑑(𝑥,𝑦),(5) where sup and inf refer to the supremum and infimum and 𝑑(𝑥,𝑦) is the distance metric (in this case it is the 𝑙2 norm). The distance is calculated by considering the distance from a single element in a set 𝑋 to every element of set 𝑌, and the shortest distance is selected as the infimum. This process is repeated for every 𝑥𝑋, and the largest distance (supremum) is selected as the Hausdorff distance of the set 𝑋 to the set 𝑌. This process is then repeated for the set 𝑌 because the two distances will not necessarily be the same. Keeping this in mind, the measure tHD [29] is defined astHD,𝜀(𝑋,𝑌)=𝐶𝐻,𝜀(𝑍)||𝐶||1𝐶𝐻,𝜀(𝑍)||𝐶||𝑙𝑑𝐻.(𝐶𝑋,𝐶𝑌)(6) Observe that low values of the Hausdorff distance correspond to a higher degree of resemblance than larger distances. Consequently, the distance is subtracted from the largest distance 𝑙. The Hausdorff distance is a natural choice for comparison with the tNM nearness measure because it measures the distance between sets in a metric space. Recall that tolerance classes are sets of objects with descriptions in 𝑙-dimensional feature space. The nearness measure evaluates the split of a tolerance class between sets 𝑋 and 𝑌, where the idea is that a tolerance class should be evenly divided between 𝑋 and 𝑌, if the two sets are similar (or the same). In contrast, the Hausdorff distance measures the distance between two sets. Here the distance being measured is between the portions of a tolerance class in sets 𝑋 and 𝑌. Thus, two different measures can be used on the same data, namely, the tolerance classes obtained from the union of 𝑋 and 𝑌.

6.3. Comparison between Hausdorff and tNM Measures

Next, Figure 11 contains the comparison of the two measures. The precision recall data for the Hausdorff distance was generated with 𝜀=0.5. Again, the data was obtained by taking an average of all the precision (and recall) values for each image belonging to a particular patient. Notice that the nearness measure performs better, that is, the precision recall plot is closer to ideal for all three patients using the nearness measure. The reason is that the performance of the Hausdorff distance is poor for low values of 𝜀, since, as tolerance classes start to become equivalence classes (i.e., as 𝜀0), the Hausdorff distance approaches 0 as well. Thus, if each tolerance class is close to an equivalence class, the resulting distance will be zero and consequently the measure will produce a value near to 1, even if the images are not alike. In contrast, as 𝜀 increases, the members of classes tend to become separated in feature space, and, as a result, only classes with objects that have objects in 𝑋 that are close to objects in 𝑌 will produce a distance close to zero. What does this imply? If for a larger value of 𝜀, relatively speaking, the set of objects 𝑍=𝑋𝑌 still produces tolerance classes with objects that are tightly clustered, then this measure will produce a high measure value. Notice that this distinction is only made possible if 𝜀 is relaxed. Otherwise, all tolerance classes will be tightly clustered. Finally, Figures 12, 13, and 14 show the top five retrieved results for randomly selected query image of each category. Observe that the results all belong to the right category, which is as expected based on the precision recall plots.

7. Concluding Remarks

This paper focuses on the analysis, classification, and visualization of hand-finger movement images extracted from videos made during rehabilitation exercise sessions for osteoarthritic clients. This work stems from the need to provide healthcare providers and clients with resemblance measures, and hand-figure movement image analysis and visualization of the results of content-based image retrieval. Two forms of image resemblance measures are considered, the Hausdorff distance tHD and a tolerance near set resemblance measure tNM. The results reported in this paper suggest that the tNM measure is more accurate than the well-known Hausdorff distance measure. In addition, two forms of visualization of a tolerance space view of hand-finger motion during rehabilitation exercise are presented. In addition to watching videos of rehabilitation therapy sessions, it is now possible to compare arthritic and non-arthritic hand movements in entirely different ways, that is, comparisons can be made using checkerboard grids and precision recall plots. A checkerboard greyscale grid like the one in Figure 9 gives a qualitative view of hand-figure movement images extracted from rehabilitation exercise videos. That is, the greater the contrast between the grey areas reflecting arthritic and non-arthritic hand-finger movements, the greater the disparity between client hand movements. By contrast, precision recall plots like the ones in Figure 10 give a quantitative comparison of the results of different tolerances in measuring resemblance between hand-finger movement images.