By analyzing interpoint comparisons, we obtain significant results describing the
relationship in “hippocampus shape space” of clinically depressed, high-risk, and control
populations. In particular, our analysis demonstrates that the high-risk population
is closer in shape space to the control population than to the clinically depressed population.
1. Introduction
Major depressive disorder (MDD) is a mental disorder
affecting about 16% of the US adult population, and is a major cause for
concern not only in the United States but the world over. It is a disorder
characterized by depressed mood, diminished interest or pleasure, significant
weight loss, feelings of guilt or low self-worth, insomnia or hypersomnia,
fatigue, poor concentration, or recurrent thoughts of death. The symptoms are
widespread, and tend to be quite stable. In 2000, the World Health Organization
(WHO) estimated depression to be the leading cause of disability as measured by
years lived with disability (YLD) and the fourth leading contributor to the
global burden of disease. See [8].
Over the years, a significant amount of research has
been dedicated to finding physiological causes of MDD. One such study involved
the catecholamine hypothesis [1] that suggested that MDD is caused by decreased
levels of the neurotransmitters norepinephrine and serotonin. This finding led
to most modern day medication for MDD, which works by preventing the reuptake
of these neurotransmitters. Neuroimaging research has also shown that enlarged
ventricles, sulci, reduced volume of the frontal lobe and basal ganglia are
also associated with depressive episodes [1].
The studies aforementioned involved studying the brain
once MDD had already set in. The physiological changes are associated with the
symptoms themselves. What about physiological predictors for MDD? Such predictors would
facilitate the diagnosis of the disorder well before the onset of the symptoms,
perhaps allowing measures to prevent the symptoms from ever appearing.
A vast amount of research is being conducted in order
to find biological predispositions to MDD. There is evidence correlating shape
differences of the hippocampus to depression [6] and schizophrenia [10]. In this
manuscript we analyze interpoint comparisons [2] to investigate the
relationship in “hippocampus shape space” of three populations among twins. The
subjects are categorized into three categories: the affected subjects
(clinically depressed, or MDD), the nonaffected cotwin of the MDD subjects
(high-risk, or HR), and the nonaffected twin pair (Control, or CTRL). The
dataset includes both monozygotic (MZ) and dizygotic (DZ) twin subjects.
According to established literature, the concordance
rate for monozygotic (MZ) HR subjects is 40%, and for dizygotic (DZ) HR
subjects 11% [15]. This demonstrates that the subjects labeled HR (due to the
fact that their twin is MDD) are in fact high risk–-they develop MDD at a
higher rate than the general population.
2. Data
Our data set includes subjects (57
twin pairs): 29 CTRL-CTRL pairs, 22 HR-MDD pairs, and 6 MDD-MDD pairs. The
subjects are young female twins recruited through an epidemiological sample
based on Missouri birth records. To ensure that hippocampus shape space is the
only independent variable, other factors had to be controlled; all of the subjects were right handed and
were screened for factors that may cause structural changes of the brain such
as loss of consciousness greater than 5 minutes, chronic medical or
neurological illnesses, or pregnancy.
To obtain images of the hippocampus, very high
resolution magnetic resonance imaging (MRI) scans were required. The Siemens
Vision/Sonata 1.5T scanner was used to acquire three MPRAGE scans [19] (160 slices at FoV, 1.0 mm3 isotropic
voxels). Using Analyze [12], the images were registered and averaged, converted
to 8-bits while optimizing the intensity range, and interpolated to 0.5 mm
isotropic voxels. The image protocol implemented above allows for optimal
comparative analysis.
For each of left and right hemispheres separately, 22
three-dimensional landmarks were identified for each hippocampus and were used
to generate and align hippocampal subcubes to a standardized orientation. It is
these landmark data that we employ herein.
3. Shape
Using the landmark data, for each pair of subjects and
for each of left and right hippocampus, we produce an interpoint shape
comparison, as described below.
For two subjects and (for the left
hemisphere, say), let and be the
corresponding landmarks, where .
3.1. Landmark Matching
Finding the shape comparison involves a landmark
matching (LM) transformation. The transformation is nonparametric, and this
flexibility implies that overfitting must be guarded against via regularization.
LM finds a diffeomorphism that minimizes
an error criterion which includes both landmark mismatch and transformation
complexity. That is,
where is a geodesic
distance in a group of diffeomorphisms [4] and is a
regularization parameter which controls the relative contribution of
transformation complexity versus landmark mismatch to the optimization
objective. The algorithm solves the nonlinear Euler equation by a Newton method
combined with a shooting procedure [18].
We use , the energy of the minimizing diffeomorphism, as the
shape comparison between two subjects and (for the left
hemisphere, say).
3.2. Interpoint Comparison Matrices
Applying LM to the left or right hippocampus data for
each pair of subjects yields an interpoint comparison matrix . However, is , hollow (zeros on the diagonal) and is asymmetric.
That is, we obtain matrices and .
The nature of the hippocampus shape space is such that
under ideal conditions, it should yield a symmetric distance matrix. The
asymmetry of the matrix does not
reflect the true nature of the hippocampus shape space, and is in fact a result
of the limitations in the LM matching method. Hence, before further
investigation, must be
symmetrized to , using an appropriate symmetrization technique [5].
In this work we symmetrize via .
Figure 1 depicts the structure of the interpoint
comparison matrices for the 114 subjects. Figure 2 depicts the actual
interpoint comparison matrix (after
symmetrization) for the 114 subjects.
Figure 1: Structure of
the interpoint comparison matrices for the 114 subjects.
Figure 2: The interpoint
comparison matrix , after symmetrization, for the 114 subjects. The
comparison values are color-coded, with red representing zero (e.g., the
diagonal entries) to green representing large values.
4. Statistical Analysis
Our task is to begin describing the relationship of
the three populations (MDD, HR, CTRL) amongst one another in the hippocampus
shape space elicited by the LM interpoint comparisons. First, we present a
multidimensional scaling (MDS) [13] scatter plot; unfortunately, we see in
Figure 3 that no significant relationship can be discerned from this plot.
Employing linear discriminant analysis (LDA) after MDS for all possible MDS
target dimensionalities–-analysis via LDA MDS LM a la Miller et al. [9]–-yields no
classification capabilities statistically significantly superior to chance.
Nevertheless, we will see in Figures 4 and 5 a suggestion that perhaps progress
can be made on our task, given a sufficiently clever methodology.
Figure 3: A multidimensional scaling scatter plot of mapped into . Little can be discerned from this plot regarding the
relationship in hippocampus shape space of the three populations (MDD, HR,
CTRL)–-no class-conditional differentiation is apparent.
Figure 4: This figure shows kernel probability density estimates for . The solid line depicts and the dashed
line depicts .
Figure 5: This figure shows the quantile-quantile plot for . Depicted are the individual -values for a
Wilcoxon-Mann-Whitney test of each HR subject, in turn, based on the two
samples and .
Figure 4 depicts kernel probability density estimates
[7] for the LM-Left comparisons to show that the entries of the interpoint
comparisons matrix that correspond
to comparisons between HR and CTRL (the solid line in Figure 4) are, overall, smaller than the entries which correspond
to comparisons between HR and MDD. That is, Figure 4 suggests a stochastic ordering relationship [3]: . Such a result is precisely what we seek. Again,
dependencies amongst the entries of make it
difficult to assess the statistical significance of the result depicted in
Figure 4.
Each row of the interpoint comparisons matrix , corresponding to a single HR subject, gives rise to
two samples: and . That is, we have the vector of comparisons from that
HR subject to every CTRL subject, and we have the vector of comparisons from
that HR subject to every MDD subject. (We do not include in these vectors the
twin of the particular HR subject under consideration; ignoring twinnedness in
the analysis proves beneficial that we eliminate bias in similarity status
between a subject and her twin that is not due to condition (MDD,HR,CTRL).) For
this individual HR subject's two sample data, a Wilcoxon-Mann-Whitney test [17]
of the null hypothesis that the distribution of comparisons is the same as
the distribution of comparisons , against the alternative of stochastic ordering, yields a P-value.
Figure 5 provides a quantile-quantile plot of these -values for . Under the null hypothesis, these -values would
be expected to be distributed approximately uniform(0,1). The plot demonstrates
a clear deviation from a uniform distribution, again suggesting a stochastic ordering relationship–-. Again, dependencies amongst the entries of make it
difficult to assess the statistical significance of the result depicted in
Figure 5.
The quantile-quantile plot independently reiterates
the suggestion of a stochastic ordering relationship that was first seen using
the kernel probability density estimates. Thus, while Figures 4 and 5 give an
inkling of the type of information that can be gleaned regarding the
relationship in hippocampus shape space of the three populations (MDD, HR,
CTRL) amongst one another, it remains henceforth to accurately assess the
Figures' suggestion.
5. Classification
To further uncover the characteristics of hippocampus
shape space, we consider the task of classifying each HR subject as either MDD
or CTRL.
As before, we consider the two samples, and , associated with each individual HR subject. We
classify the HR subject as belonging to MDD or CTRL based on the Wilcoxon-Mann-Whitney
test statistic -value, as
described in [16]; (see also [3, page 183]).
Once we have classified each of the HR subjects in
this way, we assess the relative similarity of HR to CTRL versus MDD based on
the classifier's performance–-based on the collection of HR subjects'
classifier-assigned class labels, taken as a whole.
This procedure can be employed with LM interpoint
comparisons obtained on Left, Right, or both Left and Right hippocampuses, and
with any of the three populations (HR, CTRL, MDD) as the population of interest–-the role of HR in the description above.
6. Results
Classifying the 22 HR subjects as either MDD or CTRL
using results in 19
classified as CTRL versus 3 classified as MDD. The probability of obtaining a
result this extreme or more extreme (the -value) under
the least favorable null hypothesis HR are equally likely to be classified as MDD
as CTRL is against each
one-sided alternative. LM-Right yields 16 classified as CTRL versus 6
classified as MDD–-classification performance not statistically
significantly distinguishable from chance. Combining left and right, the shape
comparisons LM (LM-Left and Right) yields 20 classified as CTRL versus 2
classified as MDD–- for each
one-sided alternative and strong statistical evidence that HR is more like CTRL
than MDD in hippocampus shape space.
An analogous analysis–-classifying the 33 MDD
subjects as either HR or CTRL using LM-Left–-shows that MDD is more like
CTRL than it is like HR ( for each
one-sided alternative), and that the left carries more information than does
the right–-the P-values are smaller indicating that the signal is
stronger.
The results obtained from classifying the 59 CTRL
subjects as either HR or MDD are more nuanced: in this case, using LM-Left
indicates that CTRL is more like HR than it is like MDD (P < .0005) while using
LM-Right indicates that CTRL is more like MDD than it is like HR (P < .0005). This hemispherical ambiguity provides further
insight into hippocampus shape space.
Finally, we note that in the last two columns of Table 1 we consider classifying the 22 HR subjects (via leave-one-out
crossvalidation) as HR or MDD and as HR or CTRL. These results are consistent
with our other findings–-HR is more difficult to distinguish from CTRL than
from MDD, and the information extracted via LM-Left is more powerful for this
task than is LM-Right.
Table 1: Output of classifier based on the
Wilcoxon-Mann-Whitney test statistic. For example, the first numerical column,
“H : CvM,” gives the number of HR classified as CTRL versus MDD and the second
numerical column, “H : MvC,” gives the number of HR classified as MDD versus
CTRL. Thus, we find that combining left and right, the shape comparisons LM
(LM-Left and Right) yields 20 HR classified as CTRL versus 2 HR classified as
MDD–-strong statistical evidence that HR is more like CTRL than MDD in
hippocampus shape space. (This analysis is based on 22 HR subjects, 33 MDD
subjects, and 59 CTRL subjects. Thus, e.g., the two HR numbers, H : CvM
and H : MvC, should sum to 22. Discrepancies are due to situations in which the
classifier makes “no decision” as described in [
16]; (see also [
3, page 183]).
7. Conclusions/Discussion
Our analysis indicates that HR is more like CTRL than it is like MDD, MDD is more like CTRL than it is like HR, and CTRL is not obviously more like one or the other. Also, we discern that the left
hippocampus carries more information than does the right.
If hippocampus shape space were one-dimensional–-if
the population shapes could be accurately represented in –-then the
joint relationship described by these three results could be depicted as in
Figure 6, with the CTRL population between the HR and MDD populations in
terms of shape. However, it must be noted that this depiction (Figure 6) offers
only a simplified view of the true infinite dimensional nature of the shape
space configuration, as suggested by the fact that the 2-dimensional MDS
embedding depicted in Figure 3 presents little or no class separation.
Figure 6: Artist's rendition of what hippocampus shape space might
look like were it one-dimensional–-if the population shapes
could be accurately represented in . Our results suggest that the joint relationship of
the three populations, in terms of shape, puts the CTRL population between the HR and MDD populations. The
relationship depicted here holds for both LM-Left and LM-Right, although our
results suggest that for the left hippocampus CTRL is shifted closer to HR
while for the right hippocampus the CTRL is shifted closer to MDD.
7.1. On Populations
Our stated task is in terms of populations–-to begin describing the
relationship in hippocampus shape space of the three populations (MDD, HR,
CTRL) amongst one another. However, our results are conditional–-using LM-Left we classify, for example, the 22 HR subjects representing the HR population as belonging to
either the MDD or the CTRL class, conditionally on “training” data from MDD
and CTRL. This, in fact, is the standard approach in probabilistic
pattern recognition; see, for example, [11]. The difference between a focus on
populations versus conditionals is indicative of a difference between “policy
science” and “laboratory science” [14]. A justification for the conditional
approach in “laboratory science” is given in [11] where it is claimed that the
unconditional approach “ would be
unnatural, because in a given application, one has to live with the [training]
data at hand.” In “policy science”, however, knowledge about the populations
themselves may be the focus.
By performing our analysis thrice, for each of the
three populations in turn conditionally on the “training” data from the other
two, we obtain three conditionals. Letting denote the
class-conditional sample sizes for each of the three classes, we see that the
joint distribution for our sample is -dimensional
(where is the presumed
“shape space” dimensionality of each observation). Each conditional
considered is -dimensional,
with one population remaining. The overall joint distribution of interest–-the three populations in “shape space”–-is of course not simply the
product of our three conditionals. However, some population inferences regarding
stochastic ordering can be performed via the (multiple) conditionals, and in
particular the conditional approach justifies the simplistic view of our three
populations in “shape-space” given by Figure 6.
Acknowledgment
Sincere appreciation to Michael Bowers (JHU), Timothy
Brown (JHU), Anthony Kolasny (JHU), Tomoyuki Nishino (WashU), and the others
for their valuable assistance.