Schizophrenia is a complex disorder with many comorbid conditions. In this study, we used polygenic risk scores (PRSs) from schizophrenia and comorbid traits to explore consistent cluster structure in schizophrenia patients. With 10 comorbid traits, we found a stable 4-cluster structure in two datasets (MGS and SSCCS). When the same traits and parameters were applied for the patients in a clinical trial of antipsychotics, the CATIE study, a 5-cluster structure was observed. One of the 4 clusters found in the MGS and SSCCS was further split into two clusters in CATIE, while the other 3 clusters remained unchanged. For the 5 CATIE clusters, we evaluated their association with the changes of clinical symptoms, neurocognitive functions, and laboratory tests between the enrollment baseline and the end of Phase I trial. Class I was found responsive to treatment, with significant reduction for the total, positive, and negative symptoms (, 0.0099, and 0.0028, respectively), and improvement for cognitive functions (VIGILANCE, ; PROCESSING SPEED, ; WORKING MEMORY, ; and REASONING, ). Class II had modest reduction of positive symptoms () and better PROCESSING SPEED (). Class IV had a specific reduction of negative symptoms () and modest cognitive improvement for all tested domains. Interestingly, Class IV was also associated with decreased lymphocyte counts and increased neutrophil counts, an indication of ongoing inflammation or immune dysfunction. In contrast, Classes III and V showed no symptom reduction but a higher level of phosphorus. Overall, our results suggest that PRSs from schizophrenia and comorbid traits can be utilized to classify patients into subtypes with distinctive clinical features. This genetic susceptibility based subtyping may be useful to facilitate more effective treatment and outcome prediction.

1. Introduction

Schizophrenia is a severe mental disorder with heterogeneous genetic architecture and clinical presentation [14]. As a heritable disorder, schizophrenia has an estimated heritability of about 80% [5], and genome-wide association studies (GWASs) have identified more than 100 loci [69]. Clinically, schizophrenia patients present positive and negative symptoms and cognitive deficits [3, 4, 1012]. Furthermore, symptoms presented in individuals may change as the disease progresses [2]. All these impose great challenges for both genetic and clinical studies, hindering effective treatment and therapy of this disorder.

Subtyping is an effective approach to reduce heterogeneity, and it has been applied to complex diseases such as breast cancer [13, 14] and stroke [15, 16]. However, subtyping psychiatric disorders are challenging. Specific to schizophrenia, attempts to subtype with clinical symptoms [4, 1720], neurocognitive functions [12, 2126], age of onset [27, 28], treatment responses [2931], and specific genetic risk factors [24, 3235], had been reported in the literature. A 5-subtype classification based on clinical symptoms was enacted in the Diagnostic and Statistical Manual of Mental Disorders, 4th Edition (DSM IV) [19]. However, most of these subtyping systems lack biological underpinning, measurement objectivity, or systematic perspectives. As a result, they have not been broadly implemented in clinical practice and have not demonstrated utility in the patient care. For these reasons, the 5-subtype classification was removed from DSM V. Given the challenges and potential benefits, it is important to consider whether we can develop a data-driven method to subtype schizophrenia so that the resulting subtypes can be used to guide clinical practice and have a more homogeneous biological mechanism.

Recent findings from large scale GWASs [36] indicated that pleiotropy is pervasive [37, 38] and that comorbid traits share some genetic liability [37, 39]. These findings present us with such an opportunity. We reasoned that schizophrenia is comorbid with many other mental disorders [40, 41] and physical diseases [42] and that many comorbid conditions share genetic liability, genetic factors identified for both schizophrenia and the diseases and traits comorbid with schizophrenia may be used as effective classifiers to subtype schizophrenia. Since these diseases and traits share only partial genetic liability with schizophrenia, i.e., some schizophrenia patients share genetic liability with one condition, while others share liability with a different condition, collectively, these conditions could segregate schizophrenia patients into different classes or subtypes. Furthermore, this differential sharing of genetic liability implies that the resulting subtypes have distinctive underlying biology, and therefore, more targeted and subtype-specific treatments may be imposed for better outcomes. In this study, we hypothesize that the partial sharing of genetic liability can be used to classify schizophrenia into distinct subtypes with different dimensions of genetic risk, and the resulting subtypes may reveal different pathophysiology. An expected outcome of this hypothesis is that these genetically informed subtypes have a unique underlying genetic architecture and underpinning biology that can be verified objectively and accurately by clinical and laboratory tests.

In this report, we describe our study to test the hypothesis. We started with the selection of traits that are genetically related to schizophrenia and estimated their polygenic risk scores (PRSs) in three independent datasets: the Molecular Genetics of Schizophrenia (MGS), the Swedish Schizophrenia Case-Control Study (SSCCS), and the Clinical Antipsychotic Trials for Intervention Effectiveness (CATIE) datasets. Next, we used hierarchical cluster algorithms to group subjects by their shared genetic liability in the MGS dataset. Then, we verified the cluster structure with the SSCCS and the CATIE datasets. Finally, we validated the resulting subtypes with clinical, neurocognitive, and laboratory tests in the CATIE dataset. The overall study design is shown in Figure 1. Our results suggest that it is possible to classify schizophrenia patients based on the partially shared genetic liability with other comorbid conditions, highlighting the potential in the genetically based treatment and intervention for schizophrenia.

2. Results

2.1. Selection for Traits Genetically Related to Schizophrenia

We started our study with PubMed search using keywords “schizophrenia”, “comorbidity”, and “genome-wide association study”, or “GWAS” and cross-linked the traits with data at GWAS catalog repository website (https://www.ebi.ac.uk/gwas/) and other sources. As a result, we obtained GWAS summary statistics for 25 diseases and traits (Table S1). Using the markers with in both schizophrenia and the candidate traits, PRSs from these traits were calculated for the subjects in MGS [43], SSCCS [7], and CATIE [44, 45] datasets for this study. We conducted linear regression to evaluate the genetic relationships between schizophrenia diagnosis and the PRSs calculated from these candidate traits. Only those traits showing suggestive association signals () and the same direction of effect in both MGS and SSCCS datasets were selected for inclusion in our subtype classification analyses. As shown in Table 1, 10 traits showed a consistent correlation with schizophrenia. As expected, bipolar disorder (BIP) [44], cannabis dependence (cannabis) [45], and ever smoker vs never smoker (evrSmk) [46] showed a positive correlation with schizophrenia, whereas subjective wellbeing (SWB) [47] and verbal and numeric reasoning (VNR) [48] were negatively correlated with schizophrenia. Surprisingly, working memory (MEM) [48], Neoopenness (OPEN) [49], and years of school attended (YoS) [50] were also positively correlated with schizophrenia.

2.2. Cluster Analyses

Based on the analyses of genetic relationships, the PRSs of the 10 selected traits along with those of schizophrenia were used in our cluster analyses. Our objective here was to find a stable and consistent cluster structure as the base for our subtyping analyses. Towards this goal, we first used the R package “NbClust” [51] to explore the appropriate number of clusters for the MGS dataset and then validated the structure in the SSCCS dataset. We found that the 4-cluster solution was the one with the most endorsements (11 out of 20 indices) for cluster assessment (Figure 2(a)) for the MGS dataset. When we used the same parameters to verify the cluster structure with the independent SSCCS dataset, we found that the 4-cluster solution also had the most endorsements (7 out of 20) (Figure 2(b)). Based on these analyses, we concluded that the 4-cluster solution was a stable and consistent cluster structure for schizophrenia patients when assessed with PRSs from the 10 selected candidate traits.

With the same parameters from the analyses of both the MGS and SSCCS datasets, we ran the analyses for another independent CATIE dataset A 3-cluster solution had the most endorsements (6 out of 20). However, further examination of the dendrograms indicated that cluster 1 could be divided further into three groups for CATIE dataset (the red, blue, and green clusters in Figure 2(c)). Therefore, we decided to use a 5-cluster solution for the CATIE dataset. Of note, if Classes I and IV in the CATIE were merged, it would result in a 4-cluster structure similar to that observed from the MGS and SSCCS datasets.

As mentioned earlier, the central goal of this study was to evaluate whether schizophrenia had a stable and consistent subtype structure with genetic and biological underpinning. The PRSs-based cluster analyses seemed supporting this notion. In the sections below, we intended to test this hypothesis by examining class membership association with a variety of clinical, neurological, and laboratory tests.

2.3. Subtype Class Association with Clinical Symptoms and Treatment Outcomes

To examine whether the classes had any pattern in clinical symptoms, we analyzed the association of class membership with clinical symptoms using the CATIE study, a multivisit clinical trial of antipsychotics treatment on schizophrenia that last up to 18 months in the end Phase I trial [52, 53]. During the trial, clinical evaluations, psychological and behavioral assessments, and laboratory tests were performed to evaluate their responses of antipsychotics the entrance (baseline assessments), each follow-up visit, and the end of the trial. To evaluate class membership association with clinical symptoms, we used the clinical data at baseline and at the end of the 18-month Phase I trial. We first analyzed the total, positive, and negative symptoms at the baseline across the classes, and found that Class II had lower total symptoms and negative symptoms at baseline as compared to Class I, which was used as reference (Figure 3).

Next, we examined if any classes were associated with treatment outcomes as defined by the differences of symptom counts between baseline and at the end of the 18-month Phase I trial. When all subjects were analyzed together, the total, positive, and negative symptoms were all significantly reduced (Table 2). The total symptom counts reduced by 4.17 (95% CI 2.59–5.75, E-07), the reduction of positive and negative symptoms was 1.18 (95% CI 0.63–1.73, E-05) and 0.93 (95% CI 0.39–1.48, E-04), respectively. This suggested that the antipsychotic treatment during Phase I trial had positive outcomes for the CATIE patients as a whole. When the classes were analyzed separately, Class I had the most significant improvements in symptom counts for the total (mean diff = 5.47, 95% CI 2.76–8.19, E-04), positive (mean diff = 1.26, 95% CI 0.31–2.22, E-03), and negative (mean diff = 1.36, 95% CI 0.47–2.25, E-03) symptoms by the end of Phase I trial. Class I results remained significant after Bonferroni correction (3 tests for each category). Class II had lower total symptoms (mean diff = 3.90, 95% CI 0.78–7.03, mean diff = 2.18), and Class IV had lower negative symptoms (mean diff 2.18, 95% CI 0.53–3.82, ). In clear contrast, Classes III and V had no significant changes for the total, positive, and negative symptoms.

2.4. Subtype Class Association with Neurocognitive Functions

We examined the changes of neurocognitive functions by comparing the data from baseline and Visit 6, which was approximately 170 days into Phase I trial (, ). We used the data from Visit 6, because it gave us the largest sample size for neurocognitive function analyses. As shown in Table 3, Phase I treatment did have a significant improvement for cognitive functions when all subjects were analyzed together (VIGILANCE, mean diff = −0.30, 95% CI −0.40–−0.19, E-07; PROCESSING SPEED, mean diff = −0.24, 95% CI −0.32–−0.17, E-10; WORKING MEMORY, mean diff = −0.20, 95% CI −0.29–−0.11, E-05; and REASONING, mean diff = −0.27, 95% CI −0.38–−0.17, E-07). When the classes or subtypes were analyzed separately, Class I showed significant improvements for all cognitive domains (VIGILANCE, E-07, 95% CI −0.40–−0.06, ; PROCESSING SPEED, , 95% CI −0.31–−0.09, ; WORKING MEMORY, mean diff = −0.21, 95% CI −0.34–−0.08, ; and REASONING, mean diff = −0.28, 95% CI −0.45–−0.11, ), which all passed Bonferroni correction (4 tests for each domain). Class II had an improvement in PROCESSING SPEED only, and Class III had improvement for VIGILANCE only. Class V had improvements for VIGILANCE and PROCESSING SPEED. Class IV showed a tendency of improvement in all domains but only REASONING survived multiple testing correction.

2.5. Subtype Class Association with Clinical Laboratory Tests

The purpose to include these tests was to find whether there were any laboratory tests associated with our subtype classification. Using the baseline data to compare to Class I, we found that patients from Class II had lower levels of bilirubin effect size = −0.09, ) and uric acid (effect size = −0.46, ) and that patients from Class V had a higher level of prolactin (, ) (Table S2).

Next, we conducted paired -tests to compare whether any of the laboratory tests changed between the baseline and at the end of the Phase I trial. These statistical tests would provide us information on how Phase I treatment impacted the patients and how they related to our subtype classification. When all subjects were analyzed together at the end of the Phase I trial, patients had a higher level of calcium (, 95% CI 0.04–0.12, ) and phosphorus (, 95% CI 0.04–0.17, ), as compared to their baseline measurement (Table 4). When each class was analyzed separately, Class I patients had lower HDL than that measured at baseline (mean diff = −2.25, 95% CI −3.70–−0.79, ). Both Classes III and V had higher level of phosphorus ( and 0.17; 95% CI 0.02–0.37 and 0.02–0.32; and 0.0276, respectively). In addition, Class V also had higher levels of HDL and calcium ( and 0.14; 95% CI 0.30–4.13 and 0.04–0.24; and 0.0084, respectively). The results from Classes III and V suggested the patients from these two groups might have metabolic syndrome, a possible side effect from antipsychotics. Furthermore, Class IV had significant changes for cell counts of decreased lymphocytes (mean diff = −3.12, 95% CI −5.32–−0.92, ) and increased neutrophils (, 95% CI 0.47–5.61, ). Therefore, neutrophil to lymphocyte ratio (NLR) was increased in this group, suggesting those patients were under inflammation or linked to immune dysfunctions.

2.6. Post Hoc Analyses of Other Clinical Features and Laboratory Tests

To explore whether these subtypes had other specific clinical features documented in the CATIE dataset, we conducted post hoc analyses with the data of the structured clinical interview for DSM-IV axis I disorders (SCID), a clinical global impression (CGI), and trial discontinuation—the major outcome measures for the CATIE study. From the SCID data, we found that compared to Class I, Class IV had older age when first prescribed antipsychotic medicine (, ) and was more likely to have a family history of mental illness (, ) (Table S3). From CGI data, Class II was more likely to use tobacco products in the last 3 months (, ) (Table S4). From the data of medication discontinuation, we found that Class IV patients were less likely to discontinue their clinical trials even there was no treatment effect (effect size = −0.87, ) (Table S5). While the results from post hoc analyses were suggestive and could not survive multiple test correction, they were consistent with the results from the analyses of clinical symptoms and neurocognitive functions.

3. Conclusion and Discussion

In this study, we used the differentially shared genetic liability between schizophrenia and comorbid traits to classify and subtype schizophrenia. Using the PRSs from 10 comorbid traits, we classified the 435 patients in the CATIE study into 5 classes. As summarized in Table 5, patients from Class I had the best responses to antipsychotic therapy for symptom reduction at the end of Phase I trial and improvements for cognitive functions at Visit 6. This class could be considered treatment responsive group. After treatment, Class II showed signs of symptom reduction for the total symptoms and positive symptoms and improvement for PROCESSING SPEED. Class IV was uniquely responsive to treatment for negative symptoms and tended to improve for all cognitive domains. Interestingly, Class IV had a significant decrease in cell count for lymphocytes but an increase for neutrophils at the end of Phase I trial, suggesting that the patients in Class IV may have an ongoing inflammation condition or immune dysfunction. In contrast, patients in both Classes III and V did not have symptom reduction by the end of Phase I trial but only partial improvements for cognitive function. Patients in Class III had a higher level of phosphorus by the end of Phase I. Similarly, patients in Class V also had a higher level of phosphorus, while they had additional higher levels of HDL and calcium. Based on their treatment outcomes, Classes III and V could be considered treatment-resistant. The changes in laboratory tests from Classes III and V were indicative of the development of metabolic syndrome-related to antipsychotic drug treatment. A significant increase in phosphorus levels has been linked to some schizophrenia clinical subtypes [54]. The side effects may interfere with the treatment compliance that causes the treatment-resistant. Overall, these five classes showed some unique patterns with regard to clinical symptoms, cognitive functions, treatment outcomes, laboratory tests, and other clinical features. These results were consistent with the hypothesis that partially shared genetic liability between schizophrenia and comorbid traits could be used to classify patients into subtypes with distinct underlying biology.

As mentioned in the introduction, although there were attempts to subtyping schizophrenia using varying approaches, our study is the first to combine genetic liability to schizophrenia and other comorbid conditions for subtype classification. Our approach differs from previous studies in two aspects. First, our approach is systematic and data-driven. We used genome-wide association data to evaluate genetic liability to schizophrenia. More and more evidence suggests that the genetic architecture of schizophrenia is complex, and causal variants in patients may differ substantially. Profiling genetic liability across the entire genome could reveal the difference between individual patients. Grouping patients based on their genetic liability, i.e., subtyping, could reduce the heterogeneity within the group, leading to a better understanding of the underlying biology and new treatment strategy. The availability of genome-wide genetic data makes this data-driven approach feasible. To our knowledge, no previous study uses genome-wide data to address this issue. While there is room for improvement, our exploratory results are promising and encouraging. Second, our approach integrates partially overlapped genetic liability from comorbid conditions to subtype schizophrenia patients. Previous studies, regardless of the use of clinical symptoms, neurocognitive functions, or treatment outcomes, it is schizophrenia-centric, comorbid conditions rarely contribute to the subtyping. This is partially due to the difficulty to differentiate the same symptoms obtained from schizophrenia and comorbid conditions. With the use of genome-wide genetic profiling, i.e., PRSs, from comorbid conditions, we could distinguish to what extent that the bipolar disorder and major depression factors contribute to schizophrenia. When multiple comorbid conditions are incorporated, the difference between individual patients would be more distinctive, leading to better separation of subgroups or subtypes. The finding that different classes have a distinct association with clinical symptoms and laboratory tests is supportive of this notion.

Our study has important implications. First, our classification is associated with treatment outcomes. Classes I and IV had better outcomes as measured by symptom count reduction and improvements in 4 aspects of cognitive function. Class IV was also less likely to discontinue medication even though the antipsychotic drugs showed no effects. In contrast, Classes III and V had no reduction of symptoms but only partial improvements in cognitive function by the end of the Phase I trial. This contrast in treatment outcomes is of clinical importance, especially for prognostic prediction. Second, our classification is associated with specific laboratory tests. Clinical laboratory tests are a cornerstone of modern medicine; they constitute the basis for the diagnosis and treatment for many physical diseases. Our analyses of the CATIE dataset indicated that Class IV was associated with the cell count changes of lymphocytes and neutrophils between the baseline and the end of Phase I trial. As we all know, both lymphocytes and neutrophils are important parts of the immune system. Lymphocytes are responsible for antigen recognition and antibody production, while neutrophils respond to inflammation and kill invaded microorganisms and damaged cells. Since the connection between schizophrenia and the immune system is well documented in the literature, this specific association of Class IV and the immune cells suggests unique underlying pathophysiology. On the other hand, an elevated NLR is widely considered as an indicator of inflammation, because the physiological response of leukocytes to inflammation often leads to higher neutrophils and lower lymphocytes in the body [55]. Indeed, NLR has been reported to be related to the different stages of schizophrenia, supporting inflammation or immune hypothesis in schizophrenia [56]. For Classes III and V, the elevated phosphorus accompanying with other laboratory test changes such as HDL and electrolytes indicate that those patients may have ongoing metabolic syndrome. The specific association between these laboratory tests and subtype classes implies distinctive underlying biology. These tests may be used as biomarkers for subtyping and treatment evaluation. Third, the features associated with the subtype classes provide new insights for our understanding of disease pathophysiology and new strategies for treatment. For example, Class IV had a significant reduction of negative symptoms, but not a reduction of positive symptoms after Phase I trial. An elevated NLR from a simple complete blood count (CBC) often reveals an ongoing inflammation. This would explain why positive symptoms did not have a reduction among those patients, for whom treatment including adjunctive use of nonsteroid anti-inflammatory drugs would be beneficial, as previously reported [57]. Class IV was also associated with older age when the patients were first prescribed antipsychotic medicine and a family history of mental illness. This subtype also had an improvement for reasoning after Phase I. When all this information is considered together, some intriguing questions emerge. Is there a relationship between positive symptoms and inflammation or immune dysfunction? Would a treatment strategy that combines antipsychotics and anti-inflammatory drugs or immune therapy perform better for Class IV patients than a standard antipsychotics treatment? Do the age of onset (inferred by the age of the first prescription of antipsychotics) and family history of mental illness relate to treatment outcomes and reasoning? For Class V, due to its link to the metabolic syndrome, would simultaneous treatments with antipsychotics and drugs for the metabolic syndrome, or avoidance of antipsychotic drugs that increase the risk for developing metabolic syndrome, lead to better outcomes? Is an elevated level of phosphorus more related to treatment-resistant? Further investigation of these questions could help our understanding of subtype-specific mechanisms and provide the basis for the development of subtype-specific treatments.

Our study has some limitations. First, although we found class-specific associations with symptom counts, treatment outcomes, cognitive improvements, and clinical laboratory tests in CATIE patients, we could not validate these findings with independent samples. This is largely due to the limitation of available data. Further study with appropriate data is necessary to confirm our results. Second, although we screened 25 comorbid traits, our screening was by no means exhaustive. When more traits are screened, more candidate traits could be included and different cluster structures may be found. It may require multiple iterations and additional comorbid traits to reach an optimal cluster structure. Third, the demographics and the clinical data from the MGS and SSCCS datasets are currently not publicly available. Based on the diagnosis only, we may have missed some hidden confounder factors when we compare the CATIE sample with those two samples.

In summary, this study provides the first demonstration that differentially shared genetic variants between comorbid traits can be utilized to subtype schizophrenia into classes associated with specific clinical features, treatment outcomes, cognitive improvements, and laboratory tests. Patients in the CATIE study were classified into 5 classes. Classes I and IV had varying levels of treatment responses as measured by symptom reduction and had cognitive improvements for all measured domains, with Class I having better outcomes. They would be the treatment responsive group. Classes III and V had no symptom reductions but with only partial cognitive improvement after Phase I trial. They could be considered the treatment-resistant group. Using laboratory tests as a measure, Class IV would have ongoing inflammation or immune dysfunction, and Classes III and V may have metabolic syndrome. This classification may be translated into a class-specific treatment strategy. Our study is of clinical importance and mechanistic significance; it provides the evidence that data-driven subtyping, biology-based, and subtype-specific treatment of schizophrenia may be accomplishable. However, due to the limitation of data availability mentioned above, our findings need further validation.

4. Methods

4.1. Genotype Datasets

We applied for and obtained the genotype and clinical data for the MGS [43], SSCCS [7], and the CATIE [52, 53] datasets from NIMH Genetics Repository (https://www.nimhgenetics.org/). Both the MGS and SSCCS datasets were large genetic studies of schizophrenia using a case-control design. The CATIE study was a clinical trial to evaluate the efficacy of antipsychotics treatment on schizophrenia. The MGS and CATIE datasets were genotyped with Affymetrix 6.0 microarray with about 906,600 SNPs. The SSCCS was typed with Illumina OmniExpression array with 713,599 SNPs. In order to have the same markers across the MGS, SSCCS, and CATIE datasets, we used the IMPUTE2 [58] to impute the downloaded genotypes with the 1000 Genome haplotypes as reference. Markers with the INFO value <0.4 were filtered out. Details of imputation were described previously [59].

The genetic scores were used for the evaluation of the genetic relationships between schizophrenia and comorbid traits. Imputed genotype data were used for PRS calculation from MGS, SSCCS, and CATIE datasets including subjects (cases and controls). All subjects used were of European ancestry.

For cluster analyses or subtyping, only cases (affected schizophrenia individuals) were used as we are only interested in the subtype for the patients. The MGS dataset had 2,681 cases, SSCCS had 2,895 cases, and CATIE had 435 cases. Overall, MGS and SSCCS were used to discover traits comorbid with schizophrenia and explore cluster structure. The CATIE was used to verify the cluster structure and validate subtypes with clinical symptoms, treatment outcomes, neurocognitive functions, and laboratory tests (Figure 1).

4.2. Comorbid Trait Selection and Genetic Relationship with Schizophrenia

Based on the literature search by keywords “schizophrenia”, “comorbidity”, and “genome-wide association studies”, or “GWAS” from https://pubmed.ncbi.nlm.nih.gov/, we selected 25 psychiatric and physical diseases/traits that are comorbid with schizophrenia. We then downloaded the GWAS summary statistics from the GWAS catalog website (various sources). A total of 25 traits chosen were chosen (Table S1). We calculated PRSs for each specific trait [60] using markers with values ≤0.05 in both schizophrenia and the candidate traits, which had been shown to optimally capture phenotypic variance in a previous study [8]. Scores were weighted by the logarithm of the odds ratio (OR) for dichromate traits or beta for quantitative traits according to PLINK [61].

Next, we evaluated the genetic relationships between these traits and schizophrenia using logistic regression. Only those traits with regression values ≤0.15 in both the MGS and SSCCS and had the same direction of effect were included as potential classifiers for cluster analyses.

4.3. Cluster Analyses

We used the R platform to conduct our analyses. PRSs for schizophrenia [8] and the 10 selected traits were used in cluster analyses. The 10 selected traits were PGC Phase II bipolar disorder (BIP) [44]; body mass index (BMI) [62]; cannabis dependence (cannabis) [45]; ever smokers vs never smoker (evrSmk) [46]; working memory (MEM) [48]; verbal and numeric reasoning (VNR) [48]; neoopenness (OPEN) [49]; one person income per household (OPPH) [63]; subjective wellbeing (SWB) [47]; and years of schooling (YoS) [50]. We used the “NbClust” package [51] to explore the appropriate solution for the number of clusters based on Euclidean distances. The majority rule was used to select the number of clusters. In these analyses, the MGS dataset was used to explore likely cluster structure with different clustering parameters. Once a reasonable structure was found, the SSCCS dataset was used to validate the cluster structure using the same parameters. When a stable and consistent cluster structure was identified, the same parameters would be applied to the CATIE dataset to cluster the patients. The clusters from the CATIE dataset were used for membership association analyses with clinical, neurological, and laboratory data.

4.4. Clinical and Neurocognitive Data

The CATIE study was a double-blind randomized clinical trial to evaluate the effectiveness of antipsychotic drugs. There were 5 drugs used in the Phase I trial: perphenazine, olanzapine, quetiapine, risperidone, and ziprasidone. Participants were assigned a drug randomly and evaluated with extensive assessments for clinical symptoms, neurocognitive functions, and laboratory tests at the enrollment (baseline) and regular follow-ups for 18 months. We used the data obtained at the baseline, Visit 6 (about 6 months into Phase I), and at the end of the Phase I trial. Clinical symptoms were evaluated with the positive and negative syndrome scale (PANSS) [64]. Symptom count data were treated as quantitative measures without any transformation. We used the difference of symptom counts between the baseline and that at the end of Phase I trial to define treatment outcomes. If the difference was statistically significant, then the treatment outcome was judged to be effective. We used a paired -test to compare the means for each symptom category separately. Data from a structured clinical interview for DSM IV (SCID), a clinical global impression (CGI), and treatment discontinuation were used in post hoc analyses. We used the neurocognitive data from the baseline and at Visit 6 to determine the improvement of neurocognitive functions. Neurocognitive functions, as defined in the CATIE study, included vigilance (VIGILANCE), processing speed (PROCESSING SPEED), reasoning (REASON), and working memory (WORKING MEMORY). We took the data as provided by NIMH Genetic Repository. More details of the CATIE neurocognitive tests were described elsewhere [65, 66].

4.5. Clinical Laboratory Test Data

The CATIE study had data for some standard laboratory tests at the enrollment and follow-up checks. We selected the data for bilirubin, uric acid, prolactin, and cell counts for neutrophils, eosinophils, lymphocytes, and monocytes for this study. The motivation for the inclusion of laboratory tests was to evaluate whether any of these tests had the potential for use as biomarkers for the classes resulting from our cluster analysis. The tests selected were related to oxidative stress, inflammation, hyperprolactinemia, and immune functions. In the literature, there were suggestions that oxidative stress could be an underlying factor for schizophrenia [67], and antipsychotic medicine led to increased levels of prolactin [68]. Dysregulation of the immune system and inflammation in patients of schizophrenia is well documented [6971]. Both data at the baseline and at the end of Phase I were used.

4.6. Statistical Analyses

Once we grouped the patients into classes, we conducted analyses to evaluate whether these classes were associated with clinical features, treatment outcomes, neurocognitive functions, and laboratory tests by linear and logistic regression. Baseline assessments of clinical symptoms, neurocognitive and laboratory tests were used in these analyses where functional data were treated as quantitative outcomes and class memberships were treated as factorial predictors with Class I as a reference, sex, age, and assigned antipsychotics as covariates. The outcomes of Phase I treatment were evaluated by paired -tests to compare the means of tested items between the baseline, Visit 6, and that at the end of Phase I trial. In all tests, values were reported without multiple test correction.

Data Availability

All data used in this study were obtained from the NIMH Genetic Repository (https://www.nimhgenetics.org/) and are available to qualified researchers.

Conflicts of Interest

The authors have no conflict of interest.

Authors’ Contributions

TM and JW collected and processed genetic and phenotypical data used in this study and conducted polygenic analyses for comorbid traits. EH, VN, KSK, DA, EO, and AN contributed to the writing of the article. JC and XC conceived the idea, conducted cluster and association analyses, and wrote the paper. All authors reviewed the paper and agreed on its content.


This work was supported in part by grant MH101054 to XC from the National Institute of Mental Health, grant U54GM104944 pilot grant to JC from NIGMS-CTRIN, and grant P20 GM121325 to JC from NIGMS-COBRE. We thank the subjects and investigators involved in the MGS, SSCCS, and CATIE studies.

Supplementary Materials

Supplementary Table S1: traits screened for genetic relationship with schizophrenia. Table S2: baseline laboratory test in Classes II to V as compared to Class I. Table S3: class association with age first prescribed antipsychotics and family history. Table S4: class association with the use of tobacco products. Table S5: class association with treatment discontinuation. Table S6: Phase I discontinuation outcome. (Supplementary Materials)