Abstract

Chronic myeloid leukemia (CML) is a myeloproliferative disease derived from an abnormal hematopoietic stem cell (HSC) and is consistently associated with the formation of Philadelphia (Ph) chromosome. Tyrosine kinase inhibitors (TKIs) are highly effective in treating chronic phase CML but do not eliminate leukemia stem cells (LSCs), which are believed to be related to disease relapse. Therefore, one major challenge in the current CML research is to understand the biology of LSCs and to identify the molecular difference between LSCs and its normal stem cell counterparts. Comparing the gene expression profiles between LSCs and normal HSCs by DNA microarray assay is a systematic and unbiased approach to address this issue. In this paper, we present a DNA microarray dataset for CML LSCs and normal HSCs to show that the microarray assay will benefit the current and future studies of the biology of CML stem cells.

1. Introduction

Chronic myeloid leukemia (CML) is a clonal myeloproliferative disorder that originates from an abnormal hematopoietic stem cell (HSC) harboring the Philadelphia chromosome (Ph) [13]. The Ph is caused by a reciprocal translocation between chromosomes 9 and 22 (t(9; 22)(q34; q11)), resulting in the formation of the chimeric BCR-ABL protein [1, 4]. As a result, the BCR-ABL oncogene is formed. BCR-ABL is a constitutively active tyrosine kinase, which has provided the molecular basis for designing therapeutic drugs. Imatinib is the first tyrosine kinase inhibitor (TKI) found to be highly effective in CML treatment through inhibiting BCR-ABL kinase activity [57]. Targeted therapy with TKIs induces a complete hematologic and cytogenetic response in more than 90% of chronic phase CML patients [8]. However, TKIs cannot efficiently eradicate leukemia stem cells (LSCs) of CML due to the insensitivity of LSCs to TKIs [9, 10]. Therefore, developing new strategies to target LSC is necessary and critical for curing CML, and the success of this approach requires a full understanding of the biology of LSCs. Although increasing evidence demonstrated that CML stem cells utilize signaling pathways that are independent of BCR-ABL kinase activity for their maintenance and survival [11, 12], the underlying molecular mechanisms remain unclear. To address these challenging issues, we performed a DNA microarray assay to compare the gene expression profiles between LSCs and normal HSCs. Here, we present the entire microarray dataset in open access format to demonstrate the usefulness of the microarray data in studying LSCs in CML.

2. Methodology

Our previous study has shown that BCR-ABL-expressing LinSca-1+c-Kit+ (LSK) cells function as LSCs [10]. In order to isolate LSCs, eight- to twelve-week-old C57BL/6J mice were used for the induction of CML by BCR-ABL transduction of bone marrow cells in our bone marrow transduction/transplantation mouse model [1315]. As the parallel control experiment, non-BCR-ABL-expressing (transduced with the empty GFP vector) bone marrow cells were also transplanted into recipient mice, and GFP+LSK cells derived from these mice were used as normal hematopoietic stem cells. At day 13 after bone marrow transplantation, 30 CML mice were treated with imatinib and another 30 CML mice were treated with placebo. Similarly, 30 control mice were treated with imatinib and another 30 control mice were treated with a placebo (Table 1). The drug was given orally in a volume of 0.25 mL by gavage (100 mg/kg; total 6 doses; 1 dose per 4 hours). Water was used as placebo. These mice were sacrificed at day 14 after bone marrow transplantation to collect bone marrow cells for the isolation of total RNA. Red blood cells (RBCs) were depleted using RBC lysis buffer (containing NH4Cl, KHCO3, and EDTA). To stain the cells with antibodies, bone marrow cells were suspended in staining medium (Hank’s Balanced Salt Solution (HBSS) with 2% heat-inactivated calf serum) and incubated with biotin-labeled lineage antibody cocktail containing a mixture of antibodies against CD3, CD4, CD8, B220, Gr-1, Mac-1, and Ter119 at 4°C for 15 min for identifying lineage-positive cells for elimination. After washing, the fluorochrome-labeled secondary antibody (APC-Cy7-conjugated streptavidin) for recognizing biotin and PE-conjugated c-Kit and APC-conjugated Sca-1 antibodies were added to the cells for 15 min at 4°C in the dark. All these antibodies were purchased from eBioscience, USA. After washing, Fluorescence-Activated Cell Sorter (FACS) was run for sorting LSCs and normal HSCs. About 20 000 cells were directly sorted into RLT buffer (RNeasy Micro Kit, Qiagen, CA, USA) (Figure 1).

The total RNA was extracted using the RNeasy Micro Kit (Qiagen, CA, USA), and the quality of RNA was assessed using a 2100 Bioanalyzer instrument and RNA 6000 Pico LabChip assay (Agilent Technologies, Palo Alto, CA, USA). The ribosomal RNA ratios (28S/18S) are around 1.4, the RNA integrity numbers (RINs) of all samples are above 8, and all the RNA samples have an OD 260/280 ratio of 1.8 to 2. Utilizing the GeneChip Whole Transcript Sense Target Labeling Assay kit (Affymetrix, Santa Clara, CA, USA), 20 ng of total RNA undergoes reverse transcription with random hexamers tagged with T7 sequence. The double-stranded cDNA generated was then amplified by T7 RNA polymerase to produce cRNA. Second cycle first strand cDNA synthesis was performed by incorporating dUTP which was later used as sites where fragmentation was created by utilizing a uracil DNA glycosylase and apurinic/apyrimidinic endonuclease 1 enzyme mix. The fragmented cDNA was then labeled by terminal transferase attaching a biotin molecule using Affymetrix proprietary DNA labeling reagent. Approximately 2.0 μg of fragmented and biotin-labeled cDNA was hybridized onto a Mouse Gene ST 1.0 Array (Affymetrix, Santa Clara, CA, USA) for 16 hours at 45°C. Posthybridization staining and washing were performed according to manufacturer's protocols using the Fluidics Station 450 instrument (Affymetrix). Finally, the arrays were scanned with a GeneChip Scanner 3000. Images were acquired, and cel files were generated and then used for analysis.

Microarray data analysis was as follows. The average signal intensities for each probeset within arrays were calculated by the rma function provided within the affy package for . The quality control of the microarray data was shown in Figure 2. The RMA method incorporates convolution background correction and summarization based on multiarray model that fits robustly using the median polish algorithm. Data were quantile-normalized to bring arrays onto a common scale. In order to systematically identify the critical genes that are specific for LSCs, we used the Shaoguang/Dongguang Li (SDLI) optimization method, which is developed by us [16]. Exploratory hierarchical clustering analysis was conducted for classifying samples across time points. Array preprocessing and fold-change analysis was conducted for each batch separately. The data were sorted to show the pattern of the fold changes for all genes. Expression of all genes was analyzed simultaneously. To find the gene subset that corresponds to LSC function at the minimum/maximum extent, an Orthogonal Array (OA) sampling procedure was performed to construct a multi-subset class predictor with a pyramidal hierarchy [16]. The detailed mathematical information is described as follows.

To sort the data based on chromosome groups, the average change was calculated: average change , where are the expression fold changes and are the numbers of genes in every chromosome group, ,   .

To enhance the signal-to-noise ratio, the dynamics of all the gene expression levels were calculated: average absolute change , where are the expression fold changes and are the numbers of genes in every chromosome group, ,  . Next, the absolute difference between two expression groups was calculated: absolute difference , where are the expression fold changes and are the numbers of genes in every chromosome group, ,  .

To find the gene subset that corresponds to the minimum (or maximum) of the objective function to give optimum solutions, an Orthogonal Array (OA) sampling procedure was combined with some search space reduction strategies for constructing a multi-subset class predictor with a pyramidal hierarchy. To discover the optimum solution, we consider a multidimensional continuous function with multiple global minima and local minima on subset of . (i) Definition of local minima: for a given point , if there exists a -Neighborhood of ,  , such that and , then is called a local minimal point of . (ii) Definition of global minima: if for every the inequality is correct, then is called a global minimum of on and the global minima of on form a global minimum set. (iii) Finding of the global minima: for a given constant such that the level set is nonempty, if , where is the Lebesgue measure of , then is the minimum of and is the global minimum set. Otherwise, assume that and is the mean value of on . Then and , and then the level set and mean value of on were gradually constructed as follows: With the assistance of OA’s sampling, a decreasing sequence of mean values and a sequence of level sets were obtained.

Let where is the minimum of on and is the global minimum set.

This analytical method allows us to narrow down candidate genes to a list of 10–20 genes for further functional tests.

3. Dataset Description

The dataset associated with this Dataset Paper consists of 3 items which are described as follows.

Dataset Item 1 (Table). Probe set ID and signal intensities of 8 samples. RMA function was used to normalize and calculate the signal intensities. The column HSC 1 presents the normalized signal intensity of HSC sample 1; LSC 1, the normalized signal intensity of LSC sample 1; HSC with Imatinib 1, the normalized signal intensity of HSC with imatinib treatment sample 1; LSC with Imatinib 1, the normalized signal intensity of LSC with imatinib treatment sample 1; HSC 2, the normalized signal intensity of HSC sample 2; LSC 2, the normalized signal intensity of LSC sample 2; HSC with Imatinib 2, the normalized signal intensity of HSC with imatinib treatment sample 2; LSC with Imatinib 2, the normalized signal intensity of LSC with imatinib treatment sample 2. A breif summary is also shown in Table 1.

  • Column 1: Probe Set ID
  • Column 2: HSC 1
  • Column 3: LSC 1
  • Column 4: HSC with Imatinib 1
  • Column 5: LSC with Imatinib 1
  • Column 6: HSC 2
  • Column 7: LSC 2
  • Column 8: HSC with Imatinib 2
  • Column 9: LSC with Imatinib 2

Dataset Item 2 (Table). An updated annotation file with gene designations and probe set ID numbers based on tables available from the Affymetrix website. This will aid the reader in finding a gene of interest based on gene symbols and gene products through the probe ID number. The column Species Scientific Name presents the genus and species of the organism represented by the probe set; Sequence Source, the database from which the sequence used to design this probe set was taken; Representative Public ID, the accession number of a representative sequence; Gene Symbol, a gene symbol when one is available (from UniGene). Note that for consensus-based probe sets, the representative sequence is only one of several sequences (sequence subclusters) used to build the consensus sequence and it is not directly used to derive the probe sequences. The representative sequence is chosen during array design as a sequence that is best associated with the transcribed region being interrogated by the probe set.

  • Column 1: Probe Set ID
  • Column 2: GenBank Accession Number
  • Column 3: Species Scientific Name
  • Column 4: Sequence Type
  • Column 5: Sequence Source
  • Column 6: Target Description
  • Column 7: Representative Public ID
  • Column 8: Gene Title
  • Column 9: Gene Symbol

Dataset Item 3 (Table). We compared gene expression between LSCs and HSCs. This item shows the list of genes which differently express themselves in LSCs when compared to HSCs. The cutoff for relative fold change is set up to 2. The column Relative Fold Change 1 presents the relative fold change comparing LSC sample 1 with HSC sample 1, and Relative Fold Change 2 presents the relative fold change comparing LSC sample 2 with HSC sample 2.

  • Column 1: Probe Set ID
  • Column 2: Relative Fold Change 1
  • Column 3: Relative Fold Change 2
  • Column 4: Gene Symbol
  • Column 5: Gene Title

4. Concluding Remarks

We present a detailed description about microarray dataset that might make this dataset very useful for any investigator to compare the gene expression profile of normal HSCs and LSCs with or without imatinib treatment. It will also make easy to identity critical regulators of LSCs of CML. By comparing the gene expression profiles of LSCs with those of HSCs, we identified several genes that might be critical for the maintenance and self-renewal of LSCs. A brief gene list was shown in Table 2. Among these genes, the roles of several genes, including Alox5 [17], Blk [12], and Scd1 [18], have been functionally validated using serial of genetic approaches. Ongoing and further studies from our laboratory and other groups on the biology of LSCs in CML will be facilitated with this dataset in an open access format.

Dataset Availability

The dataset associated with this Dataset Paper is dedicated to the public domain using the CC0 waiver and is available at http://dx.doi.org/10.1155/2013/520285.

Conflict of Interests

The authors declare that there is no conflict of interests.

Acknowledgments

This work was supported by grants from the Leukemia & Lymphoma Society and the National Institutes of Health (R01-CA122142, R01-CA114199) to Shaoguang Li.

Dataset Files

  • 520285.item.1.xlsx

    Dataset Item 1 (Table). Probe set ID and signal intensities of 8 samples. RMA function was used to normalize and calculate the signal intensities. The column HSC 1 presents the normalized signal intensity of HSC sample 1; LSC 1, the normalized signal intensity of LSC sample 1; HSC with Imatinib 1, the normalized signal intensity of HSC with imatinib treatment sample 1; LSC with Imatinib 1, the normalized signal intensity of LSC with imatinib treatment sample 1; HSC 2, the normalized signal intensity of HSC sample 2; LSC 2, the normalized signal intensity of LSC sample 2; HSC with Imatinib 2, the normalized signal intensity of HSC with imatinib treatment sample 2; LSC with Imatinib 2, the normalized signal intensity of LSC with imatinib treatment sample 2. A breif summary is also shown in Table 1.

    • Column 1: Probe Set ID
    • Column 2: HSC 1
    • Column 3: LSC 1
    • Column 4: HSC with Imatinib 1
    • Column 5: LSC with Imatinib 1
    • Column 6: HSC 2
    • Column 7: LSC 2
    • Column 8: HSC with Imatinib 2
    • Column 9: LSC with Imatinib 2

  • 520285.item.2.xlsx

    Dataset Item 2 (Table). An updated annotation file with gene designations and probe set ID numbers based on tables available from the Affymetrix website. This will aid the reader in finding a gene of interest based on gene symbols and gene products through the probe ID number. The column Species Scientific Name presents the genus and species of the organism represented by the probe set; Sequence Source, the database from which the sequence used to design this probe set was taken; Representative Public ID, the accession number of a representative sequence; Gene Symbol, a gene symbol when one is available (from UniGene). Note that for consensus-based probe sets, the representative sequence is only one of several sequences (sequence subclusters) used to build the consensus sequence and it is not directly used to derive the probe sequences. The representative sequence is chosen during array design as a sequence that is best associated with the transcribed region being interrogated by the probe set.

    • Column 1: Probe Set ID
    • Column 2: GenBank Accession Number
    • Column 3: Species Scientific Name
    • Column 4: Sequence Type
    • Column 5: Sequence Source
    • Column 6: Target Description
    • Column 7: Representative Public ID
    • Column 8: Gene Title
    • Column 9: Gene Symbol

  • 520285.item.3.xlsx

    Dataset Item 3 (Table). We compared gene expression between LSCs and HSCs. This item shows the list of genes which differently express themselves in LSCs when compared to HSCs. The cutoff for relative fold change is set up to 2. The column Relative Fold Change 1 presents the relative fold change comparing LSC sample 1 with HSC sample 1, and Relative Fold Change 2 presents the relative fold change comparing LSC sample 2 with HSC sample 2.

    • Column 1: Probe Set ID
    • Column 2: Relative Fold Change 1
    • Column 3: Relative Fold Change 2
    • Column 4: Gene Symbol
    • Column 5: Gene Title