Review Article

Complexity of Gene Expression Evolution after Duplication: Protein Dosage Rebalancing

Table 1

Analysis of the duplication-degeneration-complementation (DDC) model using expression profiles of within-species paralogs (gene X versus genes Y1/Y2).

> , 
> (predicted by DDC)
< , 
< (contrary to DDC prediction)
> , 
<
Or 
< , 
> (contrary to DDC prediction)

161546

= 0.24, for 16 (expected 0.25) versus 15 + 46 (expected 0.75)

Kendall’s rank correlation coefficient was used to measure the similarity between expression profiles of pairs of human-mouse paralogs (I analyzed cases when one genome contains one gene copy X and another genome contains two copies Y1 and Y2). The number of cases where the expression profile shows a greater similarity to the combined expression profile ( = + ) as predicted by the DDC model (the first column) is compared with the number of cases where shows a greater similarity to , , or both (the second and third columns) using the binomial test. The ortholog-paralog cluster construction protocol included, first, all-against-all comparison of protein sequences from the analyzed human and genomes using the BLASTP program, with masking of low sequence complexity regions using the SEG program [34]. At the second step, orthologs were identified using symmetrical best hits. Paralogs were delineated using within-species and between-species BLASTP hits (-value < 10−20) using the single linkage clustering procedure (the 50% identity score was used as a threshold) [34]. The RPKM values, that is, reads per kilobase of exon model per million mapped reads [33], were calculated from the counts values for each of four tissues shared by human and mouse (heart, kidney, liver, and lung) [34]. The expression data and clusters of orthologs and paralogs are available at ftp://ftp.ncbi.nlm.nih.gov/pub/managdav/paper_suppl/ortholog_conjecture/.