Research Article

Iterative Variable Gene Discovery from Whole Genome Sequencing with a Bootstrapped Multiresolution Algorithm

Figure 4

(a) Density distributions of the iterative learning algorithm of VgeneFinder for successive iterations using 14 WGS primate datasets. (b) Number of total sequences as a function of iterations for two different feature vector transforms; the AA frequency transform considers consecutive pairs of amino acids, while the AA physicochemical is a method that forms a feature vector using physical properties depending on the position of amino acids. (c) The number of sequences that are below the prediction threshold as a function of iteration, indicating that exons which are quite distant from the initial training set (but nonetheless viable V-genes), are gradually included as the iterative process evolves. (d) Example of TRAV multispecies tree for starting set (with H. sapiens) and 2 iterations (see more detailed view in Figure 5(c)).
(a)
(b)
(c)
(d)