Identification and Quantification of Genomic Repeats and Sample Contamination in Assemblies of 454 Pyrosequencing Reads
Table 1
Annotation of the high read depths contigs for P. rubescens and A. flos-aquae assemblies. For each contig with an estimated copy number of at least 5x, the length, read depth, and estimate of copy number (“Est. copy number”) with upper and lower Confidence interval Limits (CL) are shown. In addition, BLASTX results are shown (maximum E value 10-16). When a contig had hits in multiple different regions, these are separated by a comma. The species to which the BLAST hit belongs is shown in between square brackets.
(a) P. rubescens
Contig
Length
Read
Est. copy
Lower
Upper
Features
(bp)
depth
number
CL
CL
13664
1309
259.7
12.4
9.1
19.1
transposase, IS4 family protein (Nostoc punctiforme PCC 73102)
13972
937
190.3
9.2
6.5
15.1
transposase, IS4 family protein (Cyanothece sp. PCC 8802)
13823
1190
180.2
8.9
6.3
12.9
transposase (Microcystis aeruginosa NIES-843)
13688
1109
176.3
8.8
5.8
12.9
transposase (Trichodesmium erythraeum IMS101)
136
3509
173.3
8.6
5.6
13.7
No hits
13792
9424
173.1
8.7
5.3
13.3
hypothetical protein Npun_R2618 (Nostoc punctiforme PCC 73102), DnaB domain protein helicase domain protein (Cyanothece sp. PCC 7822)
13610
852
163.2
8.0
5.2
12.6
No hits
13711
1051
145.0
7.2
5.3
10.2
No hits
13735
2163
144.0
7.2
4.0
11.5
conserved hypothetical protein (Cyanothece sp. PCC 7425)
13901
611
140.9
7.1
4.7
9.8
transposase, IS605 OrfB family (Cyanothece sp. PCC 8801)
13846
902
132.8
7.2
2.5
9.8
hypothetical protein L8106_22791 (Lyngbya sp. PCC 8106)
14014
770
123.8
6.0
4.3
9.8
Histone-like DNA-binding protein (Lyngbya sp. PCC 8106)
13843
1712
111.4
5.6
3.5
8.5
RNA-directed DNA polymerase (Microcystis aeruginosa NIES-843)
13469
669
104.2
5.2
3.8
7.4
No hits
13858
641
99.5
5.1
3.0
7.2
hypothetical protein L8106_22631 (Lyngbya sp. PCC 8106)