Local Packing Density Is the Main Structural Determinant of the Rate of Protein Sequence Evolution at Site Level
Table 1
Site-specific properties.
Symbol
Property measured
Name and description
CS
Rate of evolution
ConSurf rate of evolution: estimated rate relative to the overall average, computed using an empirical Bayesian approach using the phylogenetic tree topology and branch lengths and the JTT probability matrix of amino acid substitutions as implemented in the ConSurf web server.
ET
Sequence variability
Real-valued evolutionary trace: sequence variability score computed using a weighted average of sequence entropy with weights accounting for the topology of the phylogenetic tree.
KBSP
Sequence conservation
Karlin & Brocchieri Sum-of-Pairs: sequence conservation score computed by summing amino acid similarity scores over all amino acid pairs of the site’s column in a multiple sequence alignment. Similarity scores are obtained using a normalized JTT250 matrix.
VTSP
Sequence conservation
Valdar & Thornton Sum-of-Pairs: sequence conservation score computed by summing amino acid similarity scores over all amino acid pairs of the site’s column in a multiple sequence alignment. Sequences are weighted, and similarity scores are obtained using a min-max normalized JTT250 matrix.
EN
Sequence variability
Entropy: Shannon information entropy computed using the amino acid frequencies observed at the site’s MSA column.
CN
Local packing
Contact number: the number of within various distances of the site’s . The cut-off distance ranges from 9 to 30 Å.
WCN
Local packing
Weighted contact number: measure of contact density obtained by summing the inverse square distances between the site’s and the rest of the sites of the protein.
ASA
Solvent accessibility
Accessible surface area: solvent accessibility of the site computed by rolling a 1.4-Å sphere over the residue’s molecular surface.
RSA
Solvent accessibility
Relative solvent accessibility: solvent accessibility of the site computed by rolling a 1.4-Å sphere over the residue’s molecular surface, divided by the maximum value for residues of the same type. We consider three different tables of values of maximum ASA resulting in three RSA measures: RSAR, RSAM, and RSAT.