Research Article

Assigning Significance in Label-Free Quantitative Proteomics to Include Single-Peptide-Hit Proteins with Low Replicates

Table 2

Overview of the key steps in extended selection of differentially regulated proteins.

StepProcedureJustificationUtilized data

1Establish a null distributionA null distribution affords an estimation of measurement noise originated from biological sample preparations and analytical procedure. The noise will dictate the threshold cutoff to distinguish regulated proteins from unregulated ones.Protein abundances and in the four quantitation categories c , c , c , and c (Figure 1; Table 1). These four quantitation categories represent the replicate analyses of the same [15N]-labeled control protein sample run together with the other two unlabeled protein samples. Thus, regulated proteins are not expected from any pairing between these four quantitation categories.
2Model local noise in the null distributionThe measurement noise is not evenly distributed throughout the range of different peptide and protein abundances. Instead, the noise is locally dependent upon the signal strength that is, peptide and protein abundances, in a region. Thus, the threshold to select regulated proteins could be different at different protein abundance level. Modeling the distribution of noise according to protein abundances will help to discern more subtle changes for more abundant proteins while reduce the false positives for less abundant proteins.

3Select regulated proteins with PLGEM-STN statisticPLGEM-STN statistic has been used in analyzing microarray data and spectral-count based quantitative proteomics data. The PLGEM approach establishes the distribution of noise according to gene/protein abundance level. In combination with STN statistic, adaptive thresholds are applied according to the protein abundance levels to maximize the selection of regulated proteins at higher abundance level while reduce the false positives for lower abundance proteins. For determining false positives: Use the protein abundances and in the four quantitation categories c , c , c , and c (Figure 1; Table 1). For  determining  positives: Use the protein abundances and in the four quantitation categories , , , and (Figure 1; Table 1).   and     represent the duplicate analyses of the unlabeled protein sample originated from the acid stressed culture S. and represent the duplicate analyses of the unlabeled protein sample originated from the reference neutral pH culture R. Thus, regulated proteins are expected from any pairings between these four quantitation categories that is, , , , and .
4Apply the MPSP ruleDue to the imperfection commonly found in many data sets and statistical models, the PLGEM-STN was not stringent enough to reduce false discovery rates in the label-free quantitative proteomics analysis. The MPSP rule is introduce to further reduce false discovery rates. The MPSP rule simply requires that a protein is accepted as a regulated one only if it is found regulated in multiple permutations of sample pairings using any kind of statistics, such as a -test, PLGEM-STN, or even a fold-change threshold.
5Select regulated proteins with the PLGEM-STN-MPSP approachThe use of a combination of PLGEM-STN-MPSP approach reduces false discovery rates compared to PLGEM-STN statistic alone.
6Select regulated proteins with a fold-change-MPSP approachThe PLGEM-STN statistic over-penalizes the proteins with low abundances. A fold-change threshold in combination with MPSP is found more effective to select regulated proteins in the lower abundance region.
7Comparison of the PLGEM-STN-MPSP and fold-change-MPSP approachesWhile the PLGEM-STN-MPSP approach over-penalizes lower-abundance proteins, the fold-change-MPSP approach over-penalizes the higher-abundance proteins. Thus, the two approaches are complimentary and can be used in combination.