BioMed Research International

Review Article

Incorporating Pathway Information into Feature Selection towards Better Performed Gene Signatures

Penalty terms used in the penalty methods.


Methods	Mathematical notation	Characteristics

Li & Li, 2008 [27]	Here, is the degree of freedom for gene u, recording the sum of weights for all genes connected to gene u. is the weight for the edge between genes u and v.	Aims at smoothing the coefficients over the network, ignoring that neighboring genes might have ’s in opposite directions.

[55]	Here, is the estimated value of coefficient for gene u, and sign (x) represents the sign of x, if x>0 sign(x)=1; x<0 sign(x)=-1; otherwise sign(x)=0.	Accounts for that two connected genes might have ’s with different signs, but may not work well since it is difficult to estimate the signs for ’s.

[30]		Shrinks the weighted ’s of two neighboring genes towards each other, but the estimates may be severely biased.
[26, 56]	for , it becomes	A 2-step procedure is used to reduce biases; it is proved that this performs better than that with smaller

[57]	Here, I (x) is an indicator. If the condition x is true I(x)=1, otherwise its value is 0.	Encourages simultaneous selection of neighboring genes in the network. But the Indictor function I is not continuous and thus needs special care.

The generalize elastic net: [29]	Here D and P are additional penalty weights for individual genes (gene-level penalty) and gene pairs (pathway-level penalty).	Includes the network-constrained penalty term by [27] as a special case, capable of accommodating any positive semi-definite measure of dissimilarity between pairs of genes.