Mathematical Problems in Engineering

Research Article

Efficient Data Mining Algorithms for Screening Potential Proteins of Drug Target

EM for negative Selection Algorithm.

Input: The unlabeled dataset , the positive dataset , the number of selection
Initialization the reliable negative set RN = NULL
Run EM on mixture model using U and P to derive the mixture probability distributions

For each sample in U:
Compute the probability of the sample assigned as the negative

Rank the above probability likelihood in decreasing order
Select the top L samples to append the RN
Output: The reliable negative samples RN