Input: the training sample set: , the radius of neighborhood: w. |
Output: new training sample set: . |
Step 1: (Initialization) |
; // is the generated synthetic sample set. |
; // is the minority class sample set in boundary |
region which needs over-sampling. |
; // is the majority class sample set in lower |
approximation of decision. |
Step 2: (Compute the majority class sample set and minority class sample set) |
According to the decision values 1 to N, divide into subsets: ; |
Compute the minority class sample set ; |
Compute the majority class sample set ; |
Step 3: (Compute boundary region and lower approximation of decision) |
FOR each in DO |
According to formulas (2) and (3), compute the distance between |
and the other sample in ; |
; |
; |
According to formula (10), compute the threshold of ; |
Compute the neighborhood of , ; |
IF //minority class sample which |
belongs to boundary region. |
THEN ; |
ELSE IF //majority class sample which |
belongs to lower approximation of decision. |
THEN ; |
END IF |
END FOR |
Step 4: (Generate synthetic samples from BoundSet) |
FOR each in DO |
BOOL ; |
Compute 's nearest neighborhoods with the same classification: ; |
; |
WHILE DO |
Choose one sample denoted by randomly; |
; |
//Generate a synthetic sample. |
; |
//Judge whether affects the lower approximation of decision. |
FOR each in DO |
IF THEN |
; |
BREAK; |
END IF |
END FOR |
//Add to SampleSet |
IF THEN |
; |
END IF |
END WHILE |
END FOR |
Step 5: (Return) |
; |
RETURN . |