Mathematical Problems in Engineering

Research Article

A Novel Boundary Oversampling Algorithm Based on Neighborhood Rough Set Model: NRSBoundary-SMOTE

Neighborhood rough set boundary SMOTE algorithm for oversampling (NRSBoundary SMOTE).

Input: the training sample set: , the radius of neighborhood: w.
Output: new training sample set: .
Step 1: (Initialization)
; // is the generated synthetic sample set.
; // is the minority class sample set in boundary
region which needs over-sampling.
; // is the majority class sample set in lower
approximation of decision.
Step 2: (Compute the majority class sample set and minority class sample set)
According to the decision values 1 to N, divide into subsets: ;
Compute the minority class sample set ;
Compute the majority class sample set ;
Step 3: (Compute boundary region and lower approximation of decision)
FOR each in DO
According to formulas (2) and (3), compute the distance between
and the other sample in ;
;
;
According to formula (10), compute the threshold of ;
Compute the neighborhood of , ;
IF //minority class sample which
belongs to boundary region.
THEN ;
ELSE IF //majority class sample which
belongs to lower approximation of decision.
THEN ;
END IF
END FOR
Step 4: (Generate synthetic samples from BoundSet)
FOR each in DO
BOOL ;
Compute 's nearest neighborhoods with the same classification: ;
;
WHILE DO
Choose one sample denoted by randomly;
;
//Generate a synthetic sample.
;
//Judge whether affects the lower approximation of decision.
FOR each in DO
IF THEN
;
BREAK;
END IF
END FOR
//Add to SampleSet
IF THEN
;
END IF
END WHILE
END FOR
Step 5: (Return)
;
RETURN .