Research Article

A Novel Boundary Oversampling Algorithm Based on Neighborhood Rough Set Model: NRSBoundary-SMOTE

Algorithm 1

Neighborhood rough set boundary SMOTE algorithm for oversampling (NRSBoundary SMOTE).
Input: the training sample set: , the radius of neighborhood: w.
Output: new training sample set: .
Step  1: (Initialization)
; // is the generated synthetic sample set.
; // is the minority class sample set in boundary
       region which needs over-sampling.
;  // is the majority class sample set in lower
       approximation of decision.
Step  2: (Compute the majority class sample set and minority class sample set)
 According to the decision values 1 to N, divide into subsets: ;
 Compute the minority class sample set ;
 Compute the majority class sample set ;
Step  3: (Compute boundary region and lower approximation of decision)
 FOR each   in DO
  According to formulas (2) and (3), compute the distance between
   and the other sample in ;
   ;
   ;
  According to formula (10), compute the threshold of ;
  Compute the neighborhood of , ;
  IF //minority class sample which
                belongs to boundary region.
   THEN ;
  ELSE IF //majority class sample which
                    belongs to lower approximation of decision.
   THEN ;
  END IF
 END FOR
Step  4: (Generate synthetic samples from BoundSet)
 FOR each   in DO
  BOOL ;
  Compute 's nearest neighborhoods with the same classification: ;
   ;
  WHILE DO
   Choose one sample denoted by randomly;
    ;
   //Generate a synthetic sample.
    ;
   //Judge whether affects the lower approximation of decision.
   FOR each   in DO
    IF THEN
       ;
      BREAK;
    END IF
   END FOR
   //Add to SampleSet
   IF THEN
     ;
   END IF
  END WHILE
 END FOR
Step  5: (Return)
;
 RETURN .