Research Article

Multi-Rule Based Ensemble Feature Selection Model for Sarcasm Type Detection in Twitter

Algorithm 2

Feature selection.
Input: Preprocessed tweets
Output: Set of features
 (1) Read the Data from the file
 (2) Create empty lists for each feature that had to be extracted.
 (3) For each tweet, ti, do the following:
   //Extracting various features
  3.1 word = nltk.pos_tag(indiv_tokens)
  3.2 nouns = [‘NN’, ‘NNS’, ‘NNP’, ‘NNPS’]
  3.3 verbs = [‘VB’, ‘VBD’, ‘VBG’, ‘VBN’, ‘VBP’, ‘VBZ’]
  3.4 if word in nouns then
     increment noun_count
  3.5 else if word in verbs then
     increment verb_count
  3.6 return the normalized sum values of verbs and nouns.
  3.7 Initialize pos_int and neg_int to 0
  3.8 sent_id = SentimentIntensityAnalyzer()
  3.9 for index in range of tokens
  3.10 if tokens in intensifier_list:
     compute sent_id.polarity_scores(tokens)
  3.11 if score is negative then
     increment neg_int count
  3.12 else
     increment pos_int count
  3.13 return pos_int, neg_int
  3.14 Initialize sk_value = 0
  3.15 var = [x for x in nltk.skipgrams(token, n, j)
    //n is the degree of the n grams & j is the skip distance
  3.16 for i in range(len(var)):
  3.17 for j in range(n):
  3.18 word = sid.polarity_scores(var[i][j])
  3.19 if word corresponds to positive
     increment the sk_value
  3.20 else
     decrement the sk_value
  3.21 return sk_value
  3.22 Read a tweet from the input data set
  3.23 Load the dictionary containing popular emojis
  3.24 for ‘i’ in emoji_list:
  3.25 if i in tweet:
  3.26 update the emoji list and increment sentiment value based
     on the total occurrence of that particular emoji
  3.27 return the normalized emoji_sentiment value
  3.28 Initialize the interjection counter to 0
  3.29 Load the file containing list of interjections for interjections in the list
  3.30 if tweet contains the corresponding interjection
  3.31 update the interjection count
  3.32 for every word in tokens
  3.33 if word.isupper()
     increment uppercase count
  //Apply regular expression to find out repeating letters
  3.34 result = re.compile(r’(.)\1∗’)
  3.35 for text segment in result (repeating letters)
  3.36 if length of text segments exceeds 3 //minimum 3 consecutive
     occurrence of same letter
        Increment the repeated words count
  3.37 Initialize pos_count, neg_count, flip_count to 0
  3.38 for words in tokens:
  3.39 sent_score = sent_id.polarity_scores(words)
  3.40 if the score obtained is negative then
     Increment neg_count
  3.41 if the previous word encountered is positive then
     Increment flip_count value
  3.42 if the score obtained is positive then
     Increment pos_count
  3.43 if the previous word encountered is negative then
     Increment flip_count value
  3.44 return pos_count. neg_count, flip_count
  3.45 punct = punctuations_counter(tweet, [‘!’, ‘?’, ‘…’])
  3.46 return exclamation.append(punct[‘!’])
  3.47 return questionmark.append(punct[‘?’])
 (4) Extract the features and append the features to the lists that was created initially.