Computational Intelligence and Neuroscience

Research Article

Multi-Rule Based Ensemble Feature Selection Model for Sarcasm Type Detection in Twitter

Algorithm 2

Feature selection.

	Input: Preprocessed tweets
	Output: Set of features
	(1) Read the Data from the file
	(2) Create empty lists for each feature that had to be extracted.
	(3) For each tweet, t_i, do the following:
	//Extracting various features
	3.1 word = nltk.pos_tag(indiv_tokens)
	3.2 nouns = [‘NN’, ‘NNS’, ‘NNP’, ‘NNPS’]
	3.3 verbs = [‘VB’, ‘VBD’, ‘VBG’, ‘VBN’, ‘VBP’, ‘VBZ’]
	3.4 if word in nouns then
	increment noun_count
	3.5 else if word in verbs then
	increment verb_count
	3.6 return the normalized sum values of verbs and nouns.
	3.7 Initialize pos_int and neg_int to 0
	3.8 sent_id = SentimentIntensityAnalyzer()
	3.9 for index in range of tokens
	3.10 if tokens in intensifier_list:
	compute sent_id.polarity_scores(tokens)
	3.11 if score is negative then
	increment neg_int count
	3.12 else
	increment pos_int count
	3.13 return pos_int, neg_int
	3.14 Initialize sk_value = 0
	3.15 var = [x for x in nltk.skipgrams(token, n, j)
	//n is the degree of the n grams & j is the skip distance
	3.16 for i in range(len(var)):
	3.17 for j in range(n):
	3.18 word = sid.polarity_scores(var[i][j])
	3.19 if word corresponds to positive
	increment the sk_value
	3.20 else
	decrement the sk_value
	3.21 return sk_value
	3.22 Read a tweet from the input data set
	3.23 Load the dictionary containing popular emojis
	3.24 for ‘i’ in emoji_list:
	3.25 if i in tweet:
	3.26 update the emoji list and increment sentiment value based
	on the total occurrence of that particular emoji
	3.27 return the normalized emoji_sentiment value
	3.28 Initialize the interjection counter to 0
	3.29 Load the file containing list of interjections for interjections in the list
	3.30 if tweet contains the corresponding interjection
	3.31 update the interjection count
	3.32 for every word in tokens
	3.33 if word.isupper()
	increment uppercase count
	//Apply regular expression to find out repeating letters
	3.34 result = re.compile(r’(.)\1∗’)
	3.35 for text segment in result (repeating letters)
	3.36 if length of text segments exceeds 3 //minimum 3 consecutive
	occurrence of same letter
	Increment the repeated words count
	3.37 Initialize pos_count, neg_count, flip_count to 0
	3.38 for words in tokens:
	3.39 sent_score = sent_id.polarity_scores(words)
	3.40 if the score obtained is negative then
	Increment neg_count
	3.41 if the previous word encountered is positive then
	Increment flip_count value
	3.42 if the score obtained is positive then
	Increment pos_count
	3.43 if the previous word encountered is negative then
	Increment flip_count value
	3.44 return pos_count. neg_count, flip_count
	3.45 punct = punctuations_counter(tweet, [‘!’, ‘?’, ‘…’])
	3.46 return exclamation.append(punct[‘!’])
	3.47 return questionmark.append(punct[‘?’])
	(4) Extract the features and append the features to the lists that was created initially.