| Input: Preprocessed tweets |
| Output: Set of features |
| (1) Read the Data from the file |
| (2) Create empty lists for each feature that had to be extracted. |
| (3) For each tweet, ti, do the following: |
| //Extracting various features |
| 3.1 word = nltk.pos_tag(indiv_tokens) |
| 3.2 nouns = [‘NN’, ‘NNS’, ‘NNP’, ‘NNPS’] |
| 3.3 verbs = [‘VB’, ‘VBD’, ‘VBG’, ‘VBN’, ‘VBP’, ‘VBZ’] |
| 3.4 if word in nouns then |
| increment noun_count |
| 3.5 else if word in verbs then |
| increment verb_count |
| 3.6 return the normalized sum values of verbs and nouns. |
| 3.7 Initialize pos_int and neg_int to 0 |
| 3.8 sent_id = SentimentIntensityAnalyzer() |
| 3.9 for index in range of tokens |
| 3.10 if tokens in intensifier_list: |
| compute sent_id.polarity_scores(tokens) |
| 3.11 if score is negative then |
| increment neg_int count |
| 3.12 else |
| increment pos_int count |
| 3.13 return pos_int, neg_int |
| 3.14 Initialize sk_value = 0 |
| 3.15 var = [x for x in nltk.skipgrams(token, n, j) |
| //n is the degree of the n grams & j is the skip distance |
| 3.16 for i in range(len(var)): |
| 3.17 for j in range(n): |
| 3.18 word = sid.polarity_scores(var[i][j]) |
| 3.19 if word corresponds to positive |
| increment the sk_value |
| 3.20 else |
| decrement the sk_value |
| 3.21 return sk_value |
| 3.22 Read a tweet from the input data set |
| 3.23 Load the dictionary containing popular emojis |
| 3.24 for ‘i’ in emoji_list: |
| 3.25 if i in tweet: |
| 3.26 update the emoji list and increment sentiment value based |
| on the total occurrence of that particular emoji |
| 3.27 return the normalized emoji_sentiment value |
| 3.28 Initialize the interjection counter to 0 |
| 3.29 Load the file containing list of interjections for interjections in the list |
| 3.30 if tweet contains the corresponding interjection |
| 3.31 update the interjection count |
| 3.32 for every word in tokens |
| 3.33 if word.isupper() |
| increment uppercase count |
| //Apply regular expression to find out repeating letters |
| 3.34 result = re.compile(r’(.)\1∗’) |
| 3.35 for text segment in result (repeating letters) |
| 3.36 if length of text segments exceeds 3 //minimum 3 consecutive |
| occurrence of same letter |
| Increment the repeated words count |
| 3.37 Initialize pos_count, neg_count, flip_count to 0 |
| 3.38 for words in tokens: |
| 3.39 sent_score = sent_id.polarity_scores(words) |
| 3.40 if the score obtained is negative then |
| Increment neg_count |
| 3.41 if the previous word encountered is positive then |
| Increment flip_count value |
| 3.42 if the score obtained is positive then |
| Increment pos_count |
| 3.43 if the previous word encountered is negative then |
| Increment flip_count value |
| 3.44 return pos_count. neg_count, flip_count |
| 3.45 punct = punctuations_counter(tweet, [‘!’, ‘?’, ‘…’]) |
| 3.46 return exclamation.append(punct[‘!’]) |
| 3.47 return questionmark.append(punct[‘?’]) |
| (4) Extract the features and append the features to the lists that was created initially. |