Research Article
Edge-Based Detection and Classification of Malicious Contents in Tor Darknet Using Machine Learning
Input: Web Set | Output: Corpus set ([0], []) | (1) | for TO do | (2) | content = obtain the HTML content of | (3) | use“lxml()”funtion to parse the content, then remove HTML tags, script andso on | (4) | text = preserve the text content displayed on the page | (5) | for in text do | (6) | if = ‘’ or = ‘’ then | (7) | = ‘’ | (8) | end if | (9) | end for | (10) | Lowercase all English words | (11) | for in text do | (12) | if is a punctuation or a number then | (13) | = ‘’ | (14) | end if | (15) | end for | (16) | PorterStemmer(text)//Unify all word formats | (17) | word_list(, …) = text.split(‘’) | (18) | for TO do | (19) | if word_list[] stopwords(, … ) or 2 len(word_list[ ]) 12 then | (20) | delete_wordlist[i] | (21) | end if | (22) | end for | (23) | SET word_list(, … ).join(‘’)//Words are concatenated to strings | (24) | end for | (25) | return |
|