Research Article

Phishing Target Identification Based on Neural Networks Using Category Features and Images

Table 1

Features for phishing target identification.

FeaturesNo.Feature identifierDescription

URL featuresF1SchemeScheme of URL (HTTP or HTTPS)
F2DomainDomain of URL
F3top_domainTop-level domain of URL
F4second_domainSecond-level domain of URL
F5domain_levelDepth of domain level
F6domain_lenLength of domain
F7behind_domain_lenLength of path
F8dash_countNumber of “-” in URL
F9num_countNumber of nums in URL
F10slash_countNumber of “” in URL
F11special_symbol_countNumber of “@ _%#” in URL
F12top_charThe character appears most frequently in URL
F13top_symbolThe symbol appears most frequently in URL
F14sens_words_urlSensitive words (i.e., “secure,” “account,” “login,” “signing,” and “confirm”) in URL
F15url_word_top3Top three words with the highest word frequency in URL

Host featuresF16valid_daysValid days of domain
F17registrant_countryRegistrant country of domain
F18AIP in A record for domain (A.B.C.D)
F19A_1P In A record for domain (A.B.C)
F20A_2IP in A record for domain (A.B)
F21A_IP_numNumber of IP in A record for domain
F22CNAMECNAME in CNAME record for domain

Web resources’ featuresF23tag_countNumber of specific tags in Html source code (i.e., “ link ,” “ script ,” “ img ,” and “ form ”)
F24sens_words_htmlSensitive words (i.e., “secure,” “account,” “login,” “signing,” and “confirm”) in HTML text
F25brand_words_htmlBrand names in HTML text
F26tfidf_top3Top three words with the highest tf-idf in HTML text
F27html_text_symbolNumber of symbols in HTML text (unicode FF00-FFEF)
F28icon_strHex string converted by.ico file

OCR featuresF29sens_words_ocrBrand names in web page OCR results
F30brand_words_ocrSensitive words in web page OCR results