Research Article

CBR-Based Decision Support Methodology for Cybercrime Investigation: Focused on the Data-Driven Website Defacement Analysis

Table 1

Case vector design, highlighting two groups of features.

Case vectorUsed in processDescription
SC

EncodingOOIt is used to represent the different types of language information on the computer. It determines the usable characters and the methods to express them. The feature was normalized based on MS Windows and the ISO character set
IP addressON/AA unique number that allows devices on the network to identify and communicate with each other
Domain
 Service nameON/AThe service name is individually made with a different name depending on the service categories such as gTLD or ccTLD
 gTLDOOThe gTLD feature was normalized depending on the element having the same meaning (e.g., .go, .gob, and .gobr feature were normalized into .gov)
 ccTLDOOThe ccTLD is a unique code assigned to the domain name that represents the country, specific region, or an international organization
The ccTLD normalized by the continent is used in the clustering process, and the original ccTLD is used in the similarity process
DateON/AThe attack date performed by the hacker or the hacking group
OSOOA part of a computer system that manages all hardware and software (e.g., Windows, Linux, and UNIX)

S, similarity measure; C, clustering processing.