CBR-Based Decision Support Methodology for Cybercrime Investigation: Focused on the Data-Driven Website Defacement Analysis
Table 1
Case vector design, highlighting two groups of features.
Case vector
Used in process
Description
S
C
Encoding
O
O
It is used to represent the different types of language information on the computer. It determines the usable characters and the methods to express them. The feature was normalized based on MS Windows and the ISO character set
IP address
O
N/A
A unique number that allows devices on the network to identify and communicate with each other
Domain
Service name
O
N/A
The service name is individually made with a different name depending on the service categories such as gTLD or ccTLD
gTLD
O
O
The gTLD feature was normalized depending on the element having the same meaning (e.g., .go, .gob, and .gobr feature were normalized into .gov)
ccTLD
O
O
The ccTLD is a unique code assigned to the domain name that represents the country, specific region, or an international organization The ccTLD normalized by the continent is used in the clustering process, and the original ccTLD is used in the similarity process
Date
O
N/A
The attack date performed by the hacker or the hacking group
OS
O
O
A part of a computer system that manages all hardware and software (e.g., Windows, Linux, and UNIX)