Research Article
Similarity Digest Search: A Survey and Comparative Analysis of Strategies to Perform Known File Filtering Using Approximate Matching
Table 1
Similarity digest search strategies: characteristics.
| Strategy | Tools | Main technology | Input | Output ( - threshold) | Match decision | Insert/ remove elements | Owning database |
| Brute force (sdhash) | sdhash | Bloom filters | sdhash digest | Digest | Bloom filter comparison | ✓/✓ | × |
| Brute force (ssdeep) | ssdeep | Rolling Hash | ssdeep digest | Digest | Edit distance | ✓/✓ | × |
| Brute force (TLSH) | TLSH | LSH | TLSH digest | Digest | Header/body distance | ✓/✓ | × |
| DHTnil | Nilsimsa | DHT (chord) + Voronoi diagram | Bit vector | Number of matches | Adapted Euclidean distance | ✓/✓ | × |
| iCTPH | ssdeep | DHT (chord) + iDistance | ssdeep digest | Number of matches | Edit distance | ✓/✓ | × |
| F2S2 | ssdeep | Indexing (-grams) + hash table | ssdeep digest | Candidates sharing the same -gram queried | Edit distance | ✓()/✓ | ✓ |
| MRSH-NET | sdhash, mrsh-v2 | Single, huge Bloom filter | Object features | Yes/No (consecutive features found in the filter ) | Bloom filter matches | ×/× | ✓ |
| BF-based tree | sdhash, mrsh-v2 | Bloom filter tree structure | Object features | Candidate with highest number of features found in the filter | Bloom filter matches | ×/× | ✓ |
| MRSH-CF | sdhash, mrsh-v2 | Cuckoo filter | Object features | Yes/No (consecutive features found in the filter ) | Cuckoo filter matches | ×/✓ | ✓ |
|
|
Observation: (): the data set increase (beyond its real capacity) is allowed at the cost of performance.
|