Research Article
Effective and Fast Near Duplicate Detection via Signature-Based Compression Metrics
Algorithm 1
SigNCD duplicate detection.
Require: document list ; similarity threshold ; number of threads ; compressor . | Ensure: duplicate set | () , | () function DUPDETECT() | () for all documents in using threads in parallel do | () preprocessing to filter out noisy information | () signature of | () the length of compressed | () end for | () sort all in by in ascending order | () for all in using threads in parallel do | () if in then | () continue | () end if | () | () end for | () return | () end function | () | () function ((, , )) | () | () the index of boundary object of matching partition of on | () for all in do | () if in then | () continue | () end if | () | () if then | () | () | () end if | () end for | () return | () end function |
|