Research Article

An Efficient Mechanism for Deep Web Data Extraction Based on Tree-Structured Web Pattern Matching

Algorithm 2

Web database WD, data records DR, data items DI, and users U are the inputs.
Step 1: Begin with the set of input web pages you have gathered.
Step 2: Create a tree T.
Step 3: Fill in the missing for each (Tree representation T).
Step 4: Create a M matrix.
Step 5: Determined by the presence of content similarity in web pages,
Step 6: Move the nodes from the left to the right or likewise.
Step 7: End For.
Step 8: Form a set of rules R.
Step 9: For Each (R).
Step 10: Form a vector v in (WP).
Step 11: Determine whether or not there is a schema present.
Step 12: Come to an end For.
Step 13: Make a distinction between the exact representation and the leaf child nodes.
Step 14: Come to an end.
The technique above outlines the full process of identifying schema and templates in order to improve deep web page extraction.