Research Article

An Embedded-Based Weighted Feature Selection Algorithm for Classifying Web Document

Algorithm 2

Embedded-based weighted feature selection algorithm.
1: procedure FeatureSelection
2:  BOW ⟵ Bag of Word representation of the corpus
3:  L ⟵ Set of Leaders from Algorithm 1
4:  G ⟵ The graph model from Algorithm 1
5:  begin:
6:  W={}
7:  foreach Word w in BOWdo
8:   if w thenord ∈ HEF such as link text, headings, meta informations, image descriptions etc...
9:    W[w]=BOW[w]1.5
10:   else
11:    W=[w]=BOW[w]
12:  DocClass={} - Represent the Document-Class vector, all documents are initialized 1 to all classes.
13:  foreach leader l in Ldo
14:   class,prob=get class and probability of l from the classifer
15:   update DocClass, value=l, class and prob, weight =1.5
16:   foreach neighbor n of l in Gdo
17:   update DocClass, value=n, class and prob, weight =1.5
18:  foreach document d in the corpusdo
19:  class,prob=get class and probability of d from the classifer
20:  update DocClass, value=l, class and prob, weight =1
21:  Assign the class which is having the highest probweight value to d.