Research Article

Alternate Low-Rank Matrix Approximation in Latent Semantic Analysis

Algorithm 1

The LSA modeling.
% input:
 % monolingual textual corpus
% output:
 % graph of terms and documents
% read the corpus, parse it, and execute the morphological step,
% weight the terms and obtain the term-document matrix
doc_file = ‘files/document.txt’;
stopword_file = ‘files/stopword_en.txt’;
A = doc_term_mat (doc_file, stopword_file, weight);
% apply the SVD decomposition
[W, Sigma, Y] = svd(A);
% obtain the components of by the SVD
W_k = W (:, 1 : k);
Sigma_k = Sigma (1 : k, 1 : k);
Y_k = Y(:, 1 : k);
% apply the TULVD
[U, L, V] = TULV(A);
% obtain the components of A_k by the TULVD
U_k = U; L_k = L; V_k = V;
% find term and document vectors in k-space
term_vec = U_k ∗ L_k
doc_vec = L_k ∗ transpose (V_k)
% represent the query q in k-space
query_vec = query (q, U_k, L_k)
% find semantic relationship in the corpus using cosine similarity
semantic_sim = cosine_sim (query_vec, doc_vec)