Scientific Programming

Research Article

Alternate Low-Rank Matrix Approximation in Latent Semantic Analysis

The LSA modeling.

	% input:
	% monolingual textual corpus
	% output:
	% graph of terms and documents
	% read the corpus, parse it, and execute the morphological step,
	% weight the terms and obtain the term-document matrix
	doc_file = ‘files/document.txt’;
	stopword_file = ‘files/stopword_en.txt’;
	A = doc_term_mat (doc_file, stopword_file, weight);
	% apply the SVD decomposition
	[W, Sigma, Y] = svd(A);
	% obtain the components of by the SVD
	W_k = W (:, 1 : k);
	Sigma_k = Sigma (1 : k, 1 : k);
	Y_k = Y(:, 1 : k);
	% apply the TULVD
	[U, L, V] = TULV(A);
	% obtain the components of A_k by the TULVD
	U_k = U; L_k = L; V_k = V;
	% find term and document vectors in k-space
	term_vec = U_k ∗ L_k
	doc_vec = L_k ∗ transpose (V_k)
	% represent the query q in k-space
	query_vec = query (q, U_k, L_k)
	% find semantic relationship in the corpus using cosine similarity
	semantic_sim = cosine_sim (query_vec, doc_vec)