Research Article

Stock Price Prediction Based on Natural Language Processing1

Algorithm 1

Experiment methodology.
Input initial seed keywords from the literature
Stage 1: BERT word vector similarity selection
(1)Initialize empty similar words vocabulary
(2)For each seed keyword do
(3)Collect corresponding baidu baike text
(4) Construct keywords vocabulary based on JIEBA segmentation
(5) Vectorize seed keywords and potential keywords in vocabulary based on BERTvec
(6)For each keyword in potential keywords vocabulary do
(7)  Calculate cosine similarity score between and
(8)  IF threshold then
(9)   Add to similar words vocabulary
(10)  End for
(11)End for
(12)Output similar words vocabulary
Stage 2: NEZHA word importance selection
(13)Initialize empty similar & important vocabulary
(14)Collect data from CLUE data set in the form of (keywords, text)
(15)Randomly select words from text as pseudo-keywords at a ratio of 1 : 1
(16)Build finetune data set (Keyword/Pseudo-Keyword, text, label) as
(17)Construct training set and development set from data set
(18)Finetune BERT-TensorFlow, BERT-MindSpore, NEZHA-MindSpore in training set
(19)Select the best performing model (NEZHA-MindSpore) by precision on the development set
(20)For each keyword in similar words vocabulary do
(21)Calculate context importance score based on model
(22)Add and to similar and important vocabulary
(23)End for
(24)Keep words with top 100 importance scores in vocabulary
Output similar and important vocabulary
Stage 3: LSTM stock index forecast
(25)For keyword in do
(26)For lagging term in 1 to 10 do
(27) Calculate lagged search index time series
(28)End for
(29) Use Pearson correlation coefficient to select the most related lagged term
(30)End for
(30)Train LSTM to forecast CSI300 stock index on the 2215-day train data set
(31)Calculate and compare model RMSE on the 243-day test data set
Output model RMSE