Table of Contents Author Guidelines Submit a Manuscript
Computational Intelligence and Neuroscience
Volume 2018, Article ID 6305246, 15 pages
https://doi.org/10.1155/2018/6305246
Research Article

Using Internet Search Trends and Historical Trading Data for Predicting Stock Markets by the Least Squares Support Vector Regression Model

1Department of Information Management, National Chi Nan University, 1 University Rd., Puli, Nantou 54561, Taiwan
2Department of Information Management, Lunghwa University of Science and Technology, No. 300 Sec. 1, Wanshou Rd., Guishan District, Taoyuan 33306, Taiwan
3Institute of Innovation and Circular Economy, Asia University, Taichung 41354, Taiwan

Correspondence should be addressed to Ping-Feng Pai; wt.ude.uncn@fpiap

Received 9 January 2018; Revised 23 May 2018; Accepted 14 June 2018; Published 24 July 2018

Academic Editor: Paolo Gastaldo

Copyright © 2018 Ping-Feng Pai et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Historical trading data, which are inevitably associated with the framework of causality both financially and theoretically, were widely used to predict stock market values. With the popularity of social networking and Internet search tools, information collection ways have been diversified. Instead of only theoretical causality in forecasting, the importance of data relations has raised. Thus, the aim of this study was to investigate performances of forecasting stock markets by data from Google Trends, historical trading data (HTD), and hybrid data. The keywords employed for Google Trends are collected from three different ways including users’ definitions (GTU), trending searches of Google Trends (GTTS), and tweets (GTT) correspondingly. The hybrid data include Internet search trends from Google Trends and historical trading data. In addition, the correlation-based feature selection (CFS) technique is used to select independent variables, and one-step ahead policy is adopted by the least squares support vector regression (LSSVR) for predicting stock markets. Numerical experiments indicate that using hybrid data can provide more accurate forecasting results than using single historical trading data or data from Google Trends. Thus, using hybrid data of Internet search trends and historical trading data by LSSVR models is a promising alternative for forecasting stock markets.

1. Introduction

With the advances of the Internet and communication in recent years, the increasing amount of data from social networks leads to changes in ways of collecting and analyzing data. Google Trends (http://www.google.com/trends) can be used to search trends of keywords. Hence, the data from Google Trends data started to be applied to many fields such as economy, election, and medication. Compared to structured data, collection data from social networks are another way to depict the issues concerned, and thus, some other interesting and essential insights that are not included in the traditional data collection may be discovered. Ever since the beginning of the stock market, it is hard to predict. However, the stock markets have profound effects on a country. In the past, the forecasting of stock markets has relied heavily on historical trading data. Most forecasting models using historical trading data are based on the causality theoretically. Due to the popular use of the Internet search, people tend to seek data or information from the Internet and express opinions on social networks. Stephens-Davidowitz [1] indicated that when social censoring issues are studied, Internet search behaviors can better reflect the real thinking of people than survey data, and the timing to obtain data is more close to real time [26]. However, the importance of historical trading data in forecasting stock market values should not be disregarded. This study attempts to incorporate the data from Google Trends and historical trading data together to predict stock markets. The performance of hybrid data and the unique data type in forecasting stock market closing values were examined in this investigation. Five stock markets, namely, Dow Jones Industrial Average Index (DJIA), Nasdaq Composite Index (IXIC), Russell 2000 Index (RUT), Standard & Poor’s 500 Index (S&P 500), and Chicago Board Options Exchange Volatility Index (VIX), and three companies, the Apple corporation (APPL), the Alphabet corporation (GOOGL), and the Microsoft Corporation (MSFT), were forecasted by least squares support vector machines models with different data types. The rest of this article is organized as follows: Section 2 provides the related work. Section 3 introduces the methods employed in this study. Section 4 illustrates the proposed stock-forecasting framework and numerical examples. Section 5 draws conclusions.

2. Related Work

Hassan [7] noted that predicting stock markets using complex calculations does not help much. The author proposed a forecasting technique combining the hidden Markov model and fuzzy concept to predict stock markets. The results showed that the presented model outperformed the autoregressive integrated moving average model, the neural network model, and other hidden Markov models. Hadavandi et al. [8] claimed that a successful forecasting technique model for stock markets is a technique that can obtain accurate forecasting results with the smallest amount of input data and the simplest stock market model. This article combined genetic fuzzy systems and neural networks to forecast stock markets for information technology companies and airline companies. For the data-preprocessing stage, the stepwise regression analysis was used to pick factors, and then, through the self-organizing map approach, they were employed to cluster data. The experiment’s results showed that the proposed approach can obtain more accurate results than some other forecasting methods. Singh and Borah [9] designed a forecasting model consisting of fuzzy theory and the particle swarm optimization technique to predict stock markets by using historical data from the State Bank of India. The numerical results illustrated that the proposed forecasting model is superior to the grey model, artificial neural networks, and regression models.

Another tendency of forecasting stock markets is putting finance indicators into forecasting models. Laboissiere et al. [10] developed a model including correlation analysis and artificial neural networks to predict stock prices of Brazilian electric companies. In addition to the historical trading data, some indices such as the Ibovespa index, the Electric Power index, and American dollar quote were employed to predict stock prices. The numerical results were promising in terms of forecasting accuracy. Lincy and John [11] presented a multiple fuzzy inference systems model to predict selected stocks prices of the Nasdaq stock exchange. Four indicators, Moving Average Convergence/Divergence, Relative Strength Index, Stochastic Oscillator, and Chaikin Oscillator, were used by the proposed model, and decision rules were generated by using fuzzy set theory and multicriteria decision-making approaches. Simulation results revealed that the presented model is a positive way to analyze stock prices in terms of profit return. de Oliveira et al. [12] used artificial neural networks to forecast Petrobras’ PETR4 stock by fundamental and technical factors which may influence stock markets. After the data-preprocessing procedure, essential factors left out were used by artificial neural networks. This study reported that the testing accuracy of stock market directions was more than ninety percent. Göçken et al. [13] applied metaheuristics, which are employed to select essential indicators, and artificial neural networks in stock price prediction. In addition, this study examined the suitable number of hidden neurons in the hidden layer in order to deal with the overfitting or underfitting problems of artificial neural networks. The results indicated that the proposed forecasting model was a dominant way to predict stock markets.

Because the use of social networks is booming, data from social networks offer valuable insights into what people think and want. Thus, these data have become more and more popular for collecting opinions and for forecasting. Stephens-Davidowitz [1] studied the relation between the voting of American presidential election and racially charged language. The author pointed out that the Google search queries were more useful than the survey data when social censoring issues were investigated. The results showed that there was a relation between voting and the search queries of racial animus. Gunn III and Lester [5] employed Google Trends with three terms to analyze the relation between the three terms and monthly suicide rates. They reported that the information from the Internet search is correlated with the number of suicides, and thus, it is a faster way of monitoring possible suicide trends than compiling suicide statistics. Yang et al. [14] analyzed the relation between Internet search trends and suicide death. The conclusions revealed that suicide-related search terms were related to suicide death, and thus, keyword-driven search results of the Internet are the essential knowledge to reduce suicide deaths. Frijters et al. [4] conducted a study about the relationship between macroeconomic conditions and an indicator of problem drinking data from Google searches. The results showed that the macroeconomic conditions are associated with health in some ways, and the real-time data provided by Google searches are crucial information for policy-makers. Smith [15] investigated the volatility in forecasting foreign currency exchange rates by using three Google search keywords and time-series models. The results demonstrated that the information from Google searches is important in forecasting the market for foreign currency. Fondeur and Karamé [16] used the Google search data to enhance the prediction accuracy of youth unemployment in France. The results indicated that Google search data did improve the prediction of unemployment. Li et al. [17] used both statistical data and Google search data to predict the consumer price index by a mixed-data sampling model. Numerical results revealed that the proposed approach was helpful in forecasting the consumer price index by using data from the user-generated content. Takeda and Wakao [18] studied the relation between the Google search intensity, stock trading volume, and stock prices. It was reported that the positive relationship between Google search intensity and trading volume is stronger than that between Google search intensity and stock prices. Araz et al. [2] used Google Flu Trends data to forecast influenza-like illness, and a strong positive relation between Google Flu Trends data and influenza-like illness was revealed. In addition, using Google Flu Trends data as independent variables can result in accurate forecasting results. Some studies have examined the relation between the Internet search and some diseases, such as disease-related genes [19], kidney stones [20, 21], epilepsy [3, 22], allergy [23], and restless legs [24].

Most data on social networks are unstructured. Therefore, to find meaningful information from social networks, text mining has been one of the major tools employed. Mostafa [25] used tweet samples on some famous companies to analyze sentiments of users to forecast the Prosperity index of each company. This investigation concluded that text mining in social networks is a helpful way to capture consumers’ view and preferences of products. Ikeda et al. [26] investigated the Japanese tweeters and developed a hybrid text-based and community-based method for the demographic group or prediction of Twitter users. The proposed method can analyze tweeter’s hobby, occupation, marital status, age, gender, and area. The authors reported that the proposed hybrid method can increase the precision of the text-based method. He et al. [27] collected social media data from both their own sites and the competitors’ sites in the pizza industry. This study indicated that the social media competitive analysis is essential and can help companies to form marketing strategies. Yu and Wang [28] gathered real-time tweets during 2014 World Cup games and employed text mining tools to distinguish positive and negative comments which may reflect moods of the soccer fans during matches. This study showed that opinions of sports fans can be learned from Twitter, and the results were fairly close to the predictions of the disposition theory. Chae [29] used a collection of Twitter hashtags related to the supply chain to gain some insight into supply chain management. The presented model consists of four approaches, descriptive analytics, content analytics, integrating text mining and sentiment analysis, and network analytics. Some interesting and valuable conclusions have been reached from the studies on the professional use of Twitter, organizational use of Twitter, and supply chain research, respectively.

3. Methodology

Proposed by Hall [30], the correlation-based feature selection (CFS) is a feature identification technology used for determining features with critical influence on prediction classes. The influence of features is related to the correlation between the feature and the prediction class labels. The correlation function is represented as follows:where is the degree of importance of a feature set p, NV is the amount of features in the subset p, is the average correlations between the feature i in the subset p and the class q, and is the average intercorrelation between features. The best-first search algorithm [31] was employed to generate the appropriate feature subset, and the Weka [30, 32] software was utilized to perform CFS in this investigation. The support vector machines [33, 34] model has been one of the most prevalent classification techniques in the past two decades. The support vector machines model was extended to cope with regression problems, and the support vector regression [3537] has become popular in solving function approximation problems. Both support vector machines and support vector regression have to handle quadratic functions during the problem-solving processes. This is a time-consuming task. This restriction has been overcome by transferring a quadratic programming problem into a linear equation so that it can be solved. The least square support vector regression (LSSVR) [38] model can be represented as follows:where is the weighted vector or the normal of the hyperplane, is the penalty parameters that manipulate the balance between the minimization of estimation error and smoothness of the estimated function, is the error vector of the ith sample point, is the nonlinear function mapping of from the original space into a high dimension feature space, is the bias parameter, and and are input data and output value, respectively.

Due to the difficulty of solving the optimization problem straightly, the Lagrange function is developed and the dual problem can be represented as follows:where are the Lagrange multipliers.

By solving the above functions, the solution of the problem can be achieved when all derivatives are equal to zero based on the Karush–Kuhn–Tucker conditions [3941]. The optimal conditions are shown as follows:

By removing and from (4), the following linear equation can be obtained:where .

K is a kernel matrix and determined bywhere indicates the kernel function satisfying the Mercer’s condition [42].

In this study, the radial basis function represented by (7) was employed as a kernel function:where is the kernel width. By solving (5), and p can be obtained, and the LSSVR function is represented as follows:

4. The Proposed Stock Market-Forecasting Framework and Numerical Examples

4.1. The Proposed Framework

Figure 1 shows the framework of this study. Three major types of data, namely, data from Google Trends, historical trading data, and hybrid data, were gathered in this study. When using Google Trends data as independent attributes for making a forecast, the determination of related search keywords influences forecasting results a lot. Thus, in this study, keywords of Google Trends were collected in three ways: users’ definitions (GTU), trending searches of Google Trends (GTTS), and tweets (GTT), respectively. Firstly, for collecting GTU data, users specified keywords subjectively with some domain knowledge or intuition. Secondly, keywords of Google Trends were gathered by the GTTS approach. Google Trends has a way to calculate keywords’ activity levels, namely, trending searches of Google Trends. When a specific term is considered, the results show other related keywords from the highest activity level to the lowest one. Then, the keywords of trending searches are ranked. Users can select keywords in terms of the ranking. The third way of generating keywords for Google Trends is the GTT method which collects texts on Twitter. When keywords for Google Trends obtained from Twitter were employed, the word “clusters tool” provided by KH Coder [43] was employed in this study to select the first 100 terms according to the scores calculated. For three methods of generating keywords for Google Trends, only keywords for Google Trends with scores were used as independent variables to forecast stock markets in this study. Some keywords for Google Trends are without scores due to the low search frequencies. Three hybrid data sets shown in Table 1 were generated by combining the historical data set data set with three data sets of Google Trends. Hybrid data I, hybrid data II, and hybrid data III represent historical data with data of GTU, GTTS, and GTT correspondingly.

Figure 1: The proposed stock market-forecasting framework.
Table 1: Three hybrid data sets.

Then, the correlation-based feature selection technique was performed for determining essential independent variables to predict stock markets. Since GTU data and historical trading data are with a small number of features, all data sets except the GTU data and historical trading data were processed by the feature selection procedure. Therefore, totally 12 types of independent variables were used in this study to forecast stock markets. One-step ahead policy was employed to predict values of stock markets for all data sets. All 12 types of data were divided into three parts, namely, training data, validation data, and testing data, for LSSVR models to predict five stock markets. The training and validation data were used to select the LSSVR models, and the testing data were utilized to evaluate the forecasting performance of LSSVR models. In addition, genetic algorithms [44] were employed to determine parameters of LSSVR models [45]. In addition, the mean absolute percentage error (MAPE) and mean absolute error (MAE) were used to measure the performance of LSSVR models. The MAPE can be represented as follows:where is the number of forecasting periods, is the actual value at period , and is the forecasting value at period .

4.2. Numerical Examples

Five daily data sets of stock markets, Dow Jones Industrial Average Index (DJIA), Russell 2000 Index (RUT), Standard & Poor’s 500 Index (S&P 500), Volatility Index (VIX), and Nasdaq Composite Index (IXIC), and three companies, the Apple corporation (APPL), the Alphabet corporation (GOOGL), and the Microsoft Corporation (MSFT), obtained from Yahoo Finance (http://finance.yahoo.com) were employed in this study. The data from Google Trends and historical trading data of the current working days were used to predict the stock market values or stock prices of the next working day. Due to the function limitation of Google Trends, the daily search data can be collected within the time horizon of 270 days. Within the limited time horizon of 270 days excluding weekends and national holidays, the data of working days were gathered and one-step ahead policy was employed to predict values of stock markets for all data sets. The time period of the Google Trends data and historical trading data is from June 14, 2016, to March 9, 2017, and data were divided into the training data set (from June 14, 2016, to December 9, 2016), the validation data set (from December 12, 2016, to January 25, 2017), and the testing data set (from January 26, 2017, to March 9, 2017). The training data set, validation data set, and testing data set contain 126, 30, and 30 data, respectively. For the data from Google Trends, three types of data, namely, GTU data, GTTS data, and GTT data, were used in this study. The Google Trends search keywords determined by users, trending searches, and tweets are listed in Tables 24, respectively. When the GTT data were collected, terms of five stock markets and three corporations, namely, Dow Jones Industrial Average Index, Russell 2000 Index, S&P 500, Volatility Index, Nasdaq Composite Index, APPL, GOOGL, and MSFT, were searched by the Twitter search engine and related tweets were determined. Then, KH Coder [43] was used as a text mining tool to select terms from tweets. The top 100 terms provided by the KH Coder were put into the Google Trends search. Not all keywords selected from KH Coder could be observed from the Google Trends search due to the shortage of search volume. Sequentially, the CFS was performed to select essential keywords of Google Trends determined by trending searches and by tweets. The results are shown in Tables 5 and 6, respectively.

Table 2: Google Trends search keywords determined by users.
Table 3: Google Trends search keywords determined by trending searches.
Table 4: Google Trends search keywords determined by tweets.
Table 5: Selected keywords obtained from trending searches by using CFS.
Table 6: Selected keywords obtained from tweets by using CFS.

Five variables, including opening values, maximum values, minimum values, closing values, and trading volume, were used as condition variables, and closing values of the next day were used as the variables predicted [79,12,46]. Three types of hybrid data were used to predict stock markets. Tables 79 show the selected keywords and historical data attributes of three hybrid data used for five stock markets by using CFS. Tables 1017 indicate testing MAPE and MSE values and two LSSVR parameters of different data types for predicting five stock markets and three corporations. The point-to-point comparisons of actual and predicted values by using various data to forecast values of stock markets and corporations are presented in Figures 29. The experiment’s results revealed that using hybrid data with LSSVR models does improve forecasting performance on closing values of five stock markets and three corporations.

Table 7: Selected keywords and historical data attributes obtained from hybrid data I by using CFS.
Table 8: Selected keywords and historical data attributes obtained from hybrid data II by using CFS.
Table 9: Selected keywords and historical data attributes obtained from hybrid data III by using CFS.
Table 10: Values of forecasting indices and LSSVR parameters of DJIA.
Table 11: Values of forecasting indices and LSSVR parameters of RUT.
Table 12: Values of forecasting indices and LSSVR parameters of S&P 500.
Table 13: Values of forecasting indices and LSSVR parameters of VIX.
Table 14: Values of forecasting indices and LSSVR parameters of IXIC.
Figure 2: Closing values of the DJIA stock market.
Figure 3: Closing values of the RUT stock market.
Figure 4: Closing values of the S&P 500 stock market.
Figure 5: Closing values of the VIX stock market.
Figure 6: Closing values of the IXIC stock market.
Figure 7: Closing values of the Apple corporation.
Figure 8: Closing values of the Alphabet corporation.
Figure 9: Closing values of the Microsoft Corporation.
Table 15: Values of forecasting indices and LSSVR parameters of the Apple corporation.
Table 16: Values of forecasting indices and LSSVR parameters of the Alphabet corporation.
Table 17: Values of forecasting indices and LSSVR parameters of the Microsoft Corporation.

5. Conclusions

Many forecasting models have been proposed for stock market forecasting in the past decades. Due to the rise of social networking and Internet search tools, types of data employed for predicting stock markets became diversified. This study proposed a framework to explore the influence of Internet search trends, historical trading data, and hybrid data on the prediction of stock markets by the least squares support vector regression models. Numerical experiments indicate that using hybrid data can provide satisfied forecasting results. The superior performance and success of the proposed framework are most likely owing to employing the unique advantage of data from the Internet search and historical trading data. Empirically, the Google data may capture a part of the nonlinear data patterns [47], and therefore, the variety of the data has a chance to improve the forecasting performance. The promising results achieved in this study reveal the potential of the proposed framework for forecasting stock markets.

Since keywords of Google Trends significantly affect the forecasting accuracy, Naccarato et al. [48] pointed out the selection of keywords results in different data sets for analysis and thus generates different numerical results. This study provided three ways, namely, users’ definitions, trending searches of Google Trends, and tweets, to determine keywords for Google Trends. The three ways can be easily and systematically reproduced for future use. Some other advanced techniques for determining appropriate keywords for Google Trends could be an essential direction for future study. In addition, numerical examples in the developed markets were employed to depict the proposed framework. For emerging markets, owning to the restriction of languages used for Twitter and Google Trends, some hurdles have to be overcome for analyzing the performance of the proposed framework.

Data Availability

The data used to support the findings of this study are included within the article by website linkages.

Conflicts of Interest

The authors declare that there are no conflicts of interest.

Acknowledgments

The authors would like to thank the Ministry of Science and Technology of the Republic of China, Taiwan, for financially supporting this research under Contract nos. MOST 103-2410-H-260-020, MOST 104-2410-H-260-018, and MOST105-2410-H-260-017-MY2. The authors acknowledge Hsiao-Ting Hsu, Pao Hsiung Huang, Fang-Ru He, Chia-Hsin Liu, and Yi-Ting Huang who assisted with data collection and analysis.

References

  1. S. Stephens-Davidowitz, “The cost of racial animus on a black candidate: evidence using Google search data,” Journal of Public Economics, vol. 118, pp. 26–40, 2014. View at Publisher · View at Google Scholar · View at Scopus
  2. M. Araz, D. Bentley, and R. L. Muelleman, “Using Google Flu Trends data in forecasting influenza-like-illness related ED visits in Omaha, Nebraska,” The American Journal of Emergency Medicine, vol. 32, no. 9, pp. 1016–1023, 2014. View at Publisher · View at Google Scholar · View at Scopus
  3. F. Brigo, S. C. Igwe, H. Ausserer et al., “Why do people Google epilepsy?: An infodemiological study of online behavior for epilepsy-related search terms,” Epilepsy & Behavior, vol. 31, pp. 67–70, 2014. View at Publisher · View at Google Scholar · View at Scopus
  4. P. Frijters, D. W. Johnston, G. Lordan, and M. A. Shields, “Exploring the relationship between macroeconomic conditions and problem drinking as captured by Google searches in the US,” Social Science & Medicine, vol. 84, pp. 61–68, 2013. View at Publisher · View at Google Scholar · View at Scopus
  5. J. F. Gunn III and D. Lester, “Using Google searches on the internet to monitor suicidal behavior,” Journal of Affective Disorders, vol. 148, no. 2-3, pp. 411-412, 2013. View at Publisher · View at Google Scholar · View at Scopus
  6. N. Oliveira, P. Cortez, and N. Areal, “The impact of microblogging data for stock market prediction: using Twitter to predict returns, volatility, trading volume and survey sentiment indices,” Expert Systems with Applications, vol. 73, pp. 125–144, 2017. View at Publisher · View at Google Scholar · View at Scopus
  7. M. R. Hassan, “A combination of hidden Markov model and fuzzy model for stock market forecasting,” Neurocomputing, vol. 72, no. 16–18, pp. 3439–3446, 2009. View at Publisher · View at Google Scholar · View at Scopus
  8. E. Hadavandi, H. Shavandi, and A. Ghanbari, “Integration of genetic fuzzy systems and artificial neural networks for stock price forecasting,” Knowledge-Based Systems, vol. 23, no. 8, pp. 800–808, 2010. View at Publisher · View at Google Scholar · View at Scopus
  9. P. Singh and B. Borah, “Forecasting stock index price based on M-factors fuzzy time series and particle swarm optimization,” International Journal of Approximate Reasoning, vol. 55, no. 3, pp. 812–833, 2014. View at Publisher · View at Google Scholar · View at Scopus
  10. L. A. Laboissiere, R. A. Fernandes, and G. G. Lage, “Maximum and minimum stock price forecasting of Brazilian power distribution companies based on artificial neural networks,” Applied Soft Computing, vol. 35, pp. 66–74, 2015. View at Publisher · View at Google Scholar · View at Scopus
  11. G. R. M. Lincy and C. J. John, “A multiple fuzzy inference systems framework for daily stock trading with application to NASDAQ stock exchange,” Expert Systems with Applications: An International Journal, vol. 44, pp. 13–21, 2016. View at Publisher · View at Google Scholar · View at Scopus
  12. F. A. de Oliveira, C. N. Nobre, and L. E. Zárate, “Applying artificial neural networks to prediction of stock price and improvement of the directional prediction index–case study of PETR4, Petrobras, Brazil,” Expert Systems with Applications, vol. 40, no. 18, pp. 7596–7606, 2013. View at Publisher · View at Google Scholar · View at Scopus
  13. M. Göçken, M. Özçalıcı, A. Boru, and A. T. Dosdoğru, “Integrating metaheuristics and artificial neural networks for improved stock price prediction,” Expert Systems with Applications, vol. 44, pp. 320–331, 2016. View at Publisher · View at Google Scholar · View at Scopus
  14. A. C. Yang, S. J. Tsai, N. E. Huang, and C. K. Peng, “Association of Internet search trends with suicide death in Taipei City, Taiwan, 2004–2009,” Journal of Affective Disorders, vol. 132, no. 1-2, pp. 179–184, 2011. View at Publisher · View at Google Scholar · View at Scopus
  15. G. P. Smith, “Google internet search activity and volatility prediction in the market for foreign currency,” Finance Research Letters, vol. 9, no. 2, pp. 103–110, 2012. View at Publisher · View at Google Scholar · View at Scopus
  16. Y. Fondeur and F. Karamé, “Can Google data help predict French youth unemployment?” Economic Modelling, vol. 30, pp. 117–125, 2013. View at Publisher · View at Google Scholar · View at Scopus
  17. X. Li, W. Shang, S. Wang, and J. Ma, “A MIDAS modelling framework for Chinese inflation index forecast incorporating Google search data,” Electronic Commerce Research and Applications, vol. 14, no. 2, pp. 112–125, 2015. View at Publisher · View at Google Scholar · View at Scopus
  18. F. Takeda and T. Wakao, “Google search intensity and its relationship with returns and trading volume of Japanese stocks,” Pacific-Basin Finance Journal, vol. 27, pp. 1–18, 2014. View at Publisher · View at Google Scholar · View at Scopus
  19. J. Kim, H. Kim, Y. Yoon, and S. Park, “LGscore: a method to identify disease-related genes using biological literature and Google data,” Journal of Biomedical Informatics, vol. 54, pp. 270–282, 2015. View at Publisher · View at Google Scholar · View at Scopus
  20. B. N. Breyer, S. Sen, D. S. Aaronson, M. L. Stoller, B. A. Erickson, and M. L. Eisenberg, “Use of Google insights for search to track seasonal and geographic kidney stone incidence in the United States,” Urology, vol. 78, no. 2, pp. 267–271, 2011. View at Publisher · View at Google Scholar · View at Scopus
  21. S. D. Willard and M. M. Nguyen, “Internet search trends analysis tools can provide real-time data on kidney stone disease in the United States,” Urology, vol. 81, no. 1, pp. 37–42, 2013. View at Publisher · View at Google Scholar · View at Scopus
  22. J. S. van Campen, E. van Diesse, W. M. Otte, M. Joels, F. E. Jansen, and K. P. Braun, “Does Saint Nicholas provoke seizures? Hints from Google Trends,” Epilepsy & Behavior, vol. 32, pp. 132–134, 2014. View at Publisher · View at Google Scholar · View at Scopus
  23. O. Zuckerman, S. H. Luster, and L. Bielory, “Internet searches and allergy: temporal variation in regional pollen counts correlates with Google searches for pollen allergy related terms,” Annals of Allergy, Asthma & Immunology, vol. 113, no. 4, pp. 486–488, 2014. View at Publisher · View at Google Scholar · View at Scopus
  24. D. G. Ingram and D. T. Plante, “Seasonal trends in restless legs symptomatology: evidence from Internet search query data,” Sleep medicine, vol. 14, no. 12, pp. 1364–1368, 2013. View at Publisher · View at Google Scholar · View at Scopus
  25. M. M. Mostafa, “More than words: social networks’ text mining for consumer brand sentiments,” Expert Systems with Applications, vol. 40, no. 10, pp. 4241–4251, 2013. View at Publisher · View at Google Scholar · View at Scopus
  26. K. Ikeda, G. Hattori, C. Ono, H. Asoh, and T. Higashino, “Twitter user profiling based on text and community mining for market analysis,” Knowledge-Based Systems, vol. 51, pp. 35–47, 2013. View at Publisher · View at Google Scholar · View at Scopus
  27. W. He, S. Zha, and L. Li, “Social media competitive analysis and text mining: a case study in the pizza industry,” International Journal of Information Management, vol. 33, no. 3, pp. 464–472, 2013. View at Publisher · View at Google Scholar · View at Scopus
  28. Y. Yu and X. Wang, “World Cup 2014 in the Twitter World: a big data analysis of sentiments in US sports fans’ tweets,” Computers in Human Behavior, vol. 48, pp. 392–400, 2015. View at Publisher · View at Google Scholar · View at Scopus
  29. B. K. Chae, “Insights from hashtag #supplychain and Twitter Analytics: considering Twitter and Twitter data for supply chain practice and research,” International Journal of Production Economics, vol. 165, pp. 247–259, 2015. View at Publisher · View at Google Scholar · View at Scopus
  30. M. A. Hall, Correlation-Based Feature Selection for Machine Learning, Department of Computer Science, University of Waikato, Hamilton, New Zealand, 1999.
  31. J. Pearl, Heuristics: Intelligent Search Strategies for Computer Problem Solving, Addison-Wesley Publishing Co., Reading, MA, USA, 1984.
  32. M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, “The WEKA data mining software: an update,” ACM SIGKDD Explorations Newsletter, vol. 11, no. 1, pp. 10–18, 2009. View at Publisher · View at Google Scholar
  33. C. Cortes and V. Vapnik, “Support-vector networks,” Machine Learning, vol. 20, no. 3, pp. 273–297, 1995. View at Publisher · View at Google Scholar · View at Scopus
  34. V. N. Vapnik, The Nature of Statistical Learning Theory, Springer, New York, NY, USA, 1995.
  35. S. Mukherjee, E. Osuna, and F. Girosi, “Nonlinear prediction of chaotic time series using support vector machines,” in Proceedings of the IEEE Workshop on Neural Networks for Signal Processing VII, pp. 511–520, Amelia Island, FL, USA, September 1997.
  36. K. R. Müller, A. J. Smola, G. Rätsch, B. Schölkopf, J. Kohlmorgen, and V. Vapnik, “Predicting time series with support vector machines,” in Proceedings of the International Conference on Artificial Neural Networks, pp. 999–1004, Berlin, Heidelberg, October 1997.
  37. V. Vapnik, S. E. Golowich, and A. J. Smola, “Support vector method for function approximation, regression estimation and signal processing,” in Proceedings of the Advances in Neural Information Processing Systems, pp. 281–287, Denver, CO, USA, 1997.
  38. J. A. Suykens and J. Vandewalle, “Least squares support vector machine classifiers,” Neural Processing Letters, vol. 9, no. 3, pp. 293–300, 1999. View at Publisher · View at Google Scholar
  39. R. Fletcher, “Conjugate direction methods,” Practical Methods of Optimization,, Wiley, Hoboken, NJ, USA, 2nd edition, 1987. View at Google Scholar
  40. W. Karush, “Minima of functions of several variables with inequalities as side conditions,” University of Chicago, Chicago, IL, USA, 1939, M.Sc. thesis. View at Google Scholar
  41. H. W. Kuhn and A. W. Tucker, “Nonlinear programming,” in Proceeding of 2nd Berkeley Symposium on Mathematical Statistics and Probabilities, pp. 481–492, Berkeley, CA, USA, July-August 1951.
  42. J. Mercer, “Functions of positive and negative type, and their connection with the theory of integral equations,” Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 209, no. 441–458, pp. 415–446, 1909. View at Publisher · View at Google Scholar
  43. K. Higuchi, KH Coder: A Free Software for Quantitative Content Analysis or Text Mining, 2001.
  44. J. H Holland, Adaption in Natural and Artificial Systems, The University of Michigan Press, Ann Arbor, MI, USA, 1975.
  45. P. F. Pai, K. C. Hung, and K. P. Lin, “Tourism demand forecasting using novel hybrid system,” Expert Systems with Applications, vol. 41, no. 8, pp. 3691–3702, 2014. View at Publisher · View at Google Scholar · View at Scopus
  46. J. L Ticknor, “A Bayesian regularized artificial neural network for stock market forecasting,” Expert Systems with Applications, vol. 40, no. 14, pp. 5501–5506, 2013. View at Publisher · View at Google Scholar · View at Scopus
  47. D. Fantazzini and Z. Toktamysova, “Forecasting German car sales using Google data and multivariate models,” International Journal of Production Economics, vol. 170, pp. 97–135, 2015. View at Publisher · View at Google Scholar · View at Scopus
  48. A. Naccarato, S. Falorsi, S. Lorig, and A. Pierini, “Combining official and Google Trends data to forecast the Italian youth unemployment rate,” Technological Forecasting & Social Change, vol. 130, pp. 114–122, 2018. View at Publisher · View at Google Scholar · View at Scopus