Stock Market Prediction Through Technical And Public .

Transcription

1Stock Market Prediction through Technical andPublic Sentiment AnalysisKien Wei Siah, Paul MyersII. DATA C OLLECTION AND F EATURE G ENERATIONI. I NTRODUCTIONTOCK market price behavior has been studied extensively.It is influenced by a myriad of factors, including politicaland economic events, among others, and is a complex nonlinear time-series problem. Traditionally, stock price forecastingis performed based on technical analysis, which focuses onprice action, which is the process of finding patterns inprice history. More recently, research has shown that publicsentiment is correlated with stock market events [1], [2], [3].This project proposes to study the potential of using bothbehavioral and technical features in stock price predictionmodels based on traditional classifiers and popular neuralnetworks. We believe that behavioral data may offer insightsinto financial market dynamics in addition to that captured bytechnical analysis. An improved price forecasting model canyield enormous rewards in stock market trading.SA. Problem StatementFor this project, we focus on the Nikkei 225 (N225) stockindex. N225 is the stock market index for the Tokyo StockExchange. It constitutes a price-weighted index average of 225top rated Japanese companies in the Tokyo Stock Exchange.With Japan being the third largest economy in the worldcurrently, and Tokyo being one of the largest global financialcenters, the N225 price index is certainly a critical financialindicator that is closely watched by traders and banks aroundthe world.We formulate the stock price prediction problem as a binaryclassification problem: whether the future daily returns ofN225 will be positive (1) or negative (0), i.e. whether N225’sclosing price tomorrow will be higher (1) or lower (0) thantoday’s closing price. Daily return is defined in Equation 1.Ri Ci Ci 1Ci 1(1)where Ri is the daily return for the i-th day and Ci is theN225 closing price for the i-th day. Daily return for day i isessentially the percent change in closing price from day (i 1)to day i. Future daily return for day i is just R(i 1) . Take notethat to get the classification target, we must take the sign ofthe future daily return R(i 1) rather than its numerical value.As described in the introduction, we will investigate the useprice histories and public sentiment indicators available up today i to predict sign(R(i 1) ). Subsequent sections cover thedata collection process. Since this is framed as a classificationtask, we may use classification accuracy as a metric forevaluating the performances of various models.A. Price HistoryWe queried daily historical prices of N225, for all tradingdays spanning January 1, 2004 to December 31, 2014, fromYahoo! Finance. However financial time series are well-knownto be non-stationary, with means, variances and covariancesthat change over time. Such non-stationary data are difficultto model and will likely give poor classification accuracywhen directly used as features. By viewing the daily prices asrandom walks, we attempted to stationarize the price history(through differencing and lagging) before using them as predictors. To this end, we used three main types of conventionalprice technical indicators as features [13]:1) n-Day ReturnsRi,n Ci Ci nCi n(2)where Ri,n is the i-th day return with respect to the(i n)-th day, or the percentage difference between thei-th day closing price Ci and the (i n)-th day closingprice Ci n . Positive values imply that the N225 indexhas risen over the n days. For n 1, we get the simpledaily returns equation (Equation 1).2) n-Day Returns Moving AverageRi,1 R(i 1),1 · · · R(i n),1(3)nwhere M Ai,n is the average returns over the previous ndays, and n 1 because a one day average is the day’sreturn itself.3) n-Time Lagged 1-Day ReturnsM Ai,n Ri,1 , R(i 1),1 , . . . , R(i n),1(4)where R(i n),1 is (i n)-th day’s 1-Day returns.By varying n, we have different numbers of features whichcontains varying degrees of information about price trends andpast prices. This is one of the multiple parameters we will varyand decide upon using cross validation.B. Public Sentiment IndicatorsIn addition to conventional technical indicators, we alsolooked at public sentiment indicators. The theory of behavioraleconomics postulates that emotions plays a significant roleinfluencing economic decisions of individuals. Research hasshown that this applies to societies as large as well. In fact,Bollen et al used Twitter messages as indicators of publicmood states and demonstrated that they were correlated to,

2and predictive of the Dow Jones Industrial Average overtime [2]. In another study, Preis et al found patterns inGoogle query volumes, for search terms related to finance, thatconstitutes ’early warning signs’ of stock market movements.They hypothesize that investors search for information onlineabout the markets before eventually deciding whether to buy orsell stocks. This indicates that search query data from GoogleTrends may contain valuable predictive information about theinformation gathering process that precedes trading decisionsin the stock market [3]. This project takes inspiration fromthese two widely cited studies and attempts to integrate someaspects of public sentiment analysis as part of our features,in hope that combining behavioral data with technical priceindicators will lead to improved performance.To this end, we used behavioral data from two sources:Bloomberg Businessweek and Google Trends. We were unableto replicate Bollen et al’s study using Twitter messages asTwitter has restricted public access to very limited amounts ofdata. Other Twitter data sources required paid subscriptions.Therefore, similar to [3], we used trends in Google queryvolume for finance-related search terms as a proxy for publicsentiment. Further, we wrote a script to crawl a free onlinenews archive Bloomberg Businessweek for articles publishedfrom 2004 to 2014: approximately 210,000 articles were gathered. It is hoped that the state of the economy and prevalentstock market conditions can be extracted through sentimentanalysis from these articles.For Google Trends, we focused on the daily search volumesof five finance-related search terms that showed the greatestpredictive potential for stock market forecasting in [3], namelyeconomics, debt, inflation, risk and stocks. Google Trendsscores the daily query volumes on a scale of 0-100, normalizedwith respect to the peak within the date range (2004 to 2014in our case).Subsequently we performed a relatively simple sentimentanalysis on the news articles crawled from Bloomberg Businessweek to obtain daily sentiment scores. First, we obtainedlists of ”positive” and ”negative” words that are both financialspecific and general. For financial-specific words, we used thelists published by McDonald, originating from his research onsentiment analysis on financial texts [4]. This is particularlyrelevant in our case as words with positive meanings in thegeneral context may actually be negative in the financialcontext. For the general case, we used the lists of positiveand negative opinion words or sentiment words by Hu andLiu [5]. To compute the sentiment score for each article, weused the following equation:Score POS NEGPOS NEG(5)where POS refers to the number of positive words (from thelists obtained earlier) counted in the article, NEG refers tothe number of negative words (from the lists obtained earlier)counted in the article. The positive and negative words werecounted as many times as they appear. A score of 1 impliesan entirely positive article, 0 (when no words are counted)implies neutral, and -1 implies an entirely negative article.Daily scores were obtained by averaging over all the articlesin that day. Just computing this score for the 210,000 articlescrawled took up a few days and had to be done in batches. It islikely that a more sophisticated sentiment analysis would haverequired longer time and unfeasible within the time frameworkof this project.C. Missing Data and Look-Ahead BiasBy crawling for our own data, we inevitably face theproblem of missing data e.g. price histories for some days aremissing, the Bloomberg Businessweek archive does not havearticles for every trading day. In dealing with this issue, wehave three options: mean imputation, interpolation based onprevious and next data point, or sample and hold. We optedto go with the last option (using the last observed valid datapoint) as we felt that mean imputation and interpolation willintroduce some extent of look-ahead bias (using informationthat would not have been available during that time). Forinstance, the interpolation of prices or returns implicitly usesthe future price, i.e. the interpolated point will be higher if thenext price is high. This will lead to inaccurate results. Whilethere are certainly more sophisticated and effective techniquesof dealing with missing data, we considered only the simplermethods in view of time constraints.III. R ECURRENT N EURAL N ETWORKA. Vanilla Recurrent Neural NetworkRecurrent Neural Networks (RNNs) have shown greatpotential in many Natural Language Processing tasks (e.g.machine translation, language models.etc.) and are becomingincreasingly popular. Unlike vanilla Neural Networks (NNs),RNN’s network topology allows it make use of sequentialinformation. This is a natural fit for stock market prediction,a time series problem - knowing previous days’ prices mayhelp us predict tomorrow’s price.Fig. 1.Recurrent Neural Network topology [6].As illustrated in Figure 1, RNN performs the same operations, with the same weights, for each element of thesequence. It takes into account the previous step’s state (st 1 )while computing the output for the current step. This recurrentproperty allows it to have a ’memory’ as mentioned earlier.The relevant equations are as follows:st tanh(U xt W st 1 )(6)ot sigmoid(V st )(7)where U , W and V are the weight matrices used across alltime steps, xt is the input at time step t, st is the hidden state

3at time step t and ot is the output at time step t. We may thinkof st as the ’memory’ of the RNN which contains informationabout inputs and computations of all the previous time steps(subject to the vanishing gradient problem elaborated below)!As described earlier, the output is computed based on theprevious hidden state st 1 and current input xt (Equation 6).The first hidden state s0 is typically initialized with zeros.In our stock market prediction problem, we can think of xtas the feature vector of each day (composing of features fromSection II). Figure 1 has outputs at all time steps, but in ourcase, we are really only concerned with the output at the finalstep, which is the prediction whether price will rise or fall.In other words, we input feature vectors from previous t daysinto the RNN sequentially, and ot (a sigmoid output (Equation7)) represents the probability of price rising or falling for the(t 1)-th day. This allows it capture more temporal informationthan classifiers (e.g. Support Vector Machines, NNs, LogisticRegression) that only take input of one time step.Training for RNNs is similar to that for vanilla NNs: backpropagation. However for RNNs, we backpropagate throughdL dL dL, dW , dV . The idea is to ’unfold’ the RNNtime to obtain dUacross time (similar to that in Figure 1) and do backpropagation as if it were a normal NN. Since this is a classificationproblem, we can use the binary cross entropy loss as the errorfunction L. Because we are only looking at the final output,we can mask all other outputs and only consider loss fromthe final output. From here, we may use stochastic gradientdescent to minimize the error.There is one caveat: the vanishing gradient problem. As wedL,know from NN backpropagation in class, the gradients dUdLdL,arederivedfromthechainrule,meaningtheyaredW dVproducts of multiple derivatives. These chain rule derivativeshave upper bounds of 1 (apparent from the tanh and sigmoidactivation functions used). And this means that gradient valuescan shrink exponentially fast and ’vanish’ after a few timesteps, particularly when the neurons are saturated. Becausegradients ’vanish’ within a limited number of time steps, thevanilla RNN model typically has issues learning long rangedependencies, i.e. the RNN will not learn much from inputsmore than a certain number of time steps before the finaloutput. From this, we know that the number of time stepsin the input sequence for this RNN model cannot be toolarge. We may determine this hyper-parameter from crossvalidation. Note that this is a problem in deep NNs as well.Also, exploding gradient may be a problem, but this can becircumvented effectively by clipping the gradients.For this project, we implemented the above described RNNmodel from scratch in Python and tested its performance onthe stock market prediction problem.B. Gated Recurrent UnitWe also implemented from scratch in Python a more sophisticated RNN variant - the Gated Recurrent Unit (GRU).GRUs are identical to the vanilla RNN described above (takessequential inputs) except in the way the hidden states stare calculated. They were designed to alleviate the vanishinggradient problem through the use of gates (Figure 2). Theseare illustrated through the GRU equations 8, 9, 10, 11 and 12.Fig. 2.Gated Recurrent Unit topology [8], [9].z sigmoid(U z xt W z st 1 )(8)r sigmoid(U r xt W r st 1 )(9)h tanh(U h xt W h (st 1 · r))(10)st (1 z) · h z · st 1(11)ot sigmoid(V st )(12)where · denotes element-wise multiplication. GRU has twogates, specifically a reset gate r and an update gate z. The resetgate r determines how to combine the new input xt with theprevious hidden state st 1 , while the update gate z determineshow much of the previous hidden state st 1 to retain in thecurrent hidden state st . We obtain the vanilla RNN by settingr to all 1’s and z to all 0’s [8].The GRU is a relatively new model published in recentyears. They have fewer parameters than Long Short TermMemory (another RNN variant), rendering them faster to trainand requiring less data to generalize. We tested our implementation of GRU on the stock market prediction problem aswell.IV. M ETHODOLOGYA. Baseline and Other ModelsSince we have framed stock market prediction as a binaryclassification problem, Logistic Regression (LR) is a naturalchoice as a baseline model. Beyond LR, we also tested severalother more sophisticated models (some of which were notcovered in lectures) to gain exposure to common machinelearning algorithms. They are Support Vector Machines RBF(SVM RBF), K-Nearest Neighbors (KNN) and AdaBoost(implemented in Scikit-Learn).B. Experiment DesignThe range of data (price history and sentiment scores)collected span 11 years from January 1, 2004 to December 31,2014. In this project, we would like to predict whether tomorrow’s price will be higher (1) or lower (0) than today’s price.Thus, each day may be viewed as an observation from whicha training example or testing example may be constructed.We created feature vectors based on the features described inSection II: each vector is essentially a concatenation of pricetechnical indicators and public sentiment scores. The targetvariable is binary and is simply the sign of tomorrow’s 1-dayreturns. We show an example feature vector x(i) and targetvariable y (i) pair for some arbitrary i-th day below:

4x(i)This means that we are not restricted to using one featurevector for each prediction; we may input feature vectors fromsome previous t days into the RNN sequentially and take thefinal output prediction (minimize cross entropy error of finalstep prediction). Using t 3 as a concrete example:h Ri,1 , Ri,2 , . . . . . . . . . . . . , Ri,n ,M Ai,2 , . . . . . . . . . . . . , M Ai,n ,R(i 1),1 , R(i 2),1 , . . . , R(i n),1y (i)GTi,econ , GTi,debt , GTi,inflat , GTi,risk , GTi,stocks ,iScoreihi Sign(R(i 1),1 )y (i) (0)swhere notation remains the same as introduced in SectionII, GTi,YYY refers to the Google Trends query volumes for theword ”YYY”. It is important that the feature vector x(i) doesnot contain any future information and only uses informationavailable up to that point.n determines that amount of information about past pricesand price trends incorporated into the feature vector; thedimensions of the feature vector changes with n. Note thatbecause we are predicting tomorrow’s price change, we loseone day: no prediction can be made for the last day in the dataset, December 31, 2014, because we do not know the true priceon January 1, 2015. Also, depending on the n chosen, we haveto drop the first n days observations: to calculate the n-daysreturns, n-day returns moving average and n-time lagged 1-dayreturns, we need the previous n days prices. So these featurescannot be calculated for the first n days in the data set becausewe do not know prices prior the first day, January 1, 2004. Weselect n from cross validation.f (θ) N hXy (i) log(q(x(i) ) (1 y (i) )log(1 q(x(i) )ii 1(13)where N is the number of training examples.This is a binary classification task so we may use the binarycross entropy error function as objective to minimize for LRand the RNN (Equation 13).TABLE IT RAIN AND T EST S ET S PLITData SetTrain SetTest SetJanuary 2004 to December 2014January 2004 to December 2012January 2013 to December 2014Before we began training, we split the data set of observations into train and test sets, roughly 80% and 20%respectively each (Table I). We will train our models (RNN,GRU, LR, SVM, KNN and AdaBoost) based on the train set,and subsequently evaluate their performance on the untouchedtest set.C. RNN TrainingFor conventional classifiers like LR, the training method isstraightforward: for each prediction, we use x(i) as input, y (i)as target and minimize the error function either stochastically(stochastic gradient descent) or collectively (batch gradientdescent). This is not the case for RNNs. Recall that one of theproperties of RNNs is that they can process sequential data.(1) RN N sx(2) RN N s (i 2)(i 1)x RN N x(i)where s(t) are the hidden state vectors at time step tfrom the RNN, and s(0) is initialized with all zeros. Fortraining RNN, we used inputs that are sequences of featurevectors [x(i t 1) , . . . , , x(i 1) , x(i) ]. We feed them into theRNN sequentially beginning from x(i t 1) to x(i) . And thefinal output gives us a probability for the target variable y (i)since we use the sigmoid function (Equation 7). Again, similarto that described in the previous section, depending on t wehave to drop the first few days of training examples.This allows the RNN to capture some extent of temporalinformation that LR does not (e.g. finer grain resolution ofhow returns are changing day to day). The larger t is, themore temporal information we are feeding into the RNN.However, as mentioned in Section III, t is intrinsically limitedby the vanishing gradient problem. t, together the dimensionsof hidden state vectors s(t) are the hyper parameters we cantune using cross validation.The above training method also applies for GRUs (a variantof RNN). However we may expect better results for GRUs asthey should theoretically face a less extent of the vanishinggradient problem.D. Cross Validation for Time SeriesCross validation is an important step in model selection andparameters tuning. It provides a measure of the generalizationerror of the trained classifier. To a certain extent, this techniqueallows us to avoid over-fitting on the training data (and perhapsunder-fitting), and consequently do better on the test data.For independent data, we can typically use K-Folds crossvalidation, where the training data is randomly split in Kideally equally sized folds. Each fold may then be used as avalidation set while the remaining (K-1) folds become the newtraining set. We cycle through the K folds so that each fold isleft out of training and used for validation once. By taking theaverage error over these K folds validation, we get an estimateof the generalization error (i.e. how well the classifier willlikely perform on unseen test sets).However, for this project, the data involved is financialtime series and they are not independent! Correlation betweenadjacent observations is often prevalent in time series data; thedata has some intrinsic order. The K-Folds cross validationmethod described earlier breaks down because (assuming werandomly split the training data into K Folds) the validation

5and training samples are no longer independent. Furthermore,the train set should not contain any information that occursafter the validation set. But splitting the data randomly, wecannot be sure of that.TABLE IIC ROSS VALIDATION FOR T IME S ERIESFold1234Train Set20042004, 20052004, 2005, 20062004, 2005, 2006, 2007Validation Set2005200620072008A more principled approach for time series cross validationis forward chaining [7]. Using 5 years of training time seriesdata from 2004 to 2008 as example, we may split it into 4folds and perform cross validation as in Table II. This is amore accurate reflection of the situation during testing wherewe train on past data and predict future price changes. Weadopted this approach for cross validation in this project.In Table III, we summarize the hyper-parameters for eachmodel we tested, and the respective ranges over which we dida grid search for.Fig. 3. Grid search heat map for Logistic Regression. The optimal parametersfrom cross validation are n 8 and regularization C 0.1, withoutsentiment scores.TABLE IIIG RID S EARCH H YPER -PARAMETERSHyper-Parametersn (refer to section II)GT , ScoreLRRegularization CSVM RBFBandwidth γCKNNNo. of neighborsAdaBoostNo. of estimatorsLearning rateRNNTime steps tHidden state s(t) dimensionsGRUTime steps tHidden state s(t) dimensionsSweep Range3, 4, 5, 6, 7, 8, 9with and without10e-2, 10e-1, 10e-0, 10e1, 10e210e-2, 10e-1, 10e-0, 10e1, 10e210e-2, 10e-1, 10e-0, 10e1, 10e25, 10, 25, 50, 75, 100Fig. 4. Grid search heat map for K-Nearest Neighbor. The optimal parametersfrom cross validation are n 8 and no. of neighbors 5, without sentimentscores.5, 10, 25, 50, 75, 1000.01, 0.05, 0.1, 0.5, 12, 4, 610, 30, 502, 4, 610, 30, 50V. R ESULTS AND D ISCUSSIONA. Grid Search Cross Validation ResultsWe performed extensive grid searches for each model tochoose the best hyper-parameters based on the resulting crossvalidation accuracy. Selected results are presented as heat mapsin Figures 3, 4, 5, 6, 7 and 8. Using the best hyper-parametercombination, we trained fresh models (LR, KNN, AdaBoost,SVM RBF, RNN and GRU) based on the entire train set(from January 2004 to December 2012) and tested them onthe unseen test set (from January 2013 to December 2014).The results are summarized in Table IV.B. DiscussionFrom our grid search experiments, we realized that including Google query volumes and sentiment scores did not necessarily lead to improved performance. In fact for some modelsFig. 5. Grid search heat map for AdaBoost. We swept n as mentioned inTable III. For easy visualization we only present heat map of the best n here.The optimal parameters from cross validation are n 3, no. of estimators 5and learning rate 1, without sentiment scores.(like KNN and LR), including these sentiment scores caused asignificant drop in test accuracy. The reason becomes apparentwhen we overlay Google query volumes and sentiments scoreswith the N225 price index.From Figures 9 and 10, we can see that both scores do

6TABLE IVB EST C ROSS VALIDATION ACCURACY AND T EST ACCURACYModelLR (baseline)KNNAdaBoostSVM RBFRNNGRUBest Cross Validation Accuracy0.5090.5110.5200.5680.5340.561Test Accuracy0.5100.4950.5230.5650.5310.558Fig. 6. Grid search heat map for Support Vector Machine RBF. We sweptn as mentioned in Table III. For easy visualization we only present heat mapof the best n here. The optimal parameters from cross validation are n 8,bandwidth γ 0.1 and C 1000, without sentiment scores.Fig. 9. Plot of Bloomberg Businessweek sentiment scores and the N225 priceindex over time from 2007 to 2009.Fig. 7. Grid search heat map for Recurrent Neural Network. We swept nas mentioned in Table III. For easy visualization we only present heat mapof the best n here. The optimal parameters from cross validation are n 5,hidden state s(t) dimensions 30 and time steps t 4, without sentimentscores.Fig. 10. Plot of Google Trends query volume for the word ”debt” and theN225 price index over time from 2010 to 2014.Fig. 8. Grid search heat map for Gated Recurrent Unit. We swept n asmentioned in Table III. For easy visualization we only present heat map ofthe best n here. The optimal parameters from cross validation are n 5,hidden state s(t) dimensions 50 and time steps t 4, without sentimentscores.not seem to be consistently correlated with the N225 price.They do not seem to be predictive of N225 price changes(while the figures are plotted at the monthly-level, the sameholds true when we zoom in to the daily-level). This likelyexplains why the sentiment score features do not improve theclassifiers’ performance; they do not provide useful additionalinformation.It seems that our simple sentiment analysis (scoring bycounting positive and negative words from pre-specified lists)is too coarse to extract useful information. Perhaps using moresophisticated sentiment analysis methods that goes beyondthe word-level (such as OpinionFinder in [2], that looks atsentence-level subjectivity) will yield more informative scores.In addition, it may be useful to crawl articles from multiplenews archives, rather than just the Bloomberg Businessweek,to gain a more diverse set of corpus that may be more representative of the state of world affairs. Unlike that reported in[3], Google search volume trends did not improve our results.

7This could be simply due to the fact we are analyzing N225 inthis project, and not the Dow Jones Industrial Index as in theoriginal paper. On hindsight, perhaps using volume trends forsearch terms in the Japanese language would have been moreappropriate since English is not Japan’s first language (butthen again, with globalization, N225 is tradable from almostanywhere in the world). Further, [3] could have used a greaterset of search terms; we restricted ourselves to 5 finance relatedterms to keep data collection and computation time reasonable.Out of all the models tested, LR gave one of the poorestaccuracy at 0.510. This is only slightly better than randomlyguessing (0.5). However such result is consistent with ourunderstanding that LR is ultimately a linear classificationmodel (we did not kernelize LR for this project). It is naturalthat stock market prediction, a non-linear problem, cannot bewell-modeled by a linear model. Nevertheless this serves asa baseline benchmark to evaluate other more sophisticatedalgorithms.Both the RNN and GRU performed better than LR. Becausethese are non-linear models, it is natural that they can givebetter accuracy than LR. One observation is that the GRU(0.558) performs slightly better then the vanilla RNN (0.531),suggesting that the GRU gating architecture may have indeedhelped to alleviate the vanishing gradient problem, allowingit to learn better. We also note that both the RNN and GRUrequired significant longer times to train as compared to theother models. This posed as an issue particularly for timeseries cross validation. As a result, we only managed to sweep3 values for both the time steps t and the hidden state s(t)dimensions (on top of the n) - sweeping these parameters tookover a day for each of the two models.Finally, we see that GRU has comparable performance/slightly lower with the SVM RBF (0.565). In general,our SVM RBF accuracy is consistent with that reported inliterature and other implementations online ([10], [11], [12]and [13]). However we feel that the GRU has potential tooutperform the SVM RBF classifier: Firstly, as mentionedearlier, we only swept 3 values for both GRU parameters.Given more time and resources, we could sweep the parameters at finer resolutions and for a larger range. This willlikely give better performance. In addition, we used simplestochastic gradient descent in the GRU implementation. Thereare more sophisticated optimization methods available (such asRMSprop) that could potentially lead to improved accuracy.Lastly, we are currently looking at daily data, which givesus around 2000 training examples. This data set size maybe insufficient to learn the reset and update gates’ weightseffectively. Perhaps if we looked at minute scale data (whichwould vastly increase the number of training examples), theGRU will perform much better than the SVM RBF.Lastly, we did not have sufficient time to thoroughly analyzethe results for KNN and AdaBoost. As mentioned in SectionIV, we tested these models mostly to gain exposure to a widerrange of common machine learning algorithms.VI. C ONCLUSIONIn this project we collected price history from Yahoo!Finance, crawled articles from Bloomberg Businessweek andobtained Google query volumes from Google Trends for theperiod 2004 to 2014. Using the data, we generated pricetechnical indicators and sentiment scores to be used as featuresfor predicting future (tomorrow’s) price change direction. Weimplemented a vanilla RNN and GRU from scratch in Pythonand tested them against LR as a baseline. Through gridsearches and cross validation for time series, we chose theoptimal (according to cross validation error) hyper-parametersfor each model.From our experiments, sentiment scores and Google queryvolumes did not improve classifiers’ performance. This islikely because our simple sentiment analysis does not extractuseful information from the news articles. Consistent with ourexpectations, LR performed the poorest among SVM RBF,RNN and GRU. It is logical than a linear model cannot adequately describe a complex non-linear problem such as stockprices.

indicator that is closely watched by traders and banks around the world. We formulate the stock price prediction problem as a binary classification problem: whether the future daily returns of N225 will be positive (1) or negative (0), i.e. whether N225’s closing price tomorrow wi