Trend Following: A Machine Learning Approach

Transcription

Stanford UniversityMS&E 448Big Financial Data and Algorithmic TradingTrend Following: A Machine LearningApproachAuthors:Art Paspanthong, Divya Saini, Joe Taglic, Raghav Tibrewala, WillVithayapalertJune 10, 2019

Trend Following StrategyContentsIntroduction and Strategy3DataInvestment Universe Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Data Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .333Feature GenerationContinuous Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Categorical Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .344ModelsLinear Model . . . . . . . . . . . . . . . .Results . . . . . . . . . . . . . . . . .RNN Model . . . . . . . . . . . . . . . . .Results . . . . . . . . . . . . . . . . .Neural Net Model . . . . . . . . . . . . .Comparison with Linear Regression .Results . . . . . . . . . . . . . . . . .Summary and Comparison . . . . . . . . .44467891011Portfolio Construction11Portfolio Optimizer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11Stop Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12Risk Management Philosophy13Portfolio Results13Baseline Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Comparison of Results from Different Models . . . . . . . . . . . . . . . . . . . . . . . . . 13Execution Discussion14Retrospective Discussion14Page 1 of 15

Trend Following StrategyList of elation of Returns of 36 Different Assets . . . . . . . . . . . . . . . . . . . . . .Predicted versus actual values of unregularized linear regression model. . . . . . . .Histogram of error values of unregularized linear regression model . . . . . . . . . . .Beta values of unregularized linear model and their significance values. . . . . . . . .Portfolio over 2017-2018 using unregularized linear model predictions. . . . . . . . .Predicted vs actual values of the lasso regression model. . . . . . . . . . . . . . . . .Lasso regression model histogram of errors. . . . . . . . . . . . . . . . . . . . . . . .Portfolio over 2017-2018 using lasso model predictions. . . . . . . . . . . . . . . . . .Portfolio over 2017-2018 using 5-day linear regression return predictions. . . . . . . .The architecture of 3-layer LSTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Correlation of actual next day’s returns and predicted next day’s returns . . . . . . .Correlation of actual next 5-day’s returns and predicted next 5-day’s returns . . . .Histogram of errors for prediction on next day’s returns . . . . . . . . . . . . . . . .Histogram of errors for prediction on next 5-day’s returns . . . . . . . . . . . . . . .Portfolio value over 2017-18 using LSTM model prediction on next day’s returns . .Portfolio value over 2017-18 using LSTM model prediction on next 5-day’s returns .Different Results given by Neural Net Model due to Stochastic Nature of Neural NetsLoss as a function of epochs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Comparison of Linear Regression and Neural Network without Activation . . . . . .Correlation: predicted and actual returns . . . . . . . . . . . . . . . . . . . . . . . .Histogram of Errors from Neural Net Model . . . . . . . . . . . . . . . . . . . . . . .Final Saved Portfolio from the Neural Net Model compared to the Naive Strategy .Plots of Portfolio Value over Time for Linear Regression Portfolio with Stop Loss(No SL, 15%, 10%, 5%) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Comparison of the portfolio over time for different models . . . . . . . . . . . . . . .345555666777888899101010111213Page 2 of 15

Trend Following StrategyIntroduction and StrategyIn addition to that, we also filter out commodities futures with low volume out as well. In theTrend following is one of the most clas- end, we have in total of 36 different contractssic investment styles used by investors for over from 7 commodities.decades. The concept of trend following is relatively simple: When there is a trend, follow it;when things move against you or when the trend Data Explorationisnt really there, cut your losses.Since the data set we selected are relativelyHowever, due to its simplicity, our team becomplete, we did not encounter any challenginglieves that trend following strategy itself mightproblems. However, the original features in thenot be able to capture the nuance and the comdataset is somewhat limited, so we decided toplexity of the financial market. Consequently,add approximately 50 new ”trend-following” feawith increased availability of data, we believetures into the data set. Details of these featuresmachine learning techniques could play an imwill be discussed in the next section.portant role in constructing a better trend folIn addition to that, we also explore the correlowing portfolio. That’s why our task for thislation between different assets. The correlationproject is to replicate and improve on the basicplot is shown in the figure below.ideas of trend following.DataInvestment Universe SelectionAs per the project proposal, we narroweddown our universe of assets to futures markets.Using data sets from Quandl, we have access tomultiple different futures contracts. However, wefirst select 9 different commodities to start offwith, including Crude Oil, Natural Gas, Gasoline, Gold, Silver, Copper, Agriculture, Corn,Wheat, and Soybean. We consider 6 differentcontracts for each commodity (1 to 6 months expiration). The primary reason for looking intoa diverse set of assets is to diversify the portfolio. In addition to that, the volume of futures contracts for specific commodities could bea lot smaller than equity markets. Large buy orsell orders could potentially move the market.That’s why we want to invest in many differentcontracts.After inspecting and considering each dataset, we ended up selecting 7 different commodities, dropping Natural Gas and Gasoline fromour study due to incompleteness of the data set.Figure 1: Correlation of Returns of 36 Different AssetsIn the plot above, there are quite a few noticeable clusters of assets with high positive correlation. Such clusters are the same commoditywith different expiration period. It’s also notablethat among all assets we selected, there is no pairof futures contracts that have high negative correlation.Feature GenerationFeatures selected for the modeling were basedon traditional trend following indicators. Thesewere used in the prediction of the final responsevariable, next day return, or (Pt 1 Pt )/Pt .Page 3 of 15

Trend Following StrategyContinuous Variables1. Simple Moving Average (SMA)2. Exponential Moving Average (EMA)3. Moving Average Convergence Divergence(MACD)4. Momentum Indicator5. Day Since Cross6. Number of days up - downThe simple moving average, momentum indicator, and number of days of price upward movement minus number of days of price downwardmovement were calculated over several lookbackwindows. Specifically over the time-frames of 5,10, 15, 20, 50, and 100 days back. EMA variableswere included over lookback windows of 10, 12,20, 26, 50, and 100 days. And, MACD was calculated as 12-day EMA - 26-day EMA. Days sincecross indicates the number of days since the lastcrossover between an asset price and its EMA.Categorical Variables1. SMA Crossover indicator variables2. EMA Crossover indicator variablesModelsLinear ModelFirst, a linear regression model was trainedon 2014-2017 data and tested on 2017-2018 data.The technique provided fairly stable predictablepatterns and in the unregularized version, allparameters mentioned in the feature generationsection of this paper were used. A separate regression was run on each asset available in thetraining data in order to allow the models moreexpressiveness in their understanding. The advantages of using a linear model on this problemare that it is simple and easy to understand, andit fits decently well to the data. Second, a regularized lasso regression model was trained onthe same training data and tested on the sametest data. Finally, a linear regression model wastrained to predict returns over a longer timeframe. Specifically, on 5-day returns. We attempted this model because in a non-ideal trading system there are frictions. Namely, thatone-day returns are small and may be erased bytransaction costs and we might not enter the position until the next day. So, the question became whether we could reliably predict 5-dayreturns and whether that would improve the efficacy of our trading algorithm.3. MACD Crossover indicator variablesThe categorical variables were labeled at eachtimestep as 1 to indicate a crossover with buysignal, 0 to indicate no crossover, and -1 toindicate a crossover with a sell signal. Theywere calculated as asset price crossovers with allthe SMA, EMA, and MACD indicator variablesmentioned in the continuous variables section.In traditional trend following strategies, thesecrossover variables are important indicators ofdetecting upward or downward trends that canbe ridden for profit. Our reasoning for feedingall of them into our models was to allow the algorithm to determine which ones are more accuratepredictors of next day returns.ResultsThe figures below showcase the plots of thepredicted versus actual values as well as a histogram of the linear regression errors.Figure 2: Predicted versus actual values of unregularized linear regression model.Page 4 of 15

Trend Following Strategyear regression model price predictions performedquite well. Below is a chart of the portfoliogrowth based on the linear regression model compared to a naive strategy. Over the course of2017-2018, the portfolio grew to 1.3x using thelinear regression model return predictions.Figure 3: Histogram of error values of unregularized linear regression modelThe overall train mse was 2.187 E-04. Thetest mse was 1.47 E-04. In analyzing the betavalues of the linear regression, we noticed thatexponential Moving Averages are generally better predictors than simple moving averages interms of higher absolute values of betas. One ofa 5 day, 10 day, 12 day, and 100 day indicatorswere statistically significant at the five percentlevel. Thus we also noticed that recent trendsare most significant, though longer term trendsare not irrelevant. Finally, we noticed that because of the change of sign between EMA 10,12, 20 indicator variable beta values, there is animportance to recent crosses, which validates theinclusion of categorical crossover variables in ourfeature selection. These beta values are summarized with their p-values in the chart below.Figure 5: Portfolio over 2017-2018 using unregularizedlinear model predictions.Next, for the lasso model, we decided that itmay be interesting to train in order to get ridof some of the overfitting of a linear regression.This would be accomplished by automatically selecting only more important features. The advantages of this model would be that it is lesslikely to overfit and is less prone to noise, whichwe believe there is a lot of in the pricing data.The disadvantages are that it does not solve thecomplexity issue and can reduce the expressiveness that we may need in explaining returns. Thelasso model predicted versus actual distributionas well as error histogram are displayed below.Figure 4: Beta values of unregularized linear model andtheir significance values.The overall trading strategy based on the lin-Figure 6: Predicted vs actual values of the lasso regression model.Page 5 of 15

Trend Following StrategyFigure 7: Lasso regression model histogram of errors.Figure 9: Portfolio over 2017-2018 using 5-day linearregression return predictions.It turns out that though the mse were relatively similar to the unregularized linear model,with a train MSE of 2.281 E-04 and a test MSE:1.353 E-04, the overall strategy based on the return predictions performed worse over the courseof our test period. The portfolio growth compared to the naive strategy are displayed below.Interestingly, the daily returns of this portfolio vs. the naive portfolio are fairly comparable(0.04% vs. 0.02%) but the 5-day returns are notably better (0.22% vs. 0.07%).Figure 8: Portfolio over 2017-2018 using lasso modelpredictions.Finally, for the 5-day return predictions wenoticed 5-day returns are generally about 23x larger than 1 day returns, and, thus, aroughly 6.5x increase in mean squared error(MSE: 9.47E-04) indicates that the predictionsare about equivalent to 1-day predictions. Theportfolio performed as shown in the figure below. The 5-day return portfolio did not performas well as our 1-day return portfolio, with merelya 1.2x growth factor as compared to the earlier1.3x growth factor over this test set period.RNN ModelRecurrent Neural Network (RNN) model isconsidered to be one of the most powerful models that can make accurate prediction on future stock prices. Especially Long Short TermMemory (LSTM) model has its configurationthat incorporates historical information to capture the data pattern. Furthermore, most of research concluded that Neural Network structurehas outperformed simple linear regression in substantial margins, although they didn’t explicitlyexplain how specific hyper-parameters were selected. We also choose to build LSTM architecture to investigate whether it can drive up theprofitability of our trend-following strategy.In this project, our RNN architecture consistsof 3 layers of LSTM, and one fully-connectedlayer at the end. Each layer has 128 hidden unitswith the linear activation in the last step, as

Trend Following Strategy Continuous Variables 1.Simple Moving Average (SMA) 2.Exponential Moving Average (EMA) 3.Moving Average Convergence Divergence (MACD) 4.Momentum Indicator 5.Day Since Cross 6.Number of days up - down The simple moving average, momentum indica-tor, and number of days of price upward move- ment minus number of days of price downward movement were