Friday, May 29, 2015

Stock Forecasting With Machine Learning - Seven Possible Errors

Here are at least seven reasons why pumping the last ten days of SP500 O/H/L/C/V into a neural network in an attempt to solve for the next day's O/H/L/C is a bad idea. 

1. Not enough data.  Ten days of data is simply not enough.
2. No feature engineering.  The plan used raw data.  A better approach might be to use to solve for percentage gain or loss.  How about daily range as a converted to a percentage volatility value?  How about Volume spike true/false?
3. No separate train vs. test set.  You have no way of determining the accuracy of the model on unseen data.
4. If you are training and predicting during in a trending market, the neural network is being asked to solve for values outside its known range of values.  Not a task that is well suited for a Neural Network.
5. Separate Neural Network should be used to solve for multiple output values.  While some algorithms can be constructed to solve for multiple targets, the Neural Network is not one of them.
6. The Neural Network is a very brittle and opaque algorithm.  Sometimes it does not converge at all and when/if it does, it is very difficult to understand the results of the model.
7. Attempting to forecast next day's numbers based on a series of End-of-Day values is a fool's folly.    For more information refer to this book: The (Mis)Behaviour of Markets: A Fractal View of Risk, Ruin and Reward Book by Benoit Mandelbrot

However, on the plus side, there was an important kernel of truth here.  The notion that the model needs to be adaptable to current market conditions.  This is the bane of many black box or mechanical trading systems.  If the model does not adapt to current conditions, the best it can do is average over long periods which are likely not suitable for today's market.

Bottom line.  This model is crap.  "Operation Make Millions" was naive and ill advised.  It never made $10.00 !

Feedback?  Hit me up on Twitter @anlytcs