I have started applying data mining to finance for a few months now. I will thus give you an insight about my main project regarding stock market prediction. While starting in my company, I have seen several projects (so-called “screener”, i.e. based on technical indicators to build stock picking rules, but no use of data mining). Most of them make two assumptions:
- The rules based on technical indicators don’t evolve in time
- Stocks are selected (and sometimes processed) differently according to the sector they belong to (e.g. health and care, industry, etc.)
This means that i) rules based on indicators should evolve in time and ii) each stock should be processed independently. Note that the second point doesn’t mean that there are no correlation between a particular stock and the sector it belongs to. It only means that stocks may behave differently and thus should be treated independently. However, any information from their sector could be used in the forecasting process.
When seen as a balck box, the system has information about a specific stock (such as open, high, low, close, volume, etc.) as input and a class value as output. The class is fixed this way:
1 if close[j+n] > (x% * close[j]) + close[j]
n is the difference between the current day and the day predicted and
x is a value chosen to take transaction fees into account (note that a fixed value could also be chosen instead of a percentage). The class predictions are thus made for each stock independently. One year daily data is used for training and the following month for testing. A shifting window process is made so that the system adapts itself to the current market.
Here are the different steps of the overall methodology that makes use of decision tree for stock prediction:
1. Stock filtering
2. Data preprocessing
3. Classification tree
4. Risk management
In the following posts, I will explain in details each of these steps.