It is known that in data mining projects, one can spend 80% of the time for data preprocessing and the remaining 20% for the data mining task itself. However, when data mining is integrated in an overall system (such as a stock picking system), an important task is to tune the parameters of the overall system.
For example, in the above mentioned system, there are several parameters to fix in order to obtain satisfying results. Here is a list of these parameters:
- Number of stocks to analyze (depends on the computational resources)
- Number of stocks to select as the best ones (fixed number or with a threshold on the validation accuracy and the minimum number of trades)
- Short or long term prediction (predict increase/decrease of given stocks in X days)
- Confusion matrix for the classifier (how to penalize the errors of the classifier)
- Size of the shifting window (i.e. size of the training/validation set)
These parameters will vary according to each project. For example, you can have a look at the parameters mentioned in a post by Themos Kalafatis. Feel free to comment and give examples of parameters that you have to tune.