Forecasting: varying factors

trendsHave you ever worked with time series? This topic is particular for several reasons. I will discuss them in a series of posts on Data Mining Research. In this post, I explain varying factors that may influence forecasting results.

Time series is an ordered sequence of values of a variable at equally spaced time intervals. Time series prediction (forecasting) uses past data to predict future time series values. These forecasts can be used to support decisions in a company (planning). Examples of applications include electricity demand, call volume and stock requirements. One of the main difficulty when dealing with time series analysis is the number of parameters to tune. Here are examples:

  • The time scale must be chosen carefully: daily, monthly, yearly? At the day level, although you have precise data, it may be difficult to highlight patterns. Daily prediction may also be too detailed, for example for managers. In certain situations, there may simply be no daily data available.
  • The forecast horizon is critical: should you predict for 1, 5 or 20 weeks ahead? This question highly depends on your business problem. It also depends on the available data: the longest the forecast horizon, the more data you need to build and test your model.
  • The time window size is the number of time events you consider in the past data for building your model. The bigger the time windows the more information you include in your model. As for the forecast horizon, a bigger time window size will need more data.
  • Additional data can be included in your time series model. For example, external data from various sources can be incorporated. It is possible to add any time event data that may influence your time series values.
  • Several prediction methods are available for forecasting. It goes from the simple moving average (and its extension such as weighted and exponential) to ARIMA and more advanced techniques such as neural networks and support vector regression. There is no best choice and it clearly depends on your data and experience with these algorithms.
  • The evaluation criterion is key since it allows you to estimate your performance, compare prediction techniques and time series between each others. Choosing the wrong evaluation criterion may lead to incorrect comparison and inexact conclusions.

As you can read, there are more varying factors with forecasting than with most other data mining domains. I will describe certain of these varying factors in upcoming posts, so stay tuned!

Update: link to forecasting methods and evaluation criteria.


Recommended Reading