To continue the first post of a series on forecasting, let’s discuss standard methods used for predicting time series. The two firsts methods can be used as benchmark for comparing with more advances models:

**Mean value:** the mean of the time series used for training is used as the forecast for all values in the test time series.

**Last value:** the last value of the time series is used as the forecast for the next one.

**Simple Moving Average (SMA):** the last *m* values are averaged to predict the next one.

**Weighted Moving Average (WMA):** the last *m* values are averaged, with a more important weight for recent values, to predict the next one.

Other approaches exist such as Exponential Moving Average (EMA), ARIMA, Neural Network (NN) and Support Vector Regression (SVR). Do you use other techniques for forecasting? Which one is better according to your experience?

Structural models (for example SAS’ unobserved components model (UCM)).

These forecast models generally work very well against real world data, can accept multiple input variables, and decompose the forecast into outputs that can be used very powerfully to answer business questions (affect of promotions on sales, what-if etc).

I’m currently working on a project forecasting arrivals into Hospital Emergency, and also how many Hospital beds will be needed for the next 3 days etc. This helps planning for healthcare demand. UCM’s are proving to be generating the best results.

Cheers!

Tim Manns

Like Tim, I am a software vendor. Our software, Autobox, uses ARIMA and the use of outliers (ie pulse, level shift, local time trends, and seasonal pulses) to build robust models. Most other approaches are unable to reproduce our results. For example, take the time series 1,9,1,9,1,9,1,5 and use the methods listed above and give it to a forecasting tool and it will fail in finding the inlier. Is this a made up example? Yes. Do inliers exist exist in real examples in the real world? Yes.

Tom Reilly

http://www.autobox.com

Hey Tim,

Have you found this technique (new to me) comparable (superior) to ARIMA in your applications?

Yes, but with some effort (I find UCM’s less ‘plug and play’ :). High forcast accuracy was key – a matter of life or death quite literally.

I’ve just completed a very short (15 day) off site project forecasting historical hospital data. Basically, forecasting the number of people walking into the Emergency Dept, and also forecasting the number of beds needed in Hospital. This is used to help doctor and nurse staffing levels.

I was forecasting hourly and 72 hours ahead (3 days ahead), but I decided to use a simple bit of data preparation to create a lag/offset of the dependent variable rather than the full forecast 72 intervals ahead (I only needed the 72nd step).

So, I was forecasting hourly interval, and the ‘next step’ was 72 hours/intervals steps ahead. Yep, lots of data prep during building and validation…

I used UCM because I was able to add inputs such as holiday indicator (we know if a date will be a holiday or special date ahead of time) that proved to be very influential in arriving at accurate forecasts.

The data also showed a lot of variation (higher actuals) due to weather extremes (eg. many low priority arrivals into Emergency when there is a heatwave) and so weighting on more recent events (which UCM does well) might be why UCM out performed ARIMA.

I’ll try to add something to my own blog later, but difficult to discuss my work in any level of detail due to NDA’s (the pains of being a consultant..). That is the reason my blog http://timmanns.blogspot.com.au/ has virtually died this year…

Hope that helps

Tim

A part from SVR (i wrote in my blog two post on such model applied to a slightly different topic) which is very powerful especially for data having high dimensionality,

I tried to do some experiment through Cubic Spline with good results.

In my opinion, the methodology is important, but the filtering of the data (in order to detect peaks and outliers) is one of the key to have high predictability.

cheers

Thanks for your input! I also obtained good results using Support Vector Regression (SVR). It’s very interesting to read about your applications!

In revenue management applications where the goal is allocate capacity to different customer segments based on there willingess to pay, we require a forecast of the remaining demand that is yet to book/make a reservation. Since optimizing capacity cannot be done for existing customers whose decisions have been made(i.e. water under the bridge). We find issues with ESM/ESMT(exponentially smoothing and exponential smoothing with trend) that do not make the forecasts tenable for the application of revenue management even though the historical in-sample fit is optimal:

a. since ESM is a convex combination of the last value and the current it has a tendnecy to drift to 0 or rise way above the physical capacity of the resource that is being forecasted for and this phenonemon becomes more pronounced as the forecast horizon increases.

b. ESM/ESMT cannot incorporate exogenous inputs such Day-of-Week, Monthly class variables to capture those levels of seasonality.

c. ESM/ESMT cannot incorporate exogenous inputs such additive outliers, or level shifts, which capture periods of high/low demand.

d. the ARIMA(0,1,1) model generates forecasts that are equivalent to simple “exponential smoothing”, where the 3rd parameter is the smoothing

parameter.

When the series is highly irregular without clear seasonality in the historical period of the time series. Then most customer subjective interpretation of a reasonable forecast would be a forecast which is at the level of the mean in the later historical period with some day of week seasonality to provide enough volatility to remove their cognitive dissonance with the forecast.

In forecasting, I have learned there is no silver bullet or easy way out due to the fact that promotion, pricing, availability of products by revenue managers, competitors, fiscal planning all cause changes to the forecasts in such a way that the forecast models must be re-estimated, selected, evaluated, and monitored ongoing because there are millions of these which inturn are use to make millions of capacity allocation and pricing decisions which in turn must be evaluated, and monitored. It is an art to understand which parameters make the forecasts unstable and which parameters improve the forecasts and the science provides a mathematical form to map our investigative art to.