Research | Optimal Statistics

High Frequency data

High frequency intraday data is becoming more accessible in the financial markets and contains valuable information in particular in relation to modeling and forecasting return variance and correlation. This is the current focus of research at Optimal Statistics.

It has been shown that under some assumptions daily variance calculated from intraday observations is an unbiased ex post estimation of actual variance and asymptotically free of measurement error and as such can be treated as if it is actually observed. The assumptions underpinning this statement include a stable instantaneous mean and zero correlation between returns. The reality with markets is that the higher the frequency of the sampling the more problem there is with market microstructure noise and especially bid-ask bounce producing negative correlation between returns. Microstructure noise can seriously bias variance estimation and negate the obvious benefit of having high frequency data. Many researchers have skirted this problem by only using lower frequency data(e.g.. 10min -30min) but Optimal Statistics has developed methods to model the microstructure noise in a robust way and preserve the information in the higher frequency data. Registered users will be notified when the upgrade is available for download.

Most variance models currently in use including the proprietary NL-SWARCH model used in Optimal Investor™ use a derivative of the open, high, low, close prices commonly available as a proxy for the actual variance. The popular ARCH class of models typically use only the close-close returns. It is obvious that these models can be insensitive as changes in the variance are only expressed in the close-close return in a random or near random way.

•ARCH(1) model—- Xt=(A0+(A1*Xt-1^2))^0.5 * Zt

•Xt=ln(Pt/Pt-1) (The natural log return series)

•A0 and A1 are both positive parameters(variance cannot be negative) that are set to values so as to maximize the fit to the data. The ARCH(1) example only uses one lag but one can go to as many lags as necessary to fit the data.

•Zt is typically a sequence of independent normally distributed random variables with mean zero and variance one.

The common feature of variance models is the idea that today’s variance has an influence on tomorrow’s variance and so on. The auto correlation function (ACF) in fact usually displays significant correlation across very long time periods and has led many researchers to propose long memory models for variance.

Long memory in a time series is best understood by considering that the summed correlation’s across time approach infinity for a long memory series and a finite limit for a short memory series. Analysis of the ACF can be misleading and there is evidence that changes in the unconditional variance of a series can produce ACF functions with the slow power law decay typically produced by long memory in a series but also produced by non-stationarity. The unconditional variance is the proportion of the variance that is not dependent on its previous values and is often assumed to be a constant value. Optimal Statistics has done extensive research on the analysis and identification of regime changes and breaks in the variance series and has developed a proprietary filter already utilised in Optimal Investor™.

The following 250 day auto correlation functions (ACF) are based on almost 6000 absolute daily changes in a typical share price index. The red horizontal line is the 95% confidence level based on the assumption of gaussian white noise. The first two are on prefiltered data and the power law is very clearly a superior fit suggesting a long memory effect. The next two are after the data has been filtered for changes in the unconditional variance. It is now clear(statistically as well as visually) that the exponential fit is superior and the series can be modeled accordingly. Correlations now converge to zero within approximately 50 days and this seems a more reasonable time period. The proprietary NL-SWARCH model used in Optimal Investor™ is parametised on filtered data.

The leptokurdic(fat tailed) nature of the return distribution is well known and is principally the result of changes in the variance. An important and intuitively satisfying test of any variance model is the degree to which it normalizes the return distribution. The following frequency histograms demonstrate one of the most satisfying aspects of using realized variance.

• The first is a frequency distribution of a typical daily share price return history minus the normal distribution inferred from the sample mean and standard deviation. The high peak and fat tails are typical of most return series. The kolmogorof-Smirnov test confirms that there is practicably zero chance that the distribution is normal.

•This is the same share price return series but this time standardized using a standard Arch model. Progress has been made towards normality but the distribution is still not normal.

•This is the same share price return series but this time standardized using a standard Arch model. Progress has been made towards normality but the distribution is still not normal.

•The series is now standardized using a variance series derived from high frequency intraday data. It is clear that this series is now Normal and does in fact pass the standard statistical tests. It should however be remembered that the variance series is constructed from data contained within the daily series.