Writing rather prolifically this past week. Last week was the end of my midterms (for now!) and continuation of job search and preparing for my final 5 weeks of university.(!) I hope my readers are finding my posts interesting and enlightening.

As mentioned in an earlier post on statistical arbitrage, the interesting aspect of it comes when we consider multi leg portfolios. To construct a multi-leg portfolio, the traditional way to do it would be to employ a multivariate linear regression (factor model). The intuition behind this is that we are trying to estimate a fair value for an asset using various predictors or independent variables. For example, we know that the S&P 500 is composed of stocks from various sectors. Therefore, an intuitive way is to derive a fair value for S&P 500 using the 9 different sector Spdrs by the following equation:

The residual return that is left over, “alpha”, is considered to be neutral (uncorrelated) against the industry sectors. With this framework, we now can essentially make ourselves neutral to any factors we want. For example, we have access to a wide variety of ETFs that mimic underlying asset class movements. If we want to be neutral to interest rates, credit risk, and volatility, we can employ ETFs: TLT, HYG, and VXX respectively. Below is a chart demonstrating this, showing the estimated fair value of SPY relative to the actual ETF:

Below is the spread that can be traded via long short on each leg:

The concept of being able to control the factors we are exposed to is very appealing as it allows us to potential shy away from turbulent events that transpire from specific assets. Not only that, these uncorrelated return streams when combined in to a portfolio allows significant risk reduction. As Dalio said, the ability to combine 15 uncorrelated return streams allows us to effectively reduce 80% of risk. (Chart below) Interestingly, from my understanding of what Bridgewater does, I am pretty confident they are employing spread trading too, but purely from a fundamental way. For example, how does a set of asset classes react to the movements of economic indicators? From there they construct synthetic spreads to trade off of these relationships.

Below is the code that generated the data for this post:

spread.analysis<-function(data, y.symbol, x.symbol, lookback=250){ y = data$prices[,y.symbol] x = data$prices[,x.symbol] lm.holder<-list() fv = NA * data$prices[,1] colnames(fv) = c('FairValue') for( i in (lookback+1):nrow(data$prices) ){ cat(i,'\n') hist.y = y[(i-lookback):i, y.symbol] hist.x = x[(i-lookback):i, x.symbol] lm.r = lm(hist.y ~ hist.x) lm.holder[[i]] = lm.r fv[i,] = lm.r$coefficients[1] + sum(lm.r$coefficients[-1] * x[i,]) } mat = merge( x,y,fv ) return( list( mat = mat, fv = fv, reg.model = lm.holder ) ) }

Also here are some links I’ve found to be very informative.

- Mean Reversion - Quantivity
- Wonder of Residuals - Quantivity
- Ed Thorp Article Series On Stat Arb (Sick stuff here!!!)
- Market Neutrality (Quantum Financier)
- High Frequency & Dynamic Pairs Trading ( Energy Stocks )-Quantivity Paper Feed
- Gekko Quants 3 Part Series on Stat Arb ( Part I, II, III )
- Treasury Term Structure Arbitrage - Equametrics

The paper on high frequency statistical arbitrage is rather a relevant one as it relates to my previous blog posts on energy related pairs trading. Essentially, the author goes on to construct a meta algorithm for ranking pairs to trade. This meta-algorithm is composed of correlation coefficient, minimum square distance of normalized price series, and a co-integration test value. I don’t have intraday (paper used 15 min bars) equities data nor do I have the infrastructure to test it but the idea resonates with me from my research in top N momentum systems. A lot of ways to improve.

I tried to replicate the results of the paper on high frequency statistical arbitrage and it looks like the author has a look-ahead bias – I think he first uses out-of-sample and in-sample data to normalize spreads and only after this normalization he trades out-of-sample. Using this look-ahead means and standard deviations one can get equity curves mentioned in the paper, otherwise when one uses means and standard deviations from the in-sample then equity curve has a downward pattern…