Energy Stat Arb Part 2

In my previous post, I rushed through a lot of technical details on how I implemented the strategy. For that I apologize! I am here to make up by providing more on how I approached it and hopefully make my analysis more understandable.

In this post, I want to re-visit energy pairs (XLE vs OIL) trading but with the traditional spread construction approach through regression analysis. My data comes from QuantQuote, all adjusted for dividends and splits. To read in the data, I used the following code:

from matplotlib.pylab import *
import pandas as pd
import numpy as np
import datetime as dt
xle = pd.read_csv('/Users/mg326/xle.csv', header=None,parse_dates = [[0,1]])
xle.columns = ['Timestamp', 'open', 'high', 'low', 'close', 'volume', 'Split Factor', 'Earnings', 'Dividends']
xle['Timestamp'] = xle['Timestamp'].apply(lambda x: dt.datetime.strptime(x, '%Y%m%d %H%M'))
xle = xle.set_index('Timestamp')
xle = xle[["open", "high", "low", "close"]]

For minute data, there are approximately 391 rows of data per day. Taking in to account OHLC, there are a total of 391 * 4 = 1564 data observations per day. Heres a image displaying May 9th, 2013:


If you look in to the data, you may see a price for 9:45AM but the next data point comes in at 9:50AM. This means that there was a 5 minute gap where no shares were traded. To fix and align this, the following function will align the two data sets.

def align_data(data_leg1,data_leg2,symbols):
    combined_df = pd.concat([data_leg1,data_leg2],axis=1)
    combined_df = combined_df.fillna(method='pad')
    data_panel = pd.Panel({symbols[0]: combined_df.ix[:,0:4], symbols[1]:combined_df.ix[:,4:9]}) #dict of dataframes

To construct the spread, we will run a rolling regression on the prices to extract the hedge ratio. This is then piped in to the following equation:


Given two series of prices, the following helper function will return a dictionary of the model and the spread. Following is the spread displayed.

def construct_spread(priceY,priceX,ols_lookback):
    data = pd.DataFrame({'x':priceX,'y':priceY})
    model = {}
    model['model_ols'] = pd.ols(y=data.y, x=data.x, window=ols_lookback,intercept=False)
    model['spread'] = data.y - (model['model_ols'].beta*data.x)
    return model


To normalize it, simply subtract a rolling mean and divide that by the rolling standard deviation. Image of normalized spread follows.

zscore = lambda x: (x[-1] - x.mean()) / x.std(ddof=1)
sprd['zs'] = pd.rolling_apply(sprd, zs_window, zscore)  # zscore


Without changing our parameters, a +-2 std will be our trigger point. At this threshold, there is a total of 16 trades. Here is the performance if we took all the trades for the day, frictionless:


Pretty ugly in my opinion but its only a day. Lets display all the daily equity performance distributed for the whole year of 2013.


The flat line initially is for the 60 bar lookback window for each day, unrealistic but it does give a rough picture on the returns. The average final portfolio gain is 0.06 with std of 0.13. The performance is pretty stellar when you look at 2013 as a whole. Comparing this to the other spread construction in my last post, its seems to reduce the variance of returns when incorporating a longer lookback period.

2013Coming up in the next instalment I want to investigate whether incorporating Garch models for volatility forecasting will help improve the performance of spread trading.

Thanks for reading,


Energy Stat Arb

Back to my roots. Haven’t tested outright entry exit trading systems for a while now since the Mechanica and Tblox days but I aim to post more about these in the future.

I’ve been looking and reading about market neutral strategies lately to expand my knowledge. Long only strategies are great but sometimes constant outright directional exposure may leave your portfolio unprotected to the downside when all assets are moving in the same direction. A good reminder would be the May of last year when gold took a nose dive.

Below are some tests I conducted on trading related energy pairs. Note that I haven’t done any elaborate testing for whether the spread is mean reverting,etc. I just went with my instincts. No transaction costs. Spread construction based on stochastic differential, 10 day lookback, +-2/0 std normalized z score entry/exit, and delay 1 bar execution.

Crude Oil and Natural Gas Futures (Daily) (Daily don’t seem to work that well no more):


OIL and UNG ETF (1 Min Bar)


XLE and OIL ETF (1 Min Bar)


Pair trading is the simplest form of statistical arbitrage but what gets interesting is when you start dealing with a basket of assets. For example, XLE tracks both Crude Oil and Natural Gas companies, therefore a potential 3 legged trade would be to trade XLE against both OIL and UNG. Another well-known trade would be to derive value for the SPY against TLT (rates), HYG (corp spreads), and VXX (Vol).

The intuition behind relative value strategies is to derive a fair value of an asset “relative” to another. In basic pair trading, we are using one leg to derive the value of another, or vice versa. Any deviations are considered opportunities for arbitrage. In the case for multi legged portfolio, a set of assets are combined in some way (optimization, factor analysis, PCA) to measure the value. See (Avellaneda) for details.

While the equity lines above look nice, please remember that they don’t account for transaction costs and are modelled purely on adjusted last trade price. A more realistic simulation would be to test the sensitivity of entry and order fills given level 1 bid-ask spreads. For that, a more structured backtesting framework should be employed.

(Special thanks to QF for tremendous insight)

Thanks for reading,


Random Subspace Optimization: Max Sharpe

I was reading David’s post on the idea of Random Subspace Optimization and thought I’d provide some code to contribute to the discussion. I’ve always loved ensemble methods since combining multiple streams of estimates makes more robust estimation outcomes.

In this post, I will show how RSO overlay performs using max sharpe framework. To make things more comparable, I will employ the same assets as David for the backtest. One additional universe I would like to incorporate is the current day S&P 100 (survivorship bias).

Random subspace method is a generalization of the random forest algorithm. Instead of generating random decision trees, the method can employ any desired classifiers. Applied to portfolio management, given N different asset classes and return streams, we will randomly select `k` assets `s` times. Given `s` different random asset combinations, we can perform a user defined sizing algorithm for each of them. The last step is to combined them though averaging to get the final weights. In R, the problem can be easily formulated via `lapply` or `for` loops as the base iterative procedure. For random integers, the function `sample` will be employed. Note my RSO function employs functions inside Systematic Investors Toolbox.

 size.fn =$weight.function)
 if(k > ia$n) stop("K is greater than number of assets.")
 space = seq(1:ia$n)
 index.samples =t(replicate(s,sample(space,size=k)))
 weight.holder = matrix(NA,nrow = s , ncol = ia$n)
 colnames(weight.holder) = ia$symbol.names

 hist = coredata(ia$hist.returns)
 constraints = new.constraints(k, lb = 0, ub = 1)
 constraints = add.constraints(diag(k), type='&;=', b=0, constraints)
 constraints = add.constraints(diag(k), type='<=', b=1, constraints)

 #SUM x.i = 1
 constraints = add.constraints(rep(1, k), 1, type = '=', constraints)

 for(i in 1:s){
 ia.temp = create.historical.ia(hist[,index.samples[i,]],252)
 weight.holder[i,index.samples[i,]] = size.fn(ia.temp,constraints)
 final.weight = colMeans(weight.holder,na.rm=T)


The above function will take in a `ia` object, short for input assumption. It calculates all the necessary statistics for most sizing algorithms. Also, I’ve opted to focus on long only.

The following are the results for 8 asset class. All backtest hereafter will keep `s` equal to 100 while varying `k` from 2 to N-1, where N equals the total number of assets. The base comparison will be that of simple max sharpe and equal weight portfolio.


The following is for 9 sector asset classes.


Last but not least is the performance for current day S&P 100 stocks.


The RSO method seems to improve all the universes that I’ve thrown at it. For a pure stock universe, it is able to reduce volatility by more than 300-700 basis points depending on your selection of k. In a series of tests across different universes, I have found that the biggest improvements from RSO comes from applying it to a universe of instruments that belong to the same asset class. Also, I’ve found that for a highly similar universe (stocks), a lower `k` is better than a higher `k`. One explanation: since the max sharpe portfolio of X identical assets is equal to that of an equal weight portfolio, we can postulate that when the asset universe is highly similar or approaching equivalence, resampling with a lower `k` Y times where Y approaches infinity, we are in a sense approaching the limit of a equally weighted portfolio. This is in line with the idea behind curse of dimensionality: for better estimates,  the data required grows exponentially when the number of assets increase.  In this case, with limited data, a simple equal weight portfolio will do better which conforms to a better performance for lower `k`.

For a well specified universe of assets, RSO with a higher `k` yields better results than lower `k`. This is most likely caused by the fact that simple random sampling of such universe with a small `k` will yield samples that contain highly mis-specified universe. This problem is magnified when the number of diversifying assets like bonds are significantly out-numbered by other assets like equities as the probability of sampling an asset with diversification benefits are far lower than sampling an asset without such benefits. Another word, with a lower `k`, one will most likely end up with a portfolio that contain a lot of risky assets relative to lower risk assets.

Possible future direction would be to figure out some ways of having to specify the `k` and `s` in a RSO. For example, randomly selecting `k` OR selecting a `k` such that it targets a certain risk/return OR maximize an user defined performance metric.

Thanks for reading,


Engineering Risks and Returns

In this post, I want to present a framework for formulating portfolio with targeted risk or return. The basic idea was inspired by controlling risk from a different point of view. The traditional way of controlling for portfolio risk was to apply a given set of weights to historical data to calculate historical risk. If estimated portfolio risk exceeds a threshold, we peel off allocation percentages for each asset. In this framework, I focus on constructing portfolios that target a given risk or return on a efficient risk return frontier.

First lets get some data to so we can visualize traditional portfolio optimization’s risk return characteristics. I will be using a 8 asset ETF universe.

con = gzcon(url('', 'rb'))
tickers = spl('EEM,EFA,GLD,IWM,IYR,QQQ,SPY,TLT')
data <- new.env()
getSymbols(tickers, src = 'yahoo', from = '1980-01-01', env = data, auto.assign = T)
for(i in ls(data)) data[[i]] = adjustOHLC(data[[i]], use.Adjusted=T)
bt.prep(data, align='keep.all', dates='2000:12::')

Here are the return streams we are working with


The optimization algorithms I will employ are the following:

  • Minimum Variance Portfolio
  • Risk Parity Portfolio
  • Equal Risk Contribution Portfolio
  • Maximum Diversification Portfolio
  • Max Sharpe Portfolio

To construct the risk return plane, I will put together the necessary input assumptions (correlation, return, covariance, etc). This can be done with create.historical.ia function in the SIT tool box.

#input Assumptions
prices = data$prices
ret = prices/mlag(prices)-1
ia = create.historical.ia(ret,252)
# 0 <= x.i <= 1
constraints = new.constraints(n, lb = 0, ub = 1)
constraints = add.constraints(diag(n), type='>=', b=0, constraints)
constraints = add.constraints(diag(n), type='<=', b=1, constraints)

# SUM x.i = 1
constraints = add.constraints(rep(1, n), 1, type = '=', constraints)

With the above we can go ahead and input both ‘ia’ and ‘constraints’ in to the above optimization algorithms to get weights. With the weights, we can derive the portfolio risk and portfolio return. These then can be plotted on a risk return plain visually.

# create efficient frontier
ef = portopt(ia, constraints, 50, 'Efficient Frontier')
plot.ef(ia, list(ef),


The risk return plain in the above image shows all the possible space to which a portfolio’s risk and return characteristic can reside. Anything that is beyond to the left side of the frontier do not exist (unless leverage, to which the EF will also shift leftward too). Since I am more of a visual guy, I tend to construct this risk return plain whenever I am working on new allocation algorithms. This allows me to compare with other portfolio the expected risk and return.

As you can see, each portfolio algorithm has their own set of characteristics. Note that these characteristics fluctuate across the frontier were we to frame this rolling through time. A logical extension to these risk return concepts is to construct a portfolio that aims to target ether a given risk or a given return on the frontier. To formulate this problem in SIT for the return component, simply modify the constraints as follows:

constraints = add.constraints(ia$expected.return,type='>=', b=target.return, constraints)

Note that the target.return variable is simply a variable storing the desired target return. After adding the constraint, simply run a minimum variance portfolio and you will get a target return portfolio. On the other hand, targeting risk is a bit more complicated. If you look at the efficient frontier, you will find that for a given level of risk there is two portfolios that line on it.  (The sub-optimal portion of the efficient frontier is hidden). I solved for the weights using a multi optimization framework which employed both linear and quadratic (dual) optimization.


 max.w = max.return.portfolio(ia, constraints)
 min.w = min.var.portfolio(ia, constraints)
 max.r = sum(max.w * ia$expected.return)
 min.r = sum(min.w * ia$expected.return)
 max.risk = portfolio.risk(max.w,ia)
 min.risk = portfolio.risk(min.w,ia)

 # If target risk exists as an efficient portfolio else
 # return weights of 0
 if(target.risk >= min.risk | target.risk <= max.risk){
 out <-optimize(f =target.return.risk.helper,
 interval = c(0,max.r),
 target.risk = target.risk,
 ia = ia,
 constraints = constraints)$minimum


Below is a simple backtest that takes the above assets and optimizes for the target return or target risk component. Each will run with a target of 8%.

Backtest1Now the model itself requires us to specify a return or risk component. What if instead we make that a dynamic component such that we extract ether the risk or return component of a alternative sizing algorithm. Below are the performance of the dynamic risk or return component extracted from naive risk parity.



Not surprisingly, whenever we target risk, the strategy tends to become more risky. This confirms confirms risk based allocations are superior if investors are aiming to achieve low long term volatility.


Thanks for reading,


Some Shiny Stuff

At the beginning of the summer I knew Shiny was going to be an indispensable tool for my work to connect with my readers. For those of you who don’t yet know what Shiny is, it is a web application package developed by RStudio. It integrates flawlessly with R and has been nothing but excitement playing with it.

In terms of difficultly, Shiny differs in its structure. While it may be intimidating initially to R veterans, I urge you to be patient with it. I learned it by examples, and Systematic Investors’ Michael Kapler has got numerous posts on how the basic framework come together (here).

I thought I’d share my own Market Dashboard. To access the dashboard, simply go to the follow link.

The Market Dashboard is divided into 3 Tabs. The first tab is called “Main.” This tab is entirely created for single-asset and cross asset comparisons. The charts in the entire application are interactive. This is useful when you want to rank assets based on the table at the bottom of the page. I created 6 charts (no reason for any one, just came to mind). These include:

  • normalized Equity prices given lookback
  • 12 Month Performance
  • Annualized CAGR
  • Percent Volatility Rank
  • Financial Turbulence Index
  • Efficiency Ratio
  • Table of Statistics (Can be ranked when you click on the title)


On the second tab, I created a broad based Asset Analytics tab. This tab aims to put all asset classes (ETF as proxy) together in coherent fashion for easy digestion. There are three main sections. First section is all asset comparison (a) (this is my attempt to replicate this: here 😉 ), second is individual asset class comparison (b), and lastly is a cluster chart comparison (Varadi & Kapler).





Last tab is called the “Macro Analytics” Tab. Here I aim to bring together US macro fundamentals in to a single page. Fundamentals include:

  • Real GDP
  • Inflation
  • Yield Curve
  • Inflation Expectation
  • Industrial Production

Untitled4 Untitled5 Untitled6

This app is something I just pulled together real quick. There will be design issues but I just wanted to get the idea across: Shiny can be very powerful! Hope you guys will have fun with this and find it useful.

Thanks for reading,


Note: The application is highly un-optimized (slow). It downloads all data over the internet (Yahoo and Quandl). This is entirely for educational purposes. Please do not make financial decisions based on the applications output. I do not guarantee the correctness of the code.

Principal Component Analysis in Portfolio Management

I’ve spend quite some time with PCA for the last couple of months and I thought I would share some of my notes on this subject. I must caution that this is one of the longest post I’ve done. But I find the subject very interesting as it has a lot of potential in the asset allocation process.

PCA stands for principal component analysis and it is a dimensionality reduction procedure to simpify your dataset. According to Wikipedia, “PCA is a mathematical procedure that uses orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components.” It is designed in a way such that each successive principal components explains a decreasing amount of variation in your dataset. Below is a screeplot of the percentage variance explained by each factor on a four asset class universe.


PCA is done via an eigen decomposition on a square matrix which in finance is ether a covariance matrix or a correlation matrix of asset returns. Through eigen decomposition, we will get eigenvectors (loadings) and eigenvalues. Each eigenvalue, which is the variance of that factor, is associated with an eigenvector. The following equation relates both the eigenvectors and eigenvalues.


The eigen vector E of a square matrix A equals to an eigenvalue (lambda) multiplied by the eigenvector. While I must confess I was bewildered towards the potential application of such values and vectors at school, it does come to make more sense when you place it in to financial context.

In asset allocation, PCA can be used to decompose a matrix of return into statistical factors. These latent factors usually represent unobservable risk factors that are imbedded inside asset classes. Therefore allocating across these may improve one’s portfolio diversification.

The loadings (eigenvectors) of a PCA decomposition can be treated as principal factor weights. Another words, they represents asset weights towards each principal component portfolio. The total number of principal portfolios equals to the number of principal components. Not surprisingly, the variance of each principal portfolio is its corresponding eigenvalue. Note that the loadings are designed to have values ranging from +1 to -1 meaning short sales are entirely possible.

Since I am not a math major, I don’t really understand any math equations until I actually see it being programmed out. Lets get our hands dirty with some data:

sit = getURLContent('', binary=TRUE, followlocation = TRUE, ssl.verifypeer = FALSE)

con = gzcon(rawConnection(sit, 'rb'))

data <- new.env()

getSymbols(tickers, src = 'yahoo', from = '1980-01-01', env = data, auto.assign = T)
for(i in ls(data)) data[[i]] = adjustOHLC(data[[i]], use.Adjusted=T)

bt.prep(data, align='', dates='1990::2013')

ret<-na.omit(prices/mlag(prices) - 1)

To calculate return we can simply


This can be represented by in R by

p.ret<-(weight) %*% t(ret)

Note that if R is a multi-row matrix return series, you will actually get back a vector of return series with the same number of rows as R. This is simply the portfolio return series, assuming daily rebalancing.

There are like 3 different way of doing a PCA in R. I will show a foundational way and a functionalized way of doing it. Such two-step process will help see what’s going on under the hood.

#PCA Foundational
demean = scale(coredata(ret), center=TRUE, scale=FALSE)
evec<-eigen(cov(demean), symmetric=TRUE)$vector[] #eigen vectors
eval<-eigen(cov(demean), symmetric=TRUE)$values #eigen values
#PCA Functional
evec<-pca$rotation[] #eigen vectors
eval <- pca$sdev^2 #eigen values

The foundational way uses the build in “eigen” function to extract the eigenvectors and eigenvalues. This way requires that you demean (scale function) the data before calculating the covariance matrix. The return values are stored in a R list data structure. On the other hand, the  functional PCA just takes in an return series vector and it does the rest.

After calculating the eigenvector, eigenvalues, and the covariance matrix, the following equation will hold true:


In the above equation, E and lambda, again, are the eigenvector and eigenvalues respectively. Sigma represents the demeaded covariance matrix. In R, this can be computed by:

diag(t(evec) %*% covm %*% evec) #reverse calculate eigenvalues

Now we have all the components to calculate principal portfolios. These return series are simply the latent factors that are embedded inside asset classes. Each asset class are exposed to each factor pretty consistently through time which may help further understand the inherent risk structure. To calculate the principal component portfolios, we will use the following formula:


This equation is very intuitive as just like earlier, the eigenvectors are “weights,” therefore applying it to the returns will yield the N different principal portfolio return streams.  In R:

inv.evec<-solve(evec) #inverse of eigenvector
pc.port<-inv.evec %*% t(ret)


In Meucci’s paper “Managing Diversification”, he showed that with the following formula one can convert a vector of weights from the asset space to the principal component space. More specifically, given a vector of asset weights, one could now show the exposure to each principal component.


Next I would like to show how from the above equation one will be able to calculate the exposure of traditional portfolio optimization weights to each principal component factors. I will be using the same universe of assets and applying Minimum Variance, Risk Parity, Max Diversification, Equal Risk Contribution, Equal Weight, and David Varadi’s et al Minimum Correlation Algorithm (mincorr2).ExposurePCA Table_exposures

As you can see, risk base portfolio optimization in the traditional asset space leads to excessive exposure towards the interest rate factor (bonds). The general equity risk factor (factor 1) accounts for the second largest concentration. Prudent investors looking at this chart will immediately notice the diversification potential if one were to allocate to factors 2 and 3.

In R, the principal component exposure equation can be calculated as follows:

factor.exposure<-inv.evec %*% t(weight)

Through simple algebra, one can convert back and forth between the asset space and the principal space easily. In the following example, I am reconstructing using equal weights.

pc.port<-as.xts(t(pc.port))  #correct dimension of principal portfolio returns
p.ret1<-t(factor.exposure) %*%  t(pc.port)

If you are following along, you will notice that the equity curves derived from the variables “p.ret” and “p.ret1” are identical. This confirms that we have correctly converted back to the asset space.

Equity Curves

Next, to calculate the variance contribution due to the N-the principal component we can simply:

A neat aspect is that from the above equation, we can arrive at the portfolio variance in the asset space with the following equation (sum because PCs are uncorrelated):

The following R code confirms that their variance are the same in the PC space and the asset space.

sum((factor.exposure^2)*eval) #variance concentration from PCA
apply(p.ret,2,var) #portfolio risk from equal weight

With this as proof, one can actually reformulate the minimum variance portfolio from the principal space. We know that portfolio variance can be calculated according to the following formula

Just to repeat, we know that given a set of asset weights we can calculate our exposure to each principal factor using:

pca-exposureIsolating asset weights, substituting and then simplifying:

Since we know that


Then portfolio variance is simply

This is intuitive as since the correlation between principal portfolios are zero, the covariance are all zero too. What is left behind are the variances of the eigen portfolios. Minimizing the above and converting the weights back to asset space should give you equivalent weights from a MVO. While I understand that there is no practicality in the above steps as it adds complexity and computing power, I just wanted to illustrate that these two spaces are tied together in many aspects.

This post is going to get out of hands if I don’t stop. As you can see PCA is a very interesting technique, it allows you to access latent factors and how they interact with the asset space. I hope curious readers can go on exploring more and post below if you find any interesting things you stumble across.

Thanks for reading,