Energy Stat Arb Part 2

In my previous post, I rushed through a lot of technical details on how I implemented the strategy. For that I apologize! I am here to make up by providing more on how I approached it and hopefully make my analysis more understandable.

In this post, I want to re-visit energy pairs (XLE vs OIL) trading but with the traditional spread construction approach through regression analysis. My data comes from QuantQuote, all adjusted for dividends and splits. To read in the data, I used the following code:


from matplotlib.pylab import *
import pandas as pd
import numpy as np
import datetime as dt
xle = pd.read_csv('/Users/mg326/xle.csv', header=None,parse_dates = [[0,1]])
xle.columns = ['Timestamp', 'open', 'high', 'low', 'close', 'volume', 'Split Factor', 'Earnings', 'Dividends']
xle['Timestamp'] = xle['Timestamp'].apply(lambda x: dt.datetime.strptime(x, '%Y%m%d %H%M'))
xle = xle.set_index('Timestamp')
xle = xle[["open", "high", "low", "close"]]

For minute data, there are approximately 391 rows of data per day. Taking in to account OHLC, there are a total of 391 * 4 = 1564 data observations per day. Heres a image displaying May 9th, 2013:

figure_1

If you look in to the data, you may see a price for 9:45AM but the next data point comes in at 9:50AM. This means that there was a 5 minute gap where no shares were traded. To fix and align this, the following function will align the two data sets.

def align_data(data_leg1,data_leg2,symbols):
    combined_df = pd.concat([data_leg1,data_leg2],axis=1)
    combined_df = combined_df.fillna(method='pad')
    data_panel = pd.Panel({symbols[0]: combined_df.ix[:,0:4], symbols[1]:combined_df.ix[:,4:9]}) #dict of dataframes
    return(data_panel)

To construct the spread, we will run a rolling regression on the prices to extract the hedge ratio. This is then piped in to the following equation:

CodeCogsEqn

Given two series of prices, the following helper function will return a dictionary of the model and the spread. Following is the spread displayed.

def construct_spread(priceY,priceX,ols_lookback):
    data = pd.DataFrame({'x':priceX,'y':priceY})
    model = {}
    model['model_ols'] = pd.ols(y=data.y, x=data.x, window=ols_lookback,intercept=False)
    model['spread'] = data.y - (model['model_ols'].beta*data.x)
    return model

figure_1

To normalize it, simply subtract a rolling mean and divide that by the rolling standard deviation. Image of normalized spread follows.

zscore = lambda x: (x[-1] - x.mean()) / x.std(ddof=1)
sprd['zs'] = pd.rolling_apply(sprd, zs_window, zscore)  # zscore

figure_1

Without changing our parameters, a +-2 std will be our trigger point. At this threshold, there is a total of 16 trades. Here is the performance if we took all the trades for the day, frictionless:

figure_1

Pretty ugly in my opinion but its only a day. Lets display all the daily equity performance distributed for the whole year of 2013.

figure_3

The flat line initially is for the 60 bar lookback window for each day, unrealistic but it does give a rough picture on the returns. The average final portfolio gain is 0.06 with std of 0.13. The performance is pretty stellar when you look at 2013 as a whole. Comparing this to the other spread construction in my last post, its seems to reduce the variance of returns when incorporating a longer lookback period.

2013Coming up in the next instalment I want to investigate whether incorporating Garch models for volatility forecasting will help improve the performance of spread trading.

Thanks for reading,

Mike

About these ads

3 comments

  1. Thanks for sharing your thoughts!
    It looks like the strategy can’t even withstand 1 cent round trip commission per contract – upward sloping equity curve rapidly turns into a downward…
    Do you have any ideas of how to struggle with it?

  2. Thank you for your post!
    Can you share the trading strategy code ? If can’t, I am wondering how you will close the position since hedge ratio(beta) is time-varying. the Beta when you open the position is totally different from the Beta when you want to close your previous position.
    And How did you calculate zscore in the first rolling window?

  3. Be careful with the QuantQuote adjusted data. Their back adjuster is unreliable and known to have problems (missing dividends is common). If you are not careful, the price history will drift over time. Compare with CSI for an example. Or use raw, un-adjusted price data.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s