Queue Position Simulation

First off, Happy Thanksgiving! If time permits in the coming months I’d like to explore more on how I look at High Frequency (HF) data. Hopefully along the way I can spark some new discussion and improve on my thought process.

HFT strategy “simulation” is no easy task. I am referring to this as an simulation because its purely an approximation of how a strategy would have performed given a set of execution assumptions the researcher made beforehand. Should the assumptions change, the results would also change (significantly).

In my line of work, the edge we are seeking are generally less than a tick (futures). To make this even worth while, the constraints are that costs must be low AND we need to trade a lot. This may sound foreign to most of my readers as their time frames are generally much longer (days, weeks, even months). But at the end of the day, how much money we make is a simple function of our alpha * number of times we trade.

In HFT, execution is king. You can be right where the market moves the next tick but if you can’t get a fill, you are not making any money. Therefore it is paramount that when we conduct HF simulations, we make accurate execution assumptions.

Queue position, this is something that is worth a lot. Being first in line and getting a fill is like owning a call option in my world (where the premium is exchange fees per contract). The worst that can happen is you scratch assuming you are not the slowest one and there are people behind you. The image below is an analysis done on the expected edge you’d get N-events out (x-axis) assuming you are in various spots within the fifo queue. (QP_0 = first in line, QP_0.1 = 10th in line if there was 100 qty). As you can see, the further behind in line you are, the more you are going to be exposed to toxic flow, fancy word for informed traders. How does one take this in to account when you simulate a strategy? When you place a limit order on the bid, how do you know when you will be filled? This depends on 2 factors, your place in line and trade flow. As time progresses there will be people who add orders to the fifo queue, people who cancel orders and people who take liquidity (trade). These actions are something one needs to keep track of tick by tick (or packet by packet) during a simulation. While most people assume tick data is the most fine grain dataset one can have in performing such simulations there actually exists packet data. Tick data simply gives you an aggregated snapshot of what an orderbook looks like – best bid, best offer, bid qty, ask qty (this is known as Market by price). Packet data on the other hand contains all the actions taking by all the market participants. This includes, trade matches and order submissions. This feed is also know as Market by order and its up to the market participant to build and maintain their own orderbook. Using packet data for simulation would be the most optimal as you will know exactly where you are in line.

When you only have tick data, the only way to conduct these type of simulations would be to make assumptions. Here is a simple example. When you place a limit buy on the bid you are going to be last in line. You keep track of two variables, qty_in_front and qty_behind. Additions are straight-forward. Just add them to qty_behind. Cancels are a little more tricky because you don’t know whether its coming from people in front of you or people behind. A work around is to have something I call a reduce ratio. Its can take a value between 0 and 1 and it controls the percentage that is cancelling in front of you. For example, in ES simulations, I would set this to around 0.1  ie when there is a total of 100 qty cancells, I’d assume 10 happens in front of me and 90 happens behind me. There are edge cases but I’ll leave the reader to figure it out themselves. This is just a way, not the only way, of going about simulating a fifo queue. More complicated ways include dynamically adjusting the reducing ratio as you approach the front of the queue.

Constant Maturity Data

I’ve been asked multiple times why/when I use constantly maturity data for research and modelling. I thought I’d cover it here on my blog since its been a while. I hope to post more in the coming months/future as it has been a good way for me to organize my thoughts and share what I’ve been working on.

Constant maturity (CM) data is a way of stitching together non-continuous time series just like the back adjusted method. It is used heavily in derivative modelling due to the short-term time span a derivative (options, futures, etc) is listed/traded.

What is it and how is it used?

The CM methodology is essentially holding time constant. Various derivative contracts behave differently as time approach expiration so researchers developed this method to account for that and study the statistical properties through time.

I’ll provide a couple of usages.

In options trading, we know that time is one of the major factors that affect the price of an option as it approaches expiry. Options that expire further out in time are more expensive than options that expire closer to today. The reason for this is due to the implied volatility (IV). Researchers who want to study IV across time but not take the expiration affect in to account needs to hold time constant. For example, the study of how IV changes as a stock option approach earning announcements.

In futures, the CM methodology can be used to model the covariance matrices for risk analysis. For example, if you are trading futures under the same root (Crude) across various expirations, this method has shown to be rather useful is managing portfolio level risk.

For cash, the standout examples are the recent proliferation of the volatility ETPs. Most of these products are structured in a way to maintain a constant exposure to a given DTE. They will buy/sell calendar spread futures daily to rebalance their existing position.

How do you calculate it?

I’ve come across multiple ways of doing this. I will show you the most basic way and readers can test out which suit them best. The method I’ve used in the past is a simple linear interpolation given points. So assuming you are calculating IV for 30 days but you only have IV for a 20 and 40 DTE ATM option the equation is:

cm.pt = ( (target.dte – dte.front) * price_1 + (dte.back – target.dte) * price_2 ) / (dte.back – dte.front)

Here target DTE is the expiration you want to calculate. DTE.front should be < DTE.back as the front signifies it expires before the back. This is not the only way; there are other ways just like non-linear interpolation, etc. Carol Alexanders books provide more examples and much better explanations than I ever can!

Hope this helps!

Mike

Energy Stat Arb

Back to my roots. Haven’t tested outright entry exit trading systems for a while now since the Mechanica and Tblox days but I aim to post more about these in the future.

I’ve been looking and reading about market neutral strategies lately to expand my knowledge. Long only strategies are great but sometimes constant outright directional exposure may leave your portfolio unprotected to the downside when all assets are moving in the same direction. A good reminder would be the May of last year when gold took a nose dive.

Below are some tests I conducted on trading related energy pairs. Note that I haven’t done any elaborate testing for whether the spread is mean reverting,etc. I just went with my instincts. No transaction costs. Spread construction based on stochastic differential, 10 day lookback, +-2/0 std normalized z score entry/exit, and delay 1 bar execution.

Crude Oil and Natural Gas Futures (Daily) (Daily don’t seem to work that well no more): OIL and UNG ETF (1 Min Bar) XLE and OIL ETF (1 Min Bar) Pair trading is the simplest form of statistical arbitrage but what gets interesting is when you start dealing with a basket of assets. For example, XLE tracks both Crude Oil and Natural Gas companies, therefore a potential 3 legged trade would be to trade XLE against both OIL and UNG. Another well-known trade would be to derive value for the SPY against TLT (rates), HYG (corp spreads), and VXX (Vol).

The intuition behind relative value strategies is to derive a fair value of an asset “relative” to another. In basic pair trading, we are using one leg to derive the value of another, or vice versa. Any deviations are considered opportunities for arbitrage. In the case for multi legged portfolio, a set of assets are combined in some way (optimization, factor analysis, PCA) to measure the value. See (Avellaneda) for details.

While the equity lines above look nice, please remember that they don’t account for transaction costs and are modelled purely on adjusted last trade price. A more realistic simulation would be to test the sensitivity of entry and order fills given level 1 bid-ask spreads. For that, a more structured backtesting framework should be employed.

(Special thanks to QF for tremendous insight)

Mike

Random Subspace Optimization: Max Sharpe

I was reading David’s post on the idea of Random Subspace Optimization and thought I’d provide some code to contribute to the discussion. I’ve always loved ensemble methods since combining multiple streams of estimates makes more robust estimation outcomes.

In this post, I will show how RSO overlay performs using max sharpe framework. To make things more comparable, I will employ the same assets as David for the backtest. One additional universe I would like to incorporate is the current day S&P 100 (survivorship bias).

Random subspace method is a generalization of the random forest algorithm. Instead of generating random decision trees, the method can employ any desired classifiers. Applied to portfolio management, given N different asset classes and return streams, we will randomly select k assets s times. Given s different random asset combinations, we can perform a user defined sizing algorithm for each of them. The last step is to combined them though averaging to get the final weights. In R, the problem can be easily formulated via lapply or for loops as the base iterative procedure. For random integers, the function sample will be employed. Note my RSO function employs functions inside Systematic Investors Toolbox.


rso.optimization&amp;amp;lt;-function(ia,k,s,list.param){
size.fn = match.fun(list.param$weight.function) if(k &amp;gt; ia$n) stop(&amp;quot;K is greater than number of assets.&amp;quot;)
space = seq(1:ia$n) index.samples =t(replicate(s,sample(space,size=k))) weight.holder = matrix(NA,nrow = s , ncol = ia$n)
colnames(weight.holder) = ia$symbol.names hist = coredata(ia$hist.returns)
constraints = new.constraints(k, lb = 0, ub = 1)
constraints = add.constraints(diag(k), type='&amp;amp;;=', b=0, constraints)
constraints = add.constraints(diag(k), type='&amp;amp;lt;=', b=1, constraints)

#SUM x.i = 1
constraints = add.constraints(rep(1, k), 1, type = '=', constraints)

for(i in 1:s){
ia.temp = create.historical.ia(hist[,index.samples[i,]],252)
weight.holder[i,index.samples[i,]] = size.fn(ia.temp,constraints)
}
final.weight = colMeans(weight.holder,na.rm=T)

return(final.weight)
}



The above function will take in a ia object, short for input assumption. It calculates all the necessary statistics for most sizing algorithms. Also, I’ve opted to focus on long only.

The following are the results for 8 asset class. All backtest hereafter will keep s equal to 100 while varying k from 2 to N-1, where N equals the total number of assets. The base comparison will be that of simple max sharpe and equal weight portfolio. The following is for 9 sector asset classes. Last but not least is the performance for current day S&P 100 stocks. The RSO method seems to improve all the universes that I’ve thrown at it. For a pure stock universe, it is able to reduce volatility by more than 300-700 basis points depending on your selection of k. In a series of tests across different universes, I have found that the biggest improvements from RSO comes from applying it to a universe of instruments that belong to the same asset class. Also, I’ve found that for a highly similar universe (stocks), a lower k is better than a higher k. One explanation: since the max sharpe portfolio of X identical assets is equal to that of an equal weight portfolio, we can postulate that when the asset universe is highly similar or approaching equivalence, resampling with a lower k Y times where Y approaches infinity, we are in a sense approaching the limit of a equally weighted portfolio. This is in line with the idea behind curse of dimensionality: for better estimates,  the data required grows exponentially when the number of assets increase.  In this case, with limited data, a simple equal weight portfolio will do better which conforms to a better performance for lower k.

For a well specified universe of assets, RSO with a higher k yields better results than lower k. This is most likely caused by the fact that simple random sampling of such universe with a small k will yield samples that contain highly mis-specified universe. This problem is magnified when the number of diversifying assets like bonds are significantly out-numbered by other assets like equities as the probability of sampling an asset with diversification benefits are far lower than sampling an asset without such benefits. Another word, with a lower k, one will most likely end up with a portfolio that contain a lot of risky assets relative to lower risk assets.

Possible future direction would be to figure out some ways of having to specify the k and s in a RSO. For example, randomly selecting k OR selecting a k such that it targets a certain risk/return OR maximize an user defined performance metric.

Mike

Engineering Risks and Returns

In this post, I want to present a framework for formulating portfolio with targeted risk or return. The basic idea was inspired by controlling risk from a different point of view. The traditional way of controlling for portfolio risk was to apply a given set of weights to historical data to calculate historical risk. If estimated portfolio risk exceeds a threshold, we peel off allocation percentages for each asset. In this framework, I focus on constructing portfolios that target a given risk or return on a efficient risk return frontier.

First lets get some data to so we can visualize traditional portfolio optimization’s risk return characteristics. I will be using a 8 asset ETF universe.

rm(list=ls())
setInternet2(TRUE)
con = gzcon(url('http://www.systematicportfolio.com/sit.gz', 'rb'))
source(con)
close(con)
tickers = spl('EEM,EFA,GLD,IWM,IYR,QQQ,SPY,TLT')
data <- new.env()
getSymbols(tickers, src = 'yahoo', from = '1980-01-01', env = data, auto.assign = T)
bt.prep(data, align='keep.all', dates='2000:12::')


Here are the return streams we are working with The optimization algorithms I will employ are the following:

• Minimum Variance Portfolio
• Risk Parity Portfolio
• Equal Risk Contribution Portfolio
• Maximum Diversification Portfolio
• Max Sharpe Portfolio

To construct the risk return plane, I will put together the necessary input assumptions (correlation, return, covariance, etc). This can be done with create.historical.ia function in the SIT tool box.

#input Assumptions
prices = data$prices n=ncol(prices) ret = prices/mlag(prices)-1 ia = create.historical.ia(ret,252) # 0 <= x.i <= 1 constraints = new.constraints(n, lb = 0, ub = 1) constraints = add.constraints(diag(n), type='>=', b=0, constraints) constraints = add.constraints(diag(n), type='<=', b=1, constraints) # SUM x.i = 1 constraints = add.constraints(rep(1, n), 1, type = '=', constraints)  With the above we can go ahead and input both ‘ia’ and ‘constraints’ in to the above optimization algorithms to get weights. With the weights, we can derive the portfolio risk and portfolio return. These then can be plotted on a risk return plain visually. # create efficient frontier ef = portopt(ia, constraints, 50, 'Efficient Frontier') plot.ef(ia, list(ef), transition.map=F) The risk return plain in the above image shows all the possible space to which a portfolio’s risk and return characteristic can reside. Anything that is beyond to the left side of the frontier do not exist (unless leverage, to which the EF will also shift leftward too). Since I am more of a visual guy, I tend to construct this risk return plain whenever I am working on new allocation algorithms. This allows me to compare with other portfolio the expected risk and return. As you can see, each portfolio algorithm has their own set of characteristics. Note that these characteristics fluctuate across the frontier were we to frame this rolling through time. A logical extension to these risk return concepts is to construct a portfolio that aims to target ether a given risk or a given return on the frontier. To formulate this problem in SIT for the return component, simply modify the constraints as follows: constraints = add.constraints(ia$expected.return,type='>=', b=target.return, constraints)


Note that the target.return variable is simply a variable storing the desired target return. After adding the constraint, simply run a minimum variance portfolio and you will get a target return portfolio. On the other hand, targeting risk is a bit more complicated. If you look at the efficient frontier, you will find that for a given level of risk there is two portfolios that line on it.  (The sub-optimal portion of the efficient frontier is hidden). I solved for the weights using a multi optimization framework which employed both linear and quadratic (dual) optimization.

target.risk.obj<-function(ia,constraints,target.risk){

max.w = max.return.portfolio(ia, constraints)
min.w = min.var.portfolio(ia, constraints)
max.r = sum(max.w * ia$expected.return) min.r = sum(min.w * ia$expected.return)
max.risk = portfolio.risk(max.w,ia)
min.risk = portfolio.risk(min.w,ia)

# If target risk exists as an efficient portfolio else
# return weights of 0
if(target.risk >= min.risk | target.risk <= max.risk){
out <-optimize(f =target.return.risk.helper,
interval = c(0,max.r),
target.risk = target.risk,
ia = ia,
constraints = constraints)$minimum weight=target.return.portfolio(out)(ia,constraints) }else{ weight=rep(0,ia$n)
}

return(weight)
}



Below is a simple backtest that takes the above assets and optimizes for the target return or target risk component. Each will run with a target of 8%. Now the model itself requires us to specify a return or risk component. What if instead we make that a dynamic component such that we extract ether the risk or return component of a alternative sizing algorithm. Below are the performance of the dynamic risk or return component extracted from naive risk parity. Not surprisingly, whenever we target risk, the strategy tends to become more risky. This confirms confirms risk based allocations are superior if investors are aiming to achieve low long term volatility.

Mike

Max Decorrelation Portfolio

Its been almost almost two months since I posted. Finishing the school year off with exams and moving twice forced me to put the blog on hold. I hope to post more in the future!

Today I humbly attempt to formulate in R the maximum decorrelation algorithm in constructing portfolios. This method was formulated by Peter Christoffersen et al.  (a fellow Canadian at Rotman School of Management) and presented by EDHEC in a paper called: “Scientific Beta Maximum Decorrelation Indices“. For those interested in asset allocation and risk management, EDHEC has a treasure trove of papers and research.

In traditional mean variance optimization, we are minimizing the portfolio risk given estimations of the covariance matrix. More specifically, we need to estimate both volatility and correlation which are used to construct the covariance. The objective function to minimize is:

The problem with portfolio optimization models is that we are making forecasts about future covariance structures. As it is unlikely they will hold in the future, what may be optimal today may not be optimal in the next period. This is what most practitioners term as “estimation error”.  Over the years, there has been different ways to overcome this. Methods ranging from covariance shrinkage to re-sampled efficient frontiers are most widely known. Some have instead scrapped the entire optimization process and focused on simple heuristics algorithms in estimating optimal portfolio weights.

The Maximum Decorrelation portfolio attempts to reduce the number of inputs and use solely the correlation matrix as its main input assumption. Instead of focusing on volatility, the strategy assumes that individual asset volatility are identical. The object function to maximize is therefore:

The idea is that there is less stuff to estimate which should mean estimation error should be lower.

In R, the objective function becomes:

max.decorr<-function(weight, correl){
weight <- weight / sum(weight)
obj<-1- (t(weight) %*% correl %*% weight)
return(-obj)
}


I am using R’s optim function. This is my first time formulating the objective function from scratch. While I am 90% sure I am correct, I am but a student and am all ears if there are any mistakes and errors (or more efficient way of implementing it). Please leave comments below : ).

I took the algorithm for a test drive and below are the results for the standard 10 asset class.

For benchmark purposes, I have used minimum variance and equal weight portfolios. The Max Decor strategy earned higher returns but with higher volatility, hence the lower sharpe compared to Min Var.

Code can be found here: Dropbox

Mike

Equity Bond Exposure Management

I did a post last October (here) looking at varying allocation between stocks/bonds and at the end I hinted towards a tactical overly between the two asset classes. Six months later, I finally found a decent overlay I feel may hold value.

In a paper called “Principal Components as a Measure of Systemic Risk” (SSRN), Kritzman Et al. presented a method for identifying “fragile” market states. To do this, he constructed the Absorption Ratio. Here is the equation:

The numerator sigma represents the variance of the ith eigenvector, while the denominator one equals the variance of the jth asset. In the paper, n = 1/5 the total number of assets (N). The interpretation is simple, the higher the ratio, the more “fragile” the market state. The intuition behind this ratio is that when its high, it implies that risk is very concentrated. On the other hand, when it is low, risk is dispersed and spread out. Think weak and strong. Following is the raw AR through time of the DJ 30 Components. As you can see, the ratio spikes during the tech bubble and the recent financial crisis. How would it look like when used as a filter? Below are two pictures comparing the signals generated by 200 day sma and standardized AR.

Pretty good at the timing in my opinion. In line with the paper, I reconstructed the strategy that switches between stocks(DIA) and bonds (VBMFX). When the AR is between 1 and -1, we will split 50/50. When its above 1, we are in love with bonds and when its below -1, we are  in love with stocks. Simple. Results:

And here is the code: (I know its messy, didn’t have a lot of time! :)

Note: There is survivorship bias. I used the current day DJ30.