# Natural Language Processing

I’ve recently updated some matlab code about machine learning on to github. Today I’ll be adding to that with some Natural Language Processing (NLP) Code. The main concepts we covered in class were ngram modelling which is a markovian process. This means that future states or values have a conditional dependence on the past values. In NLP this concept is utilized via training n gram probabilities models on given texts. For example, if we specify N to equal to 3, then each word in a given sentence depends on the last two words.

So the equation for conditional probability is given by: Extending this to multiple sequential events, this is generalized to be (chain rule) This above equation is very useful for modelling sequential stuff like sentences. Extension to these concepts to finance are utilized heavily in hidden Markov models that attempts to model states in various markets.  I hope the interested reader comment below for other interesting applications.

The last topic we are covering is class is computer vision. As of now, topics like image noise reduction via Gaussian filtering, edge detection, segmentation are being covered. I will post more about them in the future.

Cheers,

Mike

# Artificial Intelligence

Artificial intelligence surrounds much of our lives. The aim of this branch of computer science is to build intelligent machines that is able to operate as individuals; much like humans. I am sure most of us have watched the Terminator movie series and questioned to what extent will our own society converge to that in the movie. While that may sound preposterous, much of what automated system developers do revolves around building adaptive systems that react to changes in markets. Inspired by a course I am taking at school right now, I would like to use this post as a general fundamental intro to AI.

If you ask people what intelligence is, most will initially find it hard to wrap words around the idea. We just know what it is and our gut tells us we, humans, are the pinnacle of what defines intelligence. But fact is, intelligence encompasses so much. According to the first sentence in wikipedia, there are ten different ways to define it.

“Intelligence has been defined in many different ways including logic, abstract thought, understanding, self-awareness, communication, learning, having emotional knowledge, retaining, planning, and problem solving.” -Wikipedia

Since it encompasses so much it is not easy to define it in a single sentence. What can be said is that intelligence relates to one’s ability to problem solve, reason, perceive relationships, and learn.

Now that I’ve offered a sense of what intelligence means, what, on the other hand, is artificial intelligence? Artificial intelligence is the field of designing machines that is capable of intelligent behavior; machines that is able to reason; machines that is able to learn. More precisely, the definition of AI can be organized in to four different sections:

• Thinking Humanly
• Thinking Rationally
• Acting Humanly
• Acting Rationally

The first two relates to thoughts processes while the last two relates to behavior. Thinking humanly revolves around whether the entity in question is able think and have a mind of its own. This is essentially making decision, learning and  problem solving. Acting humanly is whether a machine is able to process and communicate language, store information or knowledge and act based on what it knows and learn to adapt based on new information. These set of required traits are formulated based on the famous Turing Test which examines if a machine is able to act like a human through answering questions asked by anther human being. The machine passes the test if the person asking the question isn’t able to determine if its a machine or human. Thinking rationally closely incorporates the study of logic and logical reasoning. It was first introduced by Aristotle who attempted to provide a systematic way of inferring a proposition based on a given premise. An famous example would be “Socrates is a man; all man are mortal; therefore, Socrates is mortal.” Lastly, acting rationally is the idea of choosing the most suitable behavior that produces the best expected outcome. Another word one is rational if given all its knowledge and experience, selects the action that maximizes their own performance measure/utility.

Agents

When studying AI, the term agent is used to represent an entity/model that interacts with the environment. More precisely, an agent perceives the environment through its sensors and employ actions through actuators. Comparing this to humans, imagine sensors as eyes/ears and actuators as arms and legs. The sensors will at each time step take inputs, called percepts, which are than processed by the agent program. The agent program then passes the inputs in to an agent function. The agent function maps inputs to correct outputs (actions) which are then sent via the agent program to the actuators. This agent based framework closely relates to automated trading systems. The environment is the market and the changing prices at each time interval. The agent program would be our trading system which takes in daily price information and pipes it into the agent function, or the logic of the trading system. For example, todays new price is updated which is passed in to the trading logic. The logic specifies that if the current price is \$10, it will sell. The sell action is passed back to the environment as a sell order.

The above example is a very basic type of agent known as the simple reflex agent. This type of agent only makes decision solely on the current percept (price). It doesn’t have a memory of the previous states. A more complex agent known as the model based reflex agent is one in which it has memory of the past, known as its own percept sequence. Also this agent has an internal understanding of how the environment works which is detailed in its own model. This model of the world takes inputs and identify the state it is in. Given the state, the model forecasts what the likely environment will be like in the next time step. Proper action is then recommended and executed via actuators. (Think of markov models)  So far, the agents I’ve introduced largely reflect that of a function that take input and spits out a output. To make things more humanly, the next agent I will introduce is called a goal based agent. This is similar to how given our current circumstances, we aim to maximize out objective function. The objective function can be money or anything that makes us happy. More concretely, the goal based agent is an extension of the model based reflex agent but it assigns a score for each recommended action. The agent will choose the one that maximizes its own objective function.

The reader will most likely ask how this knowledge helps them make money in the markets. What I can say is that finance is enter a brave new world where together with technology is transforming how money is being made in the markets. Having a understanding in finance and statistics in my opinion is not enough. Those are the areas where your competitors are already fishing (mostly). Knowledge in subject areas like AI, speech recognition, natural language processing, machine learning, and computer vision (just to name a few) will allow you to be more creative in design. I urge the curious minds to explore the unexplored!

# Beyond Pairs

Writing rather prolifically this past week. Last week was the end of my midterms (for now!) and continuation of job search and preparing for my final 5 weeks of university.(!) I hope my readers are finding my posts interesting and enlightening.

As mentioned in an earlier post on statistical arbitrage, the interesting aspect of it comes when we consider multi leg portfolios. To construct a multi-leg portfolio, the traditional way to do it would be to employ a multivariate linear regression (factor model). The intuition behind this is that we are trying to estimate a fair value for an asset using various predictors or independent variables. For example, we know that the S&P 500 is composed of stocks from various sectors. Therefore, an intuitive way is to derive a fair value for S&P 500 using the 9 different sector Spdrs by the following equation: The residual return that is left over, “alpha”, is considered to be neutral (uncorrelated) against the industry sectors. With this framework, we now can essentially make ourselves neutral to any factors we want. For example, we have access to a wide variety of ETFs that mimic underlying asset class movements. If we want to be neutral to interest rates, credit risk, and volatility, we can employ ETFs: TLT, HYG, and VXX respectively. Below is a chart demonstrating this, showing the estimated fair value of SPY relative to the actual ETF: Below is the spread that can be traded via long short on each leg: The concept of being able to control the factors we are exposed to is very appealing as it allows us to potential shy away from turbulent events that transpire from specific assets. Not only that, these uncorrelated return streams when combined in to a portfolio allows significant risk reduction. As Dalio said, the ability to combine 15 uncorrelated return streams allows us to effectively reduce 80% of risk. (Chart below) Interestingly, from my understanding of what Bridgewater does, I am pretty confident they are employing spread trading too, but purely from a fundamental way. For example, how does a set of asset classes react to the movements of economic indicators? From there they construct synthetic spreads to trade off of these relationships. Below is the code that generated the data for this post:

```
y = data\$prices[,y.symbol]
x = data\$prices[,x.symbol]
lm.holder<-list()
fv = NA * data\$prices[,1]
colnames(fv) = c('FairValue')

for( i in (lookback+1):nrow(data\$prices) ){
cat(i,'\n')
hist.y = y[(i-lookback):i, y.symbol]
hist.x = x[(i-lookback):i, x.symbol]
lm.r = lm(hist.y ~ hist.x)
lm.holder[[i]] = lm.r
fv[i,] = lm.r\$coefficients + sum(lm.r\$coefficients[-1] * x[i,])
}
mat = merge( x,y,fv )
return( list( mat = mat, fv = fv, reg.model = lm.holder ) )
}
```

Also here are some links I’ve found to be very informative.

The paper on high frequency statistical arbitrage is rather a relevant one as it relates to my previous blog posts on energy related pairs trading. Essentially, the author goes on to construct a meta algorithm for ranking pairs to trade. This meta-algorithm is composed of correlation coefficient, minimum square distance of normalized price series, and a co-integration test value. I don’t have intraday (paper used 15 min bars) equities data nor do I have the infrastructure to test it but the idea resonates with me from my research in top N momentum systems. A lot of ways to improve.

# Energy Stat Arb Part 2

In my previous post, I rushed through a lot of technical details on how I implemented the strategy. For that I apologize! I am here to make up by providing more on how I approached it and hopefully make my analysis more understandable.

In this post, I want to re-visit energy pairs (XLE vs OIL) trading but with the traditional spread construction approach through regression analysis. My data comes from QuantQuote, all adjusted for dividends and splits. To read in the data, I used the following code:

```
from matplotlib.pylab import *
import pandas as pd
import numpy as np
import datetime as dt
xle.columns = ['Timestamp', 'open', 'high', 'low', 'close', 'volume', 'Split Factor', 'Earnings', 'Dividends']
xle['Timestamp'] = xle['Timestamp'].apply(lambda x: dt.datetime.strptime(x, '%Y%m%d %H%M'))
xle = xle.set_index('Timestamp')
xle = xle[["open", "high", "low", "close"]]

```

For minute data, there are approximately 391 rows of data per day. Taking in to account OHLC, there are a total of 391 * 4 = 1564 data observations per day. Heres a image displaying May 9th, 2013: If you look in to the data, you may see a price for 9:45AM but the next data point comes in at 9:50AM. This means that there was a 5 minute gap where no shares were traded. To fix and align this, the following function will align the two data sets.

```def align_data(data_leg1,data_leg2,symbols):
combined_df = pd.concat([data_leg1,data_leg2],axis=1)
data_panel = pd.Panel({symbols: combined_df.ix[:,0:4], symbols:combined_df.ix[:,4:9]}) #dict of dataframes
return(data_panel)

```

To construct the spread, we will run a rolling regression on the prices to extract the hedge ratio. This is then piped in to the following equation: Given two series of prices, the following helper function will return a dictionary of the model and the spread. Following is the spread displayed.

```def construct_spread(priceY,priceX,ols_lookback):
data = pd.DataFrame({'x':priceX,'y':priceY})
model = {}
model['model_ols'] = pd.ols(y=data.y, x=data.x, window=ols_lookback,intercept=False)
return model
``` To normalize it, simply subtract a rolling mean and divide that by the rolling standard deviation. Image of normalized spread follows.

```zscore = lambda x: (x[-1] - x.mean()) / x.std(ddof=1)
sprd['zs'] = pd.rolling_apply(sprd, zs_window, zscore)  # zscore
``` Without changing our parameters, a +-2 std will be our trigger point. At this threshold, there is a total of 16 trades. Here is the performance if we took all the trades for the day, frictionless: Pretty ugly in my opinion but its only a day. Lets display all the daily equity performance distributed for the whole year of 2013. The flat line initially is for the 60 bar lookback window for each day, unrealistic but it does give a rough picture on the returns. The average final portfolio gain is 0.06 with std of 0.13. The performance is pretty stellar when you look at 2013 as a whole. Comparing this to the other spread construction in my last post, its seems to reduce the variance of returns when incorporating a longer lookback period. Coming up in the next instalment I want to investigate whether incorporating Garch models for volatility forecasting will help improve the performance of spread trading.

Mike

# Energy Stat Arb

Back to my roots. Haven’t tested outright entry exit trading systems for a while now since the Mechanica and Tblox days but I aim to post more about these in the future.

I’ve been looking and reading about market neutral strategies lately to expand my knowledge. Long only strategies are great but sometimes constant outright directional exposure may leave your portfolio unprotected to the downside when all assets are moving in the same direction. A good reminder would be the May of last year when gold took a nose dive.

Below are some tests I conducted on trading related energy pairs. Note that I haven’t done any elaborate testing for whether the spread is mean reverting,etc. I just went with my instincts. No transaction costs. Spread construction based on stochastic differential, 10 day lookback, +-2/0 std normalized z score entry/exit, and delay 1 bar execution.

Crude Oil and Natural Gas Futures (Daily) (Daily don’t seem to work that well no more): OIL and UNG ETF (1 Min Bar) XLE and OIL ETF (1 Min Bar) Pair trading is the simplest form of statistical arbitrage but what gets interesting is when you start dealing with a basket of assets. For example, XLE tracks both Crude Oil and Natural Gas companies, therefore a potential 3 legged trade would be to trade XLE against both OIL and UNG. Another well-known trade would be to derive value for the SPY against TLT (rates), HYG (corp spreads), and VXX (Vol).

The intuition behind relative value strategies is to derive a fair value of an asset “relative” to another. In basic pair trading, we are using one leg to derive the value of another, or vice versa. Any deviations are considered opportunities for arbitrage. In the case for multi legged portfolio, a set of assets are combined in some way (optimization, factor analysis, PCA) to measure the value. See (Avellaneda) for details.

While the equity lines above look nice, please remember that they don’t account for transaction costs and are modelled purely on adjusted last trade price. A more realistic simulation would be to test the sensitivity of entry and order fills given level 1 bid-ask spreads. For that, a more structured backtesting framework should be employed.

(Special thanks to QF for tremendous insight)

Mike

# Random Subspace Optimization: Max Sharpe

I was reading David’s post on the idea of Random Subspace Optimization and thought I’d provide some code to contribute to the discussion. I’ve always loved ensemble methods since combining multiple streams of estimates makes more robust estimation outcomes.

In this post, I will show how RSO overlay performs using max sharpe framework. To make things more comparable, I will employ the same assets as David for the backtest. One additional universe I would like to incorporate is the current day S&P 100 (survivorship bias).

Random subspace method is a generalization of the random forest algorithm. Instead of generating random decision trees, the method can employ any desired classifiers. Applied to portfolio management, given N different asset classes and return streams, we will randomly select `k` assets `s` times. Given `s` different random asset combinations, we can perform a user defined sizing algorithm for each of them. The last step is to combined them though averaging to get the final weights. In R, the problem can be easily formulated via `lapply` or `for` loops as the base iterative procedure. For random integers, the function `sample` will be employed. Note my RSO function employs functions inside Systematic Investors Toolbox.

```
rso.optimization&amp;amp;lt;-function(ia,k,s,list.param){
size.fn = match.fun(list.param\$weight.function)
if(k &amp;gt; ia\$n) stop(&amp;quot;K is greater than number of assets.&amp;quot;)
space = seq(1:ia\$n)
index.samples =t(replicate(s,sample(space,size=k)))
weight.holder = matrix(NA,nrow = s , ncol = ia\$n)
colnames(weight.holder) = ia\$symbol.names

hist = coredata(ia\$hist.returns)
constraints = new.constraints(k, lb = 0, ub = 1)
constraints = add.constraints(diag(k), type='&amp;amp;;=', b=0, constraints)
constraints = add.constraints(diag(k), type='&amp;amp;lt;=', b=1, constraints)

#SUM x.i = 1
constraints = add.constraints(rep(1, k), 1, type = '=', constraints)

for(i in 1:s){
ia.temp = create.historical.ia(hist[,index.samples[i,]],252)
weight.holder[i,index.samples[i,]] = size.fn(ia.temp,constraints)
}
final.weight = colMeans(weight.holder,na.rm=T)

return(final.weight)
}

```

The above function will take in a `ia` object, short for input assumption. It calculates all the necessary statistics for most sizing algorithms. Also, I’ve opted to focus on long only.

The following are the results for 8 asset class. All backtest hereafter will keep `s` equal to 100 while varying `k` from 2 to N-1, where N equals the total number of assets. The base comparison will be that of simple max sharpe and equal weight portfolio. The following is for 9 sector asset classes. Last but not least is the performance for current day S&P 100 stocks. The RSO method seems to improve all the universes that I’ve thrown at it. For a pure stock universe, it is able to reduce volatility by more than 300-700 basis points depending on your selection of k. In a series of tests across different universes, I have found that the biggest improvements from RSO comes from applying it to a universe of instruments that belong to the same asset class. Also, I’ve found that for a highly similar universe (stocks), a lower `k` is better than a higher `k`. One explanation: since the max sharpe portfolio of X identical assets is equal to that of an equal weight portfolio, we can postulate that when the asset universe is highly similar or approaching equivalence, resampling with a lower `k` Y times where Y approaches infinity, we are in a sense approaching the limit of a equally weighted portfolio. This is in line with the idea behind curse of dimensionality: for better estimates,  the data required grows exponentially when the number of assets increase.  In this case, with limited data, a simple equal weight portfolio will do better which conforms to a better performance for lower `k`.

For a well specified universe of assets, RSO with a higher `k` yields better results than lower `k`. This is most likely caused by the fact that simple random sampling of such universe with a small `k` will yield samples that contain highly mis-specified universe. This problem is magnified when the number of diversifying assets like bonds are significantly out-numbered by other assets like equities as the probability of sampling an asset with diversification benefits are far lower than sampling an asset without such benefits. Another word, with a lower `k`, one will most likely end up with a portfolio that contain a lot of risky assets relative to lower risk assets.

Possible future direction would be to figure out some ways of having to specify the `k` and `s` in a RSO. For example, randomly selecting `k` OR selecting a `k` such that it targets a certain risk/return OR maximize an user defined performance metric.