Data

Queue Position Simulation

First off, Happy Thanksgiving! If time permits in the coming months I’d like to explore more on how I look at High Frequency (HF) data. Hopefully along the way I can spark some new discussion and improve on my thought process.

HFT strategy “simulation” is no easy task. I am referring to this as an simulation because its purely an approximation of how a strategy would have performed given a set of execution assumptions the researcher made beforehand. Should the assumptions change, the results would also change (significantly).

In my line of work, the edge we are seeking are generally less than a tick (futures). To make this even worth while, the constraints are that costs must be low AND we need to trade a lot. This may sound foreign to most of my readers as their time frames are generally much longer (days, weeks, even months). But at the end of the day, how much money we make is a simple function of our alpha * number of times we trade.

In HFT, execution is king. You can be right where the market moves the next tick but if you can’t get a fill, you are not making any money. Therefore it is paramount that when we conduct HF simulations, we make accurate execution assumptions.

Queue position, this is something that is worth a lot. Being first in line and getting a fill is like owning a call option in my world (where the premium is exchange fees per contract). The worst that can happen is you scratch assuming you are not the slowest one and there are people behind you. The image below is an analysis done on the expected edge you’d get N-events out (x-axis) assuming you are in various spots within the fifo queue. (QP_0 = first in line, QP_0.1 = 10th in line if there was 100 qty). As you can see, the further behind in line you are, the more you are going to be exposed to toxic flow, fancy word for informed traders.

 

How does one take this in to account when you simulate a strategy? When you place a limit order on the bid, how do you know when you will be filled? This depends on 2 factors, your place in line and trade flow. As time progresses there will be people who add orders to the fifo queue, people who cancel orders and people who take liquidity (trade). These actions are something one needs to keep track of tick by tick (or packet by packet) during a simulation. While most people assume tick data is the most fine grain dataset one can have in performing such simulations there actually exists packet data. Tick data simply gives you an aggregated snapshot of what an orderbook looks like – best bid, best offer, bid qty, ask qty (this is known as Market by price). Packet data on the other hand contains all the actions taking by all the market participants. This includes, trade matches and order submissions. This feed is also know as Market by order and its up to the market participant to build and maintain their own orderbook. Using packet data for simulation would be the most optimal as you will know exactly where you are in line.

When you only have tick data, the only way to conduct these type of simulations would be to make assumptions. Here is a simple example. When you place a limit buy on the bid you are going to be last in line. You keep track of two variables, qty_in_front and qty_behind. Additions are straight-forward. Just add them to qty_behind. Cancels are a little more tricky because you don’t know whether its coming from people in front of you or people behind. A work around is to have something I call a reduce ratio. Its can take a value between 0 and 1 and it controls the percentage that is cancelling in front of you. For example, in ES simulations, I would set this to around 0.1  ie when there is a total of 100 qty cancells, I’d assume 10 happens in front of me and 90 happens behind me. There are edge cases but I’ll leave the reader to figure it out themselves. This is just a way, not the only way, of going about simulating a fifo queue. More complicated ways include dynamically adjusting the reducing ratio as you approach the front of the queue.

How do you guys go about this? I’d love to hear.

 

Advertisements

NDX-100 Constituents

Couple of months ago I mentioned the importance of survivor-ship bias when testing strategies that trade on equities. I thought it will help traders to achieve better testing results if they could build them themselves. I got from my university a list of NDX100 constituents all the way back to 1995. Those of you interested can email me if you want a copy of it. I would of upload it but I cant with my current wordpress account.

email: michaelguan326@gmail.com

 

Survivorship Bias

For the past month and a half, I have been in the process of constructing a survivorship free database for the NDX 100. I am currently about 90% done but I have to say this is no easy task.

According to wikipedia:

“Survivorship bias is the logical error of concentrating on the people or things that “survived” some process and inadvertently overlooking those that didn’t because of their lack of visibility. This can lead to false conclusions in several different ways. The survivors may literally be people, as in a medical study, or could be companies or research subjects or applicants for a job, or anything that must make it past some selection process to be considered further.”

Coming from testing futures trading systems, I initially failed to grasp on the need to test trading systems survivorship bias free database. But from numerous sources and evidence, I found that testing systems on biased data over state results significantly. I suggest that the trader/ researcher give a good thorough thought on this topic and check if they are employing survivorship free database.

For those who don’t want to create their own, Frank Hasslers’s blog, engineering returns offer two database for some cash.

Good trading,
SE

Charting

CSI offers end of the day (EOD) data at a cost. Ninja trader paired with Kinetick is free. Now I have access to end of the day data plus charting for literally all futures markets at the tips of my finger tips for free 😀