Some mathematical constructs appear so repeatedly across trading strategies that they evolve into mental models for quantitative trading. Each of these constructs deserve their own post, given their fundamental role in algorithmic trading.

Residuals are one of these wonderful constructs.

Many market-neutral strategies are derived from residuals, particularly the arbitrage family: volatility arbitrage, dispersion arbitrage, index arbitrage, basket arbitrage, correlation arbitrage, etc. Residuals also open for trading an infinite-sized universe of investable assets, rather than just the instruments for which quotes are available from Yahoo or Bloomberg.

Conceptually, a residual is deceptively simple:

Residual is the difference between a sample and its corresponding estimated function value.

For example, the Mean Reversion post estimated the following linear relationship between SPY and PRF during the 2007-2008 period:

`PRF = -7.356 + (0.455 * SPY)`

This function generates an estimated value of PRF, given a value for SPY. Translating this into trading: buy one share of PRF, short 0.455 shares of SPY, and pay \$7.356 in cash. This combination forms a basket (which is a generalization of the classic pair), which is a synthetic instrument. This synthetic can be bought or sold at an exchange at any time, just like any traditional asset.

Given this synthetic, first question is how is it priced? You guessed it: price of the synthetic is equal to the residual! Specifically, equal to difference between the actual price of PRF (sample) versus the value estimated by the equation. Specifically, the price of the synthetic is as follows, rearranging the function to solve for the residual term (ε):

`ε = PRF - (0.455 * SPY) + 7.356`

This innocent-looking equation belies wonder, and is worth briefly dwelling upon: a residual prices a synthetic linear combination of assets (SPY, PRF, and cash in this example). Similarly, the profit to hold this pair between yesterday and today is equal to subtracting value of the residual today from yesterday’s value. More generally, profit of holding the synthetic can be calculated between any two times t and t+1:

profit = εt+1 – εt

This simple example hints at a very deep generalization, owing to basic algebra: an infinite number of linear combinations exist for two or more assets, let alone the investable universe of assets (worldwide equities, commodities, fx, and corresponding derivatives). If there are an infinite number of linear combinations, then there are an infinite number of residuals. If infinite residuals, then there are an infinite number of synthetics.

In other words, residuals open the door for investment into an infinite number of potential assets, rather than the mere thousands in the traditional investable universe. Pretty amazing for the little greek letter epsilon which statistics ironically refers to as the “error”.

Synthetics are just one wonderful fruit of residuals. Subsequent Wonder posts will cover non-OLS residuals (e.g. generalized regression, state space models, and wavelets) and techniques for using residuals to choose trades amongst this infinite-sized universe of investments and evaluate their stability via techniques such as variance analysis and structural change (e.g. variance ratio test, ACF, Chow / Quandt tests, and iterative coefficient estimates).

1. November 19, 2009 11:41 am

Quantivity,

Good work… The thing thou … Infinite number of trading instruments does not make trading easier… It actually makes decision making process harder… I prefer simplicity and mastering trading “portfolio of algos” against one instrument rather than portfolio of instruments… Less randomness … I.e you are trying to crack one nut with whole toolbox…

Inspyring post for sure

2. August 10, 2011 7:20 pm

Great post! I read this post awhile ago and have been thinking about it on and off over the past few months. I’m not sure whether you actually trade residuals but if you do have experience in this area I would greatly appreciate if you could comment on how the beta estimates hold up out of sample. Also how often do you perform a *re-estimation* of the betas?

I have conceptually thought about this for awhile and tried to tinker with baskets in the past but I found that the relationship (cointegration) fell apart out of sample. I thought maybe the answer was to re-estimate at the end of each bar (whatever time frequency you are using – I was using daily bars) but that did not work as the value of the coefficients changed often and sometimes experienced trending like behavior. For some reason I am convinced that there is some edge here but so far I have been unsuccessful in finding it. If you can provide some guidance I would greatly appreciate that. You can email me or reply via the blog. Thank you.

3. 