Unmasking a phenomenon $f$ into its constituent parts $\textbf{g}$, via functional decomposition $\phi$, is one of the great beauties of mathematics:

$f(\textbf{x}) = \phi(g_1(\textbf{x}), g_2(\textbf{x}), \dots, g_n(\textbf{x}))$

This technique finds surprisingly often use in quant models.

Ongoing analysis and trading based on proxy hedging, exemplified by series beginning with Proxy / Cross Hedging, suggests potential for an equity decomposition model based on the relationship between returns of a stock $r_t$ and its corresponding index $i_t$:

$r_t = s_t \left[ \alpha_t | z_t | + (1 - \alpha_t) \beta | i_t | \right] + \epsilon_t$

To explain this model, let’s build it up from intuition.

To begin, consider a trading observation: interday returns of individual stocks have a subtle relationship with their corresponding index. On some days, return for a given stock follows its index; other days, returns of stock and index diverge strongly. This distinction in behavior is commonly attributed to stock-specific “news”, interpreted broadly—whether known publicly or only privately.

This intuition can be formalized into two-state regime:

• Uninformed regime: stock return $r_t$ follows an index $i_t$, scaled by a proportional factor $\beta$
• Informed regime: stock return $r_t$ follows an idiosyncratic path $z_t$, conditionally independent of its index

Relationship between regimes can be modeled in two ways via $\alpha$. A switching model arises when regimes are binary: $\alpha \in \{ 0, 1 \}$. An ensemble model arises when regimes are smooth: $\alpha \in [ 0, 1 ]$. For the latter, $\alpha$ can be understood as proportional decomposition weighting of the respective return series, and thus can provide smooth mixing between the regimes. Finally, sign of returns are explicitly decomposed as $s_t \in \{ -1, 1 \}$, acknowledging greater regularity of absolute-valued return series.

Worth noting is the following are latent variables: idiosyncratic path $z_t$ from the informed regime, proportional factor $\beta$, and regime parameter $\alpha$. Obviously, challenge of this model lies in their estimation. One potential trick is to exploit triangular relationships, as described below.

One stylized fact not explicitly accommodated by this model is well-known asymmetry of uninformed regimes, arising from analysis of market breadth: stocks uniformly go down together (think big down days), but much less often uniformly go up together (majority of rallies). Unclear whether this fact naturally arises via $\alpha$ or needs to be explicitly modeled.

Readers familiar with machine learning (ML) may recognize how to reformulate this as an additive model:

$r(\textbf{x}) = \sum\limits_{i=1}^2 w_i f_i(\textbf{x})$

Where $\textbf{x} \equiv \{ z, i, \alpha, \beta \}$.

This model can be interpreted in numerous ML ways, depending on the desired objective. For example, $z$ and $i$ can be interpreted as basis functions. Alternatively, boosting can be applied by interpreting them as weak classifiers. Graphical models can be applied by introducing conditional dependence between $r$, $z$, and $i$. Hierarchical models and decision trees naturally arise when $z$ and $i$ are further functionally decomposed.

Given this model, an interesting question is how to use it predicatively—whether directional or not. For example, combining models for two stocks which share a common index to introduce the notion of equity triangle arbitrage on the joint $z$.

1. December 15, 2011 7:58 am

Interesting post. I’m familiar with ML and it’s not immediately obvious why you can’t estimate/approximate the series using a variety of methods that are empirically motivated. But I haven’t read enough of your work to have a good sense of your style yet.

December 15, 2011 10:58 am

@Stephen: thanks for comment; to clarify my note, my sense is estimation is interesting due to two considerations: (a) lack of parsimonious way to estimate model via a single algorithm and (b) potential differing time scales per variable. Instead, the model may be best estimated by multiple steps. For example, consider starting by estimating beta using weekly data; then, estimate alpha using daily data and then backout z (at any frequency). Of course, a key estimation question is deciding which latent variables to estimate versus which to backout.

Perhaps this is what you mean by “empirical motivated methods”?

• December 15, 2011 11:35 am

Yes, I’m jumping ahead. When you begin to use different distributions to estimate the parameters, ultra-high frequency data (UHFD) has extremely different properties, including variance and covariance, than daily, weekly, monthly, quarterly, etc. The implication I see is that the estimation process will probably not be grounded in the same event space, which will complicate the model to the point that dealing with it by selecting estimation methods through, for example, cross-validation may be more effective than trying to justify the selection of, say, a graphical model or boosting.

December 15, 2011 1:01 pm

@Stephen: indeed; one of the more interesting questions is preferred time scale on which to conduct estimation steps. My sense is intraday is sufficient for descriptive estimation, from which interday may be interesting for predictive estimation. Perhaps a topic worth a follow-up post.

I will ping you privately to discuss in more detail.

2. December 16, 2011 11:18 am

Sorry, but I’m not really familiar with all the terminology that you’re using. Can you explain what s_t is and what ML is?

December 16, 2011 3:23 pm

@s: s_t is the integral sign of the return at time t (i.e. 1 or -1); ML = machine learning. Updated post to clarify both.

3. December 16, 2011 3:21 pm

ML = machine learning

I assumed s_t is a stock’s price at time t.