Skip to content

Return Decomposition via Mixing

December 28, 2011

A variety of techniques exist for estimating parameters of the return decomposition model, previously introduced in Index Return Decomposition. This post considers estimation of an independent mixture model via maximum likelihood (MLE), a workhorse of frequentist statistics and always a nice place to begin.

Recall z is unobserved, and thus the model cannot be directly estimated via MLE. Thus, need to decide how to approach estimation for this latent variable. One way is to be naive, and simply assume z is the deterministic difference in return between stock and index (technically, this generates a profile likelihood as formalized by Severini and Wong [1992], which Murphy and Van Der Vaart [2000] verify is well-behaved consistent with exact likelihood):

   z_t = r_t - i_t

This assumption permits focus on estimating \alpha, providing insight into the mixing behavior of the return being decomposed: if a stock return behaves like its index, then mixing is low with small \alpha (in the limit, \alpha = 0 when a stock behaves identical to its index, as no mixing is required); in contrast, the stock return behaves independent from its index on a regular basis, then mixing is high with a large \alpha.

Autocorrelation of i and z is worth consideration, as that helps determine whether time indexing is required for \alpha. For returns with insignificant autocorrelation (common for signed equity returns), the time index is dropped and a single \alpha is estimated. Yet, conditional dependence often exists between i and z, consistent with previous posts in the Proxy / Cross Hedging series (illustrating r-z copula for CRM / QQQ example below):

Use of this identity for z_i transforms the decomposition model into:

   r_t = s_t \left[ \alpha_t | (r_t - i_t) | + (1 - \alpha_t) \beta | i_t | \right]

The model is further simplified into a familiar independent mixture model by dropping sign s_t and \beta, and estimating via MLE using density f and return distribution functions \phi_i:

   f(r_t; \boldsymbol{\theta}) = \alpha \phi_1(r_t - i_t) + (1 - \alpha) \phi_2(i_t)

MLE estimation requires assumption of parametric distributions for \phi_i, of which common choices from the literature are normal, student-t, skew-t, or skew hyperbolic student-t (Aas and Haff [2006]). Next question is how to estimate the \boldsymbol{\theta} parameters: \alpha and family of \phi_i parameters (e.g. \mu_i and \sigma_i if \phi_i is assumed to be normal). As i_t is observed, one way to proceed is via two-step estimation:

  1. Estimate \phi_2 parameters via MLE from i_t
  2. Jointly estimate \alpha and \phi_1 parameters via MLE on the mixture, holding \phi_2 parameters constant

For both, recall the likelihood \mathcal{L} , and log likelihood ln \mathcal{L} , are defined as:

   \mathcal{L} = \prod\limits_{t=1}^T f(r_t; \boldsymbol{\theta})

   \ln \mathcal{L} = \sum\limits_{t=1}^T \ln f(r_t; \boldsymbol{\theta})

From which MLE of the mixture is maximization of the likelihood over \boldsymbol{\theta}, where log is chosen for numeric stability:

   \displaystyle \max_{\boldsymbol{\theta}} \ln \mathcal{L} = \max_{\boldsymbol{\theta}} \sum\limits_{t=1}^T \left( \ln \left[ \alpha (r_t - i_t) + (1 - \alpha) i_t \right] \right)

This optimization can be performed numerically in R via minimization using DEoptim of the negative log likelihood negLogLikeFun (negative is due to minimization in DEoptim versus maximization of \ln \mathcal{L}). DEoptim is chosen due to rapid convergence on non-smooth global optimizations.

For example, continuing the example of CRM / QQQ introduced in the previous posts on Proxy / Cross Hedging generates the results:

> symbols <- c("CRM", "QQQ")
> endDate <- Sys.Date()
> startDate <- endDate - as.difftime(52*5, unit="weeks")
> quoteType <- "Close"
> p <- do.call(cbind, lapply(symbols, get.hist.quote, start=startDate, end=endDate,  quote=quoteType))
> colnames(p) <- symbols
> doReturnDecomp(p)
normal mix likelihood: -3485.55 phi1 params: 0.0003471366 0.01673634 params 0.2546208 -0.001208877 0.0113988
skew-t mix likelihood: -3566.512 phi1 params: 0.003844969 0.01079941 -0.3252923 2.893228 params 0.2357737 -0.004188977 0.01099266 0.4157174 26.5643
skew-hyp-t likelihood: -3083.700 phi1 params: 0.01051675 0.1485529 -3.945452 10.10836 params 0.8295289 -0.0003636940 0.03071332 -0.5 5

These results correspond to the following density functions for the skew-t mixture:

One interesting observation of these densities is their location parameters are on opposing side of zero: z has positive location, while i has negative location. One interpretation of this is positive returns from CRM disproportionately originate from the idiosyncratic z_t, while negative returns originate from the index i_t. Economically, this is plausible: positive news is often idiosyncratic, while negative news is often market-wide.

Several additional inferences can be drawn from these results:

  • Model selection: likelihood suggests skew-t is the preferred model, indicating long tails and skewness (matching stylized facts)
  • Mixing: \alpha = 0.24 indicating that over 75% of CRM returns are determined by the corresponding QQQ index; remaining 25% are determined by the unobserved z return series
  • Tails: CRM \phi_1 df = 2.89 which indicates significantly thicker tails than QQQ \phi_2 df = 26.56 (matching stylized facts for individual stocks versus indices)

Subsequent posts may consider alternative estimation techniques for this model.


R code for generating two-stage MLE estimation of return decomposition via mixing:

library("MASS")
library("stats")
library("DEoptim")
library("sn")
library("SkewHyperbolic")

normalMixtureIndexDecomp <- function(r, i)
{
  # Two-step MLE estimation of return decomposition model, assuming both
  # return distributions are normal.
  #
  # Args:
  #   r: return series being decomposed
  #   i: index series used for decomposition
  #
  # Return value: MLE parameter estimates
  
  z <- r - i
  id <- fitdistr(i, "normal")$estimate
  negLogLikeFun <- function(p) {
    a <- p[1]; mu1 <- p[2]; s1 <- p[3];
    ll <- (-sum(log(a * dnorm(z,mu1,s1) + (1 - a) * dnorm(i, id[1], id[2]))));
    return (ll); 
  }
  mle <- DEoptim(negLogLikeFun, c(0, -0.5, 0), c(1, .5, .5), control=list(trace=FALSE))
  
  cat("normal mix likelihood:", last(mle$member$bestvalit), "phi1 params:",id, "params", last(mle$member$bestmemit),"\n")
  mle <- last(mle$member$bestmemit)
  
  x <- seq(-.25,.25,length.out=500)
  dnorm1 <- dnorm(x, id[1], id[2])
  dnorm2 <- dst(x, mle[2], mle[3])
  plot(x, dnorm1, type='l', ylim=c(0, max(dnorm1,dnorm2)), ylab="Density", main="Normal Mixture")
  lines(x, dnorm2, col='red')
  abline(v=id[1], lty=2)
  abline(v=mle[2], col='red', lty=2)
  legend("topleft",legend=c("phi1", "phi2"), fill=colors, cex=0.5)
    
  return (mle)
}

mixtureSkewTIndexDecomp <- function(r, i)
{
  # Two-step MLE estimation of return decomposition model, assuming both
  # return distributions are skew-t.
  #
  # Args:
  #   r: return series being decomposed
  #   i: index series used for decomposition
  #
  # Return value: MLE parameter estimates

  z <- r - i
  idp <- st.mle(y=i)$dp
  negLogLikeFun <- function(p) {
    a <- p[1]; mu1 <- p[2]; s1 <- p[3]; s2 <- p[4]; df1 <- p[5]
    ll <- (-sum(log(a * dst(z,location=mu1,scale=s1,shape=s2,df=df1) + (1 - a) * dst(i, dp=idp))));
    return (ll); 
  }
  mle <- DEoptim(negLogLikeFun, c(0, -0.5, 0, 0, 2), c(1, .5, .5, 5, 50), control=list(trace=FALSE))
  
  cat("skew-t mix likelihood:", last(mle$member$bestvalit), "phi1 params:", idp, "params", last(mle$member$bestmemit),"\n")
  mle <- last(mle$member$bestmemit)

  
  x <- seq(-.25,.25,length.out=500)
  dst1 <- dst(x, dp=idp)
  dst2 <- dst(x, dp=mle[2:5])
  plot(x, dst1, type='l', ylim=c(0, max(dst1,dst2)), ylab="Density", main="Skew T Mixture")
  lines(x, dst2, col='red')
  abline(v=idp[1], lty=2)
  abline(v=mle[2], col='red', lty=2)
  legend("topleft",legend=c("phi1", "phi2"), fill=colors, cex=0.5)
  
  return (mle)
}

mixtureSkewHypTIndexDecomp <- function(r, i)
{
  # Two-step MLE estimation of return decomposition model, assuming both
  # return distributions are skew hyperbolic student-t.
  #
  # Args:
  #   r: return series being decomposed
  #   i: index series used for decomposition
  #
  # Return value: MLE parameter estimates

  z <- r - i
  iparam <- skewhypFit(i,plots=FALSE,printOut=FALSE)$param
  negLogLikeFun <- function(p) {
    a <- p[1];
    ll <- (-sum(log(a * dskewhyp(z,param=p[2:5]) + (1 - a) * dskewhyp(i, param=iparam))));
    return (ll); 
  }
  mle <- DEoptim(negLogLikeFun, c(0, -5, 0, -0.5, 0), c(1, 5, .5, -0.5, 5), control=list(trace=FALSE))
  
  cat("skew-hyp-t likelihood:", last(mle$member$bestvalit), "phi1 params:",iparam,"params", last(mle$member$bestmemit),"\n")
  mle <- last(mle$member$bestmemit)

  
  x <- seq(-.25,.25,length.out=500)
  dskewhyp1 <- dskewhyp(x, param=iparam)
  dskewhyp2 <- dskewhyp(x, param=mle[2:5])
  plot(x, dskewhyp1, type='l', ylim=c(0, max(dskewhyp1,dskewhyp2)), ylab="Density", main="Skew Hyperbolic Student-T")
  lines(x, dskewhyp2, col='red')
  abline(v=iparam[1], lty=2)
  abline(v=mle[2], col='red', lty=2)
  legend("topleft",legend=c("phi1", "phi2"), fill=colors, cex=0.5)

  return (mle)
}

doReturnDecomp <- function(p)
{
  # Decompose return of two series, using several parametric distributions.
  #
  # Args:
  #   p: p[,1] is return being decomposed; p[,2] is index returns
  #
  # Return value: none

  r <- ROC(p[,1], type="discrete", na.pad=FALSE)
  i <- ROC(p[,2], type="discrete", na.pad=FALSE)
  
  normalMixtureIndexDecomp(r, i)
  mixtureSkewTIndexDecomp(r,i)
  mixtureSkewHypTIndexDecomp(r,i)
}
About these ads
2 Comments leave one →
  1. sameersoi permalink
    January 3, 2012 1:24 pm

    hi quantivity, nice post! the skew-t has more parameters than the norml; did you consider the possibility of over-fitting? perhaps using aikaike or bayesian information criterion, imperfect as they are?

    • quantivity permalink*
      January 3, 2012 8:44 pm

      @sameersoi: thanks for complement. Intent here is more exploratory analysis than pure statistical inference, and thus preference is given to gaining insight from parsimonious models over model selection considerations (i.e. AIC / BIC).

      Separately, overfitting is a pervasive concern with ML, worth discussion for this model (e.g. cross-validation is natural in this context), and deliberately not considered in this post for brevity.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 216 other followers

%d bloggers like this: