# Empirical Distributions: Minimum Variance

Trading experience reminds Quantivity the distributions of security returns are rarely normal (or log-normal), despite ubiquitous mathematical presumptions to the contrary. Yet, this begs an obvious question (to beginners and experts alike): *if returns are not normal, then what distribution are they*?

This question is particularly interesting in the context of minimum variance portfolios (MVPs), as they refute the risk premium and demonstrate significant performance differences compared with standard equity benchmarks. Further, attempts to build intuition about the higher and mixed moments of MVPs depend upon understanding the corresponding returns distribution.

This post analyzes empirical distributions to posit an answer, rather than theorize.

The following posits two assumptions. First, *return distributions change over time*, both in central moments and shape. In other words, throw away the comforting assumption that all returns for an instrument are drawn from the same distribution. Second, *ignore theoretical probability distributions* and instead let the data speak. Although traders do not use such obtuse words, they constantly echo both sentiments.

Without further ado, consider the following *two* empirical probability density functions (PDF) estimated from US minimum variance sector rotation for 2004 (randomly chosen year in the sample) in solid lines:

This graph is delightfully surprising.

The solid red line is *empirical* PDF of S&P 500 log returns; solid black line is *empirical* PDF of log returns for minimum variance portfolio. For comparison with theory, the dashed lines are the standard normal distributions whose parameters are equal to corresponding empirical distribution (*i.e.* mean and variance).

In other words, the observed *frequency* of returns (y-axis) is plotted against the return *magnitude* (x-axis). The empirical PDFs are estimated via kernel density estimation (aka Parzen window), using the `density`

function. The empirical PDFs converge in probability to the actual PDF due to weak law of large numbers, with the number of empirical observations equal to trading days in the year (roughly 250, which should be ample for LLN).

Recall 2004 was a trending market, with SP500 finishing up not quite 10%. Numerous aspects of this graph are notable (especially higher moments), demonstrating empirical divergence from theory:

- MVP bi-modality: PDF for the MVP has two modes, nearly symmetric on both sides of zero (mixture of two distributions, perhaps?)
- MVP platykurtosis: MVP exhibits a peak which is flatter and wider than its corresponding normal, and thinner tails
- SP500 negative skew: SP500 returns are negatively skewed, with bulk of the skew in the (bad) far left tail
- SP500 leptokurtosis: legendary “long tails” of SP500 returns are clearly visible, compared with corresponding normal
- SP500 tail bumps: two PDF bumps are clearly visible in SP500, at -0.01 and +0.01; the left bump is particularly disheartening

All of these observations are consistent with the *a priori* intuition expected of returns generated by smaller variance, as optimized by the MVPs criteria. More importantly, the above observations suggest returns generated by the MVP portfolio should have comparatively better profit performance.

Now comes even more fun: consider the *difference in estimated densities* between MVP and SP500 distributions (calculated as the arithmetic difference between the linear interpolation of the two kernel density functions, evaluated over the observed returns range), with blue lines across both zero origins:

Shape of this differenced density also is consistent with minimizing variance: two peaks illustrate MVP returns have comparatively higher observed frequency of smaller values, for both positive and negative returns; two troughs illustrate MVP returns have comparatively lower observed frequency of higher values, for both positive and negative returns. As an unexpected treat, a small local minima deep in the right tail indicates a few large positive surprises.

In restricting analysis to 2004, we have yet to justify the supposition that empirical distributions change over time. To do so now, consider the same analysis as above applied to the years covered by the previously considered US Sector Rotation (

*i.e.*2000 – 2010):

Note the remarkable difference in density for the *mode* of each year (as all plots use same y-axis), ranging from 80 in 2007 to under 40 in 2002/2008/2009. This nicely illustrates kurtosis in action, with the tails expanding dramatically during market dislocation—consistent with the increased frequency of comparatively larger extreme values, as CNBC so loved to hype at the time.

And the same density differences between MVP and SP500:

Towards beginning to help unravel why Global MVP Rotation was comparatively less profitable than US Sector MVP, consider the corresponding empirical densities for the MVP generated by optimizing international ETFs:

Note the same wide variability in mode across different years, more than 3x difference: from 60 in 2006 to 20 in 2009. Positive kurtosis at work, once again. Consistent with SPY densities above, there are large bumps in the left tail for Global MVP in 2006, 2009, and 2010. Finally, the shape of MVP and EFA densities are broadly more mutually-consistent than MVP/SPY. All of these factors likely contribute to comparative underperformance versus US sectors.

And the corresponding density differences between MVP and EFA:

These distributions shed some light on the underperformance versus EFA benchmark. The 2010 density shows significant negative difference for positive returns larger than 0.02, illustrating why the MVP trailed EFA incrementally worse through the year. The large negative peak in 2009 goes a long way towards explaining why MVP did not outperform EFA during the rush to the financial crisis bottom. Finally, MVP for 2007 has a large negative peak and does not compensate with any positive peaks.

For readers interested in applying empirical distribution analysis further to portfolio optimization, they may wish to consider Portfolio Selection with Robust Estimation by DeMiguel and Nogales [2009].

The R code to generate the above empirical distribution analysis is:

# empirical distributions (densities) for benchmark and MVP quartz() par(mfrow=c(3,4)) sapply(annualReturnNames, function (yr) { plot(density(dailyPnL[yr]), type='l', xlab="", ylab="Density", ylim=c(0,80), main=format(yr)); lines(density(diffspy[yr]), col='red'); curve(dnorm(x, mean=mean(dailyPnL[yr]), sd=sd(dailyPnL[yr])), col = 'black', add = TRUE, lty=3); curve(dnorm(x, mean=mean(diffspy[yr]), sd=sd(diffspy[yr])), col = 'red', add = TRUE, lty=3) }) # differential empirical distributions (densities) # requires interpolation via approxfun(), as density() generates $x at different spacing quartz() par(mfrow=c(3,4)) sapply(annualReturnNames, function (yr) { densityX <- density(dailyPnL[yr])$x; dailyd <- density(dailyPnL[yr]); spyd <- density(diffspy[yr]); dailyFn <- approxfun(x=dailyd$x, y=dailyd$y); spyFn <- approxfun(x=spyd$x, y=spyd$y); densityDiff <- function(x) {dailyFn(x) - spyFn(x)}; diffY <- densityDiff(densityX); plot(x=densityX, y=diffY, type='l', xlab='', ylab='Density Difference', main=format(yr)); abline(h=0, v=0, col='blue') })

Great post (again!)

Let me add two references related to the subject of this post.

In this paper:

Okhrin, Y. and W. Schmid (2006). Distributional properties of portfolio weights. Journal of Econometrics, 235–256.

the authors provide explicit expressions for the multivariate density of the global minimum-variance portfolio.

In this paper:

Kan, R. and D. R. Smith (2008). The Distribution of the Sample Minimum-Variance Frontier. Management Science 54, 1364–1380.

the authors provide finite-sample expressions for the minimum-variance frontier.

@Javier: thanks for additional references.

Have you used daily annualized returns to construct your density functions? It is not very clear from the code snippet. Thanks!

@Tanya: yes, daily returns.

Hi there would you mind letting me know which web host you’re working with? I’ve loaded your blog in 3 completely different internet browsers and I must say this blog loads a lot quicker then most. Can you suggest a good internet hosting provider at a honest price? Cheers, I appreciate it!