Combining Regression & Ranking
Many types of trading analysis problems boil down to a combination of regression and ranking, in one guise or another (e.g. classical or minimizing custom loss functions). Yet, the relationship between these two techniques is subtle and their interdependence subject to myriad practical difficulties. One familiar example is the lack of necessary performance equivalence, meaning excellent regression may result in poor ranking and vice versa.
KDD-2010 recently included a paper, Combined Ranking and Regression by Sculley, which describes an approach combining both techniques by simultaneously optimizing dual objective functions. Specifically, from p. 1:
Model that performs well on two distinct families of metrics. The first set of metrics are regression based metrics, such as Mean Squared Error, which reward a model for predicting a numerical value y′ that is near to the true target value y for a given example, and penalize predictions far from y. The second set are rank-based metrics, such as Area under the ROC curve (AUC), which reward a model for producing predicted values with the same pairwise ordering y′1 > y′2 as the true values y1 > y2 for a pair of given examples.
Purported benefits of this combined approach are quite interesting, given financial data:
- Stability: Guards against learning degenerate models that perform well on one set of metrics but poorly on another
- Non-normal distributions: Improve regression performance in the case of rare events, including long-tailed and extreme minority class distributions
The optimization objective is, (3) from p. 3:
Where is regression loss, is ranking loss, and is loss weight parameter. Algorithm uses stochastic gradient descent.