Interview：Concordant, Discordant and Tied Pairs for model validation

72e9794a353a65bd3b583d8e9d6b77d6a9cc6d2f

What are Concordant, Discordant and Tied Pairs for model validation?

A friend who was interviewed by Amazon for a data related position was asked about this question. Here is a very clear solution for this question.

http://www.listendata.com/2014/08/modeling-tips-calculating-concordant.html

最基本的是把1的放一组（有a个），把0的放一组（有b 个），做笛卡尔积（cartesian product）得到aXb对儿数据。然后看每一对儿，把对应该是1的和应该是0的预测出来的数值做比较，如果应该是1 的大于应该是0的，叫concordance pair, 如果应该是0的大于应该是1的，叫discordance pair, 如果相等就叫tied pair。

好的model的特点：越多的concordant pairs，越少的discordant and tied pairs

一般concordant pairs占80%以上的比例比较好。

Citation:

“

Steps to calculate concordance / discordance and AUC

Calculate the predicted probability in logistic regression model.
Divide the data into two datasets. One dataset contains observations having actual value of dependent variable with value 1 (i.e. event) and corresponding predicted probability values. And the other dataset contains observations having actual value of dependent variable 0 (non-event) against their predicted probability scores.
Compare each predicted value in first dataset with each predicted value in second dataset.

Total Number of pairs to compare = x * y
x: Number of observations in first dataset (actual values of 1 in dependent variable)
y: Number of observations in second dataset (actual values of 0 in dependent variable).

In this step, we are performing cartesian product (cross join) of events and non-events. For example, you have 100 events and 1000 non-events. It would create 100k (100*1000) pairs for comparison.

A pair is concordant if 1 (observation with the desired outcome i.e. event) has a higher predicted probability than 0 (observation without the outcome i.e. non-event).
A pair is discordant if 0 (observation without the desired outcome i.e. non-event) has a higher predicted probability than 1 (observation with the outcome i.e. event).
A pair is tied if 1 (observation with the desired outcome i.e. event) has same predicted probability than 0 (observation without the outcome i.e. non-event).
The final percent values are calculated using the formula below –

Percent Concordant = (Number of concordant pairs)/Total number of pairs
Percent Discordance = (Number of discordant pairs)/Total number of pairs
Percent Tied = (Number of tied pairs)/Total number of pairs
Area under curve (c statistics) = Percent Concordant + 0.5 * Percent Tied

In general, higher percentages of concordant pairs and lower percentages of discordant and tied pairs indicate a more desirable model.

”

Interview：Concordant, Discordant and Tied Pairs for model validation

What are Concordant, Discordant and Tied Pairs for model validation?

A friend who was interviewed by Amazon for a data related position was asked about this question. Here is a very clear solution for this question.

Be the first to comment

Leave a Reply