\documentclass{article} %\VignetteIndexEntry{Definition of the weighted ROC curve} \usepackage[cm]{fullpage} \usepackage{verbatim} \usepackage{hyperref} \usepackage{graphicx} \usepackage{natbib} \usepackage{amsmath,amssymb} \DeclareMathOperator*{\argmin}{arg\,min} \DeclareMathOperator*{\Diag}{Diag} \DeclareMathOperator*{\TPR}{TPR} \DeclareMathOperator*{\FPR}{FPR} \DeclareMathOperator*{\FN}{FN} \DeclareMathOperator*{\FP}{FP} \DeclareMathOperator*{\argmax}{arg\,max} \DeclareMathOperator*{\maximize}{maximize} \DeclareMathOperator*{\minimize}{minimize} \newcommand{\RR}{\mathbb R} \begin{document} \title{Weighted ROC analysis} \author{Toby Dylan Hocking} \maketitle \section{Introduction} In binary classification, we are given $n$ observations. For each observation $i\in \{1, \dots, n\}$ we have an input/feature $x_i\in\mathcal X$ and output/label $y_i\in\{-1, 1\}$. For example, say that $\mathcal X$ is the space of all photographs, and we want to find a binary classifier that predicts whether a particular photograph $x_i$ contains a car ($y_i=1$) or does not contain a car ($y_i=-1$). In weighted binary classification we also have observation-specific weights $w_i\in\RR_+$ which are the cost of making an error in predicting that observation. Thus the goal is to find a classifier $c:\mathcal X \rightarrow \{-1, 1\}$ that minimizes the weighted zero-one loss on a set of test data \begin{equation} \minimize_c \sum_{i\in\text{test}} I\left[ c(x_i) \neq y_i \right] w_i, \end{equation} where $I$ is the indicator function that is 0 for a correct prediction, and 1 otherwise. Instead of directly learning a classification function $c$, binary classifiers often instead learn a score function $f:\mathcal X\rightarrow \RR$. Large values are more likely to be positive $y_i=1$ and small values are more likely to be negative. One way of evaluating such a model is by using the weighted Receiver Operating Characteristic (ROC) curve, as explained in the next section. \section{Weighted ROC curve} Let $\hat y_i=f(x_i)\in\RR$ be the predicted score for each observation $i\in\{1, \dots, n\}$, let $\mathcal I_1=\{i:y_i=1\}$ be the set of positive examples and let $\mathcal I_{-1}=\{i:y_i=-1\}$ be the set of negative examples. Then the total positive weight is $W_1=\sum_{i\in\mathcal I_1} w_i$ and the total negative weight is $W_{-1} = \sum_{i\in\mathcal I_{-1}} w_i$. For any threshold $\tau\in\RR$, define the thresholding function $t_\tau:\RR\rightarrow\{-1, 1\}$ as \begin{equation} \label{eq:t_tau} t_\tau(\hat y) = \begin{cases} 1 & \text{ if } \hat y \geq \tau \\ -1 & \text{ if } \hat y < \tau. \end{cases} \end{equation} We define the weighted false positive count as \begin{equation} \label{eq:weighted_FP} \FP(\tau) = \sum_{i\in\mathcal I_{-1}} I\left[ t_\tau(\hat y_i) \neq -1 \right] w_i \end{equation} and the weighted false negative count as \begin{equation} \label{eq:weighted_FN} \FN(\tau) = \sum_{i\in\mathcal I_{1}} I\left[ t_\tau(\hat y_i) \neq 1 \right] w_i. \end{equation} We define the weighted false positive rate as \begin{equation} \label{eq:weighted_FPR} \FPR(\tau) = \frac{1}{W_{-1}} \sum_{i\in\mathcal I_{-1}} I\left[ t_\tau(\hat y_i) \neq -1 \right] w_i \end{equation} and the weighted true positive rate as \begin{equation} \label{eq:weighted_TPR} \TPR(\tau) = \frac{1}{W_{1}} \sum_{i\in\mathcal I_{1}} I\left[ t_\tau(\hat y_i) = 1 \right] w_i. \end{equation} A weighted ROC curve is drawn by plotting $\FPR(\tau)$ and $\TPR(\tau)$ for all thresholds $\tau\in\RR$. It can be computed and plotted using the R code <>= y <- c(-1, -1, 1, 1, 1) w <- c(1, 1, 1, 4, 5) y.hat <- c(1, 2, 3, 1, 1) library(WeightedROC) tp.fp <- WeightedROC(y.hat, y, w) library(ggplot2) ggplot()+ geom_path(aes(FPR, TPR), data=tp.fp)+ coord_equal() @ \section{Weighted AUC} The Area Under the Curve (AUC) may be computed using the R code <>= WeightedAUC(tp.fp) @ \end{document}