%\documentclass[article]{jss} \documentclass[nojss]{jss} \usepackage{amsmath} \newcommand{\class}[1]{``\code{#1}''} \author{Kenneth Knoblauch\\ Inserm, U846 \And Laurence T.\ Maloney\\ New York University} \Plainauthor{Kenneth Knoblauch, Laurence T. Maloney} \title{\pkg{MLDS}: Maximum Likelihood Difference Scaling in \proglang{R}} \Plaintitle{MLDS: Maximum Likelihood Difference Scaling in R} \Abstract{ This introduction to the \proglang{R} package \pkg{MLDS} is a modified and updated version of \citet{KnoblauchMaloney2008} published in the \emph{Journal of Statistical Software}. The \pkg{MLDS} package in the \proglang{R} programming language can be used to estimate perceptual scales based on the results of psychophysical experiments using the method of difference scaling. In a difference scaling experiment, observers compare two supra-threshold differences (a,b) and (c,d) on each trial. The approach is based on a stochastic model of how the observer decides which perceptual difference (or interval) $(a,b)$ or $(c,d)$ is greater, and the parameters of the model are estimated using a maximum likelihood criterion. We also propose a method to test the model by evaluating the self-consistency of the estimated scale. The package includes an example in which an observer judges the differences in correlation between scatterplots. The example may be readily adapted to estimate perceptual scales for arbitrary physical continua. } \Keywords{difference scaling, sensory magnitude, proximity, psychophysics, signal detection theory, GLM} \Address{ Kenneth Knoblauch\\ Inserm, U846, Bron, France\\ Stem Cell and Brain Research Institute\\ Department of Integrative Neurosciences \\ Universit\'e de Lyon, Lyon 1 \\ 18 avenue du Doyen L\'epine\\ 69500 Bron, France \\ Telephone: +33/472913477 \\ Fax: +33/472913461 \\ E-mail: \email{ken.knoblauch@inserm.fr}\\ URL: \url{http://www.sbri.fr/members/kenneth-knoblauch.html} \\ Laurence T.\ Maloney \\ Department of Psychology\\ Center for Neural Science \\ New York University \\ 6 Washington Place, 8th Floor \\ New York, NY 10011, United States of America \\ Telephone: +1/212/9987851 \\ E-mail: \email{ltm1@nyu.edu} \\ URL: \url{http://www.psych.nyu.edu/maloney/} } %% need no \usepackage{Sweave.sty} %\VignetteIndexEntry{MLDS: Maximum Likelihood Difference Scaling in R} %\VignetteKeywords{difference scaling, sensory magnitude, proximity, psychophysics, signal detection theory, GLM, R} %\VignettePackage{MLDS} \begin{document} \section{Introduction} \begin{Scode}{echo=FALSE,results=hide} library(MLDS) \end{Scode} Difference scaling is a psychophysical procedure used to estimate supra-threshold perceptual differences for stimuli distributed along a physical continuum. On each of a set of trials, an observer is presented with a quadruple, $(I_{a},I_{b},I_{c},I_{d})$, drawn from an ordered set of stimuli, $\{I_{1} < I_{2} < \ldots < I_{p}\}$. For convenience, the two pairs $(I_{a},I_{b})$ and $(I_{c},I_{d})$ are often ordered so that $I_{a}|\psi _{d}-\psi _{c}| \label{equ1} \end{equation}% that is, the difference scale predicts judgment of perceptual difference. It is unlikely that human observers would be so reliable in judgment as to satisfy the criterion just given, particularly if the perceptual differences $| \psi_b - \psi_a |$ and $| \psi_d - \psi_c |$ are close. \citet{MalYang03} proposed a stochastic model of difference judgment that allows the observer to exhibit some stochastic variation in judgment. Let $L_{ab} = | \psi_b - \psi_a |$ denote the unsigned perceived length of the interval $I_a, I_b$ . The proposed decision model is an equal-variance Gaussian signal detection model % \citep{GrSw74} where the signal is the difference in the lengths of the intervals, \begin{equation} \delta(a, b; c, d) = L_{cd} - L_{ab} = | \psi_d - \psi_c | - | \psi_b - \psi_a | \label{equ2} \end{equation} If $\delta$ is positive, the observer should choose the second interval as larger; when it is negative, the first. When $\delta$ is small, negative or positive, relative to the Gaussian standard deviation, $\sigma$, we expect the observer, presented with the same stimuli, to give different, apparently inconsistent judgments. The decision variable employed by the observer is assumed to be \begin{equation} \Delta(a, b; c, d) = \delta(a, b; c, d) + \epsilon = L_{cd} - L_{ab} + \epsilon \label{equ3} \end{equation} where $\epsilon \sim \mathcal{N}(0, \sigma^2)$: given the quadruple, $\left ( a, b\, ; c, d \right )$ the observer selects the pair $I_c, I_d$ precisely when, \begin{equation} \Delta(a, b\, ; c, d) > 0. \label{equ4} \end{equation} \subsection{Direct maximization of likelihood} In each experimental condition the observer completes $n$ trials, each based on a quadruple $\mathbf{q}^k = \left( a^k, b^k ; c^k, d^k \right)$, $k = 1, n $. The observer's response is coded as $R^k = 0$ (the difference of the first pair is judged larger) or $R^k = 1$ (second pair judged larger). We denote the outcome ``$cd$ judged larger than $ab$'' by $cd \succ ab$ for convenience. We fit the parameters $\mathbf{\Psi} = \left( \psi_1, \psi_2, \ldots, \psi_p \right )$ and $\sigma$ by maximizing the likelihood, \begin{equation} L(\mathbf{\Psi}, \sigma) = \prod^n_{k= 1}\Phi \left (\frac{\delta\left (% \mathbf{q}^k \right)}{\sigma} \right )^{1 - R_k} \left (1 - \Phi \left (\frac{% \delta\left (\mathbf{q}^k \right)}{\sigma} \right ) \right )^{R_k}, \label{like} \end{equation} where $\Phi(x)$ denotes the cumulative standard normal distribution and $\delta \left (\mathbf{q}^k \right ) = \delta \left (a^k, b^k ; c^k, d^k \right )$ as defined in Equation~\ref{equ3}. At first glance, it would appear that the stochastic difference scaling model just presented has $p + 1$, free parameters: $\psi_1, \ldots, \psi_p$ together with the standard deviation of the error term, $\sigma$. However, any linear transformation of the $\psi_1, \ldots, \psi_p$ together with a corresponding scaling by $\sigma^{-1}$ results in a set of parameters that predicts exactly the same performance as the original parameters. Without any loss of generality, we can set $\psi_1 = 0$ and $\psi_p = 1$, leaving us with the $% p - 1$ free parameters, $\psi_2, \ldots, \psi_{p-1}$ and $\sigma$. When scale values are normalized in this way, we describe them as standard scales. Equation~\ref{like} can be recognized as the likelihood for a Bernoulli variable. Taking the negative logarithm allows the parameters to be estimated simply with a minimization function such as \code{optim} in \proglang{R} \citep[for example][p.~445]{VR02}. \subsubsection{Reparameterization} In practice, we define the minimization functions to work with the parameters on transformed scales: the $p-2$ scale values by a logit transformation \begin{equation} \log \left( \frac{x}{1-x}\right) \label{logistic} \end{equation}% so they are constrained to be in the interval $(0,1)$ and $\sigma $ by the logarithm so that it remains positive. These transformations have no theoretical significance; they are used to avoid problems in numerical optimization. Maximum likelihood methods are invariant under such reparameterization \citep[pp.~284--285]{MoodGray74}. \subsection{GLM method} In this section, we show how the above formulation can be framed as a GLM. A generalized linear model \citep[GLM][]{McCNeld89} is described by \begin{equation} \eta( \E [ Y] ) = X \beta , \end{equation} where $\eta$ is a (link) function transforming the expected value of the elements of the response vector, $Y$, to the scale of a linear predictor given by the product of the model matrix, $X$, and a vector of coefficients, $\beta$, and the elements of $Y$ are distributed as a member of the exponential family. In the present case, the responses of the observer can be considered as Bernoulli variables and, thus, can be modeled with the binomial distribution which conforms to this family. The canonical link function for the binomial case is the logistic transformation, Equation~\ref{logistic}. However, other links are possible, and the inverse cumulative Gaussian, or quantile function, corresponds to Equation~\ref{like}, described above and would be equivalent to a probit analysis. In this section, we assume that we have re-ordered each quadruple $\left( a,b; c,d \right)$ so that $a str(kk.diag.prob) List of 5 $ NumRuns : int [1:10000] 195 193 205 199 177 177 201 211 187 209 ... $ resid : num [1:10000, 1:990] -5.25 -4.79 -4.77 -4.56 -4.52 ... $ Obs.resid: Named num [1:990] 0.458 0.164 0.413 1.155 -1.610 ... ..- attr(*, "names")= chr [1:990] "1" "2" "3" "4" ... $ ObsRuns : int 173 $ p : num 0.0159 - attr(*, "class")= chr [1:2] "mlds.diag" "list" \end{verbatim} \code{NumRuns} is a vector of integer giving the number of runs in the sorted deviance residuals for each simulation. \code{resid} is a matrix of numeric, each row of which contains the sorted deviance residuals from a simulation. \code{Obs.resid} is a vector of numeric providing the residuals from the obtained fit. \code{ObsRuns} is the number of observed runs in the sorted deviance residuals, and finally \code{p} gives the proportion of runs from the simulations less than the observed number. A \code{plot} method is supplied to visualize the results of the two analyses which are shown in Figure~\ref{diagnostics} for each of the link functions. The distributions of the residuals for the probit and logit links seem reasonable, based on 10000 simulations. Tendencies toward deviation from the envelopes are small, in any case. These contrast with the cauchit link, that displays systematic deviations from the bootstrapped envelope. The histograms indicate that there are too few runs in the residuals using the probit link. For the logit, the observed number falls well within the distribution of bootstrapped values, while the cauchit value, given its performance with the previous diagnostic, is debatable. The proportion of simulated runs less than the observed value for each link is given in the third column of Table~\ref{GOF}. Two points on the far left of the cdf's of Figure~\ref{diagnostics} stand out as having unusually large residual deviances. These points, as well as a third one, are flagged, also, by the diagnostics generated by the glm \code{plot} method. The three trials are simply extracted from the data set. \begin{Scode}{} kk[residuals(kk.mlds$obj) < -2.5, ] \end{Scode} Interestingly, if these points are removed from the data set, the value of \code{p} for the probit link increases to the value of 0.24, more in line with that obtained using the logit link. The number of runs does not change in the observed data, but the bootstrapped distribution of the number of runs shifts to a mean near 171. Judging from the estimated scale as well as the stimuli, it seems surprising that the observer would have judged the correlation difference between 0 and 0.1 to be greater than that of 0.3 (or 0.4) and 0.9. We suspect that these correspond to non-perceptual errors on the part of the observer, such as fingerslips, lack of concentration or momentary confusion about the instructions. A few such errors nearly always occur in psychophysical experiments, even with practiced and experienced observers. In modeling data from detection experiments, it has proven useful to include a nuisance parameter to account for these occasional errors \citep{WichHill01}. The error rates are modeled by modifying the lower and upper asymptotes of the inverse link function. We can get a sense of the impact of adding these two nuisance parameters by using links \code{mafc.probit} and \code{probit.lambda} from the \pkg{psyphy} package, which permit specifying these asymptotes differently from 0 and 1 \citep{psyphy}. Preliminary experimentation on the full data set indicates that the AIC is reduced by 48 if the lower estimate is set to about 0.06 and the upper to 0.007. These values also lower the number of runs in the distributions of bootstrapped residuals, so that the observed value yields a \code{p} $ = 0.7$. \subsubsection{Bias-reduced MLDS} When using the default argument \code{method = glm} with \code{mlds} a method argument can be passed to \code{glm} with the argument \code{glm.meth}. Its default value is ``glm.fit'' but other methods are possible. For example, the \pkg{brglm} \citep{Kosmidis2007} package contains the function \code{brglm.fit} that peforms a bias-reduced fit based on \citet{Firth93}. For example, \begin{Scode}{eval=FALSE} library(brglm) mlds(kk2, glm.meth = brglm.fit) \end{Scode} \subsection{The six-point test} Performing a six-point test on these data with 10000 simulations requires about 15 minutes on the same machine indicated above. \begin{CodeChunk} \begin{CodeInput} kk.6pt <- simu.6pt(kk.mlds, 10000) str(kk.6pt} \end{CodeInput} \begin{CodeOutput} List of 4 $ boot.samp: num [1:10000] -488 -539 -531 -502 -447 ... $ lik6pt : num -425 $ p : num 0.848 $ N :num 10000 \end{CodeOutput} \end{CodeChunk} Examination of the structure of the returned list with \code{str} shows the p-value and log-likelihood for the number of violations of the six-point test from the data and indicates that the observer did not make a significantly greater number of six-point violations than an ideal observer. Figure~\ref{boot6pt} shows a histogram of the log-likelihoods from such a simulation with the observed log-likehood indicated by a thick vertical line. These results support the appropriateness of the scale as a description of the observer's judgments. \begin{figure}[htb!] \begin{center} \includegraphics[width=0.6\textwidth]{Figures/boothist} \caption{Histogram of log-likelihood values from the six-point test to data set \code{kk}. The thick vertical line indicates the observed six-point likelihood on the data set.} \label{boot6pt} \end{center} \end{figure} \subsection{Fitting a parametric difference scale} The results of the difference scaling experiment with scatterplots suggest that the perception of correlation increases quadratically with correlation (Fig.~\ref{figplot}). We can refit the data but constraining the estimated difference scaling curve to be a parametric curve, such as a power function, using the formula method for \code{mlds}. Under the hypothesis that perception of correlation depends on $r^2$, we would expect the exponent to be approximately $2$. \begin{Scode}{keep.source=TRUE} kk.frm <- mlds(~ (sx/0.98)^p[1], p = c(2, 0.2), data = kk) \end{Scode} We specify the parametric form for the difference scale with a one-sided formula. The operations take on their mathematical meaning here, as with formulae for \code{nls} but not as for \code{lm} and \code{glm} without isolation. The stimulus scale is indicated by a variable \code{sx} and the parameters to estimate by a vector, \code{p}. The fitting procedure adjusts the parameters to best predict the judgments of the observer on the standard scale. The equation is normalized by the highest value tested along the stimulus dimension, $r = 0.98$ so that the fitted curve passes through $1.0$ at this value. The optimization is performed by \code{optim} and initial values of the parameters are provided by a vector given as an argument in the second position. The last element of this vector is always the initial value for \code{sigma}. Finally, a data frame with the experimental results is, also, required. The estimated parameters are returned in a component \code{par} of the model object and the judgment variability, as usual, in the component \code{sigma}. Standard errors for each of these estimates can be obtained from the Hessian matrix in the \code{hess} component. \begin{Scode}{} c(p = kk.frm$par, sigma = kk.frm$sigma) sqrt(diag(solve(kk.frm$hess))) \end{Scode} The standard error for the exponent provides no evidence to reject an exponent of $2$. The parametric fit with $2$~parameters is nested within the $10$~parameter nonparametric fit. This permits us to test the parametric fit with a likelihood ratio test. We extract the log likelihoods from each model object with the \code{logLik} method. The degrees of freedom of each fit are obtained from the ``df'' attribute of the ``logLik'' object. \begin{Scode}{keep.source=TRUE} ddf <- diff(c(attr(logLik(kk.frm), "df"), attr(logLik(kk.mlds), "df"))) pchisq(-2 * c(logLik(kk.frm) - logLik(kk.mlds)), ddf, lower.tail = FALSE) \end{Scode} The results reject the power function description of the data. A predicted curve under the parametric model fit to the data is obtained from the \code{func} component of the model object, which takes two arguments: the estimated parameters, obtained from the \code{par} component of the model object and a vector of stimulus values for which to calculate predicted values. The plot of the predicted values against the values obtained by \code{glm} indicates that the power function fit provides a reasonable description of the data for correlations greater than 0.4 and a lack of fit for lower correlations, for which this observer shows no evidence of discrimination (Fig.~\ref{fig:mldsFormMeth}). \begin{figure}[h!] \begin{center} \setkeys{Gin}{width=0.65\textwidth} \begin{Scode}{fig=TRUE,eps=FALSE,keep.source=TRUE} plot(kk.mlds, standard.scale = TRUE, xlab = expression(r^2), ylab = "Difference Scale Value") xx <- seq(0, 0.98, len = 100) lines(xx, kk.frm$func(kk.frm$par, xx)) \end{Scode} \setkeys{Gin}{width=0.8\textwidth} \caption{Difference scales estimated for the perception of correlation obtained using the default (points) and formula (line) methods to fit the data. The formula used was a power function.} \label{fig:mldsFormMeth} \end{center} \end{figure} \section{The Method of Triads} In the Method of Quadruples, observers compare interval $(a,b)$ and $(c,d)$ and it is not difficult to see how the resulting difference scale would lend itself to predicting such judgments. We can also use a slightly restricted method for presenting stimuli that we refer to as the Method of Triads. On each trial, the observer sees three stimuli $(a,b,c)$ with $a3$ stimuli, there are \begin{equation} \left( \begin{array}{c} p \\ 3 \end{array} \right) =\frac{p!}{3!\:(p-3)!} \end{equation} possible non-overlapping triads. If $p=11$, as in the example, there are \begin{equation} \left( \begin{array}{c} 11 \\ 3 \end{array} \right) =\frac{11!}{3!\:8!}=165 \end{equation} different, non-overlapping triads. Of course, the number of trials judged by the observer affects the accuracy of the estimated difference scale (See Maloney and Yang \cite{MalYang03}) as we illustrate below.The time needed to judge two repetitions of all 165 trials with triads (330 trials total) is the number of trials for one repetition of all trials with non-overlapping quadruples. \section{Future directions} There are several directions in which \pkg{MLDS} might be developed, four of which will be mentioned here. First, we do not know what would be the most efficient choice of stimuli along a continuum or of quadruples for a particular application of MLDS. These depend on the observer's actual scale and judgment uncertainty $\sigma$, but given pilot data for the observer or previous results for other observers, it would be interesting to work out methods for selecting stimuli and quadruples. For example, having seen the results for one observer in the scatterplot example, we might consider stimuli that are equally spaced along the scale $r^2$ and not the scale $r$ in future experiments. Second, we plan on developing a more systematic method of assessing the asymptotic probabilities of the inverse link function to take into account unsystematic errors by the observers. The difficulty is that these parameters are not part of the linear predictor. One possibility is to profile the nuisance parameters (as we did here) or, alternatively, to develop a method that switches back and forth between adjusting the nuisance parameters and the coefficients of the linear predictor. Third, it would be useful to incorporate random effects that influence the scale when an observer repeats the experiment or to account for variations between individuals. Such heterogeneity is, indeed, apparent if we compare the three scales obtained on different days in the data set \code{kk}. For these data, there is only one random factor, the Run. It might be possible to treat this as an effect due to a randomized block \citep[p.~295]{VR02} The ratio of the scale values between any of the two repetitions is approximately constant across the physical scale, however, which suggests that the estimate of $\sigma$ across runs, or equivalently the maximum scale value, would be a more likely candidate to explain such a source of variability, but as a multiplicative rather than as an additive effect. We develop some approaches to this problem using the \pkg{lme4}, elsewhere (Knoblauch and Maloney, \emph{Modeling Psychophysical Data in \proglang{R}}, Springer, in preparation). %\section*{Computational details} % %The bivariate point distributions were generated %with the \code{mvrnorm} function from the package \pkg{MASS} (version 7.2-40) \citep{VR02}. %The \pkg{ggplot} package (version 0.4.2) \citep{ggplot} was used to create Figure~\ref{overlap}. %All \proglang{R} code for generating the figures is available in an \proglang{R} %script along with this paper \section*{Acknowledgments} This research was funded in part by Grant EY08266 from the National Institute of Health (LTM). \bibliography{MLDS} \end{document}