\documentclass{article} \usepackage{enumerate} \usepackage{listings} \newcommand{\rcode}[1]{\lstinline[language=R,basicstyle=\normalsize\ttfamily]!#1!} %\VignetteEngine{knitr::knitr} %\VignetteIndexEntry{Using fitPS to fit number of groups data} \title{Fitting a zeta distribution to a {P} survey---number of groups data} \begin{document} <>= options(width=70) knitr::opts_chunk$set(message = FALSE, #background = opcolour, comment = "", highlight = TRUE, prompt = TRUE, tidy = TRUE, tidy.opts = list(arrow = FALSE), warning = FALSE) suppressWarnings(suppressPackageStartupMessages({library(fitPS)})) @ In this example we will learn how to use \rcode{fitPS} to fit a zeta distribution to some data from a survey where the number of groups of glass found is recorded. The data in this example comes from @roux2001 who surveyed the footwear of 776 individuals in south-eastern Australia, and is summarised in the table below. <>= library(xtable) tbl = Psurveys$roux$data colnames(tbl) = c("$n$", "$r_n$") tbl = xtable(tbl, align = "rrr", digits = 0) print(tbl, type="latex", , include.rownames = FALSE, sanitize.text = function(x){x}) @ This data set is built into the package and can be accessed from the \rcode{Psurveys} object. That is, we can type: <>= data("Psurveys") roux = Psurveys$roux @ The data is stored as an object of class \rcode{psData}. This probably will not be of importance to most users. If you are interested in the details, then these can be found in the \textbf{Value} section of the help page for \rcode{readData}. There is an S3 \rcode{print} method for objects of time, meaning that if we print the object---either by typing its name at the command prompt, or by explicitly calling \rcode{print}---then we will get formatted printing of the information contained within the object. Specifically: \begin{enumerate}[-] \item the values of $n$ and $r_n$ will be printed out in tabular format (where $n$ is the number of groups or the size of the groups, and $r_n$ is the number of times $n$ has been observed in the survey), \item the type of survey will be printed---either "Number of Groups" or "Group Size", \item and if the object has a reference or notes attached to it, then these will be printed as well \end{enumerate} For example <>= roux @ It is very simple to fit a zeta distribution to this data set. We do this using the \rcode{fitDist} function. <>= fit = fitDist(roux) @ The function returns an object of class \rcode{psFit}, the details of which can be found in the help page for \rcode{fitDist}. There are both S3 \rcode{print} and S3 \rcode{plot} methods for objects of this class. The \rcode{print} method method displays an estimate of the shape parameter $s$, an estimate of the standard deviation---the standard error---of the estimate of $s$ ($\widehat{\mathrm{sd}}(\hat{s})=\mathrm{se}(\hat{s})$). **NOTE**: it is important to understand that the value of the shape parameter that is displayed, and the value that is stored in the fitted object differ by 1. That is, $s$ is shown,m and $s^\prime = s - 1$ is stored. This is done because the package which assists in the fitting, \rcode{VGAM}, is parameterised in terms of $s^\prime$. This difference only has consequences if the fitted value is being used in conjuction with other \rcode{VGAM} functions. The \rcode{print} method also displays the first 10 fitted probabilities from the model by default. <>= fit @ The package provides a \rcode{confint} method for the fitted value. The method returns both a Wald confidence interval and profile likelihood interval. The Wald interval takes the usual the form where the lower and upper bound are given by $\hat{s} \pm z^*(1-\alpha/2)\times se(\hat{s}))$. The profile likelihood interval finds the end-points of the interval that satisfies \[ -2\left[\ell(\hat{s};\mathbf{x})-\ell(s;\mathbf{x})\right] \le \chi^2_1(\alpha) \] where $\ell(s;\mathbf{x})$ is value of the log-likelihood given shape parameter $s$. The two intervals are returned as elements of a \rcode{list} named \rcode{wald} and \rcode{prof} respectively. <>= ci = confint(fit) ci$wald ci$prof @ You will notice that neither of these intervals contain the value shown in the previous output. However, this simply because they are confidence intervals on $s^\prime$ and not $s$. This can be remedied by adding one to each interval: <>= ci$wald + 1 ci$prof + 1 @ The reason for not \emph{correcting} these intervals is that the method mostly exists to feed into other parts of the package, especially the \rcode{plot} method. \end{document}