R: _memisc

NEWS	R Documentation

memisc News

Version 0.99

NEW FEATURES

A new object-oriented infrastructure for the creation of HTML code is used in format_html() methods. This infrastructure is exposed by the html() function.
Support for with model groups in mtable(). c.mtable() now creates groups of models, if arguments are tagged.
Flattened contingency tables (ftable()s as they are created by the eponymous function in the stats() package) can now be combined into ftable_matrix() objects. This can be done by using rbind() or cbind().
There is now an object class for survey items containing dates (without times), called "Date.item".
Support for including sandwich estimates of sampling variances and standard errors into the output of summary() and mtable(), by the new generic functions withVCov() and withSE().
Support for different parameter sections is added to mtable. This is intended to allow output of mixed effects models to distinguish between ("fixed effects") coefficients and variance parameters.
Objects created by mtable() also can have several header lines. Facilities to add additional header lines will be added soon.
Optionally, mtable() shows the left-hand sides of model equations. This can be controlled by the optional argument show.eqnames and by the global option "mtable.show.eqnames".
Output of mtable() objects also include, if applicable, a note that explains the "significance stars" for p-values.
Summary statistics reported by mtable() can now be selected for each object or object class (via calls to options()) separately.
It is now possible to compress the output concerning control variables in mtable().
Support for HTML and LaTeX output in Jupyter notebooks is added to objects created by mtable() and ftable() etc.
The toLatex() method for "ftable" objects gains a fold.leaders option (with default value FALSE) which allows the row labels (leaders) to remain in a single column.
A function codeplan() creates a data frame describing the structure of an "importer", "data.set" or "item" object. It is possible to copy this so described structure from one "data.set" object to another or to a data frame.
New $ and [[ operators for "importer" objects allow to create codebooks for single items/variables in imported data files.
A duplicated_labels() function allows to show and describe duplicated labels and a deduplicate_labels() function allows to get rid of such duplicates.
New operators %#%, %##%, and %@% to manipulate annotations and other attributes.
A List() function adds names to its elements by deparsing arguments in the same way as data.frame() does.
A new function Groups() allows to split a data frame or a "data.set" into group based on factors in a more convenient way. There are methods of with() and within() to deal with resulting objects of class "grouped.data". For example, the within() method allows to substract group means from the observations within groups. withinGroups() allows to split a data frame or "data.set" objects into groups, make within-group computations and recombine the groups into the order of the original data frame or "data.set" object.
A new function Reshape() simplifies the syntax to reshape data frames and "data.set" objects from wide into long or from long into wide format.
'tibbles', including those created with the haven package can be translated into "data.set" objects without loss of information. Also "data.set" objects can be translated into 'tibbles' with minimal loss of information.
An extendable function view() allows to use the View() facilities provided by graphical user interfaces (in particular RStudio) with objects not originally supported by these user interfaces. In addition, view() methods for "codeplan", "decriptions", "data.set", and "inporter" are provided, which allow to conviently inspect the contents of these objects in RStudio.
An "as.data.table" for coercing "data.set" objects directly into "data.table" objects.
It is now possible to specify the measurement level for a set of variables in a "data.set" objects, either by using the assignment operator with measurement() or by using the new function set_measurement().
There are convenience wrappers such as Mean() etc. for mean() etc. that have the default setting na.rm=TRUE instead of na.rm=FALSE.
A new deduplicate_labels() function allows to deal with duplicate labels (where several codes have the same label)
It is now possible to create codebooks for weighted data.
The function trim_labels() allows to trim codes from value labels.
The function reverse() allows to reorder the codes of a survey item in reverse order.
The generic function Means() allows to conveniently obtain group means, optionally with standard errors and/or confidence intervals.
The colon operator (:) can be used to refer to ranges of variables in foreach()
Code plans (objects in class "codeplan") can now be exported to and imported from YAML and JSON files.
A new generic function format_md() (contributed by Mael Astrud-Le Souder) allows to format R objects in Markdown. Currently, methods for codebooks (and entries in codebooks) are implemented.
A new generic function coarsen() allows to coarsen numeric vectors into factors, based on a given number of categories.
A new generic function measurement_autolevel() allows to automatically select the appropriate measurement level for survey items.
A new operator %if% allows to assign values to a variable for observations that satisfy a condition.
A new operator %$$% allows to abbreviate object modifications using within(), i.e. instead of a <- within(a, { ... }) you can write a %$$% { ... }

IMPROVEMENTS

Subset methods for importer objects are much more memory efficient and now can handle files of size larger than 1GB.
useDcolumn and useBooktabs arguments of toLatex() methods now have global options as defaults
toLatex() methods optionally escape dollar, subscript and superscript symbols. This can be set either by an explicit (new) argument toLatex.escape.tex or by a global option with the same name.
The toLatex() method for "ftable" objects has a new option fold.leaders.
spss.system.file() now translates numeric variables with any SPSS date format into a "datetime.item"
The function List() adds names to the elements of the resulting list in a way similar to how data.frame() adds names to the columns of a data frame.
Stata.file() now handles files in format rev. 117 and later as they are created by Stata version later than 13.
User definded missing values are now reported in separate tables in entries created by codebook() even if these entries refer to items with measurement level "interval" or "ratio".
If the annotation or the labels of a non-item is set to NULL this no longer causes an error.
Changing varible names to lowercase while importing data sets with Stata.file(), spss.portable.file(), and spss.system.file() is now optional.
Importer methods Stata.file(), spss.portable.file(), and spss.system.file() now have optional arguments that allow to deal with variable labels or value labels in non-native encoding (e.g. CP1252 on a utf-8 platform).
A function spss.file() acts as a common interface to spss.portable.file() and spss.system.file().
The function head() and tail() now work with "data.set" and "importer" objects in the same sensible way as they do with data frames.
The function recode() behaves more coherently: If a labelled vector is the result of 'recode' it gets the measurement level "nominal". Factor levels explictly created first come first in the order of factor levels.
The function spss.system.file() now handles buggy SPSS system files that lack information about the number of variables in their header. (These files are typically created by the library ReadStat, used e.g. by the R package 'haven'.)
SPSS syntax files are now converted to the encoding of the host system if they have a different one. By default, the original encoding is assumed to be Codepage 1252 (extended Latin-1).
codebook(), codeplan(), labels(), value.filter, and related functions return NULL for NULL arguments.
codeplan() also works with indiviual survey items and can set to NULL, which means that all memisc-specific information is removed from the data.
codebook() works also with data frames (or "tibbles") imported with the haven package.
codebook() now makes use of the "label" attribute of variables if the attribute is present.
with(Groups()), withGroups(), within(Groups()), withinGroups(), Aggregate(), and genTable() are considerably faster now. They can also make use of certain automatic variables such as n_, i_ that contain group sizes and group indices.
relabel(), rename(), and dimrename(), do no longer require their arguments to be enclosed in quotation marks.
Operators '$', '[', and '[[' can now be appied to codebook objects to get a codebook of a subset of the varaibles.
spss.system.file() now uses information contained in SPSS files (if available) to determine the measurement level of the improrted variables.
spss.system.file() uses information about the character set encoding if available in the file to translate variable labels and value labels into the coding of the machine on which R is being run.
spss.system.file() also (optionally) uses information about the intended measurement level fo variables in the file.
as.item() now drops non-unique labelled values when applied to a "labelled", "haven_labelled", or "haven_labelled_spss" object.
spss.system.file() no takes into account metadata about measurement levels ("nominal", "ordinal", or "scale") to set the measurement() attributes of the items in the resulting "importer" and "data.set" objects.
mtable() now handles objects of class "clmm" (from package "ordinal") and the handling of objects of class "merMod" (from package "lme4") is more consistent with those of class "glm" (e.g. the number of observations is shown).
Variance component estimates of "merMod" and "clmm" objects are reported as distinct statistics.
recode() has a new optional argument code=. If TRUE, existing codes (and labels) are retained.
recode() now allows to recode factors into numeric vectors.
If the change in codes done by recode() merely reorders codes, labels are reordered accordingly, unless labels are explicitly given.
subset() is S3-generic again, as this allows for lazy evaluation of its arguments.
cases() handles NAs more sensibly - if a case condition is TRUE this leads to a non-NA result even if other conditions evaluate to FALSE, if cases() is called with na.rm=TRUE (the default).
The result of subset and of the bracket-operator ([]) applied to importer objects has row names that indicate the rows selected from the full data.
A method of format for data set objects is added.
The row names of subsets fo importer objects reflect the row numbers in the original data.
collect.data.frame and collect.data.set gain a use_last and a detailed_warnings option to improve handling of variables with different attributes in different objects being collected.
spss.system.file(), spss.portable.file(), and Stata.file() get an optional negative2missing argument.
recode() keeps NAs as NAs when an otherwise argument is given and NAs are not recoded explicitly.
codebook() now fully supports logical vectors.
HTML output created by format_html etc. now uses '<style>' elements for formatting. This reduces the size of created HTML code.

BUGFIXES

str and ls.str are imported from the utils package to prevent a NOTE in R CMD check
HTML tables and lists are no loger wrapped in HTML paragraphs in format_html.CodebookEntry.
show and codebookEntry methods for the "datetime.item" now work asexpected.
cases handles NAs more gracefully
toLatex.ftable output has been improved: No attempt at showing non-existent variable names, better application of extracolsep.
Duplicate value labels now produce an error if item object is coerced into a factor.
A bug concerning missing values in SPSS files is fixed.
Headlines in vignettes are now coherent.
mtable with empty summary sections can be created (again).
Objects returned by mtable return objects with class "memisc_mtable" to avoid name clash with objects created by the model.table in package "stats".
Calls to PROTECTION are added to the C-source to prevent protection errors.
toLatex() now handles matrices in data frames.
spss.portable.file() now handles files with weighting variables and empty variable labels.
spss.fixed.file() now handles files with lines that are longer than the number of columns specified in the columns definition file.
spss.system.file() now correctly imports value labels of string variables.
Some PROTECTION issus in the C-source flagged by Tomas Kalibera's rchk utility are fixed.
If "data.set" objects are combined and succeeding objects contain "items" not contained in the preceding ones, the result now will still be a valid "data.set" object.
seekData etc. no longer try to recreate external pointers in order to avoid segmentation faults. Also the deletion of empty pointers is avoided for the same reason.
as.data.set works for "tibbles" also when method dispatch via class inheritance does not work.
codebook() now handles character variables in SPSS system files correclty.
codebook() uses the appropriate logical operator in checking for missings.

USER-VISIBLE CHANGES

All vignettes are now using knitr.
HTML output uses unicode characters by default instead of amersand-escapes to enhance compatibility with pandoc.
codebook() no longer shows the skewness and kurtosis of numeric variables to save output space.

DEFUNCT

The function UnZip has been removed from the package. unzip in conjunction with system.file does the same job, as can be seen in the example for spss.portable.file.

Version 0.98

NEW FEATURES

Support for exporting results of various functions into HTML format is now supported by the function format_html. This should make it easier to import them into HTML or word-processing documents (that support importing HTML). A preview of the HTML is made available by the new (generic) function show_html.

In particular, results of the functions mtable (i.e. tables of model estimates), ftable (i.e. flattened contingency tables etc.), and codebooks, can be exported int. HTML using format_html. Also data frames can be exported into HTML.
A function dsView is added, which allows a display of data.set objects similar as View displays data frames.
mtable now handles multi-equation models better, in particular if the model objects supplied as arguments vary in the number and/or names of the equations. There is also a new option to place confidence intervals to the right of coefficient estimates. Further mtable gains the following optional aguments:
- show.baselevel, which allows to suppress the display of baseline categories of dummy variables, when dummy variable coefficients are displayed
- sdigits, to specify the number of digits of summary statistics.
- gs.options, to pass optional arguments to getSummary, allowin for more flexibility in creating tables.
One can now use a summaryTemplate generic function for formatting model summaries, in addition to set the template by setSummaryTemplate. Finally, parts of "mtables" can be extracted using the [ operator as with matrices, and "mtables" can now also be concatenated.
There is now an object class for survey items containing dates and times, called "datetime.item"
There is a new function wild.codes to check wild codes (i.e. unlabelled codes of an otherwise labelled item.)
codebook now supports data frames, factors, and numeric vectors.
A toLatex method exists now for data.set objects, data frames and other objects.
A new percentages function is added to allow easy creation of tables of percentages.

BUGFIXES

spss.fixed.file is now able to handle labelled strings and value labels and missing values statements.
Internal C-code used by spss.fixed.file no longer assumed that arguments are copied – some strange behaviour of objects created by spss.fixed.file is now corrected.
Description of items in external data sources is more complete now - the same information as for items in internal data.sets.
applyTemplate now returns empty strings for undefined quantities.
collect method for data.sets now works as expected.
spss.fixed.file now checks whether there are undefined variables in varlab.file etc.
Stata.file now can import Stata 9 and Stata 10 files.

USER-VISIBLE CHANGES

Argument drop no longer used by function mtable.
Format of file produced by write.mtable can now be specified using a format= argument. But forLaTeX=TRUE still can be used to get LaTeX files.

DEFUNCT

The functions Termplot, Simulate, and panel.errbars are defunct. Graphics similar to those built with panel.errbars can be created with facilities provided by the package "mplot", which is currently available on GitHub.

Version 0.97

NEW FEATURES

spss.system.file and spss.portable.file gain a tolower= argument that defaults to TRUE, which allows to change annoying all-upper-case variable names to lower case
New generic function Iconv() that allows to change the character enconding of variable descriptions and value labels. It has methods for "data.set", "importer", "item", "annotation", and "value.label" objects.
There is now a method of as.character() for "codebook" objects and a convenience function Write() with methods for "codebook" and "description" to make it more convenient to direct the output of codebook() and description() into text files.
A method for "merMod" objects of the getSummary() generic function. mtable() now should be able (again) to handle estimation results produced by lmer() and glmer() from package 'lme4'.
recode() handles character vectors in a more convenient way: They are converted into factors with sorted unique values (after recoding) as levels.

USER-VISIBLE CHANGES

getSummary.expCoef is renamed into getSummary_expCoef.

DEFUNCT

S3 method aggregate.formula has been removed from the package to avoid clash with method of the same name in the base package. The function Aggregate can be used instead.
Removed include, uninclude, and detach.sources as these are flagged as modifying the global namespace.