Changes for R-package ndl ################################################### Changes in version 0.2.18 (Sept 9th, 2018) (since version 0.2.17) * Removed unused openmp imports. * Removed duplications and inconsistancies in DESCRIPTION file and package info. ################################################### Changes in version 0.2.17 (Nov 17th, 2015) (since version 0.2.16) * Fixed error that resulted in wrong estimates when estimateWeights was used under some (all?) circumstances. * Hmisc becomes dependency. ################################################### Changes in version 0.2.16 (Feb 27th, 2013) (since version 0.2.14) * Fixed error that caused R to crash during garbage collection. This may have been due to a mysterious interactions between Rcpp and gc, but I am not sure what it was. Changing the type of returned object from Rcpp::List to SEXP fixed it. * Other minor fixes and error checking. ################################################### Changes in version 0.2.14 (Nov 11th, 2013) (since version 0.2.13) * Fixed a big bug in estimateWeights that reordered rows of the output. * Fixed a bug in orthoCoding that was caused by global variable collisions. * Added more error checking on input to functions. ################################################### Changes in version 0.2.12 and 0.2.13 (Nov 3rd, 2013) (since version 0.2.11) * Fixed problem with print.ndlClassify. * Fixed issue with OpenMP compilation ################################################### Changes in version 0.2.11 (Sept 23rd, 2013) (since version 0.2.10) * Fixed a bug in c++ module that added extra column to output matrix. ################################################### Changes in version 0.2.10 (Sept 23rd, 2013) (since version 0.2.9) * Added a better Serbian Unicode data set that uses Cyrillic instead of Latin letters, and has the proper Unicode encoding set. ################################################### Changes in version 0.2.9 (July 27th, 2013) (since version 0.2.6, skipping alpha unreleased versions) Implementer: Cyrus Shaoul * Fixed bugs and improved performance. * Prepared software for CRAN release. * Due to certain software dependencies, this release of ndl might not compile on Mac OS X 10.6 and earlier. If this is the case, please continue to use ndl version 0.1.6. The results produced are numerically almost identical, with slower performance being the main difference. * new option for estimateWeights : hasUnicode. This must be used when rownames contain Unicode! * other new options: addBackground and trueProbabilities (see the docs for explanation of what they do.) They are turned off by default to maintain continuity with ndl 0.1.6 ################################################### Changes in version 0.2.6 (Apr 30th, 2013) (since version 0.2.1) Implementer: Cyrus Shaoul * Fixed some bugs in estimateWeights.R relating to "addBackground". * Changed the way that duplicate cues were counted when "RemoveDuplicates" was set to FALSE. * Added a helper python script in inst/scripts called "ndl.preproc.py". This script can be used to convert a text corpus into a list of learning events. See the script for help on using it. * Duplicate outcomes in events are now removed by default. ################################################### Changes in version 0.2.1 (Jan 29th, 2013) (since version 0.2.0) Implementer: Cyrus Shaoul * added the ability to use the new Compact Event format in the function estimateWeightsCompact * added a new low-rank approximation of the SVD for the pseudoinverse in the case that there are more than 20000 cues. ################################################### Changes in version 0.2.0 (Dec 20th, 2012) (since version 0.1.6) Implementer: Cyrus Shaoul * Removed the program cooc.c and re-implemented the event processing code in a Rcpp module. * Parallelized where possible using OpenMP ################################################### Changes in version 0.1.6 (Nov 27th, 2012) (since version 0.1.1) Implementer: Antti Arppe * General: general modifications and fixes to several functions to check the integrity of input (mainly argument 'cuesOutcomes'), and otherwise with deal with diverse input formats with less hickups; new function 'predict.ndlClassify'; user-specifiable scalability for dealing with really large input files when using the C code method for the calculation of Cue-Cue and Cue-Outcome co-occurrences; maintainer e-mail address modified to a generic one. * acts2probs.R (acts2probs): code modified internally so that a minimum activation value of 'acts.minimum.correction=1e-10' is used instead when any activation value is exactly equal to 0 in input 'acts' (due to lack of any previously observed cues in input to e.g. 'estimateActivations'). * coocMatrices.c (cooc): C code fixed so that it will work with space characters ' ' in Cues and Outcomes (as input 'cuesOutcomes' to 'estimateWeights', 'cooccurrencesCues' and/or 'cooccurrencesCuesOutcomes') - due to this fix the 'Outcomes', 'Cues' and 'Frequency' columns are now separated by single tabulator characters '\t' (instead of spaces ' ') in the temporary text file: 'ndl_410025912.txt' through which data is passed to the C code from the 'estimateWeights', 'cooccurrencesCues' and 'cooccurencesCuesOutcomes' functions ; C code modified so that the sizes of Cues and Outcomes (and their combinations) can be specified flexibly by the user in 'estimateWeights', 'cooccurrencesCues' and/or 'cooccurrencesCuesOutcomes' beyond the default values (max.cues=20000, max.characters=20000, max.lines=500000; communicated via the temporary parameter file: 'ndl_par_410025912.txt'), which will allow the use of the C code alternative (with 'method="C"') via greater memory allocation for very large data sets on servers with (sufficient) greater capacity; C code modified so that upon encountering errors (not finding the parameter or data text files, or the data file exceeding 'max.lines'), which normally will all be detected by the calling R functions but can occur if the C code is called directly, an appropriate error message will be output to: 'ndl_err_410025912.txt' and the execution will end gracefully with 'return' (instead of 'exit'). * cooccurrencesCues.R (cooccurrencesCues): new arguments 'max.cues', 'max.characters' and 'max.lines' added and code modified so that the sizes of Cues and Outcomes in input 'cuesOutcomes'can be specified by the user beyond the default values (i.e. max.cues=20000, max.characters=20000, max.lines=500000; communicated via the temporary parameter file: 'ndl_par_410025912.txt' to the auxiliary C code alternative); global option "ndl.estimateWeights" indicating call from 'estimateWeights' checked so that the auxiliary functions 'cooccurrenceCues' and 'cooccurrenceCuesOutcomes' will not rewrite the input text file: 'ndl_410025912.txt'. * cooccurrenceCuesOutcomes.R (cooccurrencesCuesOutcomes): new arguments 'max.cues', 'max.characters' and 'max.lines' added and code modified so that the sizes of Cues and Outcomes in input 'cuesOutcomes'can be specified by the user beyond the default values (i.e. max.cues=20000, max.characters=20000, max.lines=500000; communicated via the temporary parameter file: 'ndl_par_410025912.txt' to the auxiliary C code alternative); ; global option "ndl.estimateWeights" indicating call from 'estimateWeights' checked so that the auxiliary functions 'cooccurrenceCues' and 'cooccurrenceCuesOutcomes' will not rewrite the input text file: 'ndl_410025912.txt'. * estimateActivations.R (estimateActivations): code and output values changed so that previously unseen Cues and Outcomes in input 'cuesOutcomes' are included in the output values, in addition to 'activationMatrix', as 'newCues' and 'newOutcomes', instead of these being printed to standard output; in the case of such unseen Cues or Outcomes a warning message is now output; code modified so that a warning message is output if potential accidental 'NA' strings (resulting from 'as.character(NA)') are detected among the input Cues and Outcomes in 'cuesOutcomes'; code will stop with an error message if any actual NA's are encountered among Cues and Outcomes in 'cuesOutcomes'. * estimateWeights.R (estimateWeights): new arguments 'max.cues', 'max.characters' and 'max.lines' added and code modified so that the sizes of Cues and Outcomes in input 'cuesOutcomes' can be specified by the user beyond the default values (i.e. max.cues=20000, max.characters=20000, max.lines=500000; communicated via the temporary parameter file: 'ndl_par_410025912.txt' to the auxiliary C code alternative); code modified so that the function will stop with an error message if accidental empty cues (string-initial or string-final, or multiple string-medial underscores '_'), or NA's are detected in Cues or Outcomes in 'cuesOutcomes'; a warning message is output if potential accidental 'NA' strings (resulting from 'as.character(NA)') are detected among the input Cues and Outcomes in 'cuesOutcomes'; code modified internally so that Cues and Outcomes in 'cuesOutcomes' are coerced to character strings, in case they might be input as factors (which may happen by default when reading data with 'read.table' or 'read.csv' into R from an external plain text or CSV file as a data frame); global option "ndl.estimateWeights" specified as TRUE so that the auxiliary functions 'cooccurrenceCues' and 'cooccurrenceCuesOutcomes' will not rewrite the input text file: 'ndl_410025912.txt'; after successful execution, "ndl.estimateWeights" again specified as NULL. * ndlClassify.R (ndlClassify): code modified internally so that character length of response (Outcome) variable in 'formula' argument is not restricted. * ndlClassify.R (print.ndlClassify): code modified so that setting 'max.print=NA' will print the entire 'weightMatrix' matrix. * ndlCrossvalidate.R (ndlCrossvalidate): code modified internally so that the checks of the 'formula' and 'frequency' arguments do not produce superfluous warnings. * ndlCuesOutcomes.R (ndlCuesOutcomes): code modified internally so that character length of response (Outcome) variable in 'formula' argument is not restricted. * predict.ndlClassify.R (predict.ndlClassify): new function that allows for the estimation of activation-based probabilities and/or the prediction of Outcomes with new, unseen data, using the weights in 'weightMatrix' in an object previously fitted with 'ndlClassify'. * summary.ndlClassify.R (summary.ndlClassify): code modified so that setting 'max.print=NA' will print out entire 'weights' matrix. * Documents: Manual page contents modified to correspond to changes in R code; references in manual pages updated.