Title: | Machine Learning in R |
Version: | 2.19.2 |
Description: | Interface to a large number of classification and regression techniques, including machine-readable parameter descriptions. There is also an experimental extension for survival analysis, clustering and general, example-specific cost-sensitive learning. Generic resampling, including cross-validation, bootstrapping and subsampling. Hyperparameter tuning with modern optimization techniques, for single- and multi-objective problems. Filter and wrapper methods for feature selection. Extension of basic learners with additional operations common in machine learning, also allowing for easy nested resampling. Most operations can be parallelized. |
License: | BSD_2_clause + file LICENSE |
URL: | https://mlr.mlr-org.com, https://github.com/mlr-org/mlr |
BugReports: | https://github.com/mlr-org/mlr/issues |
Depends: | ParamHelpers (≥ 1.10), R (≥ 3.0.2) |
Imports: | backports (≥ 1.1.0), BBmisc (≥ 1.11), checkmate (≥ 1.8.2), data.table (≥ 1.12.4), ggplot2, methods, parallelMap (≥ 1.3), stats, stringi, survival, utils, XML |
Suggests: | ada, adabag, batchtools, bit64, brnn, bst, C50, care, caret (≥ 6.0-57), class, clue, cluster, ClusterR, clusterSim (≥ 0.44-5), cmaes, cowplot, crs, Cubist, deepnet, DiceKriging, e1071, earth, elasticnet, emoa, evtree, fda.usc, FDboost, FNN, forecast (≥ 8.3), fpc, frbs, FSelector, FSelectorRcpp (≥ 0.3.5), gbm, GenSA, ggpubr, glmnet, GPfit, h2o (≥ 3.6.0.8), Hmisc, irace (≥ 2.0), kernlab, kknn, klaR, knitr, laGP, LiblineaR, lintr (≥ 1.0.0.9001), MASS, mboost, mco, mda, memoise, mlbench, mldr, mlrMBO, modeltools, mRMRe, neuralnet, nnet, numDeriv, pamr, pander, party, pec, penalized (≥ 0.9-47), pls, PMCMRplus, praznik (≥ 5.0.0), randomForest, ranger (≥ 0.8.0), rappdirs, refund, rex, rFerns, rgenoud, rmarkdown, Rmpi, ROCR, rotationForest, rpart, RRF, rsm, RSNNS, rucrdtw, RWeka, sda, sf, smoof, sparseLDA, stepPlr, survAUC, svglite, testthat, tgp, TH.data, tidyr, tsfeatures, vdiffr, wavelets, xgboost (≥ 0.7) |
VignetteBuilder: | knitr |
ByteCompile: | yes |
Config/testthat/edition: | 3 |
Config/testthat/parallel: | true |
Config/testthat/start-first: | featsel_plotFilterValues,base_plotResiduals,base_generateHyperParsEffect, tune_tuneIrace, featsel_filters, learners_all*, regr_h2ogbm |
Encoding: | UTF-8 |
LazyData: | yes |
RoxygenNote: | 7.3.1 |
SystemRequirements: | gdal (optional), geos (optional), proj (optional), udunits (optional), gsl (optional), gmp (optional), glu (optional), jags (optional), mpfr (optional), openmpi (optional) |
NeedsCompilation: | yes |
Packaged: | 2024-06-11 22:42:57 UTC; user |
Author: | Bernd Bischl |
Maintainer: | Martin Binder <mlr.developer@mb706.com> |
Repository: | CRAN |
Date/Publication: | 2024-06-12 10:50:02 UTC |
mlr: Machine Learning in R
Description
Interface to a large number of classification and regression techniques, including machine-readable parameter descriptions. There is also an experimental extension for survival analysis, clustering and general, example-specific cost-sensitive learning. Generic resampling, including cross-validation, bootstrapping and subsampling. Hyperparameter tuning with modern optimization techniques, for single- and multi-objective problems. Filter and wrapper methods for feature selection. Extension of basic learners with additional operations common in machine learning, also allowing for easy nested resampling. Most operations can be parallelized.
Author(s)
Maintainer: Martin Binder mlr.developer@mb706.com
Authors:
Bernd Bischl bernd_bischl@gmx.net (ORCID)
Michel Lang michellang@gmail.com (ORCID)
Lars Kotthoff larsko@uwyo.edu
Patrick Schratz patrick.schratz@gmail.com (ORCID)
Julia Schiffner schiffner@math.uni-duesseldorf.de
Jakob Richter code@jakob-r.de
Zachary Jones zmj@zmjones.com
Giuseppe Casalicchio giuseppe.casalicchio@stat.uni-muenchen.de (ORCID)
Mason Gallo masonagallo@gmail.com
Other contributors:
Jakob Bossek jakob.bossek@tu-dortmund.de (ORCID) [contributor]
Erich Studerus erich.studerus@upkbs.ch (ORCID) [contributor]
Leonard Judt leonard.judt@tu-dortmund.de [contributor]
Tobias Kuehn tobi.kuehn@gmx.de [contributor]
Pascal Kerschke kerschke@uni-muenster.de (ORCID) [contributor]
Florian Fendt flo_fendt@gmx.de [contributor]
Philipp Probst philipp_probst@gmx.de (ORCID) [contributor]
Xudong Sun xudong.sun@stat.uni-muenchen.de (ORCID) [contributor]
Janek Thomas janek.thomas@stat.uni-muenchen.de (ORCID) [contributor]
Bruno Vieira bruno.hebling.vieira@usp.br [contributor]
Laura Beggel laura.beggel@web.de (ORCID) [contributor]
Quay Au quay.au@stat.uni-muenchen.de (ORCID) [contributor]
Florian Pfisterer pfistererf@googlemail.com [contributor]
Stefan Coors stefan.coors@gmx.net [contributor]
Steve Bronder sab2287@columbia.edu [contributor]
Alexander Engelhardt alexander.w.engelhardt@gmail.com [contributor]
Christoph Molnar christoph.molnar@stat.uni-muenchen.de [contributor]
Annette Spooner a.spooner@unsw.edu.au [contributor]
See Also
Useful links:
Report bugs at https://github.com/mlr-org/mlr/issues
Aggregation object.
Description
An aggregation method reduces the performance values of the test (and possibly the training sets) to a single value. To see all possible implemented aggregations look at aggregations.
The aggregation can access all relevant information of the result after resampling and combine them into a single value. Though usually something very simple like taking the mean of the test set performances is done.
Object members:
- id (
character(1)
) Name of the aggregation method.
- name (
character(1)
) Long name of the aggregation method.
- properties (character)
Properties of the aggregation.
- fun ('function(task, perf.test, perf.train, measure, group, pred)])
Aggregation function.
See Also
BenchmarkResult object.
Description
Result of a benchmark experiment conducted by benchmark with the following members:
- results (list of ResampleResult):
-
A nested list of resample results, first ordered by task id, then by learner id.
- measures (list of Measure):
-
The performance measures used in the benchmark experiment.
- learners (list of Learner):
-
The learning algorithms compared in the benchmark experiment.
The print method of this object shows aggregated performance values for all tasks and learners.
It is recommended to
retrieve required information via the getBMR*
getter functions.
You can also convert the object using as.data.frame.
See Also
Other benchmark:
batchmark()
,
benchmark()
,
convertBMRToRankMatrix()
,
friedmanPostHocTestBMR()
,
friedmanTestBMR()
,
generateCritDifferencesData()
,
getBMRAggrPerformances()
,
getBMRFeatSelResults()
,
getBMRFilteredFeatures()
,
getBMRLearnerIds()
,
getBMRLearnerShortNames()
,
getBMRLearners()
,
getBMRMeasureIds()
,
getBMRMeasures()
,
getBMRModels()
,
getBMRPerformances()
,
getBMRPredictions()
,
getBMRTaskDescs()
,
getBMRTaskIds()
,
getBMRTuneResults()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCritDifferences()
,
reduceBatchmarkResults()
Confusion matrix
Description
The result of calculateConfusionMatrix.
Object members:
- result (matrix)
Confusion matrix of absolute values and marginals. Can also contain row and column sums of observations.
- task.desc (TaskDesc)
Additional information about the task.
- sums (
logical(1)
) Flag if marginal sums of observations are calculated.
- relative (
logical(1)
) Flag if the relative confusion matrices are calculated.
- relative.row (matrix)
Confusion matrix of relative values and marginals normalized by row.
- relative.col (matrix)
Confusion matrix of relative values and marginals normalized by column.
- relative.error (
numeric(1)
) Relative error overall.
See Also
Other performance:
calculateConfusionMatrix()
,
calculateROCMeasures()
,
estimateRelativeOverfitting()
,
makeCostMeasure()
,
makeCustomResampledMeasure()
,
makeMeasure()
,
measures
,
performance()
,
setAggregation()
,
setMeasurePars()
Failure model.
Description
A subclass of WrappedModel. It is created
if you set the respective option in configureMlr - when a model internally crashed during training. The model always predicts NAs.
The if mlr option on.error.dump
is TRUE
, the
FailureModel
contains the debug trace of the error.
It can be accessed with getFailureModelDump
and
inspected with debugger
.
Its encapsulated learner.model
is simply a string:
The error message that was generated when the model crashed.
The following code shows how to access the message.
See Also
Other debug:
ResampleResult
,
getPredictionDump()
,
getRRDump()
Examples
configureMlr(on.learner.error = "warn")
data = iris
data$newfeat = 1 # will make LDA crash
task = makeClassifTask(data = data, target = "Species")
m = train("classif.lda", task) # LDA crashed, but mlr catches this
print(m)
print(m$learner.model) # the error message
p = predict(m, task) # this will predict NAs
print(p)
print(performance(p))
configureMlr(on.learner.error = "stop")
Create control structures for feature selection.
Description
Feature selection method used by selectFeatures.
The methods used here follow a wrapper approach, described in
Kohavi and John (1997) (see references).
The following optimization algorithms are available:
- FeatSelControlExhaustive
Exhaustive search. All feature sets (up to a certain number of features
max.features
) are searched.- FeatSelControlRandom
Random search. Features vectors are randomly drawn, up to a certain number of features
max.features
. A feature is included in the current set with probabilityprob
. So we are basically drawing (0,1)-membership-vectors, where each element is Bernoulli(prob
) distributed.- FeatSelControlSequential
Deterministic forward or backward search. That means extending (forward) or shrinking (backward) a feature set. Depending on the given
method
different approaches are taken.
sfs
Sequential Forward Search: Starting from an empty model, in each step the feature increasing the performance measure the most is added to the model.
sbs
Sequential Backward Search: Starting from a model with all features, in each step the feature decreasing the performance measure the least is removed from the model.
sffs
Sequential Floating Forward Search: Starting from an empty model, in each step the algorithm chooses the best model from all models with one additional feature and from all models with one feature less.
sfbs
Sequential Floating Backward Search: Similar tosffs
but starting with a full model.- FeatSelControlGA
Search via genetic algorithm. The GA is a simple (
mu
,lambda
) or (mu
+lambda
) algorithm, depending on thecomma
setting. A comma strategy selects a new population of sizemu
out of thelambda
>mu
offspring. A plus strategy uses the joint pool ofmu
parents andlambda
offspring for selectingmu
new candidates. Out of thosemu
features, the newlambda
features are generated by randomly choosing pairs of parents. These are crossed over andcrossover.rate
represents the probability of choosing a feature from the first parent instead of the second parent. The resulting offspring is mutated, i.e., its bits are flipped with probabilitymutation.rate
. Ifmax.features
is set, offspring are repeatedly generated until the setting is satisfied.
Usage
makeFeatSelControlExhaustive(
same.resampling.instance = TRUE,
maxit = NA_integer_,
max.features = NA_integer_,
tune.threshold = FALSE,
tune.threshold.args = list(),
log.fun = "default"
)
makeFeatSelControlGA(
same.resampling.instance = TRUE,
impute.val = NULL,
maxit = NA_integer_,
max.features = NA_integer_,
comma = FALSE,
mu = 10L,
lambda,
crossover.rate = 0.5,
mutation.rate = 0.05,
tune.threshold = FALSE,
tune.threshold.args = list(),
log.fun = "default"
)
makeFeatSelControlRandom(
same.resampling.instance = TRUE,
maxit = 100L,
max.features = NA_integer_,
prob = 0.5,
tune.threshold = FALSE,
tune.threshold.args = list(),
log.fun = "default"
)
makeFeatSelControlSequential(
same.resampling.instance = TRUE,
impute.val = NULL,
method,
alpha = 0.01,
beta = -0.001,
maxit = NA_integer_,
max.features = NA_integer_,
tune.threshold = FALSE,
tune.threshold.args = list(),
log.fun = "default"
)
Arguments
same.resampling.instance |
( |
maxit |
( |
max.features |
( |
tune.threshold |
( |
tune.threshold.args |
(list) |
log.fun |
( |
impute.val |
(numeric) |
comma |
( |
mu |
( |
lambda |
( |
crossover.rate |
( |
mutation.rate |
( |
prob |
( |
method |
( |
alpha |
( |
beta |
( |
Value
(FeatSelControl). The specific subclass is one of FeatSelControlExhaustive, FeatSelControlRandom, FeatSelControlSequential, FeatSelControlGA.
References
Ron Kohavi and George H. John,
Wrappers for feature subset selection, Artificial Intelligence Volume 97, 1997, 273-324.
http://ai.stanford.edu/~ronnyk/wrappersPrint.pdf.
See Also
Other featsel:
analyzeFeatSelResult()
,
getFeatSelResult()
,
makeFeatSelWrapper()
,
selectFeatures()
Result of feature selection.
Description
Container for results of feature selection.
Contains the obtained features, their performance values
and the optimization path which lead there.
You can visualize it using analyzeFeatSelResult.
Details
Object members:
- learner (Learner)
Learner that was optimized.
- control (FeatSelControl)
Control object from feature selection.
- x (character)
Vector of feature names identified as optimal.
- y (numeric)
Performance values for optimal
x
.- threshold (numeric)
Vector of finally found and used thresholds if
tune.threshold
was enabled in FeatSelControl, otherwise not present and henceNULL
.- opt.path (ParamHelpers::OptPath)
Optimization path which lead to
x
.
Query properties of learners.
Description
Properties can be accessed with getLearnerProperties(learner)
, which returns a
character vector.
The learner properties are defined as follows:
- numerics, factors, ordered
Can numeric, factor or ordered factor features be handled?
- functionals
Can an arbitrary number of functional features be handled?
- single.functional
Can exactly one functional feature be handled?
- missings
Can missing values in features be handled?
- weights
Can observations be weighted during fitting?
- oneclas, twoclass, multiclass
Only for classif: Can one-class, two-class or multi-class classification problems be handled?
- class.weights
Only for classif: Can class weights be handled?
- rcens, lcens, icens
Only for surv: Can right, left, or interval censored data be handled?
- prob
For classif, cluster, multilabel, surv: Can probabilites be predicted?
- se
Only for regr: Can standard errors be predicted?
- oobpreds
Only for classif, regr and surv: Can out of bag predictions be extracted from the trained model?
- featimp
For classif, regr, surv: Does the model support extracting information on feature importance?
Usage
getLearnerProperties(learner)
hasLearnerProperties(learner, props)
Arguments
learner |
(Learner | |
props |
(character) |
Value
getLearnerProperties
returns a character vector with learner properties.
hasLearnerProperties
returns a logical vector of the same length as props
.
See Also
Other learner:
getClassWeightParam()
,
getHyperPars()
,
getLearnerId()
,
getLearnerNote()
,
getLearnerPackages()
,
getLearnerParVals()
,
getLearnerParamSet()
,
getLearnerPredictType()
,
getLearnerShortName()
,
getLearnerType()
,
getParamSet()
,
helpLearner()
,
helpLearnerParam()
,
makeLearner()
,
makeLearners()
,
removeHyperPars()
,
setHyperPars()
,
setId()
,
setLearnerId()
,
setPredictThreshold()
,
setPredictType()
Query properties of measures.
Description
Properties can be accessed with getMeasureProperties(measure)
, which returns a
character vector.
The measure properties are defined in Measure.
Usage
getMeasureProperties(measure)
hasMeasureProperties(measure, props)
Arguments
measure |
(Measure) |
props |
(character) |
Value
getMeasureProperties
returns a character vector with measure properties.
hasMeasureProperties
returns a logical vector of the same length as props
.
Prediction object.
Description
Result from predict.WrappedModel.
Use as.data.frame
to access all information in a convenient format.
The function getPredictionProbabilities is useful to access predicted probabilities.
The data
member of the object contains always the following columns:
id
, index numbers of predicted cases from the task, response
either a numeric or a factor, the predicted response values, truth
,
either a numeric or a factor, the true target values.
If probabilities were predicted, as many numeric columns as there were classes named
prob.classname
. If standard errors were predicted, a numeric column named se
.
The constructor makePrediction
is mainly for internal use.
Object members:
- predict.type (
character(1)
) Type set in setPredictType.
- data (data.frame)
See details.
- threshold (
numeric(1)
) Threshold set in predict function.
- task.desc (TaskDesc)
Task description object.
- time (
numeric(1)
) Time learner needed to generate predictions.
- error (
character(1)
) Any error messages generated by the learner (default NA_character_).
Internal, do not use!
Usage
makePrediction(
task.desc,
row.names,
id,
truth,
predict.type,
predict.threshold = NULL,
y,
time,
error = NA_character_,
dump = NULL
)
Internal construction / wrapping of learner object.
Description
Wraps an already implemented learning method from R to make it accessible to mlr. Call this method in your constructor. You have to pass an id (name), the required package(s), a description object for all changeable parameters (you do not have to do this for the learner to work, but it is strongly recommended), and use property tags to define features of the learner.
For a general overview on how to integrate a learning algorithm into mlr's system, please read the section in the online tutorial: https://mlr.mlr-org.com/articles/tutorial/create_learner.html
To see all possible properties of a learner, go to: LearnerProperties.
Usage
makeRLearner()
makeRLearnerClassif(
cl,
package,
par.set,
par.vals = list(),
properties = character(0L),
name = cl,
short.name = cl,
note = "",
class.weights.param = NULL,
callees = character(0L)
)
makeRLearnerMultilabel(
cl,
package,
par.set,
par.vals = list(),
properties = character(0L),
name = cl,
short.name = cl,
note = "",
callees = character(0L)
)
makeRLearnerRegr(
cl,
package,
par.set,
par.vals = list(),
properties = character(0L),
name = cl,
short.name = cl,
note = "",
callees = character(0L)
)
makeRLearnerSurv(
cl,
package,
par.set,
par.vals = list(),
properties = character(0L),
name = cl,
short.name = cl,
note = "",
callees = character(0L)
)
makeRLearnerCluster(
cl,
package,
par.set,
par.vals = list(),
properties = character(0L),
name = cl,
short.name = cl,
note = "",
callees = character(0L)
)
makeRLearnerCostSens(
cl,
package,
par.set,
par.vals = list(),
properties = character(0L),
name = cl,
short.name = cl,
note = "",
callees = character(0L)
)
Arguments
cl |
( |
package |
(character) |
par.set |
(ParamHelpers::ParamSet) |
par.vals |
(list) |
properties |
(character) |
name |
( |
short.name |
( |
note |
( |
class.weights.param |
( |
callees |
(character) |
Value
(RLearner). The specific subclass is one of RLearnerClassif, RLearnerCluster, RLearnerMultilabel, RLearnerRegr, RLearnerSurv.
Prediction from resampling.
Description
Contains predictions from resampling, returned (among other stuff) by function resample.
Can basically be used in the same way as Prediction, its super class.
The main differences are:
(a) The internal data.frame (member data
) contains an additional column iter
, specifying the iteration
of the resampling strategy, and and additional columns set
, specifying whether the prediction
was from an observation in the “train” or “test” set. (b) The prediction time
is
a numeric vector, its length equals the number of iterations.
See Also
Other resample:
ResampleResult
,
addRRMeasure()
,
getRRPredictionList()
,
getRRPredictions()
,
getRRTaskDesc()
,
getRRTaskDescription()
,
makeResampleDesc()
,
makeResampleInstance()
,
resample()
ResampleResult object.
Description
A container for resample results.
Details
Resample Result:
A resample result is created by resample and contains the following object members:
- task.id (
character(1)
): -
Name of the Task.
- learner.id (
character(1)
): -
Name of the Learner.
- measures.test (data.frame):
-
Gives you access to performance measurements on the individual test sets. Rows correspond to sets in resampling iterations, columns to performance measures.
- measures.train (data.frame):
-
Gives you access to performance measurements on the individual training sets. Rows correspond to sets in resampling iterations, columns to performance measures. Usually not available, only if specifically requested, see general description above.
- aggr (numeric):
-
Named vector of aggregated performance values. Names are coded like this
<measure>.<aggregation>
. - err.msgs (data.frame):
-
Number of rows equals resampling iterations and columns are:
iter
,train
,predict
. Stores error messages generated during train or predict, if these were caught via configureMlr. - err.dumps (list of list of dump.frames):
-
List with length equal to number of resampling iterations. Contains lists of
dump.frames
objects that can be fed todebugger()
to inspect error dumps generated on learner errors. One iteration can generate more than one error dump depending on which of training, prediction on training set, or prediction on test set, operations fail. Therefore the lists have named slots$train
,$predict.train
, or$predict.test
if relevant. The error dumps are only saved when optionon.error.dump
isTRUE
. - pred (ResamplePrediction):
-
Container for all predictions during resampling.
- models [list of WrappedModel):
-
List of fitted models or
NULL
. - extract (list):
-
List of extracted parts from fitted models or
NULL
. - runtime (
numeric(1)
): -
Time in seconds it took to execute the resampling.
The print method of this object gives a short overview, including task and learner ids, aggregated measures and runtime for the resampling.
See Also
Other resample:
ResamplePrediction
,
addRRMeasure()
,
getRRPredictionList()
,
getRRPredictions()
,
getRRTaskDesc()
,
getRRTaskDescription()
,
makeResampleDesc()
,
makeResampleInstance()
,
resample()
Other debug:
FailureModel
,
getPredictionDump()
,
getRRDump()
Create a classification, regression, survival, cluster, cost-sensitive classification or multilabel task.
Description
The task encapsulates the data and specifies - through its subclasses - the type of the task. It also contains a description object detailing further aspects of the data.
Useful operators are:
Object members:
- env (
environment
) Environment where data for the task are stored. Use getTaskData in order to access it.
- weights (numeric)
See argument.
NULL
if not present.- blocking (factor)
See argument.
NULL
if not present.- task.desc (TaskDesc)
Encapsulates further information about the task.
Functional data can be added to a task via matrix columns. For more information refer to makeFunctionalData.
Arguments
id |
( |
data |
(data.frame) |
target |
( |
costs |
(data.frame) |
weights |
(numeric) |
blocking |
(factor) |
positive |
( |
fixup.data |
( |
check.data |
( |
coordinates |
(data.frame) |
Value
Task.
See Also
ClassifTask ClusterTask CostSensTask MultilabelTask RegrTask SurvTask
Examples
if (requireNamespace("mlbench")) {
library(mlbench)
data(BostonHousing)
data(Ionosphere)
makeClassifTask(data = iris, target = "Species")
makeRegrTask(data = BostonHousing, target = "medv")
# an example of a classification task with more than those standard arguments:
blocking = factor(c(rep(1, 51), rep(2, 300)))
makeClassifTask(id = "myIonosphere", data = Ionosphere, target = "Class",
positive = "good", blocking = blocking)
makeClusterTask(data = iris[, -5L])
}
Description object for task.
Description
Description object for task, encapsulates basic properties of the task without having to store the complete data set.
Details
Object members:
- id (
character(1)
) Id string of task.
- type (
character(1)
) Type of task, “classif” for classification, “regr” for regression, “surv” for survival and “cluster” for cluster analysis, “costsens” for cost-sensitive classification, and “multilabel” for multilabel classification.
- target (
character(0)
|character(1)
|character(2)
|character(n.classes)
) -
Name(s) of the target variable(s). For “surv” these are the names of the survival time and event columns, so it has length 2. For “costsens” it has length 0, as there is no target column, but a cost matrix instead. For “multilabel” these are the names of logical columns that indicate whether a class label is present and the number of target variables corresponds to the number of classes.
- size (
integer(1)
) Number of cases in data set.
- n.feat (
integer(2)
) Number of features, named vector with entries: “numerics”, “factors”, “ordered”, “functionals”.
- has.missings (
logical(1)
) Are missing values present?
- has.weights (
logical(1)
) Are weights specified for each observation?
- has.blocking (
logical(1)
) Is a blocking factor for cases available in the task?
- class.levels (character)
All possible classes. Only present for “classif”, “costsens”, and “multilabel”.
- positive (
character(1)
) Positive class label for binary classification. Only present for “classif”, NA for multiclass.
- negative (
character(1)
) Negative class label for binary classification. Only present for “classif”, NA for multiclass.
Control object for tuning
Description
General tune control object.
Arguments
same.resampling.instance |
( |
impute.val |
(numeric) |
start |
(list) |
tune.threshold |
( |
tune.threshold.args |
(list) |
log.fun |
( |
final.dw.perc |
( |
... |
(any) |
See Also
Other tune:
getNestedTuneResultsOptPathDf()
,
getNestedTuneResultsX()
,
getResamplingIndices()
,
getTuneResult()
,
makeModelMultiplexer()
,
makeModelMultiplexerParamSet()
,
makeTuneControlCMAES()
,
makeTuneControlDesign()
,
makeTuneControlGenSA()
,
makeTuneControlGrid()
,
makeTuneControlIrace()
,
makeTuneControlMBO()
,
makeTuneControlRandom()
,
makeTuneWrapper()
,
tuneParams()
,
tuneThreshold()
Create control structures for multi-criteria tuning.
Description
The following tuners are available:
- makeTuneMultiCritControlGrid
Grid search. All kinds of parameter types can be handled. You can either use their correct param type and
resolution
, or discretize them yourself by always using ParamHelpers::makeDiscreteParam in thepar.set
passed to tuneParams.- makeTuneMultiCritControlRandom
Random search. All kinds of parameter types can be handled.
- makeTuneMultiCritControlNSGA2
Evolutionary method mco::nsga2. Can handle numeric(vector) and integer(vector) hyperparameters, but no dependencies. For integers the internally proposed numeric values are automatically rounded.
- makeTuneMultiCritControlMBO
Model-based/ Bayesian optimization. All kinds of parameter types can be handled.
Usage
makeTuneMultiCritControlGrid(
same.resampling.instance = TRUE,
resolution = 10L,
log.fun = "default",
final.dw.perc = NULL,
budget = NULL
)
makeTuneMultiCritControlMBO(
n.objectives = mbo.control$n.objectives,
same.resampling.instance = TRUE,
impute.val = NULL,
learner = NULL,
mbo.control = NULL,
tune.threshold = FALSE,
tune.threshold.args = list(),
continue = FALSE,
log.fun = "default",
final.dw.perc = NULL,
budget = NULL,
mbo.design = NULL
)
makeTuneMultiCritControlNSGA2(
same.resampling.instance = TRUE,
impute.val = NULL,
log.fun = "default",
final.dw.perc = NULL,
budget = NULL,
...
)
makeTuneMultiCritControlRandom(
same.resampling.instance = TRUE,
maxit = 100L,
log.fun = "default",
final.dw.perc = NULL,
budget = NULL
)
Arguments
same.resampling.instance |
( |
resolution |
(integer) |
log.fun |
( |
final.dw.perc |
( |
budget |
( |
n.objectives |
( |
impute.val |
(numeric) |
learner |
(Learner | |
mbo.control |
(mlrMBO::MBOControl | |
tune.threshold |
( |
tune.threshold.args |
(list) |
continue |
( |
mbo.design |
(data.frame | |
... |
(any) |
maxit |
( |
Value
(TuneMultiCritControl). The specific subclass is one of TuneMultiCritControlGrid, TuneMultiCritControlRandom, TuneMultiCritControlNSGA2, TuneMultiCritControlMBO.
See Also
Other tune_multicrit:
plotTuneMultiCritResult()
,
tuneParamsMultiCrit()
Result of multi-criteria tuning.
Description
Container for results of hyperparameter tuning. Contains the obtained pareto set and front and the optimization path which lead there.
Object members:
- learner (Learner)
Learner that was optimized.
- control (TuneControl)
Control object from tuning.
- x (list)
List of lists of non-dominated hyperparameter settings in pareto set. Note that when you have trafos on some of your params,
x
will always be on the TRANSFORMED scale so you directly use it.- y (matrix)
Pareto front for
x
.- threshold
Currently
NULL
.- opt.path (ParamHelpers::OptPath)
Optimization path which lead to
x
. Note that when you have trafos on some of your params, the opt.path always contains the UNTRANSFORMED values on the original scale. You can simply calltrafoOptPath(opt.path)
to transform them, or,as.data.frame{trafoOptPath(opt.path)}
- ind (
integer(n)
) Indices of Pareto optimal params in
opt.path
.- measures [(list of) Measure)
Performance measures.
Result of tuning.
Description
Container for results of hyperparameter tuning. Contains the obtained point in search space, its performance values and the optimization path which lead there.
Object members:
- learner (Learner)
Learner that was optimized.
- control (TuneControl)
Control object from tuning.
- x (list)
Named list of hyperparameter values identified as optimal. Note that when you have trafos on some of your params,
x
will always be on the TRANSFORMED scale so you directly use it.- y (numeric)
Performance values for optimal
x
.- threshold (numeric)
Vector of finally found and used thresholds if
tune.threshold
was enabled in TuneControl, otherwise not present and henceNULL
.- opt.path (ParamHelpers::OptPath)
Optimization path which lead to
x
. Note that when you have trafos on some of your params, the opt.path always contains the UNTRANSFORMED values on the original scale. You can simply calltrafoOptPath(opt.path)
to transform them, or,as.data.frame{trafoOptPath(opt.path)}
. If mlr optionon.error.dump
isTRUE
,OptPath
will have a.dump
object in itsextra
column which contains error dump traces from failed optimization evaluations. It can be accessed bygetOptPathEl(opt.path)$extra$.dump
.
Compute new measures for existing ResampleResult
Description
Adds new measures to an existing ResampleResult
.
Usage
addRRMeasure(res, measures)
Arguments
res |
(ResampleResult) |
measures |
(Measure | list of Measure) |
Value
See Also
Other resample:
ResamplePrediction
,
ResampleResult
,
getRRPredictionList()
,
getRRPredictions()
,
getRRTaskDesc()
,
getRRTaskDescription()
,
makeResampleDesc()
,
makeResampleInstance()
,
resample()
Aggregation methods.
Description
- test.mean
Mean of performance values on test sets.
- test.sd
Standard deviation of performance values on test sets.
- test.median
Median of performance values on test sets.
- test.min
Minimum of performance values on test sets.
- test.max
Maximum of performance values on test sets.
- test.sum
Sum of performance values on test sets.
- train.mean
Mean of performance values on training sets.
- train.sd
Standard deviation of performance values on training sets.
- train.median
Median of performance values on training sets.
- train.min
Minimum of performance values on training sets.
- train.max
Maximum of performance values on training sets.
- train.sum
Sum of performance values on training sets.
- b632
Aggregation for B632 bootstrap.
- b632plus
Aggregation for B632+ bootstrap.
- testgroup.mean
Performance values on test sets are grouped according to resampling method. The mean for every group is calculated, then the mean of those means. Mainly used for repeated CV.
- testgroup.sd
Similar to testgroup.mean - after the mean for every group is calculated, the standard deviation of those means is obtained. Mainly used for repeated CV.
- test.join
Performance measure on joined test sets. This is especially useful for small sample sizes where unbalanced group sizes have a significant impact on the aggregation, especially for cross-validation test.join might make sense now. For the repeated CV, the performance is calculated on each repetition and then aggregated with the arithmetic mean.
See Also
European Union Agricultural Workforces clustering task.
Description
Contains the task (agri.task
).
References
See cluster::agriculture.
Show and visualize the steps of feature selection.
Description
This function prints the steps selectFeatures took to find its optimal set of features and the reason why it stopped. It can also print information about all calculations done in each intermediate step.
Currently only implemented for sequential feature selection.
Usage
analyzeFeatSelResult(res, reduce = TRUE)
Arguments
res |
(FeatSelResult) |
reduce |
( |
Value
(invisible(NULL)
).
See Also
Other featsel:
FeatSelControl
,
getFeatSelResult()
,
makeFeatSelWrapper()
,
selectFeatures()
Converts predictions to a format package ROCR can handle.
Description
Converts predictions to a format package ROCR can handle.
Usage
asROCRPrediction(pred)
Arguments
pred |
(Prediction) |
See Also
Other roc:
calculateROCMeasures()
Other predict:
getPredictionProbabilities()
,
getPredictionResponse()
,
getPredictionTaskDesc()
,
predict.WrappedModel()
,
setPredictThreshold()
,
setPredictType()
Run machine learning benchmarks as distributed experiments.
Description
This function is a very parallel version of benchmark using batchtools. Experiments are created in the provided registry for each combination of learners, tasks and resamplings. The experiments are then stored in a registry and the runs can be started via batchtools::submitJobs. A job is one train/test split of the outer resampling. In case of nested resampling (e.g. with makeTuneWrapper), each job is a full run of inner resampling, which can be parallelized in a second step with ParallelMap.
For details on the usage and support backends have a look at the batchtools tutorial page: https://github.com/mllg/batchtools.
The general workflow with batchmark
looks like this:
Create an ExperimentRegistry using batchtools::makeExperimentRegistry.
Call
batchmark(...)
which defines jobs for all learners and tasks in an base::expand.grid fashion.Submit jobs using batchtools::submitJobs.
Babysit the computation, wait for all jobs to finish using batchtools::waitForJobs.
Call
reduceBatchmarkResult()
to reduce results into a BenchmarkResult.
If you want to use this with OpenML datasets you can generate tasks
from a vector of dataset IDs easily with tasks = lapply(data.ids, function(x) convertOMLDataSetToMlr(getOMLDataSet(x)))
.
Usage
batchmark(
learners,
tasks,
resamplings,
measures,
keep.pred = TRUE,
keep.extract = FALSE,
models = FALSE,
reg = batchtools::getDefaultRegistry()
)
Arguments
learners |
(list of Learner | character) |
tasks |
list of Task |
resamplings |
[(list of) ResampleDesc) |
measures |
(list of Measure) |
keep.pred |
( |
keep.extract |
( |
models |
( |
reg |
(batchtools::Registry) |
Value
(data.table). Generated job ids are stored in the column “job.id”.
See Also
Other benchmark:
BenchmarkResult
,
benchmark()
,
convertBMRToRankMatrix()
,
friedmanPostHocTestBMR()
,
friedmanTestBMR()
,
generateCritDifferencesData()
,
getBMRAggrPerformances()
,
getBMRFeatSelResults()
,
getBMRFilteredFeatures()
,
getBMRLearnerIds()
,
getBMRLearnerShortNames()
,
getBMRLearners()
,
getBMRMeasureIds()
,
getBMRMeasures()
,
getBMRModels()
,
getBMRPerformances()
,
getBMRPredictions()
,
getBMRTaskDescs()
,
getBMRTaskIds()
,
getBMRTuneResults()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCritDifferences()
,
reduceBatchmarkResults()
Wisconsin Breast Cancer classification task.
Description
Contains the task (bc.task
).
References
See mlbench::BreastCancer.
The column "Id"
and all incomplete cases have been removed from the task.
Benchmark experiment for multiple learners and tasks.
Description
Complete benchmark experiment to compare different learning algorithms across one or more tasks w.r.t. a given resampling strategy. Experiments are paired, meaning always the same training / test sets are used for the different learners. Furthermore, you can of course pass “enhanced” learners via wrappers, e.g., a learner can be automatically tuned using makeTuneWrapper.
Usage
benchmark(
learners,
tasks,
resamplings,
measures,
keep.pred = TRUE,
keep.extract = FALSE,
models = FALSE,
show.info = getMlrOption("show.info")
)
Arguments
learners |
(list of Learner | character) |
tasks |
list of Task |
resamplings |
(list of ResampleDesc | ResampleInstance) |
measures |
(list of Measure) |
keep.pred |
( |
keep.extract |
( |
models |
( |
show.info |
( |
Value
See Also
Other benchmark:
BenchmarkResult
,
batchmark()
,
convertBMRToRankMatrix()
,
friedmanPostHocTestBMR()
,
friedmanTestBMR()
,
generateCritDifferencesData()
,
getBMRAggrPerformances()
,
getBMRFeatSelResults()
,
getBMRFilteredFeatures()
,
getBMRLearnerIds()
,
getBMRLearnerShortNames()
,
getBMRLearners()
,
getBMRMeasureIds()
,
getBMRMeasures()
,
getBMRModels()
,
getBMRPerformances()
,
getBMRPredictions()
,
getBMRTaskDescs()
,
getBMRTaskIds()
,
getBMRTuneResults()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCritDifferences()
,
reduceBatchmarkResults()
Examples
lrns = list(makeLearner("classif.lda"), makeLearner("classif.rpart"))
tasks = list(iris.task, sonar.task)
rdesc = makeResampleDesc("CV", iters = 2L)
meas = list(acc, ber)
bmr = benchmark(lrns, tasks, rdesc, measures = meas)
rmat = convertBMRToRankMatrix(bmr)
print(rmat)
plotBMRSummary(bmr)
plotBMRBoxplots(bmr, ber, style = "violin")
plotBMRRanksAsBarChart(bmr, pos = "stack")
friedmanTestBMR(bmr)
friedmanPostHocTestBMR(bmr, p.value = 0.05)
Boston Housing regression task.
Description
Contains the task (bh.task
).
References
Get or delete mlr cache directory
Description
Helper functions to deal with mlr caching.
Usage
getCacheDir()
deleteCacheDir()
Details
getCacheDir()
returns the default mlr cache directory
deleteCacheDir()
clears the default mlr cache directory. Custom cache
directories must be deleted by hand.
Confusion matrix.
Description
Calculates the confusion matrix for a (possibly resampled) prediction. Rows indicate true classes, columns predicted classes. The marginal elements count the number of classification errors for the respective row or column, i.e., the number of errors when you condition on the corresponding true (rows) or predicted (columns) class. The last bottom right element displays the total amount of errors.
A list is returned that contains multiple matrices.
If relative = TRUE
we compute three matrices, one with absolute values and two with relative.
The relative confusion matrices are normalized based on rows and columns respectively,
if FALSE
we only compute the absolute value matrix.
The print
function returns the relative matrices in
a compact way so that both row and column marginals can be seen in one matrix.
For details see ConfusionMatrix.
Note that for resampling no further aggregation is currently performed. All predictions on all test sets are joined to a vector yhat, as are all labels joined to a vector y. Then yhat is simply tabulated vs. y, as if both were computed on a single test set. This probably mainly makes sense when cross-validation is used for resampling.
Usage
calculateConfusionMatrix(pred, relative = FALSE, sums = FALSE, set = "both")
## S3 method for class 'ConfusionMatrix'
print(x, both = TRUE, digits = 2, ...)
Arguments
pred |
(Prediction) |
relative |
( |
sums |
( |
set |
( |
x |
(ConfusionMatrix) |
both |
( |
digits |
( |
... |
(any) |
Value
Functions
-
print(ConfusionMatrix)
:
See Also
Other performance:
ConfusionMatrix
,
calculateROCMeasures()
,
estimateRelativeOverfitting()
,
makeCostMeasure()
,
makeCustomResampledMeasure()
,
makeMeasure()
,
measures
,
performance()
,
setAggregation()
,
setMeasurePars()
Examples
# get confusion matrix after simple manual prediction
allinds = 1:150
train = sample(allinds, 75)
test = setdiff(allinds, train)
mod = train("classif.lda", iris.task, subset = train)
pred = predict(mod, iris.task, subset = test)
print(calculateConfusionMatrix(pred))
print(calculateConfusionMatrix(pred, sums = TRUE))
print(calculateConfusionMatrix(pred, relative = TRUE))
# now after cross-validation
r = crossval("classif.lda", iris.task, iters = 2L)
print(calculateConfusionMatrix(r$pred))
Calculate receiver operator measures.
Description
Calculate the absolute number of correct/incorrect classifications and the following evaluation measures:
-
tpr
True positive rate (Sensitivity, Recall) -
fpr
False positive rate (Fall-out) -
fnr
False negative rate (Miss rate) -
tnr
True negative rate (Specificity) -
ppv
Positive predictive value (Precision) -
for
False omission rate -
lrp
Positive likelihood ratio (LR+) -
fdr
False discovery rate -
npv
Negative predictive value -
acc
Accuracy -
lrm
Negative likelihood ratio (LR-) -
dor
Diagnostic odds ratio
For details on the used measures see measures and also https://en.wikipedia.org/wiki/Receiver_operating_characteristic.
The element for the false omission rate in the resulting object is not called for
but
fomr
since for
should never be used as a variable name in an object.
Usage
calculateROCMeasures(pred)
## S3 method for class 'ROCMeasures'
print(x, abbreviations = TRUE, digits = 2, ...)
Arguments
pred |
(Prediction) |
x |
( |
abbreviations |
( |
digits |
( |
... |
|
Value
(ROCMeasures
).
A list containing two elements confusion.matrix
which is
the 2 times 2 confusion matrix of absolute frequencies and measures
, a list of the above mentioned measures.
Functions
-
print(ROCMeasures)
:
See Also
Other roc:
asROCRPrediction()
Other performance:
ConfusionMatrix
,
calculateConfusionMatrix()
,
estimateRelativeOverfitting()
,
makeCostMeasure()
,
makeCustomResampledMeasure()
,
makeMeasure()
,
measures
,
performance()
,
setAggregation()
,
setMeasurePars()
Examples
lrn = makeLearner("classif.rpart", predict.type = "prob")
fit = train(lrn, sonar.task)
pred = predict(fit, task = sonar.task)
calculateROCMeasures(pred)
Convert large/infinite numeric values in a data.frame or task.
Description
Convert numeric entries which large/infinite (absolute) values in a data.frame or task. Only numeric/integer columns are affected.
Usage
capLargeValues(
obj,
target = character(0L),
cols = NULL,
threshold = Inf,
impute = threshold,
what = "abs"
)
Arguments
obj |
(data.frame | Task) |
target |
(character) |
cols |
(character) |
threshold |
( |
impute |
( |
what |
( |
Value
See Also
Other eda_and_preprocess:
createDummyFeatures()
,
dropFeatures()
,
mergeSmallFactorLevels()
,
normalizeFeatures()
,
removeConstantFeatures()
,
summarizeColumns()
,
summarizeLevels()
Examples
capLargeValues(iris, threshold = 5, impute = 5)
Change Task Data
Description
Mainly for internal use. Changes the data associated with a task, without modifying other task properties.
Usage
changeData(task, data, costs, weights, coordinates)
Arguments
task |
(Task) |
data |
(data.frame) |
costs |
([data.frame' |
weights |
(numeric) |
Exported for internal use only.
Description
Exported for internal use only.
Usage
checkLearner(learner, type = NULL, props = NULL)
Arguments
learner |
(Learner | |
type |
( |
props |
( |
Check output returned by predictLearner.
Description
Check the output coming from a Learner's internal
predictLearner
function.
This function is for internal use.
Usage
checkPredictLearnerOutput(learner, model, p)
Arguments
learner |
(Learner) |
model |
(WrappedModel)] |
p |
(any) |
Value
(any). A sanitized version of p
.
Configures the behavior of the package.
Description
Configuration is done by setting custom options.
If you do not set an option here, its current value will be kept.
If you call this function with an empty argument list, everything is set to its defaults.
Usage
configureMlr(
show.info,
on.learner.error,
on.learner.warning,
on.par.without.desc,
on.par.out.of.bounds,
on.measure.not.applicable,
show.learner.output,
on.error.dump
)
Arguments
show.info |
( |
on.learner.error |
( |
on.learner.warning |
( |
on.par.without.desc |
( |
on.par.out.of.bounds |
( |
on.measure.not.applicable |
( |
show.learner.output |
( |
on.error.dump |
( |
Value
(invisible(NULL)
).
See Also
Other configure:
getMlrOptions()
Convert BenchmarkResult to a rank-matrix.
Description
Computes a matrix of all the ranks of different algorithms over different datasets (tasks). Ranks are computed from aggregated measures. Smaller ranks imply better methods, so for measures that are minimized, small ranks imply small scores. for measures that are maximized, small ranks imply large scores.
Usage
convertBMRToRankMatrix(
bmr,
measure = NULL,
ties.method = "average",
aggregation = "default"
)
Arguments
bmr |
(BenchmarkResult) |
measure |
(Measure) |
ties.method |
( |
aggregation |
( |
Value
(matrix) with measure ranks as entries.
The matrix has one row for each learner
, and one column for each task
.
See Also
Other benchmark:
BenchmarkResult
,
batchmark()
,
benchmark()
,
friedmanPostHocTestBMR()
,
friedmanTestBMR()
,
generateCritDifferencesData()
,
getBMRAggrPerformances()
,
getBMRFeatSelResults()
,
getBMRFilteredFeatures()
,
getBMRLearnerIds()
,
getBMRLearnerShortNames()
,
getBMRLearners()
,
getBMRMeasureIds()
,
getBMRMeasures()
,
getBMRModels()
,
getBMRPerformances()
,
getBMRPredictions()
,
getBMRTaskDescs()
,
getBMRTaskIds()
,
getBMRTuneResults()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCritDifferences()
,
reduceBatchmarkResults()
Examples
# see benchmark
Convert a machine learning benchmark / demo object from package mlbench to a task.
Description
We auto-set the target column, drop any column which is called “Id” and convert logicals to factors.
Usage
convertMLBenchObjToTask(x, n = 100L, ...)
Arguments
x |
( |
n |
( |
... |
(any) |
Examples
print(convertMLBenchObjToTask("Ionosphere"))
print(convertMLBenchObjToTask("mlbench.spirals", n = 100, sd = 0.1))
Iris cost-sensitive classification task.
Description
Contains the task (costiris.task
).
References
See datasets::iris. The cost matrix was generated artificially following
Tu, H.-H. and Lin, H.-T. (2010), One-sided support vector regression for multiclass cost-sensitive classification. In ICML, J. Fürnkranz and T. Joachims, Eds., Omnipress, 1095–1102.
Generate dummy variables for factor features.
Description
Replace all factor features with their dummy variables. Internally model.matrix is used. Non factor features will be left untouched and passed to the result.
Usage
createDummyFeatures(
obj,
target = character(0L),
method = "1-of-n",
cols = NULL
)
Arguments
obj |
(data.frame | Task) |
target |
( |
method |
(
Default is “1-of-n”. |
cols |
(character) |
Value
data.frame | Task. Same type as obj
.
See Also
Other eda_and_preprocess:
capLargeValues()
,
dropFeatures()
,
mergeSmallFactorLevels()
,
normalizeFeatures()
,
removeConstantFeatures()
,
summarizeColumns()
,
summarizeLevels()
Create (spatial) resampling plot objects.
Description
Visualize partitioning of resample objects with spatial information.
Usage
createSpatialResamplingPlots(
task = NULL,
resample = NULL,
crs = NULL,
datum = 4326,
repetitions = 1,
color.train = "#0072B5",
color.test = "#E18727",
point.size = 0.5,
axis.text.size = 14,
x.axis.breaks = waiver(),
y.axis.breaks = waiver()
)
Arguments
task |
Task |
resample |
ResampleResult or named |
crs |
integer |
datum |
integer |
repetitions |
integer |
color.train |
character |
color.test |
character |
point.size |
integer |
axis.text.size |
integer |
x.axis.breaks |
numeric |
y.axis.breaks |
numeric |
Details
If a named list is given to resample
, names will appear in the title of
each fold.
If multiple inputs are given to resample
, these must be named.
This function makes a hard cut at five columns of the resulting gridded plot.
This means if the resample
object consists of folds > 5
, these folds will
be put into the new row.
For file saving, we recommend to use cowplot::save_plot.
When viewing the resulting plot in RStudio, margins may appear to be different than they really are. Make sure to save the file to disk and inspect the image.
When modifying axis breaks, negative values need to be used if the area is located in either the western or southern hemisphere. Use positive values for the northern and eastern hemisphere.
Value
(list of 2L
containing (1) multiple 'gg“ objects and (2) their
corresponding labels.
CRS
The crs has to be suitable for the coordinates stored in the Task
.
For example, if the coordinates are UTM, crs
should be set to a
UTM projection.
Due to a limited axis space in the resulting grid (especially on the x-axis),
the data will by default projected into a lat/lon projection, specifically
EPSG 4326.
If other projections are desired for the resulting map, please set argument
datum
accordingly. This argument will be passed onto ggplot2::coord_sf.
Author(s)
Patrick Schratz
See Also
Other plot:
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCalibration()
,
plotCritDifferences()
,
plotLearningCurve()
,
plotPartialDependence()
,
plotROCCurves()
,
plotResiduals()
,
plotThreshVsPerf()
Examples
rdesc = makeResampleDesc("SpRepCV", folds = 5, reps = 4)
r = resample(makeLearner("classif.qda"), spatial.task, rdesc)
## -------------------------------------------------------------
## single unnamed resample input with 5 folds and 2 repetitions
## -------------------------------------------------------------
plots = createSpatialResamplingPlots(spatial.task, r, crs = 32717,
repetitions = 2, x.axis.breaks = c(-79.065, -79.085),
y.axis.breaks = c(-3.970, -4))
cowplot::plot_grid(plotlist = plots[["Plots"]], ncol = 5, nrow = 2,
labels = plots[["Labels"]])
## --------------------------------------------------------------------------
## single named resample input with 5 folds and 1 repetition and 32717 datum
## --------------------------------------------------------------------------
plots = createSpatialResamplingPlots(spatial.task, list("Resamp" = r),
crs = 32717, datum = 32717, repetitions = 1)
cowplot::plot_grid(plotlist = plots[["Plots"]], ncol = 5, nrow = 1,
labels = plots[["Labels"]])
## -------------------------------------------------------------
## multiple named resample inputs with 5 folds and 1 repetition
## -------------------------------------------------------------
rdesc1 = makeResampleDesc("SpRepCV", folds = 5, reps = 4)
r1 = resample(makeLearner("classif.qda"), spatial.task, rdesc1)
rdesc2 = makeResampleDesc("RepCV", folds = 5, reps = 4)
r2 = resample(makeLearner("classif.qda"), spatial.task, rdesc2)
plots = createSpatialResamplingPlots(spatial.task,
list("SpRepCV" = r1, "RepCV" = r2), crs = 32717, repetitions = 1,
x.axis.breaks = c(-79.055, -79.085), y.axis.breaks = c(-3.975, -4))
cowplot::plot_grid(plotlist = plots[["Plots"]], ncol = 5, nrow = 2,
labels = plots[["Labels"]])
## -------------------------------------------------------------------------------------
## Complex arrangements of multiple named resample inputs with 5 folds and 1 repetition
## -------------------------------------------------------------------------------------
p1 = cowplot::plot_grid(plots[["Plots"]][[1]], plots[["Plots"]][[2]],
plots[["Plots"]][[3]], ncol = 3, nrow = 1, labels = plots[["Labels"]][1:3],
label_size = 18)
p12 = cowplot::plot_grid(plots[["Plots"]][[4]], plots[["Plots"]][[5]],
ncol = 2, nrow = 1, labels = plots[["Labels"]][4:5], label_size = 18)
p2 = cowplot::plot_grid(plots[["Plots"]][[6]], plots[["Plots"]][[7]],
plots[["Plots"]][[8]], ncol = 3, nrow = 1, labels = plots[["Labels"]][6:8],
label_size = 18)
p22 = cowplot::plot_grid(plots[["Plots"]][[9]], plots[["Plots"]][[10]],
ncol = 2, nrow = 1, labels = plots[["Labels"]][9:10], label_size = 18)
cowplot::plot_grid(p1, p12, p2, p22, ncol = 1)
Crossover.
Description
Takes two bit strings and creates a new one of the same size by selecting the items from the first string or the second, based on a given rate (the probability of choosing an element from the first string).
Arguments
x |
(logical) |
y |
(logical) |
rate |
( |
Value
(crossover).
Downsample (subsample) a task or a data.frame.
Description
Decrease the observations in a task
or a ResampleInstance
to a given percentage of observations.
Usage
downsample(obj, perc = 1, stratify = FALSE)
Arguments
obj |
(Task | ResampleInstance) |
perc |
( |
stratify |
( |
Value
([data.frame| [Task] | [ResampleInstance]). Same type as
obj'.
See Also
Other downsample:
makeDownsampleWrapper()
Drop some features of task.
Description
Drop some features of task.
Usage
dropFeatures(task, features)
Arguments
task |
(Task) |
features |
(character) |
Value
Task.
See Also
Other eda_and_preprocess:
capLargeValues()
,
createDummyFeatures()
,
mergeSmallFactorLevels()
,
normalizeFeatures()
,
removeConstantFeatures()
,
summarizeColumns()
,
summarizeLevels()
Estimate relative overfitting.
Description
Estimates the relative overfitting of a model as the ratio of the difference in test and train performance to the difference of test performance in the no-information case and train performance. In the no-information case the features carry no information with respect to the prediction. This is simulated by permuting features and predictions.
Usage
estimateRelativeOverfitting(
predish,
measures,
task,
learner = NULL,
pred.train = NULL,
iter = 1
)
Arguments
predish |
(ResampleDesc | ResamplePrediction | Prediction) |
measures |
(Measure | list of Measure) |
task |
(Task) |
learner |
(Learner | |
pred.train |
(Prediction) |
iter |
(integer) |
Details
Currently only support for classification and regression tasks is implemented.
Value
(data.frame). Relative overfitting estimate(s), named by measure(s), for each resampling iteration.
References
Bradley Efron and Robert Tibshirani; Improvements on Cross-Validation: The .632+ Bootstrap Method, Journal of the American Statistical Association, Vol. 92, No. 438. (Jun., 1997), pp. 548-560.
See Also
Other performance:
ConfusionMatrix
,
calculateConfusionMatrix()
,
calculateROCMeasures()
,
makeCostMeasure()
,
makeCustomResampledMeasure()
,
makeMeasure()
,
measures
,
performance()
,
setAggregation()
,
setMeasurePars()
Examples
task = makeClassifTask(data = iris, target = "Species")
rdesc = makeResampleDesc("CV", iters = 2)
estimateRelativeOverfitting(rdesc, acc, task, makeLearner("classif.knn"))
estimateRelativeOverfitting(rdesc, acc, task, makeLearner("classif.lda"))
rpred = resample("classif.knn", task, rdesc)$pred
estimateRelativeOverfitting(rpred, acc, task)
Estimate the residual variance.
Description
Estimate the residual variance of a regression model on a given task. If a regression learner is provided instead of a model, the model is trained (see train) first.
Usage
estimateResidualVariance(x, task, data, target)
Arguments
x |
(Learner or WrappedModel) |
task |
(RegrTask) |
data |
(data.frame) |
target |
( |
Bspline mlq features
Description
The function extracts features from functional data based on the Bspline fit.
For more details refer to FDboost::bsignal()
.
Usage
extractFDABsignal(bsignal.knots = 10L, bsignal.df = 3)
Arguments
bsignal.knots |
( |
bsignal.df |
( |
Value
(data.frame).
See Also
Other fda_featextractor:
extractFDADTWKernel()
,
extractFDAFPCA()
,
extractFDAFourier()
,
extractFDAMultiResFeatures()
,
extractFDATsfeatures()
,
extractFDAWavelets()
DTW kernel features
Description
The function extracts features from functional data based on the DTW distance with a reference dataframe.
Usage
extractFDADTWKernel(
ref.method = "random",
n.refs = 0.05,
refs = NULL,
dtwwindow = 0.05
)
Arguments
ref.method |
( |
n.refs |
( |
refs |
( |
dtwwindow |
( |
Value
(data.frame).
See Also
Other fda_featextractor:
extractFDABsignal()
,
extractFDAFPCA()
,
extractFDAFourier()
,
extractFDAMultiResFeatures()
,
extractFDATsfeatures()
,
extractFDAWavelets()
Extract functional principal component analysis features.
Description
The function extracts the functional principal components from a data.frame
containing functional features. Uses stats::prcomp
.
Usage
extractFDAFPCA(rank. = NULL, center = TRUE, scale. = FALSE)
Arguments
rank. |
( |
center |
( |
scale. |
( |
Value
(data.frame).
See Also
Other fda_featextractor:
extractFDABsignal()
,
extractFDADTWKernel()
,
extractFDAFourier()
,
extractFDAMultiResFeatures()
,
extractFDATsfeatures()
,
extractFDAWavelets()
Extract features from functional data.
Description
Extract non-functional features from functional features using various methods.
The function extractFDAFeatures performs the extraction for all functional features
via the methods specified in feat.methods
and transforms all mentioned functional
(matrix) features into regular data.frame columns.
Additionally, a “extractFDAFeatDesc
” object
which contains learned coefficients and other helpful data for
re-extraction during the predict-phase is returned. This can be used with
reextractFDAFeatures in order to extract features during the prediction phase.
Usage
extractFDAFeatures(obj, target = character(0L), feat.methods = list(), ...)
Arguments
obj |
(Task | data.frame) |
target |
( |
feat.methods |
(named list) |
... |
(any) |
Details
The description object contains these slots:
target (
character
): See argument.coln (
character
): Colum names of data.fd.cols (
character
): Functional feature names.extractFDAFeat (
list
): Containsfeature.methods
and relevant parameters for reextraction.
Value
(list)
data | task (data.frame | Task): Extracted features, same type as obj.
desc (
extracFDAFeatDesc
): Description object. See description for details.
See Also
Other fda:
makeExtractFDAFeatMethod()
,
makeExtractFDAFeatsWrapper()
Examples
df = data.frame(x = matrix(rnorm(24), ncol = 8), y = factor(c("a", "a", "b")))
fdf = makeFunctionalData(df, fd.features = list(x1 = 1:4, x2 = 5:8), exclude.cols = "y")
task = makeClassifTask(data = fdf, target = "y")
extracted = extractFDAFeatures(task,
feat.methods = list("x1" = extractFDAFourier(), "x2" = extractFDAWavelets(filter = "haar")))
print(extracted$task)
reextractFDAFeatures(task, extracted$desc)
Fast Fourier transform features.
Description
The function extracts features from functional data based on the fast fourier transform. For more details refer to stats::fft.
Usage
extractFDAFourier(trafo.coeff = "phase")
Arguments
trafo.coeff |
( |
Value
(data.frame).
See Also
Other fda_featextractor:
extractFDABsignal()
,
extractFDADTWKernel()
,
extractFDAFPCA()
,
extractFDAMultiResFeatures()
,
extractFDATsfeatures()
,
extractFDAWavelets()
Multiresolution feature extraction.
Description
The function extracts currently the mean of multiple segments of each curve and stacks them as features. The segments length are set in a hierachy way so the features cover different resolution levels.
Usage
extractFDAMultiResFeatures(res.level = 3L, shift = 0.5, seg.lens = NULL)
Arguments
res.level |
( |
shift |
( |
seg.lens |
( |
Value
(data.frame).
See Also
Other fda_featextractor:
extractFDABsignal()
,
extractFDADTWKernel()
,
extractFDAFPCA()
,
extractFDAFourier()
,
extractFDATsfeatures()
,
extractFDAWavelets()
Time-Series Feature Heuristics
Description
The function extracts features from functional data based on known Heuristics.
For more details refer to tsfeatures::tsfeatures()
.
Under the hood this function uses the package tsfeatures::tsfeatures()
.
For more information see Hyndman, Wang and Laptev, Large-Scale Unusual Time Series Detection, ICDM 2015.
Note: Currently computes the following features:
"frequency", "stl_features", "entropy", "acf_features", "arch_stat",
"crossing_points", "flat_spots", "hurst", "holt_parameters", "lumpiness",
"max_kl_shift", "max_var_shift", "max_level_shift", "stability", "nonlinearity"
Usage
extractFDATsfeatures(
scale = TRUE,
trim = FALSE,
trim_amount = 0.1,
parallel = FALSE,
na.action = na.pass,
feats = NULL,
...
)
Arguments
scale |
( |
trim |
( |
trim_amount |
( |
parallel |
( |
na.action |
( |
feats |
( |
... |
(any) |
Value
References
Hyndman, Wang and Laptev, Large-Scale Unusual Time Series Detection, ICDM 2015.
See Also
Other fda_featextractor:
extractFDABsignal()
,
extractFDADTWKernel()
,
extractFDAFPCA()
,
extractFDAFourier()
,
extractFDAMultiResFeatures()
,
extractFDAWavelets()
Discrete Wavelet transform features.
Description
The function extracts discrete wavelet transform coefficients from the raw functional data. See wavelets::dwt for more information.
Usage
extractFDAWavelets(filter = "la8", boundary = "periodic")
Arguments
filter |
( |
boundary |
( |
Value
(data.frame).
See Also
Other fda_featextractor:
extractFDABsignal()
,
extractFDADTWKernel()
,
extractFDAFPCA()
,
extractFDAFourier()
,
extractFDAMultiResFeatures()
,
extractFDATsfeatures()
Filter features by thresholding filter values.
Description
First, calls generateFilterValuesData.
Features are then selected via select
and val
.
Usage
filterFeatures(
task,
method = "FSelectorRcpp_information.gain",
fval = NULL,
perc = NULL,
abs = NULL,
threshold = NULL,
fun = NULL,
fun.args = NULL,
mandatory.feat = NULL,
select.method = NULL,
base.methods = NULL,
cache = FALSE,
...
)
Arguments
task |
(Task) |
method |
( |
fval |
(FilterValues) |
perc |
( |
abs |
( |
threshold |
( |
fun |
( |
fun.args |
(any) |
mandatory.feat |
(character) |
select.method |
If multiple methods are supplied in argument |
base.methods |
If |
cache |
( |
... |
(any) |
Value
Task.
Caching
If cache = TRUE
, the default mlr cache directory is used to cache
filter values. The directory is operating system dependent and can be
checked with getCacheDir()
.
The default cache can be cleared with deleteCacheDir()
.
Alternatively, a custom directory can be passed to store the cache.
Note that caching is not thread safe. It will work for parallel computation on many systems, but there is no guarantee.
Simple and ensemble filters
Besides passing (multiple) simple filter methods you can also pass an
ensemble filter method (in a list). The ensemble method will use the simple
methods to calculate its ranking. See listFilterEnsembleMethods()
for
available ensemble methods.
See Also
Other filter:
generateFilterValuesData()
,
getFilteredFeatures()
,
listFilterEnsembleMethods()
,
listFilterMethods()
,
makeFilter()
,
makeFilterEnsemble()
,
makeFilterWrapper()
,
plotFilterValues()
Examples
# simple filter
filterFeatures(iris.task, method = "FSelectorRcpp_gain.ratio", abs = 2)
# ensemble filter
filterFeatures(iris.task, method = "E-min",
base.methods = c("FSelectorRcpp_gain.ratio",
"FSelectorRcpp_information.gain"), abs = 2)
Perform a posthoc Friedman-Nemenyi test.
Description
Performs a PMCMRplus::frdAllPairsNemenyiTest for a BenchmarkResult and a selected measure.
This means all pairwise comparisons of learners
are performed. The null
hypothesis of the post hoc test is that each pair of learners is equal. If
the null hypothesis of the included ad hoc stats::friedman.test can be
rejected an object of class pairwise.htest
is returned. If not, the
function returns the corresponding friedman.test.
Note that benchmark results for at least two learners on at least two tasks are required.
Usage
friedmanPostHocTestBMR(
bmr,
measure = NULL,
p.value = 0.05,
aggregation = "default"
)
Arguments
bmr |
(BenchmarkResult) |
measure |
(Measure) |
p.value |
( |
aggregation |
( |
Value
(pairwise.htest
): See PMCMRplus::frdAllPairsNemenyiTest for
details.
Additionally two components are added to the list:
f.rejnull (
logical(1)
):
Whether the according friedman.test rejects the Null hypothesis at the selected p.valuecrit.difference (
list(2)
):
Minimal difference the mean ranks of two learners need to have in order to be significantly different
See Also
Other benchmark:
BenchmarkResult
,
batchmark()
,
benchmark()
,
convertBMRToRankMatrix()
,
friedmanTestBMR()
,
generateCritDifferencesData()
,
getBMRAggrPerformances()
,
getBMRFeatSelResults()
,
getBMRFilteredFeatures()
,
getBMRLearnerIds()
,
getBMRLearnerShortNames()
,
getBMRLearners()
,
getBMRMeasureIds()
,
getBMRMeasures()
,
getBMRModels()
,
getBMRPerformances()
,
getBMRPredictions()
,
getBMRTaskDescs()
,
getBMRTaskIds()
,
getBMRTuneResults()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCritDifferences()
,
reduceBatchmarkResults()
Examples
# see benchmark
Perform overall Friedman test for a BenchmarkResult.
Description
Performs a stats::friedman.test for a selected measure. The null hypothesis is that apart from an effect of the different (Task), the location parameter (aggregated performance measure) is the same for each Learner. Note that benchmark results for at least two learners on at least two tasks are required.
Usage
friedmanTestBMR(bmr, measure = NULL, aggregation = "default")
Arguments
bmr |
(BenchmarkResult) |
measure |
(Measure) |
aggregation |
( |
Value
(htest
): See stats::friedman.test for details.
See Also
Other benchmark:
BenchmarkResult
,
batchmark()
,
benchmark()
,
convertBMRToRankMatrix()
,
friedmanPostHocTestBMR()
,
generateCritDifferencesData()
,
getBMRAggrPerformances()
,
getBMRFeatSelResults()
,
getBMRFilteredFeatures()
,
getBMRLearnerIds()
,
getBMRLearnerShortNames()
,
getBMRLearners()
,
getBMRMeasureIds()
,
getBMRMeasures()
,
getBMRModels()
,
getBMRPerformances()
,
getBMRPredictions()
,
getBMRTaskDescs()
,
getBMRTaskIds()
,
getBMRTuneResults()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCritDifferences()
,
reduceBatchmarkResults()
Examples
# see benchmark
FuelSubset functional data regression task.
Description
Contains the task (fuelsubset.task
).
2 functional covariates and 1 scalar covariate.
You have to predict the heat value of some fuel based on the
ultraviolet radiation spectrum and infrared ray radiation and one scalar
column called h2o.
Details
The features and grids are scaled in the same way as in FDboost::FDboost.
References
See Brockhaus, S., Scheipl, F., Hothorn, T., & Greven, S. (2015). The functional linear array model. Statistical Modelling, 15(3), 279–300.
Generate classifier calibration data.
Description
A calibrated classifier is one where the predicted probability of a class closely matches the rate at which that class occurs, e.g. for data points which are assigned a predicted probability of class A of .8, approximately 80 percent of such points should belong to class A if the classifier is well calibrated. This is estimated empirically by grouping data points with similar predicted probabilities for each class, and plotting the rate of each class within each bin against the predicted probability bins.
Usage
generateCalibrationData(obj, breaks = "Sturges", groups = NULL, task.id = NULL)
Arguments
obj |
(list of Prediction | list of ResampleResult | BenchmarkResult) |
breaks |
( |
groups |
( |
task.id |
( |
Value
CalibrationData. A list containing:
proportion |
data.frame with columns:
|
data |
data.frame with columns:
|
task |
(TaskDesc) |
References
Vuk, Miha, and Curk, Tomaz. “ROC Curve, Lift Chart, and Calibration Plot.” Metodoloski zvezki. Vol. 3. No. 1 (2006): 89-108.
See Also
Other generate_plot_data:
generateCritDifferencesData()
,
generateFeatureImportanceData()
,
generateFilterValuesData()
,
generateLearningCurveData()
,
generatePartialDependenceData()
,
generateThreshVsPerfData()
,
plotFilterValues()
Other calibration:
plotCalibration()
Generate data for critical-differences plot.
Description
Generates data that can be used to plot a
critical differences plot. Computes the critical differences according
to either the
"Bonferroni-Dunn"
test or the "Nemenyi"
test.
"Bonferroni-Dunn"
usually yields higher power as it does not
compare all algorithms to each other, but all algorithms to a
baseline
instead.
Learners are drawn on the y-axis according to their average rank.
For test = "nemenyi"
a bar is drawn, connecting all groups of not
significantly different learners.
For test = "bd"
an interval is drawn arround the algorithm selected
as a baseline. All learners within this interval are not signifcantly different
from the baseline.
Calculation:
CD = q_{\alpha} \sqrt{\left(\frac{k(k+1)}{6N}\right)}
Where q_\alpha
is based on the studentized range statistic.
See references for details.
Usage
generateCritDifferencesData(
bmr,
measure = NULL,
p.value = 0.05,
baseline = NULL,
test = "bd"
)
Arguments
bmr |
(BenchmarkResult) |
measure |
(Measure) |
p.value |
( |
baseline |
( |
test |
( |
Value
(critDifferencesData
). List containing:
data |
(data.frame) containing the info for the descriptive part of the plot |
friedman.nemenyi.test |
(list) of class |
cd.info |
(list) containing info on the critical difference and its positioning |
baseline |
|
p.value |
p.value used for the PMCMRplus::frdAllPairsNemenyiTest and for computation of the critical difference |
See Also
Other generate_plot_data:
generateCalibrationData()
,
generateFeatureImportanceData()
,
generateFilterValuesData()
,
generateLearningCurveData()
,
generatePartialDependenceData()
,
generateThreshVsPerfData()
,
plotFilterValues()
Other benchmark:
BenchmarkResult
,
batchmark()
,
benchmark()
,
convertBMRToRankMatrix()
,
friedmanPostHocTestBMR()
,
friedmanTestBMR()
,
getBMRAggrPerformances()
,
getBMRFeatSelResults()
,
getBMRFilteredFeatures()
,
getBMRLearnerIds()
,
getBMRLearnerShortNames()
,
getBMRLearners()
,
getBMRMeasureIds()
,
getBMRMeasures()
,
getBMRModels()
,
getBMRPerformances()
,
getBMRPredictions()
,
getBMRTaskDescs()
,
getBMRTaskIds()
,
getBMRTuneResults()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCritDifferences()
,
reduceBatchmarkResults()
Generate feature importance.
Description
Estimate how important individual features or groups of features are by contrasting prediction performances. For method “permutation.importance” compute the change in performance from permuting the values of a feature (or a group of features) and compare that to the predictions made on the unmcuted data.
Usage
generateFeatureImportanceData(
task,
method = "permutation.importance",
learner,
features = getTaskFeatureNames(task),
interaction = FALSE,
measure,
contrast = function(x, y) x - y,
aggregation = mean,
nmc = 50L,
replace = TRUE,
local = FALSE,
show.info = FALSE
)
Arguments
task |
(Task) |
method |
( |
learner |
(Learner | |
features |
(character) |
interaction |
( |
measure |
(Measure) |
contrast |
( |
aggregation |
( |
nmc |
( |
replace |
( |
local |
( |
show.info |
( |
Value
(FeatureImportance
). A named list which contains the computed feature importance and the input arguments.
Object members:
res |
(data.frame) |
interaction |
( |
measure |
(Measure) |
The measure used to compute performance.
contrast |
( |
aggregation |
( |
replace |
( |
nmc |
( |
local |
( |
References
Jerome Friedman; Greedy Function Approximation: A Gradient Boosting Machine, Annals of Statistics, Vol. 29, No. 5 (Oct., 2001), pp. 1189-1232.
See Also
Other generate_plot_data:
generateCalibrationData()
,
generateCritDifferencesData()
,
generateFilterValuesData()
,
generateLearningCurveData()
,
generatePartialDependenceData()
,
generateThreshVsPerfData()
,
plotFilterValues()
Examples
lrn = makeLearner("classif.rpart", predict.type = "prob")
fit = train(lrn, iris.task)
imp = generateFeatureImportanceData(iris.task, "permutation.importance",
lrn, "Petal.Width", nmc = 10L, local = TRUE)
Calculates feature filter values.
Description
Calculates numerical filter values for features. For a list of features, use listFilterMethods.
Usage
generateFilterValuesData(
task,
method = "FSelectorRcpp_information.gain",
nselect = getTaskNFeats(task),
...,
more.args = list()
)
Arguments
task |
(Task) |
method |
(character | list) |
nselect |
( |
... |
(any) |
more.args |
(named list) |
Value
(FilterValues). A list
containing:
task.desc |
[TaskDesc) |
data |
( |
Simple and ensemble filters
Besides passing (multiple) simple filter methods you can also pass an
ensemble filter method (in a list). The ensemble method will use the simple
methods to calculate its ranking. See listFilterEnsembleMethods()
for
available ensemble methods.
See Also
Other generate_plot_data:
generateCalibrationData()
,
generateCritDifferencesData()
,
generateFeatureImportanceData()
,
generateLearningCurveData()
,
generatePartialDependenceData()
,
generateThreshVsPerfData()
,
plotFilterValues()
Other filter:
filterFeatures()
,
getFilteredFeatures()
,
listFilterEnsembleMethods()
,
listFilterMethods()
,
makeFilter()
,
makeFilterEnsemble()
,
makeFilterWrapper()
,
plotFilterValues()
Examples
# two simple filter methods
fval = generateFilterValuesData(iris.task,
method = c("FSelectorRcpp_gain.ratio", "FSelectorRcpp_information.gain"))
# using ensemble method "E-mean"
fval = generateFilterValuesData(iris.task,
method = list("E-mean", c("FSelectorRcpp_gain.ratio",
"FSelectorRcpp_information.gain")))
Generate hyperparameter effect data.
Description
Generate cleaned hyperparameter effect data from a tuning result or from a nested cross-validation tuning result. The object returned can be used for custom visualization or passed downstream to an out of the box mlr method, plotHyperParsEffect.
Usage
generateHyperParsEffectData(
tune.result,
include.diagnostics = FALSE,
trafo = FALSE,
partial.dep = FALSE
)
Arguments
tune.result |
(TuneResult | ResampleResult) |
include.diagnostics |
( |
trafo |
( |
partial.dep |
( |
Value
(HyperParsEffectData
)
Object containing the hyperparameter effects dataframe, the tuning
performance measures used, the hyperparameters used, a flag for including
diagnostic info, a flag for whether nested cv was used, a flag for whether
partial dependence should be generated, and the optimization algorithm used.
Examples
## Not run:
# 3-fold cross validation
ps = makeParamSet(makeDiscreteParam("C", values = 2^(-4:4)))
ctrl = makeTuneControlGrid()
rdesc = makeResampleDesc("CV", iters = 3L)
res = tuneParams("classif.ksvm", task = pid.task, resampling = rdesc,
par.set = ps, control = ctrl)
data = generateHyperParsEffectData(res)
plt = plotHyperParsEffect(data, x = "C", y = "mmce.test.mean")
plt + ylab("Misclassification Error")
# nested cross validation
ps = makeParamSet(makeDiscreteParam("C", values = 2^(-4:4)))
ctrl = makeTuneControlGrid()
rdesc = makeResampleDesc("CV", iters = 3L)
lrn = makeTuneWrapper("classif.ksvm", control = ctrl,
resampling = rdesc, par.set = ps)
res = resample(lrn, task = pid.task, resampling = cv2,
extract = getTuneResult)
data = generateHyperParsEffectData(res)
plotHyperParsEffect(data, x = "C", y = "mmce.test.mean", plot.type = "line")
## End(Not run)
Generates a learning curve.
Description
Observe how the performance changes with an increasing number of observations.
Usage
generateLearningCurveData(
learners,
task,
resampling = NULL,
percs = seq(0.1, 1, by = 0.1),
measures,
stratify = FALSE,
show.info = getMlrOption("show.info")
)
Arguments
learners |
[(list of) Learner) |
task |
(Task) |
resampling |
(ResampleDesc | ResampleInstance) |
percs |
(numeric) |
measures |
[(list of) Measure) |
stratify |
( |
show.info |
( |
Value
(LearningCurveData). A list
containing:
The Task
List of Measure)
Performance measuresdata (data.frame) with columns:
-
learner
Names of learners. -
percentage
Percentages drawn from the training split. One column for each Measure passed to generateLearningCurveData.
-
See Also
Other generate_plot_data:
generateCalibrationData()
,
generateCritDifferencesData()
,
generateFeatureImportanceData()
,
generateFilterValuesData()
,
generatePartialDependenceData()
,
generateThreshVsPerfData()
,
plotFilterValues()
Other learning_curve:
plotLearningCurve()
Examples
r = generateLearningCurveData(list("classif.rpart", "classif.knn"),
task = sonar.task, percs = seq(0.2, 1, by = 0.2),
measures = list(tp, fp, tn, fn),
resampling = makeResampleDesc(method = "Subsample", iters = 5),
show.info = FALSE)
plotLearningCurve(r)
Generate partial dependence.
Description
Estimate how the learned prediction function is affected by one or more features. For a learned function f(x) where x is partitioned into x_s and x_c, the partial dependence of f on x_s can be summarized by averaging over x_c and setting x_s to a range of values of interest, estimating E_(x_c)(f(x_s, x_c)). The conditional expectation of f at observation i is estimated similarly. Additionally, partial derivatives of the marginalized function w.r.t. the features can be computed.
This function requires the mmpf
package to be installed. It is currently not on CRAN, but can
be installed through GitHub using devtools::install_github('zmjones/mmpf/pkg')
.
Usage
generatePartialDependenceData(
obj,
input,
features = NULL,
interaction = FALSE,
derivative = FALSE,
individual = FALSE,
fun = mean,
bounds = c(qnorm(0.025), qnorm(0.975)),
uniform = TRUE,
n = c(10, NA),
...
)
Arguments
obj |
(WrappedModel) |
input |
(data.frame | Task) |
features |
character |
interaction |
( |
derivative |
( |
individual |
( |
fun |
A function which operates on the output on the predictions made on the |
bounds |
( |
uniform |
( |
n |
( |
... |
additional arguments to be passed to |
Value
PartialDependenceData. A named list, which contains the partial dependence, input data, target, features, task description, and other arguments controlling the type of partial dependences made.
Object members:
data |
data.frame |
task.desc |
TaskDesc |
target |
Target feature for regression, target feature levels for classification, survival and event indicator for survival. |
features |
character |
interaction |
( |
derivative |
( |
individual |
( |
References
Goldstein, Alex, Adam Kapelner, Justin Bleich, and Emil Pitkin. “Peeking inside the black box: Visualizing statistical learning with plots of individual conditional expectation.” Journal of Computational and Graphical Statistics. Vol. 24, No. 1 (2015): 44-65.
Friedman, Jerome. “Greedy Function Approximation: A Gradient Boosting Machine.” The Annals of Statistics. Vol. 29. No. 5 (2001): 1189-1232.
See Also
Other partial_dependence:
plotPartialDependence()
Other generate_plot_data:
generateCalibrationData()
,
generateCritDifferencesData()
,
generateFeatureImportanceData()
,
generateFilterValuesData()
,
generateLearningCurveData()
,
generateThreshVsPerfData()
,
plotFilterValues()
Examples
lrn = makeLearner("regr.svm")
fit = train(lrn, bh.task)
pd = generatePartialDependenceData(fit, bh.task, "lstat")
plotPartialDependence(pd, data = getTaskData(bh.task))
lrn = makeLearner("classif.rpart", predict.type = "prob")
fit = train(lrn, iris.task)
pd = generatePartialDependenceData(fit, iris.task, "Petal.Width")
plotPartialDependence(pd, data = getTaskData(iris.task))
Generate threshold vs. performance(s) for 2-class classification.
Description
Generates data on threshold vs. performance(s) for 2-class classification that can be used for plotting.
Usage
generateThreshVsPerfData(
obj,
measures,
gridsize = 100L,
aggregate = TRUE,
task.id = NULL
)
Arguments
obj |
(list of Prediction | list of ResampleResult | BenchmarkResult) |
measures |
(Measure | list of Measure) |
gridsize |
( |
aggregate |
( |
task.id |
( |
Value
(ThreshVsPerfData). A named list containing the measured performance across the threshold grid, the measures, and whether the performance estimates were aggregated (only applicable for (list of) ResampleResults).
See Also
Other generate_plot_data:
generateCalibrationData()
,
generateCritDifferencesData()
,
generateFeatureImportanceData()
,
generateFilterValuesData()
,
generateLearningCurveData()
,
generatePartialDependenceData()
,
plotFilterValues()
Other thresh_vs_perf:
plotROCCurves()
,
plotThreshVsPerf()
Extract the aggregated performance values from a benchmark result.
Description
Either a list of lists of “aggr” numeric vectors, as returned by resample, or these objects are rbind-ed with extra columns “task.id” and “learner.id”.
Usage
getBMRAggrPerformances(
bmr,
task.ids = NULL,
learner.ids = NULL,
as.df = FALSE,
drop = FALSE
)
Arguments
bmr |
(BenchmarkResult) |
task.ids |
( |
learner.ids |
( |
as.df |
( |
drop |
( |
Value
(list | data.frame). See above.
See Also
Other benchmark:
BenchmarkResult
,
batchmark()
,
benchmark()
,
convertBMRToRankMatrix()
,
friedmanPostHocTestBMR()
,
friedmanTestBMR()
,
generateCritDifferencesData()
,
getBMRFeatSelResults()
,
getBMRFilteredFeatures()
,
getBMRLearnerIds()
,
getBMRLearnerShortNames()
,
getBMRLearners()
,
getBMRMeasureIds()
,
getBMRMeasures()
,
getBMRModels()
,
getBMRPerformances()
,
getBMRPredictions()
,
getBMRTaskDescs()
,
getBMRTaskIds()
,
getBMRTuneResults()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCritDifferences()
,
reduceBatchmarkResults()
Extract the feature selection results from a benchmark result.
Description
Returns a nested list of FeatSelResults. The first level of nesting is by data set, the second by learner, the third for the benchmark resampling iterations. If as.df
is TRUE
, a data frame with “task.id”, “learner.id”, the resample iteration and the selected features is returned.
Note that if more than one feature is selected and a data frame is requested, there will be multiple rows for the same dataset-learner-iteration; one for each selected feature.
Usage
getBMRFeatSelResults(
bmr,
task.ids = NULL,
learner.ids = NULL,
as.df = FALSE,
drop = FALSE
)
Arguments
bmr |
(BenchmarkResult) |
task.ids |
( |
learner.ids |
( |
as.df |
( |
drop |
( |
Value
(list | data.frame). See above.
See Also
Other benchmark:
BenchmarkResult
,
batchmark()
,
benchmark()
,
convertBMRToRankMatrix()
,
friedmanPostHocTestBMR()
,
friedmanTestBMR()
,
generateCritDifferencesData()
,
getBMRAggrPerformances()
,
getBMRFilteredFeatures()
,
getBMRLearnerIds()
,
getBMRLearnerShortNames()
,
getBMRLearners()
,
getBMRMeasureIds()
,
getBMRMeasures()
,
getBMRModels()
,
getBMRPerformances()
,
getBMRPredictions()
,
getBMRTaskDescs()
,
getBMRTaskIds()
,
getBMRTuneResults()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCritDifferences()
,
reduceBatchmarkResults()
Extract the feature selection results from a benchmark result.
Description
Returns a nested list of characters The first level of nesting is by data set, the second by learner, the third for the benchmark resampling iterations. The list at the lowest level is the list of selected features. If as.df
is TRUE
, a data frame with “task.id”, “learner.id”, the resample iteration and the selected features is returned.
Note that if more than one feature is selected and a data frame is requested, there will be multiple rows for the same dataset-learner-iteration; one for each selected feature.
Usage
getBMRFilteredFeatures(
bmr,
task.ids = NULL,
learner.ids = NULL,
as.df = FALSE,
drop = FALSE
)
Arguments
bmr |
(BenchmarkResult) |
task.ids |
( |
learner.ids |
( |
as.df |
( |
drop |
( |
Value
(list | data.frame). See above.
See Also
Other benchmark:
BenchmarkResult
,
batchmark()
,
benchmark()
,
convertBMRToRankMatrix()
,
friedmanPostHocTestBMR()
,
friedmanTestBMR()
,
generateCritDifferencesData()
,
getBMRAggrPerformances()
,
getBMRFeatSelResults()
,
getBMRLearnerIds()
,
getBMRLearnerShortNames()
,
getBMRLearners()
,
getBMRMeasureIds()
,
getBMRMeasures()
,
getBMRModels()
,
getBMRPerformances()
,
getBMRPredictions()
,
getBMRTaskDescs()
,
getBMRTaskIds()
,
getBMRTuneResults()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCritDifferences()
,
reduceBatchmarkResults()
Return learner ids used in benchmark.
Description
Gets the IDs of the learners used in a benchmark experiment.
Usage
getBMRLearnerIds(bmr)
Arguments
bmr |
(BenchmarkResult) |
Value
(character).
See Also
Other benchmark:
BenchmarkResult
,
batchmark()
,
benchmark()
,
convertBMRToRankMatrix()
,
friedmanPostHocTestBMR()
,
friedmanTestBMR()
,
generateCritDifferencesData()
,
getBMRAggrPerformances()
,
getBMRFeatSelResults()
,
getBMRFilteredFeatures()
,
getBMRLearnerShortNames()
,
getBMRLearners()
,
getBMRMeasureIds()
,
getBMRMeasures()
,
getBMRModels()
,
getBMRPerformances()
,
getBMRPredictions()
,
getBMRTaskDescs()
,
getBMRTaskIds()
,
getBMRTuneResults()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCritDifferences()
,
reduceBatchmarkResults()
Return learner short.names used in benchmark.
Description
Gets the learner short.names of the learners used in a benchmark experiment.
Usage
getBMRLearnerShortNames(bmr)
Arguments
bmr |
(BenchmarkResult) |
Value
(character).
See Also
Other benchmark:
BenchmarkResult
,
batchmark()
,
benchmark()
,
convertBMRToRankMatrix()
,
friedmanPostHocTestBMR()
,
friedmanTestBMR()
,
generateCritDifferencesData()
,
getBMRAggrPerformances()
,
getBMRFeatSelResults()
,
getBMRFilteredFeatures()
,
getBMRLearnerIds()
,
getBMRLearners()
,
getBMRMeasureIds()
,
getBMRMeasures()
,
getBMRModels()
,
getBMRPerformances()
,
getBMRPredictions()
,
getBMRTaskDescs()
,
getBMRTaskIds()
,
getBMRTuneResults()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCritDifferences()
,
reduceBatchmarkResults()
Return learners used in benchmark.
Description
Gets the learners used in a benchmark experiment.
Usage
getBMRLearners(bmr)
Arguments
bmr |
(BenchmarkResult) |
Value
(list).
See Also
Other benchmark:
BenchmarkResult
,
batchmark()
,
benchmark()
,
convertBMRToRankMatrix()
,
friedmanPostHocTestBMR()
,
friedmanTestBMR()
,
generateCritDifferencesData()
,
getBMRAggrPerformances()
,
getBMRFeatSelResults()
,
getBMRFilteredFeatures()
,
getBMRLearnerIds()
,
getBMRLearnerShortNames()
,
getBMRMeasureIds()
,
getBMRMeasures()
,
getBMRModels()
,
getBMRPerformances()
,
getBMRPredictions()
,
getBMRTaskDescs()
,
getBMRTaskIds()
,
getBMRTuneResults()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCritDifferences()
,
reduceBatchmarkResults()
Return measures IDs used in benchmark.
Description
Gets the IDs of the measures used in a benchmark experiment.
Usage
getBMRMeasureIds(bmr)
Arguments
bmr |
(BenchmarkResult) |
Value
(list). See above.
See Also
Other benchmark:
BenchmarkResult
,
batchmark()
,
benchmark()
,
convertBMRToRankMatrix()
,
friedmanPostHocTestBMR()
,
friedmanTestBMR()
,
generateCritDifferencesData()
,
getBMRAggrPerformances()
,
getBMRFeatSelResults()
,
getBMRFilteredFeatures()
,
getBMRLearnerIds()
,
getBMRLearnerShortNames()
,
getBMRLearners()
,
getBMRMeasures()
,
getBMRModels()
,
getBMRPerformances()
,
getBMRPredictions()
,
getBMRTaskDescs()
,
getBMRTaskIds()
,
getBMRTuneResults()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCritDifferences()
,
reduceBatchmarkResults()
Return measures used in benchmark.
Description
Gets the measures used in a benchmark experiment.
Usage
getBMRMeasures(bmr)
Arguments
bmr |
(BenchmarkResult) |
Value
(list). See above.
See Also
Other benchmark:
BenchmarkResult
,
batchmark()
,
benchmark()
,
convertBMRToRankMatrix()
,
friedmanPostHocTestBMR()
,
friedmanTestBMR()
,
generateCritDifferencesData()
,
getBMRAggrPerformances()
,
getBMRFeatSelResults()
,
getBMRFilteredFeatures()
,
getBMRLearnerIds()
,
getBMRLearnerShortNames()
,
getBMRLearners()
,
getBMRMeasureIds()
,
getBMRModels()
,
getBMRPerformances()
,
getBMRPredictions()
,
getBMRTaskDescs()
,
getBMRTaskIds()
,
getBMRTuneResults()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCritDifferences()
,
reduceBatchmarkResults()
Extract all models from benchmark result.
Description
A list of lists containing all WrappedModels trained in the benchmark experiment.
If models
is FALSE
in the call to benchmark, the function will return NULL
.
Usage
getBMRModels(bmr, task.ids = NULL, learner.ids = NULL, drop = FALSE)
Arguments
bmr |
(BenchmarkResult) |
task.ids |
( |
learner.ids |
( |
drop |
( |
Value
(list).
See Also
Other benchmark:
BenchmarkResult
,
batchmark()
,
benchmark()
,
convertBMRToRankMatrix()
,
friedmanPostHocTestBMR()
,
friedmanTestBMR()
,
generateCritDifferencesData()
,
getBMRAggrPerformances()
,
getBMRFeatSelResults()
,
getBMRFilteredFeatures()
,
getBMRLearnerIds()
,
getBMRLearnerShortNames()
,
getBMRLearners()
,
getBMRMeasureIds()
,
getBMRMeasures()
,
getBMRPerformances()
,
getBMRPredictions()
,
getBMRTaskDescs()
,
getBMRTaskIds()
,
getBMRTuneResults()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCritDifferences()
,
reduceBatchmarkResults()
Extract the test performance values from a benchmark result.
Description
Either a list of lists of “measure.test” data.frames, as returned by resample, or these objects are rbind-ed with extra columns “task.id” and “learner.id”.
Usage
getBMRPerformances(
bmr,
task.ids = NULL,
learner.ids = NULL,
as.df = FALSE,
drop = FALSE
)
Arguments
bmr |
(BenchmarkResult) |
task.ids |
( |
learner.ids |
( |
as.df |
( |
drop |
( |
Value
(list | data.frame). See above.
See Also
Other benchmark:
BenchmarkResult
,
batchmark()
,
benchmark()
,
convertBMRToRankMatrix()
,
friedmanPostHocTestBMR()
,
friedmanTestBMR()
,
generateCritDifferencesData()
,
getBMRAggrPerformances()
,
getBMRFeatSelResults()
,
getBMRFilteredFeatures()
,
getBMRLearnerIds()
,
getBMRLearnerShortNames()
,
getBMRLearners()
,
getBMRMeasureIds()
,
getBMRMeasures()
,
getBMRModels()
,
getBMRPredictions()
,
getBMRTaskDescs()
,
getBMRTaskIds()
,
getBMRTuneResults()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCritDifferences()
,
reduceBatchmarkResults()
Extract the predictions from a benchmark result.
Description
Either a list of lists of ResamplePrediction objects, as returned by resample, or these objects are rbind-ed with extra columns “task.id” and “learner.id”.
If predict.type
is “prob”, the probabilities for each class are returned in addition to the response.
If keep.pred
is FALSE
in the call to benchmark, the function will return NULL
.
Usage
getBMRPredictions(
bmr,
task.ids = NULL,
learner.ids = NULL,
as.df = FALSE,
drop = FALSE
)
Arguments
bmr |
(BenchmarkResult) |
task.ids |
( |
learner.ids |
( |
as.df |
( |
drop |
( |
Value
(list | data.frame). See above.
See Also
Other benchmark:
BenchmarkResult
,
batchmark()
,
benchmark()
,
convertBMRToRankMatrix()
,
friedmanPostHocTestBMR()
,
friedmanTestBMR()
,
generateCritDifferencesData()
,
getBMRAggrPerformances()
,
getBMRFeatSelResults()
,
getBMRFilteredFeatures()
,
getBMRLearnerIds()
,
getBMRLearnerShortNames()
,
getBMRLearners()
,
getBMRMeasureIds()
,
getBMRMeasures()
,
getBMRModels()
,
getBMRPerformances()
,
getBMRTaskDescs()
,
getBMRTaskIds()
,
getBMRTuneResults()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCritDifferences()
,
reduceBatchmarkResults()
Extract all task descriptions from benchmark result (DEPRECATED).
Description
A list containing all TaskDescs for each task contained in the benchmark experiment.
Usage
getBMRTaskDescriptions(bmr)
Arguments
bmr |
(BenchmarkResult) |
Value
(list).
Extract all task descriptions from benchmark result.
Description
A list containing all TaskDescs for each task contained in the benchmark experiment.
Usage
getBMRTaskDescs(bmr)
Arguments
bmr |
(BenchmarkResult) |
Value
(list).
See Also
Other benchmark:
BenchmarkResult
,
batchmark()
,
benchmark()
,
convertBMRToRankMatrix()
,
friedmanPostHocTestBMR()
,
friedmanTestBMR()
,
generateCritDifferencesData()
,
getBMRAggrPerformances()
,
getBMRFeatSelResults()
,
getBMRFilteredFeatures()
,
getBMRLearnerIds()
,
getBMRLearnerShortNames()
,
getBMRLearners()
,
getBMRMeasureIds()
,
getBMRMeasures()
,
getBMRModels()
,
getBMRPerformances()
,
getBMRPredictions()
,
getBMRTaskIds()
,
getBMRTuneResults()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCritDifferences()
,
reduceBatchmarkResults()
Return task ids used in benchmark.
Description
Gets the task IDs used in a benchmark experiment.
Usage
getBMRTaskIds(bmr)
Arguments
bmr |
(BenchmarkResult) |
Value
(character).
See Also
Other benchmark:
BenchmarkResult
,
batchmark()
,
benchmark()
,
convertBMRToRankMatrix()
,
friedmanPostHocTestBMR()
,
friedmanTestBMR()
,
generateCritDifferencesData()
,
getBMRAggrPerformances()
,
getBMRFeatSelResults()
,
getBMRFilteredFeatures()
,
getBMRLearnerIds()
,
getBMRLearnerShortNames()
,
getBMRLearners()
,
getBMRMeasureIds()
,
getBMRMeasures()
,
getBMRModels()
,
getBMRPerformances()
,
getBMRPredictions()
,
getBMRTaskDescs()
,
getBMRTuneResults()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCritDifferences()
,
reduceBatchmarkResults()
Extract the tuning results from a benchmark result.
Description
Returns a nested list of TuneResults. The first level of nesting is by data set, the second by learner, the third for the benchmark resampling iterations. If as.df
is TRUE
, a data frame with the “task.id”, “learner.id”, the resample iteration, the parameter values and the performances is returned.
Usage
getBMRTuneResults(
bmr,
task.ids = NULL,
learner.ids = NULL,
as.df = FALSE,
drop = FALSE
)
Arguments
bmr |
(BenchmarkResult) |
task.ids |
( |
learner.ids |
( |
as.df |
( |
drop |
( |
Value
(list | data.frame). See above.
See Also
Other benchmark:
BenchmarkResult
,
batchmark()
,
benchmark()
,
convertBMRToRankMatrix()
,
friedmanPostHocTestBMR()
,
friedmanTestBMR()
,
generateCritDifferencesData()
,
getBMRAggrPerformances()
,
getBMRFeatSelResults()
,
getBMRFilteredFeatures()
,
getBMRLearnerIds()
,
getBMRLearnerShortNames()
,
getBMRLearners()
,
getBMRMeasureIds()
,
getBMRMeasures()
,
getBMRModels()
,
getBMRPerformances()
,
getBMRPredictions()
,
getBMRTaskDescs()
,
getBMRTaskIds()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCritDifferences()
,
reduceBatchmarkResults()
Get tuning parameters from a learner of the caret R-package.
Description
Constructs a grid of tuning parameters from a learner of the caret
R-package. These values are then converted into a list of non-tunable
parameters (par.vals
) and a tunable
ParamHelpers::ParamSet (par.set
), which can be used by
tuneParams for tuning the learner. Numerical parameters will
either be specified by their lower and upper bounds or they will be
discretized into specific values.
Usage
getCaretParamSet(learner, length = 3L, task, discretize = TRUE)
Arguments
learner |
( |
length |
( |
task |
(Task) |
discretize |
( |
Value
(list(2)
). A list of parameters:
par.vals
contains a list of all constant tuning parameterspar.set
is a ParamHelpers::ParamSet, containing all the configurable tuning parameters
Examples
if (requireNamespace("caret") && requireNamespace("mlbench")) {
library(caret)
classifTask = makeClassifTask(data = iris, target = "Species")
# (1) classification (random forest) with discretized parameters
getCaretParamSet("rf", length = 9L, task = classifTask, discretize = TRUE)
# (2) regression (gradient boosting machine) without discretized parameters
library(mlbench)
data(BostonHousing)
regrTask = makeRegrTask(data = BostonHousing, target = "medv")
getCaretParamSet("gbm", length = 9L, task = regrTask, discretize = FALSE)
}
Get the class weight parameter of a learner.
Description
Gets the class weight parameter of a learner.
Usage
getClassWeightParam(learner, lrn.id = NULL)
Arguments
learner |
(Learner | |
lrn.id |
(character) |
Value
numeric LearnerParam: A numeric parameter object, containing the class weight parameter of the given learner.
See Also
Other learner:
LearnerProperties
,
getHyperPars()
,
getLearnerId()
,
getLearnerNote()
,
getLearnerPackages()
,
getLearnerParVals()
,
getLearnerParamSet()
,
getLearnerPredictType()
,
getLearnerShortName()
,
getLearnerType()
,
getParamSet()
,
helpLearner()
,
helpLearnerParam()
,
makeLearner()
,
makeLearners()
,
removeHyperPars()
,
setHyperPars()
,
setId()
,
setLearnerId()
,
setPredictThreshold()
,
setPredictType()
Confusion matrix.
Description
getConfMatrix
is deprecated. Please use calculateConfusionMatrix.
Calculates confusion matrix for (possibly resampled) prediction. Rows indicate true classes, columns predicted classes.
The marginal elements count the number of classification errors for the respective row or column, i.e., the number of errors when you condition on the corresponding true (rows) or predicted (columns) class. The last element in the margin diagonal displays the total amount of errors.
Note that for resampling no further aggregation is currently performed. All predictions on all test sets are joined to a vector yhat, as are all labels joined to a vector y. Then yhat is simply tabulated vs y, as if both were computed on a single test set. This probably mainly makes sense when cross-validation is used for resampling.
Usage
getConfMatrix(pred, relative = FALSE)
Arguments
pred |
(Prediction) |
relative |
( |
Value
(matrix). A confusion matrix.
See Also
Get default measure.
Description
Get the default measure for a task type, task, task description or a learner.
Currently these are:
classif: mmce
regr: mse
cluster: db
surv: cindex
costsen: mcp
multilabel: multilabel.hamloss
Usage
getDefaultMeasure(x)
Arguments
x |
([character(1)' | Task | TaskDesc | Learner) |
Value
(Measure).
Return the error dump of FailureModel.
Description
Returns the error dump that can be used with debugger()
to evaluate errors.
If configureMlr configuration on.error.dump
is FALSE
, this returns
NULL
.
Usage
getFailureModelDump(model)
Arguments
model |
(WrappedModel) |
Value
(last.dump
).
Return error message of FailureModel.
Description
Such a model is created when one sets the corresponding option in configureMlr.
If no failure occurred, NA
is returned.
For complex wrappers this getter returns the first error message encountered in ANY model that failed.
Usage
getFailureModelMsg(model)
Arguments
model |
(WrappedModel) |
Value
(character(1)
).
Returns the selected feature set and optimization path after training.
Description
Returns the selected feature set and optimization path after training.
Usage
getFeatSelResult(object)
Arguments
object |
(WrappedModel) |
Value
See Also
Other featsel:
FeatSelControl
,
analyzeFeatSelResult()
,
makeFeatSelWrapper()
,
selectFeatures()
Calculates feature importance values for trained models.
Description
For some learners it is possible to calculate a feature importance measure.
getFeatureImportance
extracts those values from trained models.
See below for a list of supported learners.
Usage
getFeatureImportance(object, ...)
Arguments
object |
(WrappedModel) |
... |
(any) |
Details
boosting
Measure which accounts the gain of Gini index given by a feature in a tree and the weight of that tree.cforest
Permutation principle of the 'mean decrease in accuracy' principle in randomForest. Ifauc=TRUE
(only for binary classification), area under the curve is used as measure. The algorithm used for the survival learner is 'extremely slow and experimental; use at your own risk'. Seeparty::varimp()
for details and further parameters.gbm
Estimation of relative influence for each feature. Seegbm::relative.influence()
for details and further parameters.h2o
Relative feature importances as returned byh2o::h2o.varimp()
.randomForest
Fortype = 2
(the default) the 'MeanDecreaseGini' is measured, which is based on the Gini impurity index used for the calculation of the nodes. Alternatively, you can settype
to 1, then the measure is the mean decrease in accuracy calculated on OOB data. Note, that in this case the learner's parameterimportance
needs to be set to be able to compute feature importance values. SeerandomForest::importance()
for details.RRF
This is identical to randomForest.ranger
Supports both measures mentioned above for the randomForest learner. Note, that you need to specifically set the learners parameterimportance
, to be able to compute feature importance measures. Seeranger::importance()
andranger::ranger()
for details.rpart
Sum of decrease in impurity for each of the surrogate variables at each nodexgboost
The value implies the relative contribution of the corresponding feature to the model calculated by taking each feature's contribution for each tree in the model. The exact computation of the importance in xgboost is undocumented.
Value
(FeatureImportance
) An object containing a data.frame
of the
variable importances and further information.
Calculates feature importance values for a given learner.
Description
This function is mostly for internal usage. To calculate feature importance use getFeatureImportance.
The return value is a named numeric vector. There does not need to be one value for each feature in the dataset.
In getFeatureImportance missing features will get an importance of zero and if the vector contains NA
they will also be replaced with zero.
Usage
getFeatureImportanceLearner(.learner, .model, ...)
Arguments
.learner |
(Learner | |
.model |
(WrappedModel) |
... |
(any) |
Value
(numeric) A named vector of variable importance.
Returns the filtered features.
Description
Returns the filtered features.
Usage
getFilteredFeatures(model)
Arguments
model |
(WrappedModel) |
Value
(character).
See Also
Other filter:
filterFeatures()
,
generateFilterValuesData()
,
listFilterEnsembleMethods()
,
listFilterMethods()
,
makeFilter()
,
makeFilterEnsemble()
,
makeFilterWrapper()
,
plotFilterValues()
Get only functional features from a task or a data.frame.
Description
The parameters “subset”, “features”, and “recode.target” are ignored for the data.frame method.
Usage
getFunctionalFeatures(object, subset = NULL, features, recode.target = "no")
## S3 method for class 'Task'
getFunctionalFeatures(object, subset = NULL, features, recode.target = "no")
## S3 method for class 'data.frame'
getFunctionalFeatures(object, subset = NULL, features, recode.target = "no")
Arguments
object |
(Task/data.frame) |
subset |
(integer | logical | |
features |
(character | integer | logical) |
recode.target |
( |
Value
Returns a data.frame
containing only the functional features.
Deprecated, use getLearnerModel
instead.
Description
Deprecated, use getLearnerModel
instead.
Usage
getHomogeneousEnsembleModels(model, learner.models = FALSE)
Arguments
model |
Deprecated. |
learner.models |
Deprecated. |
Get current parameter settings for a learner.
Description
Retrieves the current hyperparameter settings of a learner.
Usage
getHyperPars(learner, for.fun = c("train", "predict", "both"))
Arguments
learner |
(Learner) |
for.fun |
( |
Details
This function only shows hyperparameters that differ from the
learner default (because mlr
changed the default) or if the user set
hyperparameters manually during learner creation. If you want to have an
overview of all available hyperparameters use getParamSet()
.
Value
(list). A named list of values.
See Also
Other learner:
LearnerProperties
,
getClassWeightParam()
,
getLearnerId()
,
getLearnerNote()
,
getLearnerPackages()
,
getLearnerParVals()
,
getLearnerParamSet()
,
getLearnerPredictType()
,
getLearnerShortName()
,
getLearnerType()
,
getParamSet()
,
helpLearner()
,
helpLearnerParam()
,
makeLearner()
,
makeLearners()
,
removeHyperPars()
,
setHyperPars()
,
setId()
,
setLearnerId()
,
setPredictThreshold()
,
setPredictType()
Examples
getHyperPars(makeLearner("classif.ranger"))
## set learner hyperparameter `mtry` manually
getHyperPars(makeLearner("classif.ranger", mtry = 100))
Get the ID of the learner.
Description
Get the ID of the learner.
Usage
getLearnerId(learner)
Arguments
learner |
(Learner | |
Value
(character(1)
).
See Also
Other learner:
LearnerProperties
,
getClassWeightParam()
,
getHyperPars()
,
getLearnerNote()
,
getLearnerPackages()
,
getLearnerParVals()
,
getLearnerParamSet()
,
getLearnerPredictType()
,
getLearnerShortName()
,
getLearnerType()
,
getParamSet()
,
helpLearner()
,
helpLearnerParam()
,
makeLearner()
,
makeLearners()
,
removeHyperPars()
,
setHyperPars()
,
setId()
,
setLearnerId()
,
setPredictThreshold()
,
setPredictType()
Get underlying R model of learner integrated into mlr.
Description
Get underlying R model of learner integrated into mlr.
Usage
getLearnerModel(model, more.unwrap = FALSE)
Arguments
model |
(WrappedModel) |
more.unwrap |
( |
Value
(any). A fitted model, depending the learner / wrapped package. E.g., a model of class rpart::rpart for learner “classif.rpart”.
Get the note for the learner.
Description
Get the note for the learner.
Usage
getLearnerNote(learner)
Arguments
learner |
(Learner | |
Value
(character).
See Also
Other learner:
LearnerProperties
,
getClassWeightParam()
,
getHyperPars()
,
getLearnerId()
,
getLearnerPackages()
,
getLearnerParVals()
,
getLearnerParamSet()
,
getLearnerPredictType()
,
getLearnerShortName()
,
getLearnerType()
,
getParamSet()
,
helpLearner()
,
helpLearnerParam()
,
makeLearner()
,
makeLearners()
,
removeHyperPars()
,
setHyperPars()
,
setId()
,
setLearnerId()
,
setPredictThreshold()
,
setPredictType()
Get the required R packages of the learner.
Description
Get the R packages the learner requires.
Usage
getLearnerPackages(learner)
Arguments
learner |
(Learner | |
Value
(character).
See Also
Other learner:
LearnerProperties
,
getClassWeightParam()
,
getHyperPars()
,
getLearnerId()
,
getLearnerNote()
,
getLearnerParVals()
,
getLearnerParamSet()
,
getLearnerPredictType()
,
getLearnerShortName()
,
getLearnerType()
,
getParamSet()
,
helpLearner()
,
helpLearnerParam()
,
makeLearner()
,
makeLearners()
,
removeHyperPars()
,
setHyperPars()
,
setId()
,
setLearnerId()
,
setPredictThreshold()
,
setPredictType()
Get the parameter values of the learner.
Description
Alias for getHyperPars.
Usage
getLearnerParVals(learner, for.fun = c("train", "predict", "both"))
Arguments
learner |
(Learner | |
for.fun |
( |
Value
(list). A named list of values.
See Also
Other learner:
LearnerProperties
,
getClassWeightParam()
,
getHyperPars()
,
getLearnerId()
,
getLearnerNote()
,
getLearnerPackages()
,
getLearnerParamSet()
,
getLearnerPredictType()
,
getLearnerShortName()
,
getLearnerType()
,
getParamSet()
,
helpLearner()
,
helpLearnerParam()
,
makeLearner()
,
makeLearners()
,
removeHyperPars()
,
setHyperPars()
,
setId()
,
setLearnerId()
,
setPredictThreshold()
,
setPredictType()
Get the parameter set of the learner.
Description
Alias for getParamSet.
Usage
getLearnerParamSet(learner)
Arguments
learner |
(Learner | |
Value
ParamSet.
See Also
Other learner:
LearnerProperties
,
getClassWeightParam()
,
getHyperPars()
,
getLearnerId()
,
getLearnerNote()
,
getLearnerPackages()
,
getLearnerParVals()
,
getLearnerPredictType()
,
getLearnerShortName()
,
getLearnerType()
,
getParamSet()
,
helpLearner()
,
helpLearnerParam()
,
makeLearner()
,
makeLearners()
,
removeHyperPars()
,
setHyperPars()
,
setId()
,
setLearnerId()
,
setPredictThreshold()
,
setPredictType()
Get the predict type of the learner.
Description
Get the predict type of the learner.
Usage
getLearnerPredictType(learner)
Arguments
learner |
(Learner | |
Value
(character(1)
).
See Also
Other learner:
LearnerProperties
,
getClassWeightParam()
,
getHyperPars()
,
getLearnerId()
,
getLearnerNote()
,
getLearnerPackages()
,
getLearnerParVals()
,
getLearnerParamSet()
,
getLearnerShortName()
,
getLearnerType()
,
getParamSet()
,
helpLearner()
,
helpLearnerParam()
,
makeLearner()
,
makeLearners()
,
removeHyperPars()
,
setHyperPars()
,
setId()
,
setLearnerId()
,
setPredictThreshold()
,
setPredictType()
Get the short name of the learner.
Description
For an ordinary learner simply its short name is returned. For wrapped learners, the wrapper id is successively attached to the short name of the base learner. E.g: “rf.bagged.imputed”
Usage
getLearnerShortName(learner)
Arguments
learner |
(Learner | |
Value
(character(1)
).
See Also
Other learner:
LearnerProperties
,
getClassWeightParam()
,
getHyperPars()
,
getLearnerId()
,
getLearnerNote()
,
getLearnerPackages()
,
getLearnerParVals()
,
getLearnerParamSet()
,
getLearnerPredictType()
,
getLearnerType()
,
getParamSet()
,
helpLearner()
,
helpLearnerParam()
,
makeLearner()
,
makeLearners()
,
removeHyperPars()
,
setHyperPars()
,
setId()
,
setLearnerId()
,
setPredictThreshold()
,
setPredictType()
Get the type of the learner.
Description
Get the type of the learner.
Usage
getLearnerType(learner)
Arguments
learner |
(Learner | |
Value
(character(1)
).
See Also
Other learner:
LearnerProperties
,
getClassWeightParam()
,
getHyperPars()
,
getLearnerId()
,
getLearnerNote()
,
getLearnerPackages()
,
getLearnerParVals()
,
getLearnerParamSet()
,
getLearnerPredictType()
,
getLearnerShortName()
,
getParamSet()
,
helpLearner()
,
helpLearnerParam()
,
makeLearner()
,
makeLearners()
,
removeHyperPars()
,
setHyperPars()
,
setId()
,
setLearnerId()
,
setPredictThreshold()
,
setPredictType()
Returns a list of mlr's options.
Description
Gets the options for mlr.
Usage
getMlrOptions()
Value
(list).
See Also
Other configure:
configureMlr()
Retrieve binary classification measures for multilabel classification predictions.
Description
Measures the quality of each binary label prediction w.r.t. some binary classification performance measure.
Usage
getMultilabelBinaryPerformances(pred, measures)
Arguments
pred |
(Prediction) |
measures |
(Measure | list of Measure) |
Value
(named matrix
). Performance value(s), column names are measure(s), row names are labels.
See Also
Other multilabel:
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
Examples
# see makeMultilabelBinaryRelevanceWrapper
Get the opt.path
s from each tuning step from the outer resampling.
Description
After you resampled a tuning wrapper (see makeTuneWrapper)
with resample(..., extract = getTuneResult)
this helper returns a data.frame
with
with all opt.path
s combined by rbind
.
An additional column iter
indicates to what resampling iteration the row belongs.
Usage
getNestedTuneResultsOptPathDf(r, trafo = FALSE)
Arguments
r |
(ResampleResult) |
trafo |
( |
Value
(data.frame). See above.
See Also
Other tune:
TuneControl
,
getNestedTuneResultsX()
,
getResamplingIndices()
,
getTuneResult()
,
makeModelMultiplexer()
,
makeModelMultiplexerParamSet()
,
makeTuneControlCMAES()
,
makeTuneControlDesign()
,
makeTuneControlGenSA()
,
makeTuneControlGrid()
,
makeTuneControlIrace()
,
makeTuneControlMBO()
,
makeTuneControlRandom()
,
makeTuneWrapper()
,
tuneParams()
,
tuneThreshold()
Examples
# see example of makeTuneWrapper
Get the tuned hyperparameter settings from a nested tuning.
Description
After you resampled a tuning wrapper (see makeTuneWrapper)
with resample(..., extract = getTuneResult)
this helper returns a data.frame
with
the best found hyperparameter settings for each resampling iteration.
Usage
getNestedTuneResultsX(r)
Arguments
r |
(ResampleResult) |
Value
(data.frame). One column for each tuned hyperparameter and one row for each outer resampling iteration.
See Also
Other tune:
TuneControl
,
getNestedTuneResultsOptPathDf()
,
getResamplingIndices()
,
getTuneResult()
,
makeModelMultiplexer()
,
makeModelMultiplexerParamSet()
,
makeTuneControlCMAES()
,
makeTuneControlDesign()
,
makeTuneControlGenSA()
,
makeTuneControlGrid()
,
makeTuneControlIrace()
,
makeTuneControlMBO()
,
makeTuneControlRandom()
,
makeTuneWrapper()
,
tuneParams()
,
tuneThreshold()
Examples
# see example of makeTuneWrapper
Extracts out-of-bag predictions from trained models.
Description
Learners like randomForest
produce out-of-bag predictions.
getOOBPreds
extracts this information from trained models and builds a
prediction object as provided by predict (with prediction time set to NA).
In the classification case:
What is stored exactly in the (Prediction) object depends
on the predict.type
setting of the Learner.
You can call listLearners(properties = "oobpreds")
to get a list of learners
which provide this.
Usage
getOOBPreds(model, task)
Arguments
model |
(WrappedModel) |
task |
(Task) |
Value
(Prediction).
Examples
training.set = sample(1:150, 50)
lrn = makeLearner("classif.ranger", predict.type = "prob", predict.threshold = 0.6)
mod = train(lrn, sonar.task, subset = training.set)
oob = getOOBPreds(mod, sonar.task)
oob
performance(oob, measures = list(auc, mmce))
Provides out-of-bag predictions for a given model and the corresponding learner.
Description
This function is mostly for internal usage. To get out-of-bag predictions use getOOBPreds.
Usage
getOOBPredsLearner(.learner, .model)
Arguments
.learner |
(Learner) |
.model |
(WrappedModel) |
Value
Same output structure as in (predictLearner).
Get a description of all possible parameter settings for a learner.
Description
Returns the ParamHelpers::ParamSet from a Learner.
Value
ParamSet.
See Also
Other learner:
LearnerProperties
,
getClassWeightParam()
,
getHyperPars()
,
getLearnerId()
,
getLearnerNote()
,
getLearnerPackages()
,
getLearnerParVals()
,
getLearnerParamSet()
,
getLearnerPredictType()
,
getLearnerShortName()
,
getLearnerType()
,
helpLearner()
,
helpLearnerParam()
,
makeLearner()
,
makeLearners()
,
removeHyperPars()
,
setHyperPars()
,
setId()
,
setLearnerId()
,
setPredictThreshold()
,
setPredictType()
Return the error dump of a failed Prediction.
Description
Returns the error dump that can be used with debugger()
to evaluate errors.
If configureMlr configuration on.error.dump
is FALSE
or if the
prediction did not fail, this returns NULL
.
Usage
getPredictionDump(pred)
Arguments
pred |
(Prediction) |
Value
(last.dump
).
See Also
Other debug:
FailureModel
,
ResampleResult
,
getRRDump()
Get probabilities for some classes.
Description
Get probabilities for some classes.
Usage
getPredictionProbabilities(pred, cl)
Arguments
pred |
(Prediction) |
cl |
(character) |
Value
(data.frame) with numerical columns or a numerical vector if length of cl
is 1.
Order of columns is defined by cl
.
See Also
Other predict:
asROCRPrediction()
,
getPredictionResponse()
,
getPredictionTaskDesc()
,
predict.WrappedModel()
,
setPredictThreshold()
,
setPredictType()
Examples
task = makeClassifTask(data = iris, target = "Species")
lrn = makeLearner("classif.lda", predict.type = "prob")
mod = train(lrn, task)
# predict probabilities
pred = predict(mod, newdata = iris)
# Get probabilities for all classes
head(getPredictionProbabilities(pred))
# Get probabilities for a subset of classes
head(getPredictionProbabilities(pred, c("setosa", "virginica")))
Get response / truth from prediction object.
Description
The following types are returned, depending on task type:
classif | factor |
regr | numeric |
se | numeric |
cluster | integer |
surv | numeric |
multilabel | logical matrix, columns named with labels |
Usage
getPredictionResponse(pred)
getPredictionSE(pred)
getPredictionTruth(pred)
Arguments
pred |
(Prediction) |
Value
See above.
See Also
Other predict:
asROCRPrediction()
,
getPredictionProbabilities()
,
getPredictionTaskDesc()
,
predict.WrappedModel()
,
setPredictThreshold()
,
setPredictType()
Get summarizing task description from prediction.
Description
See title.
Usage
getPredictionTaskDesc(pred)
Arguments
pred |
(Prediction) |
Value
ret_taskdesc
See Also
Other predict:
asROCRPrediction()
,
getPredictionProbabilities()
,
getPredictionResponse()
,
predict.WrappedModel()
,
setPredictThreshold()
,
setPredictType()
Deprecated, use getPredictionProbabilities
instead.
Description
Deprecated, use getPredictionProbabilities
instead.
Usage
getProbabilities(pred, cl)
Arguments
pred |
Deprecated. |
cl |
Deprecated. |
Return the error dump of ResampleResult.
Description
Returns the error dumps generated during resampling, which can be used with debugger()
to debug errors. These dumps are saved if configureMlr configuration on.error.dump
,
or the corresponding learner config
, is TRUE
.
The returned object is a list with as many entries as the resampling being used has folds. Each of these entries can have a subset of the following slots, depending on which step in the resampling iteration failed: “train” (error during training step), “predict.train” (prediction on training subset), “predict.test” (prediction on test subset).
Usage
getRRDump(res)
Arguments
res |
(ResampleResult) |
Value
list.
See Also
Other debug:
FailureModel
,
ResampleResult
,
getPredictionDump()
Get list of predictions for train and test set of each single resample iteration.
Description
This function creates a list with two slots train
and test
where
each slot is again a list of Prediction objects for each single
resample iteration.
In case that predict = "train"
was used for the resample description
(see makeResampleDesc), the slot test
will be NULL
and in case that predict = "test"
was used, the slot train
will be
NULL
.
Usage
getRRPredictionList(res, ...)
Arguments
res |
(ResampleResult) |
... |
(any) |
Value
list.
See Also
Other resample:
ResamplePrediction
,
ResampleResult
,
addRRMeasure()
,
getRRPredictions()
,
getRRTaskDesc()
,
getRRTaskDescription()
,
makeResampleDesc()
,
makeResampleInstance()
,
resample()
Get predictions from resample results.
Description
Very simple getter.
Usage
getRRPredictions(res)
Arguments
res |
(ResampleResult) |
Value
See Also
Other resample:
ResamplePrediction
,
ResampleResult
,
addRRMeasure()
,
getRRPredictionList()
,
getRRTaskDesc()
,
getRRTaskDescription()
,
makeResampleDesc()
,
makeResampleInstance()
,
resample()
Get task description from resample results (DEPRECATED).
Description
Get a summarizing task description.
Usage
getRRTaskDesc(res)
Arguments
res |
(ResampleResult) |
Value
(TaskDesc).
See Also
Other resample:
ResamplePrediction
,
ResampleResult
,
addRRMeasure()
,
getRRPredictionList()
,
getRRPredictions()
,
getRRTaskDescription()
,
makeResampleDesc()
,
makeResampleInstance()
,
resample()
Get task description from resample results (DEPRECATED).
Description
Get a summarizing task description.
Usage
getRRTaskDescription(res)
Arguments
res |
(ResampleResult) |
Value
(TaskDesc).
See Also
Other resample:
ResamplePrediction
,
ResampleResult
,
addRRMeasure()
,
getRRPredictionList()
,
getRRPredictions()
,
getRRTaskDesc()
,
makeResampleDesc()
,
makeResampleInstance()
,
resample()
Get the resampling indices from a tuning or feature selection wrapper..
Description
After you resampled a tuning or feature selection wrapper (see makeTuneWrapper)
with resample(..., extract = getTuneResult)
or resample(..., extract = getFeatSelResult)
this helper returns a list
with
the resampling indices used for the respective method.
Usage
getResamplingIndices(object, inner = FALSE)
Arguments
object |
(ResampleResult) |
inner |
(logical) |
Value
(list). One list for each outer resampling fold.
See Also
Other tune:
TuneControl
,
getNestedTuneResultsOptPathDf()
,
getNestedTuneResultsX()
,
getTuneResult()
,
makeModelMultiplexer()
,
makeModelMultiplexerParamSet()
,
makeTuneControlCMAES()
,
makeTuneControlDesign()
,
makeTuneControlGenSA()
,
makeTuneControlGrid()
,
makeTuneControlIrace()
,
makeTuneControlMBO()
,
makeTuneControlRandom()
,
makeTuneWrapper()
,
tuneParams()
,
tuneThreshold()
Examples
task = makeClassifTask(data = iris, target = "Species")
lrn = makeLearner("classif.rpart")
# stupid mini grid
ps = makeParamSet(
makeDiscreteParam("cp", values = c(0.05, 0.1)),
makeDiscreteParam("minsplit", values = c(10, 20))
)
ctrl = makeTuneControlGrid()
inner = makeResampleDesc("Holdout")
outer = makeResampleDesc("CV", iters = 2)
lrn = makeTuneWrapper(lrn, resampling = inner, par.set = ps, control = ctrl)
# nested resampling for evaluation
# we also extract tuned hyper pars in each iteration and by that the resampling indices
r = resample(lrn, task, outer, extract = getTuneResult)
# get tuning indices
getResamplingIndices(r, inner = TRUE)
Returns the predictions for each base learner.
Description
Returns the predictions for each base learner.
Usage
getStackedBaseLearnerPredictions(model, newdata = NULL)
Arguments
model |
(WrappedModel) |
newdata |
(data.frame) |
Details
None.
Get the class levels for classification and multilabel tasks.
Description
NB: For multilabel, getTaskTargetNames and getTaskClassLevels actually return the same thing.
Usage
getTaskClassLevels(x)
Arguments
x |
Value
(character).
See Also
Other task:
getTaskCosts()
,
getTaskData()
,
getTaskDesc()
,
getTaskFeatureNames()
,
getTaskFormula()
,
getTaskId()
,
getTaskNFeats()
,
getTaskSize()
,
getTaskTargetNames()
,
getTaskTargets()
,
getTaskType()
,
subsetTask()
Extract costs in task.
Description
Returns “NULL” if the task is not of type “costsens”.
Usage
getTaskCosts(task, subset = NULL)
Arguments
task |
(CostSensTask) |
subset |
(integer | logical | |
Value
(matrix
| NULL
).
See Also
Other task:
getTaskClassLevels()
,
getTaskData()
,
getTaskDesc()
,
getTaskFeatureNames()
,
getTaskFormula()
,
getTaskId()
,
getTaskNFeats()
,
getTaskSize()
,
getTaskTargetNames()
,
getTaskTargets()
,
getTaskType()
,
subsetTask()
Extract data in task.
Description
Useful in trainLearner when you add a learning machine to the package.
Usage
getTaskData(
task,
subset = NULL,
features,
target.extra = FALSE,
recode.target = "no",
functionals.as = "dfcols"
)
Arguments
task |
(Task) |
subset |
(integer | logical | |
features |
(character | integer | logical) |
target.extra |
( |
recode.target |
( |
functionals.as |
( |
Value
Either a data.frame or a list with data.frame data
and vector target
.
See Also
Other task:
getTaskClassLevels()
,
getTaskCosts()
,
getTaskDesc()
,
getTaskFeatureNames()
,
getTaskFormula()
,
getTaskId()
,
getTaskNFeats()
,
getTaskSize()
,
getTaskTargetNames()
,
getTaskTargets()
,
getTaskType()
,
subsetTask()
Examples
library("mlbench")
data(BreastCancer)
df = BreastCancer
df$Id = NULL
task = makeClassifTask(id = "BreastCancer", data = df, target = "Class", positive = "malignant")
head(getTaskData)
head(getTaskData(task, features = c("Cell.size", "Cell.shape"), recode.target = "-1+1"))
head(getTaskData(task, subset = 1:100, recode.target = "01"))
Get a summarizing task description.
Description
See title.
Usage
getTaskDesc(x)
Arguments
x |
Value
ret_taskdesc
See Also
Other task:
getTaskClassLevels()
,
getTaskCosts()
,
getTaskData()
,
getTaskFeatureNames()
,
getTaskFormula()
,
getTaskId()
,
getTaskNFeats()
,
getTaskSize()
,
getTaskTargetNames()
,
getTaskTargets()
,
getTaskType()
,
subsetTask()
Deprecated, use getTaskDesc instead.
Description
Deprecated, use getTaskDesc instead.
Usage
getTaskDescription(x)
Arguments
x |
Get feature names of task.
Description
Target column name is not included.
Usage
getTaskFeatureNames(task)
Arguments
task |
(Task) |
Value
(character).
See Also
Other task:
getTaskClassLevels()
,
getTaskCosts()
,
getTaskData()
,
getTaskDesc()
,
getTaskFormula()
,
getTaskId()
,
getTaskNFeats()
,
getTaskSize()
,
getTaskTargetNames()
,
getTaskTargets()
,
getTaskType()
,
subsetTask()
Get formula of a task.
Description
This is usually simply <target> ~
.
For multilabel it is <target_1> + ... + <target_k> ~
.
Usage
getTaskFormula(
x,
target = getTaskTargetNames(x),
explicit.features = FALSE,
env = parent.frame()
)
Arguments
x |
|
target |
( |
explicit.features |
( |
env |
(environment) |
Value
(formula).
See Also
Other task:
getTaskClassLevels()
,
getTaskCosts()
,
getTaskData()
,
getTaskDesc()
,
getTaskFeatureNames()
,
getTaskId()
,
getTaskNFeats()
,
getTaskSize()
,
getTaskTargetNames()
,
getTaskTargets()
,
getTaskType()
,
subsetTask()
Get the id of the task.
Description
See title.
Usage
getTaskId(x)
Arguments
x |
Value
(character(1)
).
See Also
Other task:
getTaskClassLevels()
,
getTaskCosts()
,
getTaskData()
,
getTaskDesc()
,
getTaskFeatureNames()
,
getTaskFormula()
,
getTaskNFeats()
,
getTaskSize()
,
getTaskTargetNames()
,
getTaskTargets()
,
getTaskType()
,
subsetTask()
Get number of features in task.
Description
See title.
Usage
getTaskNFeats(x)
Arguments
x |
Value
(integer(1)
).
See Also
Other task:
getTaskClassLevels()
,
getTaskCosts()
,
getTaskData()
,
getTaskDesc()
,
getTaskFeatureNames()
,
getTaskFormula()
,
getTaskId()
,
getTaskSize()
,
getTaskTargetNames()
,
getTaskTargets()
,
getTaskType()
,
subsetTask()
Get number of observations in task.
Description
See title.
Usage
getTaskSize(x)
Arguments
x |
Value
(integer(1)
).
See Also
Other task:
getTaskClassLevels()
,
getTaskCosts()
,
getTaskData()
,
getTaskDesc()
,
getTaskFeatureNames()
,
getTaskFormula()
,
getTaskId()
,
getTaskNFeats()
,
getTaskTargetNames()
,
getTaskTargets()
,
getTaskType()
,
subsetTask()
Get the name(s) of the target column(s).
Description
NB: For multilabel, getTaskTargetNames and getTaskClassLevels actually return the same thing.
Usage
getTaskTargetNames(x)
Arguments
x |
Value
(character).
See Also
Other task:
getTaskClassLevels()
,
getTaskCosts()
,
getTaskData()
,
getTaskDesc()
,
getTaskFeatureNames()
,
getTaskFormula()
,
getTaskId()
,
getTaskNFeats()
,
getTaskSize()
,
getTaskTargets()
,
getTaskType()
,
subsetTask()
Get target data of task.
Description
Get target data of task.
Usage
getTaskTargets(task, recode.target = "no")
Arguments
task |
(Task) |
recode.target |
( |
Value
A factor
for classification or a numeric
for regression, a data.frame
of logical columns for multilabel.
See Also
Other task:
getTaskClassLevels()
,
getTaskCosts()
,
getTaskData()
,
getTaskDesc()
,
getTaskFeatureNames()
,
getTaskFormula()
,
getTaskId()
,
getTaskNFeats()
,
getTaskSize()
,
getTaskTargetNames()
,
getTaskType()
,
subsetTask()
Examples
task = makeClassifTask(data = iris, target = "Species")
getTaskTargets(task)
Get the type of the task.
Description
See title.
Usage
getTaskType(x)
Arguments
x |
Value
(character(1)
).
See Also
Other task:
getTaskClassLevels()
,
getTaskCosts()
,
getTaskData()
,
getTaskDesc()
,
getTaskFeatureNames()
,
getTaskFormula()
,
getTaskId()
,
getTaskNFeats()
,
getTaskSize()
,
getTaskTargetNames()
,
getTaskTargets()
,
subsetTask()
Returns the optimal hyperparameters and optimization path after training.
Description
Returns the optimal hyperparameters and optimization path after training.
Usage
getTuneResult(object)
Arguments
object |
(WrappedModel) |
Value
(TuneResult).
See Also
Other tune:
TuneControl
,
getNestedTuneResultsOptPathDf()
,
getNestedTuneResultsX()
,
getResamplingIndices()
,
makeModelMultiplexer()
,
makeModelMultiplexerParamSet()
,
makeTuneControlCMAES()
,
makeTuneControlDesign()
,
makeTuneControlGenSA()
,
makeTuneControlGrid()
,
makeTuneControlIrace()
,
makeTuneControlMBO()
,
makeTuneControlRandom()
,
makeTuneWrapper()
,
tuneParams()
,
tuneThreshold()
Get the optimization path of a tuning result.
Description
Returns the opt.path from a (TuneResult) object.
Usage
getTuneResultOptPath(tune.result, as.df = TRUE)
Arguments
tune.result |
(TuneResult) |
as.df |
( |
Value
(ParamHelpers::OptPath) or (data.frame).
Gunpoint functional data classification task.
Description
Contains the task (gunpoint.task
).
You have to classify whether a person raises up a gun or just an empty hand.
References
See Ratanamahatana, C. A. & Keogh. E. (2004). Everything you know about Dynamic Time Warping is Wrong. Proceedings of SIAM International Conference on Data Mining (SDM05), 506-510.
Check whether the object contains functional features.
Description
See title.
Usage
hasFunctionalFeatures(obj)
Arguments
obj |
( |
Value
(logical(1)
)
Deprecated, use hasLearnerProperties
instead.
Description
Deprecated, use hasLearnerProperties
instead.
Usage
hasProperties(learner, props)
Arguments
learner |
Deprecated. |
props |
Deprecated. |
Access help page of learner functions.
Description
Interactive function that gives the user quick access to the help pages associated with various functions involved in the given learner.
Usage
helpLearner(learner)
Arguments
learner |
(Learner | |
See Also
Other learner:
LearnerProperties
,
getClassWeightParam()
,
getHyperPars()
,
getLearnerId()
,
getLearnerNote()
,
getLearnerPackages()
,
getLearnerParVals()
,
getLearnerParamSet()
,
getLearnerPredictType()
,
getLearnerShortName()
,
getLearnerType()
,
getParamSet()
,
helpLearnerParam()
,
makeLearner()
,
makeLearners()
,
removeHyperPars()
,
setHyperPars()
,
setId()
,
setLearnerId()
,
setPredictThreshold()
,
setPredictType()
Other help:
helpLearnerParam()
Get specific help for a learner's parameters.
Description
Print the description of parameters of a given learner. The description is automatically extracted from the help pages of the learner, so it may be incomplete.
Usage
helpLearnerParam(learner, param = NULL)
Arguments
learner |
(Learner | |
param |
( |
See Also
Other learner:
LearnerProperties
,
getClassWeightParam()
,
getHyperPars()
,
getLearnerId()
,
getLearnerNote()
,
getLearnerPackages()
,
getLearnerParVals()
,
getLearnerParamSet()
,
getLearnerPredictType()
,
getLearnerShortName()
,
getLearnerType()
,
getParamSet()
,
helpLearner()
,
makeLearner()
,
makeLearners()
,
removeHyperPars()
,
setHyperPars()
,
setId()
,
setLearnerId()
,
setPredictThreshold()
,
setPredictType()
Other help:
helpLearner()
Built-in imputation methods.
Description
The built-ins are:
-
imputeConstant(const)
for imputation using a constant value, -
imputeMedian()
for imputation using the median, -
imputeMode()
for imputation using the mode, -
imputeMin(multiplier)
for imputing constant values shifted below the minimum usingmin(x) - multiplier * diff(range(x))
, -
imputeMax(multiplier)
for imputing constant values shifted above the maximum usingmax(x) + multiplier * diff(range(x))
, -
imputeNormal(mean, sd)
for imputation using normally distributed random values. Mean and standard deviation will be calculated from the data if not provided. -
imputeHist(breaks, use.mids)
for imputation using random values with probabilities calculated usingtable
orhist
. -
imputeLearner(learner, features = NULL)
for imputations using the response of a classification or regression learner.
Usage
imputeConstant(const)
imputeMedian()
imputeMean()
imputeMode()
imputeMin(multiplier = 1)
imputeMax(multiplier = 1)
imputeUniform(min = NA_real_, max = NA_real_)
imputeNormal(mu = NA_real_, sd = NA_real_)
imputeHist(breaks, use.mids = TRUE)
imputeLearner(learner, features = NULL)
Arguments
const |
(any) |
multiplier |
( |
min |
( |
max |
( |
mu |
( |
sd |
( |
breaks |
( |
use.mids |
( |
learner |
(Learner | |
features |
(character) |
See Also
Other impute:
impute()
,
makeImputeMethod()
,
makeImputeWrapper()
,
reimpute()
Impute and re-impute data
Description
Allows imputation of missing feature values through various techniques. Note that you have the possibility to re-impute a data set in the same way as the imputation was performed during training. This especially comes in handy during resampling when one wants to perform the same imputation on the test set as on the training set.
The function impute
performs the imputation on a data set and returns,
alongside with the imputed data set, an “ImputationDesc” object
which can contain “learned” coefficients and helpful data.
It can then be passed together with a new data set to reimpute.
The imputation techniques can be specified for certain features or for feature classes, see function arguments.
You can either provide an arbitrary object, use a built-in imputation method listed under imputations or create one yourself using makeImputeMethod.
Usage
impute(
obj,
target = character(0L),
classes = list(),
cols = list(),
dummy.classes = character(0L),
dummy.cols = character(0L),
dummy.type = "factor",
force.dummies = FALSE,
impute.new.levels = TRUE,
recode.factor.levels = TRUE
)
Arguments
obj |
(data.frame | Task) |
target |
(character) |
classes |
(named list) |
cols |
(named list) |
dummy.classes |
(character) |
dummy.cols |
(character) |
dummy.type |
( |
force.dummies |
( |
impute.new.levels |
( |
recode.factor.levels |
( |
Details
The description object contains these slots
target (character): See argument
features (character): Feature names (column names of
data
)classes (character): Feature classes (storage type of
data
)lvls (named list): Mapping of column names of factor features to their levels, including newly created ones during imputation
impute (named list): Mapping of column names to imputation functions
dummies (named list): Mapping of column names to imputation functions
impute.new.levels (
logical(1)
): See argumentrecode.factor.levels (
logical(1)
): See argument
Value
(list)
data (data.frame): Imputed data.
desc (
ImputationDesc
): Description object.
See Also
Other impute:
imputations
,
makeImputeMethod()
,
makeImputeWrapper()
,
reimpute()
Examples
df = data.frame(x = c(1, 1, NA), y = factor(c("a", "a", "b")), z = 1:3)
imputed = impute(df, target = character(0), cols = list(x = 99, y = imputeMode()))
print(imputed$data)
reimpute(data.frame(x = NA_real_), imputed$desc)
Iris classification task.
Description
Contains the task (iris.task
).
References
See datasets::iris.
Is the model a FailureModel?
Description
Such a model is created when one sets the corresponding option in configureMlr.
For complex wrappers this getter returns TRUE
if ANY model contained in it failed.
Usage
isFailureModel(model)
Arguments
model |
(WrappedModel) |
Value
(logical(1)
).
Join some class existing levels to new, larger class levels for classification problems.
Description
Join some class existing levels to new, larger class levels for classification problems.
Usage
joinClassLevels(task, new.levels)
Arguments
task |
(Task) |
new.levels |
( |
Value
Task.
Examples
joinClassLevels(iris.task, new.levels = list(foo = c("setosa", "virginica")))
Convert arguments to control structure.
Description
Find all elements in ...
which are not missing and
call control
on them.
Usage
learnerArgsToControl(control, ...)
Arguments
control |
( |
... |
(any) |
Value
Control structure for learner.
List of supported learning algorithms.
Description
All supported learners can be found by listLearners or as a table in the tutorial appendix: https://mlr.mlr-org.com/articles/tutorial/integrated_learners.html.
List ensemble filter methods.
Description
Returns a subset-able dataframe with filter information.
Usage
listFilterEnsembleMethods(desc = TRUE)
Arguments
desc |
( |
Value
(data.frame).
See Also
Other filter:
filterFeatures()
,
generateFilterValuesData()
,
getFilteredFeatures()
,
listFilterMethods()
,
makeFilter()
,
makeFilterEnsemble()
,
makeFilterWrapper()
,
plotFilterValues()
List filter methods.
Description
Returns a subset-able dataframe with filter information.
Usage
listFilterMethods(
desc = TRUE,
tasks = FALSE,
features = FALSE,
include.deprecated = FALSE
)
Arguments
desc |
( |
tasks |
( |
features |
( |
include.deprecated |
( |
Value
(data.frame).
See Also
Other filter:
filterFeatures()
,
generateFilterValuesData()
,
getFilteredFeatures()
,
listFilterEnsembleMethods()
,
makeFilter()
,
makeFilterEnsemble()
,
makeFilterWrapper()
,
plotFilterValues()
List the supported learner properties
Description
This is useful for determining which learner properties are available.
Usage
listLearnerProperties(type = "any")
Arguments
type |
( |
Value
(character).
Find matching learning algorithms.
Description
Returns learning algorithms which have specific characteristics, e.g. whether they support missing values, case weights, etc.
Note that the packages of all learners are loaded during the search if you create them. This can be a lot. If you do not create them we only inspect properties of the S3 classes. This will be a lot faster.
Note that for general cost-sensitive learning, mlr currently supports mainly “wrapper” approaches like CostSensWeightedPairsWrapper, which are not listed, as they are not basic R learning algorithms. The same applies for many multilabel methods, see, e.g., makeMultilabelBinaryRelevanceWrapper.
Usage
listLearners(
obj = NA_character_,
properties = character(0L),
quiet = TRUE,
warn.missing.packages = TRUE,
check.packages = FALSE,
create = FALSE
)
## Default S3 method:
listLearners(
obj = NA_character_,
properties = character(0L),
quiet = TRUE,
warn.missing.packages = TRUE,
check.packages = FALSE,
create = FALSE
)
## S3 method for class 'character'
listLearners(
obj = NA_character_,
properties = character(0L),
quiet = TRUE,
warn.missing.packages = TRUE,
check.packages = FALSE,
create = FALSE
)
## S3 method for class 'Task'
listLearners(
obj = NA_character_,
properties = character(0L),
quiet = TRUE,
warn.missing.packages = TRUE,
check.packages = TRUE,
create = FALSE
)
Arguments
obj |
( |
properties |
(character) |
quiet |
( |
warn.missing.packages |
( |
check.packages |
( |
create |
( |
Value
([data.frame|
list' of Learner).
Either a descriptive data.frame that allows access to all properties of the learners
or a list of created learner objects (named by ids of listed learners).
Examples
## Not run:
listLearners("classif", properties = c("multiclass", "prob"))
data = iris
task = makeClassifTask(data = data, target = "Species")
listLearners(task)
## End(Not run)
List the supported measure properties.
Description
This is useful for determining which measure properties are available.
Usage
listMeasureProperties()
Value
(character).
Find matching measures.
Description
Returns the matching measures which have specific characteristics, e.g. whether they supports classification or regression.
Usage
listMeasures(obj, properties = character(0L), create = FALSE)
## Default S3 method:
listMeasures(obj, properties = character(0L), create = FALSE)
## S3 method for class 'character'
listMeasures(obj, properties = character(0L), create = FALSE)
## S3 method for class 'Task'
listMeasures(obj, properties = character(0L), create = FALSE)
Arguments
obj |
( |
properties |
(character) |
create |
( |
Value
([character|
list' of Measure). Class names of matching
measures or instantiated objects.
List the supported task types in mlr
Description
Returns a character vector with each of the supported task types in mlr.
Usage
listTaskTypes()
Value
(character).
NCCTG Lung Cancer survival task.
Description
Contains the task (lung.task
).
References
See survival::lung. Incomplete cases have been removed from the task.
Specify your own aggregation of measures.
Description
This is an advanced feature of mlr. It gives access to some inner workings so the result might not be compatible with everything!
Usage
makeAggregation(id, name = id, properties, fun)
Arguments
id |
( |
name |
( |
properties |
(character)
|
fun |
(
|
Value
(Aggregation).
See Also
Examples
# computes the interquartile range on all performance values
test.iqr = makeAggregation(
id = "test.iqr", name = "Test set interquartile range",
properties = "req.test",
fun = function(task, perf.test, perf.train, measure, group, pred) IQR(perf.test)
)
Fuse learner with the bagging technique.
Description
Fuses a learner with the bagging method
(i.e., similar to what a randomForest
does).
Creates a learner object, which can be
used like any other learner object.
Models can easily be accessed via getLearnerModel.
Bagging is implemented as follows: For each iteration a random data subset is sampled (with or without replacement) and potentially the number of features is also restricted to a random subset. Note that this is usually handled in a slightly different way in the random forest where features are sampled at each tree split).
Prediction works as follows: For classification we do majority voting to create a discrete label and probabilities are predicted by considering the proportions of all predicted labels. For regression the mean value and the standard deviations across predictions is computed.
Note that the passed base learner must always have predict.type = 'response'
,
while the BaggingWrapper can estimate probabilities and standard errors, so it can
be set, e.g., to predict.type = 'prob'
. For this reason, when you call
setPredictType, the type is only set for the BaggingWrapper, not passed
down to the inner learner.
Usage
makeBaggingWrapper(
learner,
bw.iters = 10L,
bw.replace = TRUE,
bw.size,
bw.feats = 1
)
Arguments
learner |
(Learner | |
bw.iters |
( |
bw.replace |
( |
bw.size |
( |
bw.feats |
( |
Value
See Also
Other wrapper:
makeClassificationViaRegressionWrapper()
,
makeConstantClassWrapper()
,
makeCostSensClassifWrapper()
,
makeCostSensRegrWrapper()
,
makeDownsampleWrapper()
,
makeDummyFeaturesWrapper()
,
makeExtractFDAFeatsWrapper()
,
makeFeatSelWrapper()
,
makeFilterWrapper()
,
makeImputeWrapper()
,
makeMulticlassWrapper()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
,
makeOverBaggingWrapper()
,
makePreprocWrapper()
,
makePreprocWrapperCaret()
,
makeRemoveConstantFeaturesWrapper()
,
makeSMOTEWrapper()
,
makeTuneWrapper()
,
makeUndersampleWrapper()
,
makeWeightedClassesWrapper()
Exported for internal use only.
Description
Exported for internal use only.
Usage
makeBaseWrapper(
id,
type,
next.learner,
package = character(0L),
par.set = makeParamSet(),
par.vals = list(),
learner.subclass,
model.subclass,
cache = FALSE
)
Arguments
id |
( |
type |
( |
next.learner |
(Learner) |
package |
(character) |
par.set |
(ParamSet) |
par.vals |
(list) |
learner.subclass |
(character) |
model.subclass |
(character) |
Only exported for internal use.
Description
Only exported for internal use.
Usage
makeChainModel(next.model, cl)
Arguments
next.model |
(WrappedModel) |
cl |
(character) |
Create a classification task.
Description
Create a classification task.
Usage
makeClassifTask(
id = deparse(substitute(data)),
data,
target,
weights = NULL,
blocking = NULL,
coordinates = NULL,
positive = NA_character_,
fixup.data = "warn",
check.data = TRUE
)
Arguments
id |
( |
data |
(data.frame) |
target |
( |
weights |
(numeric) |
blocking |
(factor) |
coordinates |
(data.frame) |
positive |
( |
fixup.data |
( |
check.data |
( |
See Also
Task CostSensTask ClusterTask MultilabelTask RegrTask SurvTask
Exported for internal use.
Description
Exported for internal use.
Usage
makeClassifTaskDesc(id, data, target, weights, blocking, positive, coordinates)
makeClusterTaskDesc(id, data, weights, blocking, coordinates)
makeCostSensTaskDesc(id, data, target, blocking, costs, coordinates)
makeMultilabelTaskDesc(id, data, target, weights, blocking, coordinates)
makeRegrTaskDesc(id, data, target, weights, blocking, coordinates)
makeSurvTaskDesc(id, data, target, weights, blocking, coordinates)
Arguments
id |
( |
data |
(data.frame) |
target |
(character) |
weights |
(numeric) |
blocking |
([numeric' |
coordinates |
(data.frame) |
Classification via regression wrapper.
Description
Builds regression models that predict for the positive class whether a particular example belongs to it (1) or not (-1).
Probabilities are generated by transforming the predictions with a softmax.
Inspired by WEKA's ClassificationViaRegression (http://weka.sourceforge.net/doc.dev/weka/classifiers/meta/ClassificationViaRegression.html).
Usage
makeClassificationViaRegressionWrapper(learner, predict.type = "response")
Arguments
learner |
(Learner | |
predict.type |
( |
Value
See Also
Other wrapper:
makeBaggingWrapper()
,
makeConstantClassWrapper()
,
makeCostSensClassifWrapper()
,
makeCostSensRegrWrapper()
,
makeDownsampleWrapper()
,
makeDummyFeaturesWrapper()
,
makeExtractFDAFeatsWrapper()
,
makeFeatSelWrapper()
,
makeFilterWrapper()
,
makeImputeWrapper()
,
makeMulticlassWrapper()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
,
makeOverBaggingWrapper()
,
makePreprocWrapper()
,
makePreprocWrapperCaret()
,
makeRemoveConstantFeaturesWrapper()
,
makeSMOTEWrapper()
,
makeTuneWrapper()
,
makeUndersampleWrapper()
,
makeWeightedClassesWrapper()
Examples
lrn = makeLearner("regr.rpart")
lrn = makeClassificationViaRegressionWrapper(lrn)
mod = train(lrn, sonar.task, subset = 1:140)
predictions = predict(mod, newdata = getTaskData(sonar.task)[141:208, 1:60])
Create a cluster task.
Description
Create a cluster task.
Usage
makeClusterTask(
id = deparse(substitute(data)),
data,
weights = NULL,
blocking = NULL,
coordinates = NULL,
fixup.data = "warn",
check.data = TRUE
)
Arguments
id |
( |
data |
(data.frame) |
weights |
(numeric) |
blocking |
(factor) |
coordinates |
(data.frame) |
fixup.data |
( |
check.data |
( |
See Also
Task ClassifTask CostSensTask MultilabelTask RegrTask SurvTask
Wraps a classification learner to support problems where the class label is (almost) constant.
Description
If the training data contains only a single class (or almost only a single class), this wrapper creates a model that always predicts the constant class in the training data. In all other cases, the underlying learner is trained and the resulting model used for predictions.
Probabilities can be predicted and will be 1 or 0 depending on whether the label matches the majority class or not.
Usage
makeConstantClassWrapper(learner, frac = 0)
Arguments
learner |
(Learner | |
frac |
|
Value
See Also
Other wrapper:
makeBaggingWrapper()
,
makeClassificationViaRegressionWrapper()
,
makeCostSensClassifWrapper()
,
makeCostSensRegrWrapper()
,
makeDownsampleWrapper()
,
makeDummyFeaturesWrapper()
,
makeExtractFDAFeatsWrapper()
,
makeFeatSelWrapper()
,
makeFilterWrapper()
,
makeImputeWrapper()
,
makeMulticlassWrapper()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
,
makeOverBaggingWrapper()
,
makePreprocWrapper()
,
makePreprocWrapperCaret()
,
makeRemoveConstantFeaturesWrapper()
,
makeSMOTEWrapper()
,
makeTuneWrapper()
,
makeUndersampleWrapper()
,
makeWeightedClassesWrapper()
Creates a measure for non-standard misclassification costs.
Description
Creates a cost measure for non-standard classification error costs.
Usage
makeCostMeasure(
id = "costs",
minimize = TRUE,
costs,
combine = mean,
best = NULL,
worst = NULL,
name = id,
note = ""
)
Arguments
id |
( |
minimize |
( |
costs |
(matrix) |
combine |
( |
best |
( |
worst |
( |
name |
(character) |
note |
(character) |
Value
See Also
Other performance:
ConfusionMatrix
,
calculateConfusionMatrix()
,
calculateROCMeasures()
,
estimateRelativeOverfitting()
,
makeCustomResampledMeasure()
,
makeMeasure()
,
measures
,
performance()
,
setAggregation()
,
setMeasurePars()
Wraps a classification learner for use in cost-sensitive learning.
Description
Creates a wrapper, which can be used like any other learner object. The classification model can easily be accessed via getLearnerModel.
This is a very naive learner, where the costs are transformed into classification labels - the label for each case is the name of class with minimal costs. (If ties occur, the label which is better on average w.r.t. costs over all training data is preferred.) Then the classifier is fitted to that data and subsequently used for prediction.
Usage
makeCostSensClassifWrapper(learner)
Arguments
learner |
(Learner | |
Value
See Also
Other costsens:
makeCostSensRegrWrapper()
,
makeCostSensTask()
,
makeCostSensWeightedPairsWrapper()
Other wrapper:
makeBaggingWrapper()
,
makeClassificationViaRegressionWrapper()
,
makeConstantClassWrapper()
,
makeCostSensRegrWrapper()
,
makeDownsampleWrapper()
,
makeDummyFeaturesWrapper()
,
makeExtractFDAFeatsWrapper()
,
makeFeatSelWrapper()
,
makeFilterWrapper()
,
makeImputeWrapper()
,
makeMulticlassWrapper()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
,
makeOverBaggingWrapper()
,
makePreprocWrapper()
,
makePreprocWrapperCaret()
,
makeRemoveConstantFeaturesWrapper()
,
makeSMOTEWrapper()
,
makeTuneWrapper()
,
makeUndersampleWrapper()
,
makeWeightedClassesWrapper()
Wraps a regression learner for use in cost-sensitive learning.
Description
Creates a wrapper, which can be used like any other learner object. Models can easily be accessed via getLearnerModel.
For each class in the task, an individual regression model is fitted for the costs of that class. During prediction, the class with the lowest predicted costs is selected.
Usage
makeCostSensRegrWrapper(learner)
Arguments
learner |
(Learner | |
Value
See Also
Other costsens:
makeCostSensClassifWrapper()
,
makeCostSensTask()
,
makeCostSensWeightedPairsWrapper()
Other wrapper:
makeBaggingWrapper()
,
makeClassificationViaRegressionWrapper()
,
makeConstantClassWrapper()
,
makeCostSensClassifWrapper()
,
makeDownsampleWrapper()
,
makeDummyFeaturesWrapper()
,
makeExtractFDAFeatsWrapper()
,
makeFeatSelWrapper()
,
makeFilterWrapper()
,
makeImputeWrapper()
,
makeMulticlassWrapper()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
,
makeOverBaggingWrapper()
,
makePreprocWrapper()
,
makePreprocWrapperCaret()
,
makeRemoveConstantFeaturesWrapper()
,
makeSMOTEWrapper()
,
makeTuneWrapper()
,
makeUndersampleWrapper()
,
makeWeightedClassesWrapper()
Create a cost-sensitive classification task.
Description
Create a cost-sensitive classification task.
Usage
makeCostSensTask(
id = deparse(substitute(data)),
data,
costs,
blocking = NULL,
coordinates = NULL,
fixup.data = "warn",
check.data = TRUE
)
Arguments
id |
( |
data |
(data.frame) |
costs |
(data.frame) |
blocking |
(factor) |
coordinates |
(data.frame) |
fixup.data |
( |
check.data |
( |
See Also
Task ClassifTask ClusterTask MultilabelTask RegrTask SurvTask
Other costsens:
makeCostSensClassifWrapper()
,
makeCostSensRegrWrapper()
,
makeCostSensWeightedPairsWrapper()
Wraps a classifier for cost-sensitive learning to produce a weighted pairs model.
Description
Creates a wrapper, which can be used like any other learner object. Models can easily be accessed via getLearnerModel.
For each pair of labels, we fit a binary classifier. For each observation we define the label to be the element of the pair with minimal costs. During fitting, we also weight the observation with the absolute difference in costs. Prediction is performed by simple voting.
This approach is sometimes called cost-sensitive one-vs-one (CS-OVO), because it is obviously very similar to the one-vs-one approach where one reduces a normal multi-class problem to multiple binary ones and aggregates by voting.
Usage
makeCostSensWeightedPairsWrapper(learner)
Arguments
learner |
(Learner | |
Value
(Learner).
References
Lin, HT.: Reduction from Cost-sensitive Multiclass Classification to One-versus-one Binary Classification. In: Proceedings of the Sixth Asian Conference on Machine Learning. JMLR Workshop and Conference Proceedings, vol 39, pp. 371-386. JMLR W&CP (2014). https://proceedings.mlr.press/v39/lin14.pdf
See Also
Other costsens:
makeCostSensClassifWrapper()
,
makeCostSensRegrWrapper()
,
makeCostSensTask()
Construct your own resampled performance measure.
Description
Construct your own performance measure, used after resampling. Note that
individual training / test set performance values will be set to NA
, you
only calculate an aggregated value. If you can define a function that makes
sense for every single training / test set, implement your own Measure.
Usage
makeCustomResampledMeasure(
measure.id,
aggregation.id,
minimize = TRUE,
properties = character(0L),
fun,
extra.args = list(),
best = NULL,
worst = NULL,
measure.name = measure.id,
aggregation.name = aggregation.id,
note = ""
)
Arguments
measure.id |
( |
aggregation.id |
( |
minimize |
( |
properties |
(character) |
fun |
( |
extra.args |
(list) |
best |
( |
worst |
( |
measure.name |
( |
aggregation.name |
( |
note |
(character) |
Value
See Also
Other performance:
ConfusionMatrix
,
calculateConfusionMatrix()
,
calculateROCMeasures()
,
estimateRelativeOverfitting()
,
makeCostMeasure()
,
makeMeasure()
,
measures
,
performance()
,
setAggregation()
,
setMeasurePars()
Fuse learner with simple downsampling (subsampling).
Description
Creates a learner object, which can be used like any other learner object. It will only be trained on a subset of the original data to save computational time.
Usage
makeDownsampleWrapper(learner, dw.perc = 1, dw.stratify = FALSE)
Arguments
learner |
(Learner | |
dw.perc |
( |
dw.stratify |
( |
Value
See Also
Other downsample:
downsample()
Other wrapper:
makeBaggingWrapper()
,
makeClassificationViaRegressionWrapper()
,
makeConstantClassWrapper()
,
makeCostSensClassifWrapper()
,
makeCostSensRegrWrapper()
,
makeDummyFeaturesWrapper()
,
makeExtractFDAFeatsWrapper()
,
makeFeatSelWrapper()
,
makeFilterWrapper()
,
makeImputeWrapper()
,
makeMulticlassWrapper()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
,
makeOverBaggingWrapper()
,
makePreprocWrapper()
,
makePreprocWrapperCaret()
,
makeRemoveConstantFeaturesWrapper()
,
makeSMOTEWrapper()
,
makeTuneWrapper()
,
makeUndersampleWrapper()
,
makeWeightedClassesWrapper()
Fuse learner with dummy feature creator.
Description
Fuses a base learner with the dummy feature creator (see createDummyFeatures). Returns a learner which can be used like any other learner.
Usage
makeDummyFeaturesWrapper(learner, method = "1-of-n", cols = NULL)
Arguments
learner |
(Learner | |
method |
(
Default is “1-of-n”. |
cols |
(character) |
Value
See Also
Other wrapper:
makeBaggingWrapper()
,
makeClassificationViaRegressionWrapper()
,
makeConstantClassWrapper()
,
makeCostSensClassifWrapper()
,
makeCostSensRegrWrapper()
,
makeDownsampleWrapper()
,
makeExtractFDAFeatsWrapper()
,
makeFeatSelWrapper()
,
makeFilterWrapper()
,
makeImputeWrapper()
,
makeMulticlassWrapper()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
,
makeOverBaggingWrapper()
,
makePreprocWrapper()
,
makePreprocWrapperCaret()
,
makeRemoveConstantFeaturesWrapper()
,
makeSMOTEWrapper()
,
makeTuneWrapper()
,
makeUndersampleWrapper()
,
makeWeightedClassesWrapper()
Constructor for FDA feature extraction methods.
Description
This can be used to implement custom FDA feature extraction.
Takes a learn
and a reextract
function along with some optional
parameters to those as argument.
Usage
makeExtractFDAFeatMethod(learn, reextract, args = list(), par.set = NULL)
Arguments
learn |
(
|
reextract |
( |
args |
(list) |
par.set |
(ParamSet) |
See Also
Other fda:
extractFDAFeatures()
,
makeExtractFDAFeatsWrapper()
Fuse learner with an extractFDAFeatures method.
Description
Fuses a base learner with an extractFDAFeatures method. Creates a learner object, which can be used like any other learner object. Internally uses extractFDAFeatures before training the learner and reextractFDAFeatures before predicting.
Usage
makeExtractFDAFeatsWrapper(learner, feat.methods = list())
Arguments
learner |
(Learner | |
feat.methods |
(named list) |
Value
See Also
Other fda:
extractFDAFeatures()
,
makeExtractFDAFeatMethod()
Other wrapper:
makeBaggingWrapper()
,
makeClassificationViaRegressionWrapper()
,
makeConstantClassWrapper()
,
makeCostSensClassifWrapper()
,
makeCostSensRegrWrapper()
,
makeDownsampleWrapper()
,
makeDummyFeaturesWrapper()
,
makeFeatSelWrapper()
,
makeFilterWrapper()
,
makeImputeWrapper()
,
makeMulticlassWrapper()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
,
makeOverBaggingWrapper()
,
makePreprocWrapper()
,
makePreprocWrapperCaret()
,
makeRemoveConstantFeaturesWrapper()
,
makeSMOTEWrapper()
,
makeTuneWrapper()
,
makeUndersampleWrapper()
,
makeWeightedClassesWrapper()
Fuse learner with feature selection.
Description
Fuses a base learner with a search strategy to select variables. Creates a learner object, which can be used like any other learner object, but which internally uses selectFeatures. If the train function is called on it, the search strategy and resampling are invoked to select an optimal set of variables. Finally, a model is fitted on the complete training data with these variables and returned. See selectFeatures for more details.
After training, the optimal features (and other related information) can be retrieved with getFeatSelResult.
Usage
makeFeatSelWrapper(
learner,
resampling,
measures,
bit.names,
bits.to.features,
control,
show.info = getMlrOption("show.info")
)
Arguments
learner |
(Learner | |
resampling |
(ResampleInstance | ResampleDesc) |
measures |
(list of Measure | Measure) |
bit.names |
character |
bits.to.features |
( |
control |
[see FeatSelControl) Control object for search method. Also selects the optimization algorithm for feature selection. |
show.info |
( |
Value
See Also
Other featsel:
FeatSelControl
,
analyzeFeatSelResult()
,
getFeatSelResult()
,
selectFeatures()
Other wrapper:
makeBaggingWrapper()
,
makeClassificationViaRegressionWrapper()
,
makeConstantClassWrapper()
,
makeCostSensClassifWrapper()
,
makeCostSensRegrWrapper()
,
makeDownsampleWrapper()
,
makeDummyFeaturesWrapper()
,
makeExtractFDAFeatsWrapper()
,
makeFilterWrapper()
,
makeImputeWrapper()
,
makeMulticlassWrapper()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
,
makeOverBaggingWrapper()
,
makePreprocWrapper()
,
makePreprocWrapperCaret()
,
makeRemoveConstantFeaturesWrapper()
,
makeSMOTEWrapper()
,
makeTuneWrapper()
,
makeUndersampleWrapper()
,
makeWeightedClassesWrapper()
Examples
# nested resampling with feature selection (with a nonsense algorithm for selection)
outer = makeResampleDesc("CV", iters = 2L)
inner = makeResampleDesc("Holdout")
ctrl = makeFeatSelControlRandom(maxit = 1)
lrn = makeFeatSelWrapper("classif.ksvm", resampling = inner, control = ctrl)
# we also extract the selected features for all iteration here
r = resample(lrn, iris.task, outer, extract = getFeatSelResult)
Create a feature filter.
Description
Creates and registers custom feature filters. Implemented filters
can be listed with listFilterMethods. Additional
documentation for the fun
parameter specific to each filter can
be found in the description.
Usage
makeFilter(name, desc, pkg, supported.tasks, supported.features, fun)
Arguments
name |
( |
desc |
( |
pkg |
( |
supported.tasks |
(character) |
supported.features |
(character) |
fun |
( |
Value
Object of class “Filter”.
References
Kira, Kenji and Rendell, Larry (1992). The Feature Selection Problem: Traditional Methods and a New Algorithm. AAAI-92 Proceedings.
Kononenko, Igor et al. Overcoming the myopia of inductive learning algorithms with RELIEFF (1997), Applied Intelligence, 7(1), p39-55.
See Also
Other filter:
filterFeatures()
,
generateFilterValuesData()
,
getFilteredFeatures()
,
listFilterEnsembleMethods()
,
listFilterMethods()
,
makeFilterEnsemble()
,
makeFilterWrapper()
,
plotFilterValues()
Create an ensemble feature filter.
Description
Creates and registers custom ensemble feature filters. Implemented ensemble filters
can be listed with listFilterEnsembleMethods. Additional
documentation for the fun
parameter specific to each filter can
be found in the description.
Usage
makeFilterEnsemble(name, base.methods, desc, fun)
Arguments
name |
( |
base.methods |
the base filter methods which the ensemble method will use. |
desc |
( |
fun |
( |
Value
Object of class “FilterEnsemble”.
See Also
Other filter:
filterFeatures()
,
generateFilterValuesData()
,
getFilteredFeatures()
,
listFilterEnsembleMethods()
,
listFilterMethods()
,
makeFilter()
,
makeFilterWrapper()
,
plotFilterValues()
Fuse learner with a feature filter method.
Description
Fuses a base learner with a filter method. Creates a learner object, which can be used like any other learner object. Internally uses filterFeatures before every model fit.
Usage
makeFilterWrapper(
learner,
fw.method = "FSelectorRcpp_information.gain",
fw.base.methods = NULL,
fw.perc = NULL,
fw.abs = NULL,
fw.threshold = NULL,
fw.fun = NULL,
fw.fun.args = NULL,
fw.mandatory.feat = NULL,
cache = FALSE,
...
)
Arguments
learner |
(Learner | |
fw.method |
( |
fw.base.methods |
( |
fw.perc |
( |
fw.abs |
( |
fw.threshold |
( |
fw.fun |
( |
fw.fun.args |
(any) |
fw.mandatory.feat |
(character) |
cache |
( |
... |
(any) |
Details
If ensemble = TRUE
, ensemble feature selection using all methods specified
in fw.method
is performed. At least two methods need to be selected.
After training, the selected features can be retrieved with getFilteredFeatures.
Note that observation weights do not influence the filtering and are simply passed down to the next learner.
Value
Caching
If cache = TRUE
, the default mlr cache directory is used to cache filter
values. The directory is operating system dependent and can be checked with
getCacheDir()
. Alternatively a custom directory can be passed to store
the cache. The cache can be cleared with deleteCacheDir()
. Caching is
disabled by default. Care should be taken when operating on large clusters
due to possible write conflicts to disk if multiple workers try to write
the same cache at the same time.
See Also
Other filter:
filterFeatures()
,
generateFilterValuesData()
,
getFilteredFeatures()
,
listFilterEnsembleMethods()
,
listFilterMethods()
,
makeFilter()
,
makeFilterEnsemble()
,
plotFilterValues()
Other wrapper:
makeBaggingWrapper()
,
makeClassificationViaRegressionWrapper()
,
makeConstantClassWrapper()
,
makeCostSensClassifWrapper()
,
makeCostSensRegrWrapper()
,
makeDownsampleWrapper()
,
makeDummyFeaturesWrapper()
,
makeExtractFDAFeatsWrapper()
,
makeFeatSelWrapper()
,
makeImputeWrapper()
,
makeMulticlassWrapper()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
,
makeOverBaggingWrapper()
,
makePreprocWrapper()
,
makePreprocWrapperCaret()
,
makeRemoveConstantFeaturesWrapper()
,
makeSMOTEWrapper()
,
makeTuneWrapper()
,
makeUndersampleWrapper()
,
makeWeightedClassesWrapper()
Examples
task = makeClassifTask(data = iris, target = "Species")
lrn = makeLearner("classif.lda")
inner = makeResampleDesc("Holdout")
outer = makeResampleDesc("CV", iters = 2)
lrn = makeFilterWrapper(lrn, fw.perc = 0.5)
mod = train(lrn, task)
print(getFilteredFeatures(mod))
# now nested resampling, where we extract the features that the filter method selected
r = resample(lrn, task, outer, extract = function(model) {
getFilteredFeatures(model)
})
print(r$extract)
# usage of an ensemble filter
lrn = makeLearner("classif.lda")
lrn = makeFilterWrapper(lrn, fw.method = "E-Borda",
fw.base.methods = c("FSelectorRcpp_gain.ratio", "FSelectorRcpp_information.gain"),
fw.perc = 0.5)
r = resample(lrn, task, outer, extract = function(model) {
getFilteredFeatures(model)
})
print(r$extract)
# usage of a custom thresholding function
biggest_gap = function(values, diff) {
gap_size = 0
gap_location = 0
for (i in (diff + 1):length(values)) {
gap = values[[i - diff]] - values[[i]]
if (gap > gap_size) {
gap_size = gap
gap_location = i - 1
}
}
return(gap_location)
}
lrn = makeLearner("classif.lda")
lrn = makeFilterWrapper(lrn, fw.method = "FSelectorRcpp_information.gain",
fw.fun = biggest_gap, fw.fun.args = list("diff" = 1))
r = resample(lrn, task, outer, extract = function(model) {
getFilteredFeatures(model)
})
print(r$extract)
Generate a fixed holdout instance for resampling.
Description
Generate a fixed holdout instance for resampling.
Usage
makeFixedHoldoutInstance(train.inds, test.inds, size)
Arguments
train.inds |
(integer) |
test.inds |
(integer) |
size |
( |
Value
Create a data.frame containing functional features from a normal data.frame.
Description
To work with functional features, those features need to be
stored as a matrix
column in the data.frame, so mlr
can automatically
recognize them as functional features.
This function allows for an easy conversion from a data.frame with numeric columns
to the required format. If the data already contains matrix columns, they are left as-is
if not specified otherwise in fd.features
. See Examples
for the structure
of the generated output.
Usage
makeFunctionalData(data, fd.features = NULL, exclude.cols = NULL)
Arguments
data |
(data.frame) |
fd.features |
(list) |
exclude.cols |
(character | integer) |
Value
(data.frame).
Examples
# data.frame where columns 1:6 and 8:10 belong to a functional feature
d1 = data.frame(matrix(rnorm(100), nrow = 10), "target" = seq_len(10))
# Transform to functional data
d2 = makeFunctionalData(d1, fd.features = list("fd1" = 1:6, "fd2" = 8:10))
# Create a regression task
makeRegrTask(data = d2, target = "target")
Create a custom imputation method.
Description
This is a constructor to create your own imputation methods.
Usage
makeImputeMethod(learn, impute, args = list())
Arguments
learn |
( |
impute |
( |
args |
(list) |
See Also
Other impute:
imputations
,
impute()
,
makeImputeWrapper()
,
reimpute()
Fuse learner with an imputation method.
Description
Fuses a base learner with an imputation method. Creates a learner object, which can be used like any other learner object. Internally uses impute before training the learner and reimpute before predicting.
Usage
makeImputeWrapper(
learner,
classes = list(),
cols = list(),
dummy.classes = character(0L),
dummy.cols = character(0L),
dummy.type = "factor",
force.dummies = FALSE,
impute.new.levels = TRUE,
recode.factor.levels = TRUE
)
Arguments
learner |
(Learner | |
classes |
(named list) |
cols |
(named list) |
dummy.classes |
(character) |
dummy.cols |
(character) |
dummy.type |
( |
force.dummies |
( |
impute.new.levels |
( |
recode.factor.levels |
( |
Value
See Also
Other impute:
imputations
,
impute()
,
makeImputeMethod()
,
reimpute()
Other wrapper:
makeBaggingWrapper()
,
makeClassificationViaRegressionWrapper()
,
makeConstantClassWrapper()
,
makeCostSensClassifWrapper()
,
makeCostSensRegrWrapper()
,
makeDownsampleWrapper()
,
makeDummyFeaturesWrapper()
,
makeExtractFDAFeatsWrapper()
,
makeFeatSelWrapper()
,
makeFilterWrapper()
,
makeMulticlassWrapper()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
,
makeOverBaggingWrapper()
,
makePreprocWrapper()
,
makePreprocWrapperCaret()
,
makeRemoveConstantFeaturesWrapper()
,
makeSMOTEWrapper()
,
makeTuneWrapper()
,
makeUndersampleWrapper()
,
makeWeightedClassesWrapper()
Create learner object.
Description
For a classification learner the predict.type
can be set to
“prob” to predict probabilities and the maximum value selects the
label. The threshold used to assign the label can later be changed using the
setThreshold function.
To see all possible properties of a learner, go to: LearnerProperties.
Usage
makeLearner(
cl,
id = cl,
predict.type = "response",
predict.threshold = NULL,
fix.factors.prediction = FALSE,
...,
par.vals = list(),
config = list()
)
Arguments
cl |
( |
id |
( |
predict.type |
( |
predict.threshold |
(numeric) |
fix.factors.prediction |
( |
... |
(any) |
par.vals |
(list) |
config |
(named list) |
Value
(Learner).
par.vals
vs. ...
The former aims at specifying default hyperparameter settings from mlr
which differ from the actual defaults in the underlying learner. For
example, respect.unordered.factors
is set to order
in mlr
while the
default in ranger::ranger depends on the argument splitrule
.
getHyperPars(<learner>)
can be used to query hyperparameter defaults that
differ from the underlying learner. This function also shows all
hyperparameters set by the user during learner creation (if these differ
from the learner defaults).
regr.randomForest
For this learner we added additional uncertainty estimation functionality
(predict.type = "se"
) for the randomForest, which is not provided by the
underlying package.
Currently implemented methods are:
If
se.method = "jackknife"
the standard error of a prediction is estimated by computing the jackknife-after-bootstrap, the mean-squared difference between the prediction made by only using trees which did not contain said observation and the ensemble prediction.If
se.method = "bootstrap"
the standard error of a prediction is estimated by bootstrapping the random forest, where the number of bootstrap replicates and the number of trees in the ensemble are controlled byse.boot
andse.ntree
respectively, and then taking the standard deviation of the bootstrap predictions. The "brute force" bootstrap is executed whenntree = se.ntree
, the latter of which controls the number of trees in the individual random forests which are bootstrapped. The "noisy bootstrap" is executed whense.ntree < ntree
which is less computationally expensive. A Monte-Carlo bias correction may make the latter option preferable in many cases. Defaults arese.boot = 50
andse.ntree = 100
.If
se.method = "sd"
, the default, the standard deviation of the predictions across trees is returned as the variance estimate. This can be computed quickly but is also a very naive estimator.
For both “jackknife” and “bootstrap”, a Monte-Carlo bias correction is applied and, in the case that this results in a negative variance estimate, the values are truncated at 0.
Note that when using the “jackknife” procedure for se estimation, using a small number of trees can lead to training data observations that are never out-of-bag. The current implementation ignores these observations, but in the original definition, the resulting se estimation would be undefined.
Please note that all of the mentioned se.method
variants do not affect the
computation of the posterior mean “response” value. This is always the
same as from the underlying randomForest.
regr.featureless
A very basic baseline method which is useful for model comparisons (if you don't beat this, you very likely have a problem). Does not consider any features of the task and only uses the target feature of the training data to make predictions. Using observation weights is currently not supported.
Methods “mean” and “median” always predict a constant value for each new observation which corresponds to the observed mean or median of the target feature in training data, respectively.
The default method is “mean” which corresponds to the ZeroR algorithm from WEKA.
classif.featureless
Method “majority” predicts always the majority class for each new observation. In the case of ties, one randomly sampled, constant class is predicted for all observations in the test set. This method is used as the default. It is very similar to the ZeroR classifier from WEKA. The only difference is that ZeroR always predicts the first class of the tied class values instead of sampling them randomly.
Method “sample-prior” always samples a random class for each individual test observation according to the prior probabilities observed in the training data.
If you opt to predict probabilities, the class probabilities always correspond to the prior probabilities observed in the training data.
See Also
Other learner:
LearnerProperties
,
getClassWeightParam()
,
getHyperPars()
,
getLearnerId()
,
getLearnerNote()
,
getLearnerPackages()
,
getLearnerParVals()
,
getLearnerParamSet()
,
getLearnerPredictType()
,
getLearnerShortName()
,
getLearnerType()
,
getParamSet()
,
helpLearner()
,
helpLearnerParam()
,
makeLearners()
,
removeHyperPars()
,
setHyperPars()
,
setId()
,
setLearnerId()
,
setPredictThreshold()
,
setPredictType()
Examples
makeLearner("classif.rpart")
makeLearner("classif.lda", predict.type = "prob")
lrn = makeLearner("classif.lda", method = "t", nu = 10)
getHyperPars(lrn)
Create multiple learners at once.
Description
Small helper function that can save some typing when creating mutiple learner objects. Calls makeLearner multiple times internally.
Usage
makeLearners(cls, ids = NULL, type = NULL, ...)
Arguments
cls |
(character) |
ids |
(character) |
type |
( |
... |
(any) |
Value
(named list of Learner). Named by ids
.
See Also
Other learner:
LearnerProperties
,
getClassWeightParam()
,
getHyperPars()
,
getLearnerId()
,
getLearnerNote()
,
getLearnerPackages()
,
getLearnerParVals()
,
getLearnerParamSet()
,
getLearnerPredictType()
,
getLearnerShortName()
,
getLearnerType()
,
getParamSet()
,
helpLearner()
,
helpLearnerParam()
,
makeLearner()
,
removeHyperPars()
,
setHyperPars()
,
setId()
,
setLearnerId()
,
setPredictThreshold()
,
setPredictType()
Examples
makeLearners(c("rpart", "lda"), type = "classif", predict.type = "prob")
Construct performance measure.
Description
A measure object encapsulates a function to evaluate the performance of a prediction. Information about already implemented measures can be obtained here: measures.
A learner is trained on a training set d1, results in a model m and predicts another set d2 (which may be a different one or the training set) resulting in the prediction. The performance measure can now be defined using all of the information of the original task, the fitted model and the prediction.
Usage
makeMeasure(
id,
minimize,
properties = character(0L),
fun,
extra.args = list(),
aggr = test.mean,
best = NULL,
worst = NULL,
name = id,
note = ""
)
Arguments
id |
( |
minimize |
( |
properties |
(character) Default is |
fun |
( |
extra.args |
(list) |
aggr |
(Aggregation) |
best |
( |
worst |
( |
name |
(character) |
note |
(character) |
Value
See Also
Other performance:
ConfusionMatrix
,
calculateConfusionMatrix()
,
calculateROCMeasures()
,
estimateRelativeOverfitting()
,
makeCostMeasure()
,
makeCustomResampledMeasure()
,
measures
,
performance()
,
setAggregation()
,
setMeasurePars()
Examples
f = function(task, model, pred, extra.args) {
sum((pred$data$response - pred$data$truth)^2)
}
makeMeasure(id = "my.sse", minimize = TRUE,
properties = c("regr", "response"), fun = f)
Create model multiplexer for model selection to tune over multiple possible models.
Description
Combines multiple base learners by dispatching on the hyperparameter “selected.learner” to a specific model class. This allows to tune not only the model class (SVM, random forest, etc) but also their hyperparameters in one go. Combine this with tuneParams and makeTuneControlIrace for a very powerful approach, see example below.
The parameter set is the union of all (unique) base learners. In order to
avoid name clashes all parameter names are prefixed with the base learner id,
i.e. learnerId.parameterName
.
The predict.type of the Multiplexer is inherited from the predict.type of the base learners.
The getter getLearnerProperties returns the properties of the selected base learner.
Usage
makeModelMultiplexer(base.learners)
Arguments
base.learners |
([list' of Learner) |
Value
(ModelMultiplexer). A Learner specialized as ModelMultiplexer
.
Note
Note that logging output during tuning is somewhat shortened to make it more readable. I.e., the artificial prefix before parameter names is suppressed.
See Also
Other multiplexer:
makeModelMultiplexerParamSet()
Other tune:
TuneControl
,
getNestedTuneResultsOptPathDf()
,
getNestedTuneResultsX()
,
getResamplingIndices()
,
getTuneResult()
,
makeModelMultiplexerParamSet()
,
makeTuneControlCMAES()
,
makeTuneControlDesign()
,
makeTuneControlGenSA()
,
makeTuneControlGrid()
,
makeTuneControlIrace()
,
makeTuneControlMBO()
,
makeTuneControlRandom()
,
makeTuneWrapper()
,
tuneParams()
,
tuneThreshold()
Examples
set.seed(123)
library(BBmisc)
bls = list(
makeLearner("classif.ksvm"),
makeLearner("classif.randomForest")
)
lrn = makeModelMultiplexer(bls)
# simple way to contruct param set for tuning
# parameter names are prefixed automatically and the 'requires'
# element is set, too, to make all paramaters subordinate to 'selected.learner'
ps = makeModelMultiplexerParamSet(lrn,
makeNumericParam("sigma", lower = -10, upper = 10, trafo = function(x) 2^x),
makeIntegerParam("ntree", lower = 1L, upper = 500L)
)
print(ps)
rdesc = makeResampleDesc("CV", iters = 2L)
# to save some time we use random search. but you probably want something like this:
# ctrl = makeTuneControlIrace(maxExperiments = 500L)
ctrl = makeTuneControlRandom(maxit = 10L)
res = tuneParams(lrn, iris.task, rdesc, par.set = ps, control = ctrl)
print(res)
df = as.data.frame(res$opt.path)
print(head(df[, -ncol(df)]))
# more unique and reliable way to construct the param set
ps = makeModelMultiplexerParamSet(lrn,
classif.ksvm = makeParamSet(
makeNumericParam("sigma", lower = -10, upper = 10, trafo = function(x) 2^x)
),
classif.randomForest = makeParamSet(
makeIntegerParam("ntree", lower = 1L, upper = 500L)
)
)
# this is how you would construct the param set manually, works too
ps = makeParamSet(
makeDiscreteParam("selected.learner", values = extractSubList(bls, "id")),
makeNumericParam("classif.ksvm.sigma", lower = -10, upper = 10, trafo = function(x) 2^x,
requires = quote(selected.learner == "classif.ksvm")),
makeIntegerParam("classif.randomForest.ntree", lower = 1L, upper = 500L,
requires = quote(selected.learner == "classif.randomForst"))
)
# all three ps-objects are exactly the same internally.
Creates a parameter set for model multiplexer tuning.
Description
Handy way to create the param set with less typing.
The following is done automatically:
The
selected.learner
param is createdParameter names are prefixed.
The
requires
field of each param is set. This makes all parameters subordinate toselected.learner
Usage
makeModelMultiplexerParamSet(multiplexer, ..., .check = TRUE)
Arguments
multiplexer |
(ModelMultiplexer) |
... |
(ParamHelpers::ParamSet | ParamHelpers::Param) |
.check |
(logical) |
Value
ParamSet.
See Also
Other multiplexer:
makeModelMultiplexer()
Other tune:
TuneControl
,
getNestedTuneResultsOptPathDf()
,
getNestedTuneResultsX()
,
getResamplingIndices()
,
getTuneResult()
,
makeModelMultiplexer()
,
makeTuneControlCMAES()
,
makeTuneControlDesign()
,
makeTuneControlGenSA()
,
makeTuneControlGrid()
,
makeTuneControlIrace()
,
makeTuneControlMBO()
,
makeTuneControlRandom()
,
makeTuneWrapper()
,
tuneParams()
,
tuneThreshold()
Examples
# See makeModelMultiplexer
Fuse learner with multiclass method.
Description
Fuses a base learner with a multi-class method. Creates a learner object, which can be used like any other learner object. This way learners which can only handle binary classification will be able to handle multi-class problems, too.
We use a multiclass-to-binary reduction principle, where multiple binary problems are created from the multiclass task. How these binary problems are generated is defined by an error-correcting-output-code (ECOC) code book. This also allows the simple and well-known one-vs-one and one-vs-rest approaches. Decoding is currently done via Hamming decoding, see e.g. here https://jmlr.org/papers/volume11/escalera10a/escalera10a.pdf.
Currently, the approach always operates on the discrete predicted labels of the binary base models (instead of their probabilities) and the created wrapper cannot predict posterior probabilities.
Usage
makeMulticlassWrapper(learner, mcw.method = "onevsrest")
Arguments
learner |
(Learner | |
mcw.method |
( |
Value
See Also
Other wrapper:
makeBaggingWrapper()
,
makeClassificationViaRegressionWrapper()
,
makeConstantClassWrapper()
,
makeCostSensClassifWrapper()
,
makeCostSensRegrWrapper()
,
makeDownsampleWrapper()
,
makeDummyFeaturesWrapper()
,
makeExtractFDAFeatsWrapper()
,
makeFeatSelWrapper()
,
makeFilterWrapper()
,
makeImputeWrapper()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
,
makeOverBaggingWrapper()
,
makePreprocWrapper()
,
makePreprocWrapperCaret()
,
makeRemoveConstantFeaturesWrapper()
,
makeSMOTEWrapper()
,
makeTuneWrapper()
,
makeUndersampleWrapper()
,
makeWeightedClassesWrapper()
Use binary relevance method to create a multilabel learner.
Description
Every learner which is implemented in mlr and which supports binary classification can be converted to a wrapped binary relevance multilabel learner. The multilabel classification problem is converted into simple binary classifications for each label/target on which the binary learner is applied.
Models can easily be accessed via getLearnerModel.
Note that it does not make sense to set a threshold in the used base learner
when you predict probabilities.
On the other hand, it can make a lot of sense, to call setThreshold
on the MultilabelBinaryRelevanceWrapper
for each label indvidually;
Or to tune these thresholds with tuneThreshold; especially when you face very
unabalanced class distributions for each binary label.
Usage
makeMultilabelBinaryRelevanceWrapper(learner)
Arguments
learner |
(Learner | |
Value
References
Tsoumakas, G., & Katakis, I. (2006) Multi-label classification: An overview. Dept. of Informatics, Aristotle University of Thessaloniki, Greece.
See Also
Other wrapper:
makeBaggingWrapper()
,
makeClassificationViaRegressionWrapper()
,
makeConstantClassWrapper()
,
makeCostSensClassifWrapper()
,
makeCostSensRegrWrapper()
,
makeDownsampleWrapper()
,
makeDummyFeaturesWrapper()
,
makeExtractFDAFeatsWrapper()
,
makeFeatSelWrapper()
,
makeFilterWrapper()
,
makeImputeWrapper()
,
makeMulticlassWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
,
makeOverBaggingWrapper()
,
makePreprocWrapper()
,
makePreprocWrapperCaret()
,
makeRemoveConstantFeaturesWrapper()
,
makeSMOTEWrapper()
,
makeTuneWrapper()
,
makeUndersampleWrapper()
,
makeWeightedClassesWrapper()
Other multilabel:
getMultilabelBinaryPerformances()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
Examples
if (requireNamespace("rpart")) {
d = getTaskData(yeast.task)
# drop some labels so example runs faster
d = d[seq(1, nrow(d), by = 20), c(1:2, 15:17)]
task = makeMultilabelTask(data = d, target = c("label1", "label2"))
lrn = makeLearner("classif.rpart")
lrn = makeMultilabelBinaryRelevanceWrapper(lrn)
lrn = setPredictType(lrn, "prob")
# train, predict and evaluate
mod = train(lrn, task)
pred = predict(mod, task)
performance(pred, measure = list(multilabel.hamloss, multilabel.subset01, multilabel.f1))
# the next call basically has the same structure for any multilabel meta wrapper
getMultilabelBinaryPerformances(pred, measures = list(mmce, auc))
# above works also with predictions from resample!
}
Use classifier chains method (CC) to create a multilabel learner.
Description
Every learner which is implemented in mlr and which supports binary classification can be converted to a wrapped classifier chains multilabel learner. CC trains a binary classifier for each label following a given order. In training phase, the feature space of each classifier is extended with true label information of all previous labels in the chain. During the prediction phase, when true labels are not available, they are replaced by predicted labels.
Models can easily be accessed via getLearnerModel.
Usage
makeMultilabelClassifierChainsWrapper(learner, order = NULL)
Arguments
learner |
(Learner | |
order |
(character) |
Value
References
Montanes, E. et al. (2013) Dependent binary relevance models for multi-label classification Artificial Intelligence Center, University of Oviedo at Gijon, Spain.
See Also
Other wrapper:
makeBaggingWrapper()
,
makeClassificationViaRegressionWrapper()
,
makeConstantClassWrapper()
,
makeCostSensClassifWrapper()
,
makeCostSensRegrWrapper()
,
makeDownsampleWrapper()
,
makeDummyFeaturesWrapper()
,
makeExtractFDAFeatsWrapper()
,
makeFeatSelWrapper()
,
makeFilterWrapper()
,
makeImputeWrapper()
,
makeMulticlassWrapper()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
,
makeOverBaggingWrapper()
,
makePreprocWrapper()
,
makePreprocWrapperCaret()
,
makeRemoveConstantFeaturesWrapper()
,
makeSMOTEWrapper()
,
makeTuneWrapper()
,
makeUndersampleWrapper()
,
makeWeightedClassesWrapper()
Other multilabel:
getMultilabelBinaryPerformances()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
Examples
if (requireNamespace("rpart")) {
d = getTaskData(yeast.task)
# drop some labels so example runs faster
d = d[seq(1, nrow(d), by = 20), c(1:2, 15:17)]
task = makeMultilabelTask(data = d, target = c("label1", "label2"))
lrn = makeLearner("classif.rpart")
lrn = makeMultilabelBinaryRelevanceWrapper(lrn)
lrn = setPredictType(lrn, "prob")
# train, predict and evaluate
mod = train(lrn, task)
pred = predict(mod, task)
performance(pred, measure = list(multilabel.hamloss, multilabel.subset01, multilabel.f1))
# the next call basically has the same structure for any multilabel meta wrapper
getMultilabelBinaryPerformances(pred, measures = list(mmce, auc))
# above works also with predictions from resample!
}
Use dependent binary relevance method (DBR) to create a multilabel learner.
Description
Every learner which is implemented in mlr and which supports binary classification can be converted to a wrapped DBR multilabel learner. The multilabel classification problem is converted into simple binary classifications for each label/target on which the binary learner is applied. For each target, actual information of all binary labels (except the target variable) is used as additional features. During prediction these labels need are obtained by the binary relevance method using the same binary learner.
Models can easily be accessed via getLearnerModel.
Usage
makeMultilabelDBRWrapper(learner)
Arguments
learner |
(Learner | |
Value
References
Montanes, E. et al. (2013) Dependent binary relevance models for multi-label classification Artificial Intelligence Center, University of Oviedo at Gijon, Spain.
See Also
Other wrapper:
makeBaggingWrapper()
,
makeClassificationViaRegressionWrapper()
,
makeConstantClassWrapper()
,
makeCostSensClassifWrapper()
,
makeCostSensRegrWrapper()
,
makeDownsampleWrapper()
,
makeDummyFeaturesWrapper()
,
makeExtractFDAFeatsWrapper()
,
makeFeatSelWrapper()
,
makeFilterWrapper()
,
makeImputeWrapper()
,
makeMulticlassWrapper()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
,
makeOverBaggingWrapper()
,
makePreprocWrapper()
,
makePreprocWrapperCaret()
,
makeRemoveConstantFeaturesWrapper()
,
makeSMOTEWrapper()
,
makeTuneWrapper()
,
makeUndersampleWrapper()
,
makeWeightedClassesWrapper()
Other multilabel:
getMultilabelBinaryPerformances()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
Examples
if (requireNamespace("rpart")) {
d = getTaskData(yeast.task)
# drop some labels so example runs faster
d = d[seq(1, nrow(d), by = 20), c(1:2, 15:17)]
task = makeMultilabelTask(data = d, target = c("label1", "label2"))
lrn = makeLearner("classif.rpart")
lrn = makeMultilabelBinaryRelevanceWrapper(lrn)
lrn = setPredictType(lrn, "prob")
# train, predict and evaluate
mod = train(lrn, task)
pred = predict(mod, task)
performance(pred, measure = list(multilabel.hamloss, multilabel.subset01, multilabel.f1))
# the next call basically has the same structure for any multilabel meta wrapper
getMultilabelBinaryPerformances(pred, measures = list(mmce, auc))
# above works also with predictions from resample!
}
Use nested stacking method to create a multilabel learner.
Description
Every learner which is implemented in mlr and which supports binary classification can be converted to a wrapped nested stacking multilabel learner. Nested stacking trains a binary classifier for each label following a given order. In training phase, the feature space of each classifier is extended with predicted label information (by cross validation) of all previous labels in the chain. During the prediction phase, predicted labels are obtained by the classifiers, which have been learned on all training data.
Models can easily be accessed via getLearnerModel.
Usage
makeMultilabelNestedStackingWrapper(learner, order = NULL, cv.folds = 2)
Arguments
learner |
(Learner | |
order |
(character) |
cv.folds |
( |
Value
References
Montanes, E. et al. (2013), Dependent binary relevance models for multi-label classification Artificial Intelligence Center, University of Oviedo at Gijon, Spain.
See Also
Other wrapper:
makeBaggingWrapper()
,
makeClassificationViaRegressionWrapper()
,
makeConstantClassWrapper()
,
makeCostSensClassifWrapper()
,
makeCostSensRegrWrapper()
,
makeDownsampleWrapper()
,
makeDummyFeaturesWrapper()
,
makeExtractFDAFeatsWrapper()
,
makeFeatSelWrapper()
,
makeFilterWrapper()
,
makeImputeWrapper()
,
makeMulticlassWrapper()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelStackingWrapper()
,
makeOverBaggingWrapper()
,
makePreprocWrapper()
,
makePreprocWrapperCaret()
,
makeRemoveConstantFeaturesWrapper()
,
makeSMOTEWrapper()
,
makeTuneWrapper()
,
makeUndersampleWrapper()
,
makeWeightedClassesWrapper()
Other multilabel:
getMultilabelBinaryPerformances()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelStackingWrapper()
Examples
if (requireNamespace("rpart")) {
d = getTaskData(yeast.task)
# drop some labels so example runs faster
d = d[seq(1, nrow(d), by = 20), c(1:2, 15:17)]
task = makeMultilabelTask(data = d, target = c("label1", "label2"))
lrn = makeLearner("classif.rpart")
lrn = makeMultilabelBinaryRelevanceWrapper(lrn)
lrn = setPredictType(lrn, "prob")
# train, predict and evaluate
mod = train(lrn, task)
pred = predict(mod, task)
performance(pred, measure = list(multilabel.hamloss, multilabel.subset01, multilabel.f1))
# the next call basically has the same structure for any multilabel meta wrapper
getMultilabelBinaryPerformances(pred, measures = list(mmce, auc))
# above works also with predictions from resample!
}
Use stacking method (stacked generalization) to create a multilabel learner.
Description
Every learner which is implemented in mlr and which supports binary classification can be converted to a wrapped stacking multilabel learner. Stacking trains a binary classifier for each label using predicted label information of all labels (including the target label) as additional features (by cross validation). During prediction these labels need are obtained by the binary relevance method using the same binary learner.
Models can easily be accessed via getLearnerModel.
Usage
makeMultilabelStackingWrapper(learner, cv.folds = 2)
Arguments
learner |
(Learner | |
cv.folds |
( |
Value
References
Montanes, E. et al. (2013) Dependent binary relevance models for multi-label classification Artificial Intelligence Center, University of Oviedo at Gijon, Spain.
See Also
Other wrapper:
makeBaggingWrapper()
,
makeClassificationViaRegressionWrapper()
,
makeConstantClassWrapper()
,
makeCostSensClassifWrapper()
,
makeCostSensRegrWrapper()
,
makeDownsampleWrapper()
,
makeDummyFeaturesWrapper()
,
makeExtractFDAFeatsWrapper()
,
makeFeatSelWrapper()
,
makeFilterWrapper()
,
makeImputeWrapper()
,
makeMulticlassWrapper()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeOverBaggingWrapper()
,
makePreprocWrapper()
,
makePreprocWrapperCaret()
,
makeRemoveConstantFeaturesWrapper()
,
makeSMOTEWrapper()
,
makeTuneWrapper()
,
makeUndersampleWrapper()
,
makeWeightedClassesWrapper()
Other multilabel:
getMultilabelBinaryPerformances()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
Examples
if (requireNamespace("rpart")) {
d = getTaskData(yeast.task)
# drop some labels so example runs faster
d = d[seq(1, nrow(d), by = 20), c(1:2, 15:17)]
task = makeMultilabelTask(data = d, target = c("label1", "label2"))
lrn = makeLearner("classif.rpart")
lrn = makeMultilabelBinaryRelevanceWrapper(lrn)
lrn = setPredictType(lrn, "prob")
# train, predict and evaluate
mod = train(lrn, task)
pred = predict(mod, task)
performance(pred, measure = list(multilabel.hamloss, multilabel.subset01, multilabel.f1))
# the next call basically has the same structure for any multilabel meta wrapper
getMultilabelBinaryPerformances(pred, measures = list(mmce, auc))
# above works also with predictions from resample!
}
Create a multilabel task.
Description
Create a multilabel task.
Usage
makeMultilabelTask(
id = deparse(substitute(data)),
data,
target,
weights = NULL,
blocking = NULL,
coordinates = NULL,
fixup.data = "warn",
check.data = TRUE
)
Arguments
id |
( |
data |
(data.frame) |
target |
( |
weights |
(numeric) |
blocking |
(factor) |
coordinates |
(data.frame) |
fixup.data |
( |
check.data |
( |
Details
For multilabel classification we assume that the presence of labels is encoded via logical
columns in data
. The name of the column specifies the name of the label. target
is then a char vector that points to these columns.
Note
For multilabel classification we assume that the presence of labels is encoded via logical
columns in data
. The name of the column specifies the name of the label. target
is then a char vector that points to these columns.
See Also
Task ClassifTask ClusterTask CostSensTask RegrTask SurvTask
Fuse learner with the bagging technique and oversampling for imbalancy correction.
Description
Fuses a classification learner for binary classification with an over-bagging method for imbalancy correction when we have strongly unequal class sizes. Creates a learner object, which can be used like any other learner object. Models can easily be accessed via getLearnerModel.
OverBagging is implemented as follows: For each iteration a random data subset is sampled. Class examples are oversampled with replacement with a given rate. Members of the other class are either simply copied into each bag, or bootstrapped with replacement until we have as many majority class examples as in the original training data. Features are currently not changed or sampled.
Prediction works as follows: For classification we do majority voting to create a discrete label and probabilities are predicted by considering the proportions of all predicted labels.
Usage
makeOverBaggingWrapper(
learner,
obw.iters = 10L,
obw.rate = 1,
obw.maxcl = "boot",
obw.cl = NULL
)
Arguments
learner |
(Learner | |
obw.iters |
( |
obw.rate |
( |
obw.maxcl |
( |
obw.cl |
( |
Value
See Also
Other imbalancy:
makeUndersampleWrapper()
,
oversample()
,
smote()
Other wrapper:
makeBaggingWrapper()
,
makeClassificationViaRegressionWrapper()
,
makeConstantClassWrapper()
,
makeCostSensClassifWrapper()
,
makeCostSensRegrWrapper()
,
makeDownsampleWrapper()
,
makeDummyFeaturesWrapper()
,
makeExtractFDAFeatsWrapper()
,
makeFeatSelWrapper()
,
makeFilterWrapper()
,
makeImputeWrapper()
,
makeMulticlassWrapper()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
,
makePreprocWrapper()
,
makePreprocWrapperCaret()
,
makeRemoveConstantFeaturesWrapper()
,
makeSMOTEWrapper()
,
makeTuneWrapper()
,
makeUndersampleWrapper()
,
makeWeightedClassesWrapper()
Fuse learner with preprocessing.
Description
Fuses a base learner with a preprocessing method. Creates a learner object, which can be used like any other learner object, but which internally preprocesses the data as requested. If the train or predict function is called on data / a task, the preprocessing is always performed automatically.
Usage
makePreprocWrapper(
learner,
train,
predict,
par.set = makeParamSet(),
par.vals = list()
)
Arguments
learner |
(Learner | |
train |
( |
predict |
( |
par.set |
(ParamHelpers::ParamSet) |
par.vals |
(list) |
Value
(Learner).
See Also
Other wrapper:
makeBaggingWrapper()
,
makeClassificationViaRegressionWrapper()
,
makeConstantClassWrapper()
,
makeCostSensClassifWrapper()
,
makeCostSensRegrWrapper()
,
makeDownsampleWrapper()
,
makeDummyFeaturesWrapper()
,
makeExtractFDAFeatsWrapper()
,
makeFeatSelWrapper()
,
makeFilterWrapper()
,
makeImputeWrapper()
,
makeMulticlassWrapper()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
,
makeOverBaggingWrapper()
,
makePreprocWrapperCaret()
,
makeRemoveConstantFeaturesWrapper()
,
makeSMOTEWrapper()
,
makeTuneWrapper()
,
makeUndersampleWrapper()
,
makeWeightedClassesWrapper()
Fuse learner with preprocessing.
Description
Fuses a learner with preprocessing methods provided by caret::preProcess. Before training the preprocessing will be performed and the preprocessing model will be stored. Before prediction the preprocessing model will transform the test data according to the trained model.
After being wrapped the learner will support missing values although this will only be the case if ppc.knnImpute
, ppc.bagImpute
or ppc.medianImpute
is set to TRUE
.
Usage
makePreprocWrapperCaret(learner, ...)
Arguments
learner |
(Learner | |
... |
(any) |
Value
See Also
Other wrapper:
makeBaggingWrapper()
,
makeClassificationViaRegressionWrapper()
,
makeConstantClassWrapper()
,
makeCostSensClassifWrapper()
,
makeCostSensRegrWrapper()
,
makeDownsampleWrapper()
,
makeDummyFeaturesWrapper()
,
makeExtractFDAFeatsWrapper()
,
makeFeatSelWrapper()
,
makeFilterWrapper()
,
makeImputeWrapper()
,
makeMulticlassWrapper()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
,
makeOverBaggingWrapper()
,
makePreprocWrapper()
,
makeRemoveConstantFeaturesWrapper()
,
makeSMOTEWrapper()
,
makeTuneWrapper()
,
makeUndersampleWrapper()
,
makeWeightedClassesWrapper()
Classification of functional data by Generalized Linear Models.
Description
Learner for classification using Generalized Linear Models.
Usage
## S3 method for class 'classif.fdausc.glm'
makeRLearner()
Learner for kernel classification for functional data.
Description
Learner for kernel Classification.
Usage
## S3 method for class 'classif.fdausc.kernel'
makeRLearner()
Learner for nonparametric classification for functional data.
Description
Learner for Nonparametric Supervised Classification.
Usage
## S3 method for class 'classif.fdausc.np'
makeRLearner()
Create a regression task.
Description
Create a regression task.
Usage
makeRegrTask(
id = deparse(substitute(data)),
data,
target,
weights = NULL,
blocking = NULL,
coordinates = NULL,
fixup.data = "warn",
check.data = TRUE
)
Arguments
id |
( |
data |
(data.frame) |
target |
( |
weights |
(numeric) |
blocking |
(factor) |
coordinates |
(data.frame) |
fixup.data |
( |
check.data |
( |
See Also
Task ClassifTask CostSensTask ClusterTask MultilabelTask SurvTask
Fuse learner with removal of constant features preprocessing.
Description
Fuses a base learner with the preprocessing implemented in removeConstantFeatures.
Usage
makeRemoveConstantFeaturesWrapper(
learner,
perc = 0,
dont.rm = character(0L),
na.ignore = FALSE,
wrap.tol = .Machine$double.eps^0.5
)
Arguments
learner |
(Learner | |
perc |
( |
dont.rm |
(character) |
na.ignore |
( |
wrap.tol |
( |
Value
See Also
Other wrapper:
makeBaggingWrapper()
,
makeClassificationViaRegressionWrapper()
,
makeConstantClassWrapper()
,
makeCostSensClassifWrapper()
,
makeCostSensRegrWrapper()
,
makeDownsampleWrapper()
,
makeDummyFeaturesWrapper()
,
makeExtractFDAFeatsWrapper()
,
makeFeatSelWrapper()
,
makeFilterWrapper()
,
makeImputeWrapper()
,
makeMulticlassWrapper()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
,
makeOverBaggingWrapper()
,
makePreprocWrapper()
,
makePreprocWrapperCaret()
,
makeSMOTEWrapper()
,
makeTuneWrapper()
,
makeUndersampleWrapper()
,
makeWeightedClassesWrapper()
Create a description object for a resampling strategy.
Description
A description of a resampling algorithm contains all necessary information to create a ResampleInstance, when given the size of the data set.
Usage
makeResampleDesc(
method,
predict = "test",
...,
stratify = FALSE,
stratify.cols = NULL,
fixed = FALSE,
blocking.cv = FALSE
)
Arguments
method |
( |
predict |
( |
... |
(any)
|
stratify |
( |
stratify.cols |
(character) |
fixed |
( |
blocking.cv |
( |
Details
Some notes on some special strategies:
- Repeated cross-validation
Use “RepCV”. Then you have to set the aggregation function for your preferred performance measure to “testgroup.mean” via setAggregation.
- B632 bootstrap
Use “Bootstrap” for bootstrap and set predict to “both”. Then you have to set the aggregation function for your preferred performance measure to “b632” via setAggregation.
- B632+ bootstrap
Use “Bootstrap” for bootstrap and set predict to “both”. Then you have to set the aggregation function for your preferred performance measure to “b632plus” via setAggregation.
- Fixed Holdout set
Object slots:
- id (
character(1)
) Name of resampling strategy.
- iters (
integer(1)
) Number of iterations. Note that this is always the complete number of generated train/test sets, so for a 10-times repeated 5fold cross-validation it would be 50.
- predict (
character(1)
) See argument.
- stratify (
logical(1)
) See argument.
- All parameters passed in ... under the respective argument name
See arguments.
Value
(ResampleDesc).
Standard ResampleDesc objects
For common resampling strategies you can save some typing by using the following description objects:
- hout
holdout a.k.a. test sample estimation (two-thirds training set, one-third testing set)
- cv2
2-fold cross-validation
- cv3
3-fold cross-validation
- cv5
5-fold cross-validation
- cv10
10-fold cross-validation
See Also
Other resample:
ResamplePrediction
,
ResampleResult
,
addRRMeasure()
,
getRRPredictionList()
,
getRRPredictions()
,
getRRTaskDesc()
,
getRRTaskDescription()
,
makeResampleInstance()
,
resample()
Examples
# Bootstraping
makeResampleDesc("Bootstrap", iters = 10)
makeResampleDesc("Bootstrap", iters = 10, predict = "both")
# Subsampling
makeResampleDesc("Subsample", iters = 10, split = 3 / 4)
makeResampleDesc("Subsample", iters = 10)
# Holdout a.k.a. test sample estimation
makeResampleDesc("Holdout")
Instantiates a resampling strategy object.
Description
This class encapsulates training and test sets generated from the data set for a number of iterations. It mainly stores a set of integer vectors indicating the training and test examples for each iteration.
Usage
makeResampleInstance(desc, task, size, ...)
Arguments
desc |
(ResampleDesc | |
task |
(Task) |
size |
(integer) |
... |
(any) |
Details
Object slots:
- desc (ResampleDesc)
See argument.
- size (
integer(1)
) See argument.
- train.inds (list of integer)
List of of training indices for all iterations.
- test.inds (list of integer)
List of of test indices for all iterations.
- group (factor)
Optional grouping of resampling iterations. This encodes whether specific iterations 'belong together' (e.g. repeated CV), and it can later be used to aggregate performance values accordingly. Default is 'factor()'.
Value
See Also
Other resample:
ResamplePrediction
,
ResampleResult
,
addRRMeasure()
,
getRRPredictionList()
,
getRRPredictions()
,
getRRTaskDesc()
,
getRRTaskDescription()
,
makeResampleDesc()
,
resample()
Examples
rdesc = makeResampleDesc("Bootstrap", iters = 10)
rin = makeResampleInstance(rdesc, task = iris.task)
rdesc = makeResampleDesc("CV", iters = 50)
rin = makeResampleInstance(rdesc, size = nrow(iris))
rin = makeResampleInstance("CV", iters = 10, task = iris.task)
Fuse learner with SMOTE oversampling for imbalancy correction in binary classification.
Description
Creates a learner object, which can be used like any other learner object. Internally uses smote before every model fit.
Note that observation weights do not influence the sampling and are simply passed down to the next learner.
Usage
makeSMOTEWrapper(
learner,
sw.rate = 1,
sw.nn = 5L,
sw.standardize = TRUE,
sw.alt.logic = FALSE
)
Arguments
learner |
(Learner | |
sw.rate |
( |
sw.nn |
( |
sw.standardize |
( |
sw.alt.logic |
( |
Value
See Also
Other wrapper:
makeBaggingWrapper()
,
makeClassificationViaRegressionWrapper()
,
makeConstantClassWrapper()
,
makeCostSensClassifWrapper()
,
makeCostSensRegrWrapper()
,
makeDownsampleWrapper()
,
makeDummyFeaturesWrapper()
,
makeExtractFDAFeatsWrapper()
,
makeFeatSelWrapper()
,
makeFilterWrapper()
,
makeImputeWrapper()
,
makeMulticlassWrapper()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
,
makeOverBaggingWrapper()
,
makePreprocWrapper()
,
makePreprocWrapperCaret()
,
makeRemoveConstantFeaturesWrapper()
,
makeTuneWrapper()
,
makeUndersampleWrapper()
,
makeWeightedClassesWrapper()
Create a stacked learner object.
Description
A stacked learner uses predictions of several base learners and fits a super learner using these predictions as features in order to predict the outcome. The following stacking methods are available:
-
average
Averaging of base learner predictions without weights. -
stack.nocv
Fits the super learner, where in-sample predictions of the base learners are used. -
stack.cv
Fits the super learner, where the base learner predictions are computed by cross-validated predictions (the resampling strategy can be set via theresampling
argument). -
hill.climb
Select a subset of base learner predictions by hill climbing algorithm. -
compress
Train a neural network to compress the model from a collection of base learners.
Usage
makeStackedLearner(
base.learners,
super.learner = NULL,
predict.type = NULL,
method = "stack.nocv",
use.feat = FALSE,
resampling = NULL,
parset = list()
)
Arguments
base.learners |
((list of) Learner) |
super.learner |
(Learner | character(1)) |
predict.type |
(
|
method |
( |
use.feat |
( |
resampling |
(ResampleDesc) |
parset |
the parameters for
the parameters for
|
Examples
# Classification
data(iris)
tsk = makeClassifTask(data = iris, target = "Species")
base = c("classif.rpart", "classif.lda", "classif.svm")
lrns = lapply(base, makeLearner)
lrns = lapply(lrns, setPredictType, "prob")
m = makeStackedLearner(base.learners = lrns,
predict.type = "prob", method = "hill.climb")
tmp = train(m, tsk)
res = predict(tmp, tsk)
# Regression
data(BostonHousing, package = "mlbench")
tsk = makeRegrTask(data = BostonHousing, target = "medv")
base = c("regr.rpart", "regr.svm")
lrns = lapply(base, makeLearner)
m = makeStackedLearner(base.learners = lrns,
predict.type = "response", method = "compress")
tmp = train(m, tsk)
res = predict(tmp, tsk)
Create a survival task.
Description
Create a survival task.
Usage
makeSurvTask(
id = deparse(substitute(data)),
data,
target,
weights = NULL,
blocking = NULL,
coordinates = NULL,
fixup.data = "warn",
check.data = TRUE
)
Arguments
id |
( |
data |
(data.frame) |
target |
( |
weights |
(numeric) |
blocking |
(factor) |
coordinates |
(data.frame) |
fixup.data |
( |
check.data |
( |
See Also
Task ClassifTask ClusterTask CostSensTask MultilabelTask RegrTask
Exported for internal use.
Description
Exported for internal use.
Usage
makeTaskDescInternal(type, id, data, target, weights, blocking, coordinates)
Arguments
type |
( |
id |
( |
data |
(data.frame) |
target |
(character) |
weights |
(numeric) |
blocking |
(numeric) |
coordinates |
( |
Create control object for hyperparameter tuning with CMAES.
Description
CMA Evolution Strategy with method cmaes::cma_es. Can handle numeric(vector) and integer(vector) hyperparameters, but no dependencies. For integers the internally proposed numeric values are automatically rounded. The sigma variance parameter is initialized to 1/4 of the span of box-constraints per parameter dimension.
Usage
makeTuneControlCMAES(
same.resampling.instance = TRUE,
impute.val = NULL,
start = NULL,
tune.threshold = FALSE,
tune.threshold.args = list(),
log.fun = "default",
final.dw.perc = NULL,
budget = NULL,
...
)
Arguments
same.resampling.instance |
( |
impute.val |
(numeric) |
start |
(list) |
tune.threshold |
( |
tune.threshold.args |
(list) |
log.fun |
( |
final.dw.perc |
( |
budget |
( |
... |
(any) |
Value
See Also
Other tune:
TuneControl
,
getNestedTuneResultsOptPathDf()
,
getNestedTuneResultsX()
,
getResamplingIndices()
,
getTuneResult()
,
makeModelMultiplexer()
,
makeModelMultiplexerParamSet()
,
makeTuneControlDesign()
,
makeTuneControlGenSA()
,
makeTuneControlGrid()
,
makeTuneControlIrace()
,
makeTuneControlMBO()
,
makeTuneControlRandom()
,
makeTuneWrapper()
,
tuneParams()
,
tuneThreshold()
Create control object for hyperparameter tuning with predefined design.
Description
Completely pre-specifiy a data.frame
of design points to be evaluated
during tuning. All kinds of parameter types can be handled.
Usage
makeTuneControlDesign(
same.resampling.instance = TRUE,
impute.val = NULL,
design = NULL,
tune.threshold = FALSE,
tune.threshold.args = list(),
log.fun = "default"
)
Arguments
same.resampling.instance |
( |
impute.val |
(numeric) |
design |
(data.frame) |
tune.threshold |
( |
tune.threshold.args |
(list) |
log.fun |
( |
Value
See Also
Other tune:
TuneControl
,
getNestedTuneResultsOptPathDf()
,
getNestedTuneResultsX()
,
getResamplingIndices()
,
getTuneResult()
,
makeModelMultiplexer()
,
makeModelMultiplexerParamSet()
,
makeTuneControlCMAES()
,
makeTuneControlGenSA()
,
makeTuneControlGrid()
,
makeTuneControlIrace()
,
makeTuneControlMBO()
,
makeTuneControlRandom()
,
makeTuneWrapper()
,
tuneParams()
,
tuneThreshold()
Create control object for hyperparameter tuning with GenSA.
Description
Generalized simulated annealing with method GenSA::GenSA. Can handle numeric(vector) and integer(vector) hyperparameters, but no dependencies. For integers the internally proposed numeric values are automatically rounded.
Usage
makeTuneControlGenSA(
same.resampling.instance = TRUE,
impute.val = NULL,
start = NULL,
tune.threshold = FALSE,
tune.threshold.args = list(),
log.fun = "default",
final.dw.perc = NULL,
budget = NULL,
...
)
Arguments
same.resampling.instance |
( |
impute.val |
(numeric) |
start |
(list) |
tune.threshold |
( |
tune.threshold.args |
(list) |
log.fun |
( |
final.dw.perc |
( |
budget |
( |
... |
(any) |
Value
See Also
Other tune:
TuneControl
,
getNestedTuneResultsOptPathDf()
,
getNestedTuneResultsX()
,
getResamplingIndices()
,
getTuneResult()
,
makeModelMultiplexer()
,
makeModelMultiplexerParamSet()
,
makeTuneControlCMAES()
,
makeTuneControlDesign()
,
makeTuneControlGrid()
,
makeTuneControlIrace()
,
makeTuneControlMBO()
,
makeTuneControlRandom()
,
makeTuneWrapper()
,
tuneParams()
,
tuneThreshold()
Create control object for hyperparameter tuning with grid search.
Description
A basic grid search can handle all kinds of parameter types.
You can either use their correct param type and resolution
,
or discretize them yourself by always using ParamHelpers::makeDiscreteParam
in the par.set
passed to tuneParams.
Usage
makeTuneControlGrid(
same.resampling.instance = TRUE,
impute.val = NULL,
resolution = 10L,
tune.threshold = FALSE,
tune.threshold.args = list(),
log.fun = "default",
final.dw.perc = NULL,
budget = NULL
)
Arguments
same.resampling.instance |
( |
impute.val |
(numeric) |
resolution |
(integer) |
tune.threshold |
( |
tune.threshold.args |
(list) |
log.fun |
( |
final.dw.perc |
( |
budget |
( |
Value
See Also
Other tune:
TuneControl
,
getNestedTuneResultsOptPathDf()
,
getNestedTuneResultsX()
,
getResamplingIndices()
,
getTuneResult()
,
makeModelMultiplexer()
,
makeModelMultiplexerParamSet()
,
makeTuneControlCMAES()
,
makeTuneControlDesign()
,
makeTuneControlGenSA()
,
makeTuneControlIrace()
,
makeTuneControlMBO()
,
makeTuneControlRandom()
,
makeTuneWrapper()
,
tuneParams()
,
tuneThreshold()
Create control object for hyperparameter tuning with Irace.
Description
Tuning with iterated F-Racing with method irace::irace. All
kinds of parameter types can be handled. We return the best of the final
elite candidates found by irace in the last race. Its estimated performance
is the mean of all evaluations ever done for that candidate. More information
on irace can be found in package vignette: vignette("irace-package", package = "irace")
For resampling you have to pass a ResampleDesc, not a ResampleInstance.
The resampling strategy is randomly instantiated n.instances
times and
these are the instances in the sense of irace (instances
element of
tunerConfig
in irace::irace). Also note that irace will always store its
tuning results in a file on disk, see the package documentation for details
on this and how to change the file path.
Usage
makeTuneControlIrace(
impute.val = NULL,
n.instances = 100L,
show.irace.output = FALSE,
tune.threshold = FALSE,
tune.threshold.args = list(),
log.fun = "default",
final.dw.perc = NULL,
budget = NULL,
...
)
Arguments
impute.val |
(numeric) |
n.instances |
( |
show.irace.output |
( |
tune.threshold |
( |
tune.threshold.args |
(list) |
log.fun |
( |
final.dw.perc |
( |
budget |
( |
... |
(any) |
Value
See Also
Other tune:
TuneControl
,
getNestedTuneResultsOptPathDf()
,
getNestedTuneResultsX()
,
getResamplingIndices()
,
getTuneResult()
,
makeModelMultiplexer()
,
makeModelMultiplexerParamSet()
,
makeTuneControlCMAES()
,
makeTuneControlDesign()
,
makeTuneControlGenSA()
,
makeTuneControlGrid()
,
makeTuneControlMBO()
,
makeTuneControlRandom()
,
makeTuneWrapper()
,
tuneParams()
,
tuneThreshold()
Create control object for hyperparameter tuning with MBO.
Description
Model-based / Bayesian optimization with the function mlrMBO::mbo from the mlrMBO package. Please refer to https://github.com/mlr-org/mlrMBO for further info.
Usage
makeTuneControlMBO(
same.resampling.instance = TRUE,
impute.val = NULL,
learner = NULL,
mbo.control = NULL,
tune.threshold = FALSE,
tune.threshold.args = list(),
continue = FALSE,
log.fun = "default",
final.dw.perc = NULL,
budget = NULL,
mbo.design = NULL
)
Arguments
same.resampling.instance |
( |
impute.val |
(numeric) |
learner |
(Learner | |
mbo.control |
(mlrMBO::MBOControl | |
tune.threshold |
( |
tune.threshold.args |
(list) |
continue |
( |
log.fun |
( |
final.dw.perc |
( |
budget |
( |
mbo.design |
(data.frame | |
Value
References
Bernd Bischl, Jakob Richter, Jakob Bossek, Daniel Horn, Janek Thomas and Michel Lang; mlrMBO: A Modular Framework for Model-Based Optimization of Expensive Black-Box Functions, Preprint: https://arxiv.org/abs/1703.03373 (2017).
See Also
Other tune:
TuneControl
,
getNestedTuneResultsOptPathDf()
,
getNestedTuneResultsX()
,
getResamplingIndices()
,
getTuneResult()
,
makeModelMultiplexer()
,
makeModelMultiplexerParamSet()
,
makeTuneControlCMAES()
,
makeTuneControlDesign()
,
makeTuneControlGenSA()
,
makeTuneControlGrid()
,
makeTuneControlIrace()
,
makeTuneControlRandom()
,
makeTuneWrapper()
,
tuneParams()
,
tuneThreshold()
Create control object for hyperparameter tuning with random search.
Description
Random search. All kinds of parameter types can be handled.
Usage
makeTuneControlRandom(
same.resampling.instance = TRUE,
maxit = NULL,
tune.threshold = FALSE,
tune.threshold.args = list(),
log.fun = "default",
final.dw.perc = NULL,
budget = NULL
)
Arguments
same.resampling.instance |
( |
maxit |
( |
tune.threshold |
( |
tune.threshold.args |
(list) |
log.fun |
( |
final.dw.perc |
( |
budget |
( |
Value
See Also
Other tune:
TuneControl
,
getNestedTuneResultsOptPathDf()
,
getNestedTuneResultsX()
,
getResamplingIndices()
,
getTuneResult()
,
makeModelMultiplexer()
,
makeModelMultiplexerParamSet()
,
makeTuneControlCMAES()
,
makeTuneControlDesign()
,
makeTuneControlGenSA()
,
makeTuneControlGrid()
,
makeTuneControlIrace()
,
makeTuneControlMBO()
,
makeTuneWrapper()
,
tuneParams()
,
tuneThreshold()
Fuse learner with tuning.
Description
Fuses a base learner with a search strategy to select its hyperparameters. Creates a learner object, which can be used like any other learner object, but which internally uses tuneParams. If the train function is called on it, the search strategy and resampling are invoked to select an optimal set of hyperparameter values. Finally, a model is fitted on the complete training data with these optimal hyperparameters and returned. See tuneParams for more details.
After training, the optimal hyperparameters (and other related information) can be retrieved with getTuneResult.
Usage
makeTuneWrapper(
learner,
resampling,
measures,
par.set,
control,
show.info = getMlrOption("show.info")
)
Arguments
learner |
(Learner | |
resampling |
(ResampleInstance | ResampleDesc) |
measures |
(list of Measure | Measure) |
par.set |
(ParamHelpers::ParamSet) |
control |
(TuneControl) |
show.info |
( |
Value
See Also
Other tune:
TuneControl
,
getNestedTuneResultsOptPathDf()
,
getNestedTuneResultsX()
,
getResamplingIndices()
,
getTuneResult()
,
makeModelMultiplexer()
,
makeModelMultiplexerParamSet()
,
makeTuneControlCMAES()
,
makeTuneControlDesign()
,
makeTuneControlGenSA()
,
makeTuneControlGrid()
,
makeTuneControlIrace()
,
makeTuneControlMBO()
,
makeTuneControlRandom()
,
tuneParams()
,
tuneThreshold()
Other wrapper:
makeBaggingWrapper()
,
makeClassificationViaRegressionWrapper()
,
makeConstantClassWrapper()
,
makeCostSensClassifWrapper()
,
makeCostSensRegrWrapper()
,
makeDownsampleWrapper()
,
makeDummyFeaturesWrapper()
,
makeExtractFDAFeatsWrapper()
,
makeFeatSelWrapper()
,
makeFilterWrapper()
,
makeImputeWrapper()
,
makeMulticlassWrapper()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
,
makeOverBaggingWrapper()
,
makePreprocWrapper()
,
makePreprocWrapperCaret()
,
makeRemoveConstantFeaturesWrapper()
,
makeSMOTEWrapper()
,
makeUndersampleWrapper()
,
makeWeightedClassesWrapper()
Examples
task = makeClassifTask(data = iris, target = "Species")
lrn = makeLearner("classif.rpart")
# stupid mini grid
ps = makeParamSet(
makeDiscreteParam("cp", values = c(0.05, 0.1)),
makeDiscreteParam("minsplit", values = c(10, 20))
)
ctrl = makeTuneControlGrid()
inner = makeResampleDesc("Holdout")
outer = makeResampleDesc("CV", iters = 2)
lrn = makeTuneWrapper(lrn, resampling = inner, par.set = ps, control = ctrl)
mod = train(lrn, task)
print(getTuneResult(mod))
# nested resampling for evaluation
# we also extract tuned hyper pars in each iteration
r = resample(lrn, task, outer, extract = getTuneResult)
print(r$extract)
getNestedTuneResultsOptPathDf(r)
getNestedTuneResultsX(r)
Fuse learner with simple ove/underrsampling for imbalancy correction in binary classification.
Description
Creates a learner object, which can be used like any other learner object. Internally uses oversample or undersample before every model fit.
Note that observation weights do not influence the sampling and are simply passed down to the next learner.
Usage
makeUndersampleWrapper(learner, usw.rate = 1, usw.cl = NULL)
makeOversampleWrapper(learner, osw.rate = 1, osw.cl = NULL)
Arguments
learner |
(Learner | |
usw.rate |
( |
usw.cl |
( |
osw.rate |
( |
osw.cl |
( |
Value
See Also
Other imbalancy:
makeOverBaggingWrapper()
,
oversample()
,
smote()
Other wrapper:
makeBaggingWrapper()
,
makeClassificationViaRegressionWrapper()
,
makeConstantClassWrapper()
,
makeCostSensClassifWrapper()
,
makeCostSensRegrWrapper()
,
makeDownsampleWrapper()
,
makeDummyFeaturesWrapper()
,
makeExtractFDAFeatsWrapper()
,
makeFeatSelWrapper()
,
makeFilterWrapper()
,
makeImputeWrapper()
,
makeMulticlassWrapper()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
,
makeOverBaggingWrapper()
,
makePreprocWrapper()
,
makePreprocWrapperCaret()
,
makeRemoveConstantFeaturesWrapper()
,
makeSMOTEWrapper()
,
makeTuneWrapper()
,
makeWeightedClassesWrapper()
Wraps a classifier for weighted fitting where each class receives a weight.
Description
Creates a wrapper, which can be used like any other learner object.
Fitting is performed in a weighted fashion where each observation receives a weight,
depending on the class it belongs to, see wcw.weight
.
This might help to mitigate problems caused by imbalanced class distributions.
This weighted fitting can be achieved in two ways:
a) The learner already has a parameter for class weighting, so one weight can directly be defined
per class. Example: “classif.ksvm” and parameter class.weights
.
In this case we don't really do anything fancy. We convert wcw.weight
a bit,
but basically simply bind its value to the class weighting param.
The wrapper in this case simply offers a convenient, consistent fashion for class weighting -
and tuning! See example below.
b) The learner does not have a direct parameter to support class weighting, but
supports observation weights, so hasLearnerProperties(learner, 'weights')
is TRUE
.
This means that an individual, arbitrary weight can be set per observation during training.
We set this weight depending on the class internally in the wrapper. Basically we introduce
something like a new “class.weights” parameter for the learner via observation weights.
Usage
makeWeightedClassesWrapper(learner, wcw.param = NULL, wcw.weight = 1)
Arguments
learner |
(Learner | |
wcw.param |
( |
wcw.weight |
(numeric) |
Value
See Also
Other wrapper:
makeBaggingWrapper()
,
makeClassificationViaRegressionWrapper()
,
makeConstantClassWrapper()
,
makeCostSensClassifWrapper()
,
makeCostSensRegrWrapper()
,
makeDownsampleWrapper()
,
makeDummyFeaturesWrapper()
,
makeExtractFDAFeatsWrapper()
,
makeFeatSelWrapper()
,
makeFilterWrapper()
,
makeImputeWrapper()
,
makeMulticlassWrapper()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
,
makeOverBaggingWrapper()
,
makePreprocWrapper()
,
makePreprocWrapperCaret()
,
makeRemoveConstantFeaturesWrapper()
,
makeSMOTEWrapper()
,
makeTuneWrapper()
,
makeUndersampleWrapper()
Examples
set.seed(123)
# using the direct parameter of the SVM (which is already defined in the learner)
lrn = makeWeightedClassesWrapper("classif.ksvm", wcw.weight = 0.01)
res = holdout(lrn, sonar.task)
print(calculateConfusionMatrix(res$pred))
# using the observation weights of logreg
lrn = makeWeightedClassesWrapper("classif.logreg", wcw.weight = 0.01)
res = holdout(lrn, sonar.task)
print(calculateConfusionMatrix(res$pred))
# tuning the imbalancy param and the SVM param in one go
lrn = makeWeightedClassesWrapper("classif.ksvm", wcw.param = "class.weights")
ps = makeParamSet(
makeNumericParam("wcw.weight", lower = 1, upper = 10),
makeNumericParam("C", lower = -12, upper = 12, trafo = function(x) 2^x),
makeNumericParam("sigma", lower = -12, upper = 12, trafo = function(x) 2^x)
)
ctrl = makeTuneControlRandom(maxit = 3L)
rdesc = makeResampleDesc("CV", iters = 2L, stratify = TRUE)
res = tuneParams(lrn, sonar.task, rdesc, par.set = ps, control = ctrl)
print(res)
# print(res$opt.path)
Induced model of learner.
Description
Result from train.
It internally stores the underlying fitted model, the subset used for training, features used for training, levels of factors in the data set and computation time that was spent for training.
Object members: See arguments.
The constructor makeWrappedModel
is mainly for internal use.
Usage
makeWrappedModel(
learner,
learner.model,
task.desc,
subset,
features,
factor.levels,
time
)
Arguments
learner |
(Learner | |
learner.model |
(any) |
task.desc |
TaskDesc |
subset |
(integer | logical | |
features |
(character) |
factor.levels |
(named list of character) |
time |
( |
Value
Performance measures.
Description
A performance measure is evaluated after a single train/predict step and returns a single number to assess the quality of the prediction (or maybe only the model, think AIC). The measure itself knows whether it wants to be minimized or maximized and for what tasks it is applicable.
All supported measures can be found by listMeasures or as a table in the tutorial appendix: https://mlr.mlr-org.com/articles/tutorial/measures.html.
If you want a measure for a misclassification cost matrix, look at makeCostMeasure. If you want to implement your own measure, look at makeMeasure.
Most measures can directly be accessed via the function named after the scheme measureX (e.g. measureSSE).
For clustering measures, we compact the predicted cluster IDs such that they form a continuous series starting with 1. If this is not the case, some of the measures will generate warnings.
Some measure have parameters. Their defaults are set in the constructor makeMeasure and can be overwritten using setMeasurePars.
Usage
measureSSE(truth, response)
measureMSE(truth, response)
measureRMSE(truth, response)
measureMEDSE(truth, response)
measureSAE(truth, response)
measureMAE(truth, response)
measureMEDAE(truth, response)
measureRSQ(truth, response)
measureEXPVAR(truth, response)
measureRRSE(truth, response)
measureRAE(truth, response)
measureMAPE(truth, response)
measureMSLE(truth, response)
measureRMSLE(truth, response)
measureKendallTau(truth, response)
measureSpearmanRho(truth, response)
measureMMCE(truth, response)
measureACC(truth, response)
measureBER(truth, response)
measureAUNU(probabilities, truth)
measureAUNP(probabilities, truth)
measureAU1U(probabilities, truth)
measureAU1P(probabilities, truth)
measureMulticlassBrier(probabilities, truth)
measureLogloss(probabilities, truth)
measureSSR(probabilities, truth)
measureQSR(probabilities, truth)
measureLSR(probabilities, truth)
measureKAPPA(truth, response)
measureWKAPPA(truth, response)
measureAUC(probabilities, truth, negative, positive)
measureBrier(probabilities, truth, negative, positive)
measureBrierScaled(probabilities, truth, negative, positive)
measureBAC(truth, response)
measureTP(truth, response, positive)
measureTN(truth, response, negative)
measureFP(truth, response, positive)
measureFN(truth, response, negative)
measureTPR(truth, response, positive)
measureTNR(truth, response, negative)
measureFPR(truth, response, negative, positive)
measureFNR(truth, response, negative, positive)
measurePPV(truth, response, positive, probabilities = NULL)
measureNPV(truth, response, negative)
measureFDR(truth, response, positive)
measureMCC(truth, response, negative, positive)
measureF1(truth, response, positive)
measureGMEAN(truth, response, negative, positive)
measureGPR(truth, response, positive)
measureMultilabelHamloss(truth, response)
measureMultilabelSubset01(truth, response)
measureMultilabelF1(truth, response)
measureMultilabelACC(truth, response)
measureMultilabelPPV(truth, response)
measureMultilabelTPR(truth, response)
Arguments
truth |
(factor) |
response |
(factor) |
probabilities |
(numeric | matrix) |
negative |
( |
positive |
( |
References
He, H. & Garcia, E. A. (2009) Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 9. pp. 1263-1284.
H. Uno et al. On the C-statistics for Evaluating Overall Adequacy of Risk Prediction Procedures with Censored Survival Data Statistics in medicine. 2011;30(10):1105-1117. doi:10.1002/sim.4154.
H. Uno et al. Evaluating Prediction Rules for T-Year Survivors with Censored Regression Models Journal of the American Statistical Association 102, no. 478 (2007): 527-37.
See Also
Other performance:
ConfusionMatrix
,
calculateConfusionMatrix()
,
calculateROCMeasures()
,
estimateRelativeOverfitting()
,
makeCostMeasure()
,
makeCustomResampledMeasure()
,
makeMeasure()
,
performance()
,
setAggregation()
,
setMeasurePars()
Merge different BenchmarkResult objects.
Description
The function automatically combines a list of BenchmarkResult objects into a single BenchmarkResult object as long as the full crossproduct of all task-learner combinations are available.
Usage
mergeBenchmarkResults(bmrs)
Arguments
bmrs |
(list of BenchmarkResult) |
Details
Note that if you want to merge several BenchmarkResult objects, you must ensure that all possible learner and task combinations will be contained in the returned object. Otherwise, the user will be notified which task-learner combinations are missing or duplicated.
When merging BenchmarkResult objects with different measures, all missing measures will automatically be recomputed.
Value
Merges small levels of factors into new level.
Description
Merges factor levels that occur only infrequently into combined levels with a higher frequency.
Usage
mergeSmallFactorLevels(
task,
cols = NULL,
min.perc = 0.01,
new.level = ".merged"
)
Arguments
task |
(Task) |
cols |
(character) Which columns to convert. Default is all factor and character columns. |
min.perc |
( |
new.level |
( |
Value
Task
, where merged levels are combined into a new level of name new.level
.
See Also
Other eda_and_preprocess:
capLargeValues()
,
createDummyFeatures()
,
dropFeatures()
,
normalizeFeatures()
,
removeConstantFeatures()
,
summarizeColumns()
,
summarizeLevels()
mlr documentation families
Description
List of all mlr documentation families with members.
Arguments
benchmark |
batchmark, reduceBatchmarkResults, benchmark, benchmarkParallel, getBMRTaskIds, getBMRLearners, getBMRLearnerIds, getBMRLearnerShortNames, getBMRMeasures, getBMRMeasureIds, getBMRPredictions, getBMRPerformances, getBMRAggrPerformances, getBMRTuneResults, getBMRFeatSelResults, getBMRFilteredFeatures, getBMRModels, getBMRTaskDescs, convertBMRToRankMatrix, friedmanPostHocTestBMR, friedmanTestBMR, plotBMRBoxplots, plotBMRRanksAsBarChart, generateCritDifferencesData, plotCritDifferences |
calibration |
generateCalibrationData, plotCalibration |
configure |
configureMlr, getMlrOptions |
costsens |
makeCostSensTask, makeCostSensWeightedPairsWrapper |
debug |
predictFailureModel, getPredictionDump, getRRDump, print.ResampleResult |
downsample |
downsample |
eda_and_preprocess |
capLargeValues, createDummyFeatures, dropFeatures, mergeSmallFactorLevels, normalizeFeatures, removeConstantFeatures, summarizeColumns, summarizeLevels |
extractFDAFeatures |
reextractFDAFeatures |
fda_featextractor |
extractFDAFourier, extractFDAWavelets, extractFDAFPCA, extractFDAMultiResFeatures |
fda |
makeExtractFDAFeatMethod, extractFDAFeatures |
featsel |
analyzeFeatSelResult, makeFeatSelControl, getFeatSelResult, selectFeatures |
filter |
filterFeatures, makeFilter, listFilterMethods, getFilteredFeatures, generateFilterValuesData, getFilterValues |
generate_plot_data |
generateFeatureImportanceData, plotFilterValues, generatePartialDependenceData |
help |
helpLearner, helpLearnerParam |
imbalancy |
oversample, smote |
impute |
makeImputeMethod, imputeConstant, impute, reimpute |
learner |
getClassWeightParam, getHyperPars, getParamSet.Learner, getLearnerType, getLearnerId, getLearnerPredictType, getLearnerPackages, getLearnerParamSet, getLearnerParVals, setLearnerId, getLearnerShortName, getLearnerProperties, makeLearner, makeLearners, removeHyperPars, setHyperPars, setId, setPredictThreshold, setPredictType |
learning_curve |
generateLearningCurveData |
multilabel |
getMultilabelBinaryPerformances, makeMultilabelBinaryRelevanceWrapper, makeMultilabelClassifierChainsWrapper, makeMultilabelDBRWrapper, makeMultilabelNestedStackingWrapper, makeMultilabelStackingWrapper |
performance |
calculateConfusionMatrix, calculateROCMeasures, makeCustomResampledMeasure, makeCostMeasure, setMeasurePars, setAggregation, makeMeasure, featperc, performance, estimateRelativeOverfitting |
plot |
createSpatialResamplingPlots, plotLearningCurve, plotPartialDependence, plotBMRSummary, plotResiduals |
predict |
asROCRPrediction, getPredictionProbabilities, getPredictionTaskDesc, getPredictionResponse, predict.WrappedModel |
resample |
makeResampleDesc, makeResampleInstance, makeResamplePrediction, resample, getRRPredictions, getRRTaskDescription, getRRTaskDesc, getRRPredictionList, addRRMeasure |
task |
getTaskDesc, getTaskType, getTaskId, getTaskTargetNames, getTaskClassLevels, getTaskFeatureNames, getTaskNFeats, getTaskSize, getTaskFormula, getTaskTargets, getTaskData, getTaskCosts, subsetTask |
thresh_vs_perf |
generateThreshVsPerfData, plotThreshVsPerf, plotROCCurves |
tune |
getNestedTuneResultsX, getNestedTuneResultsOptPathDf, getResamplingIndices, getTuneResult, makeModelMultiplexerParamSet, makeModelMultiplexer, makeTuneControlCMAES, makeTuneControlDesign, makeTuneControlGenSA, makeTuneControlGrid, makeTuneControlIrace, makeTuneControlMBO, makeTuneControl, makeTuneControlRandom, tuneParams, tuneThreshold |
tune_multicrit |
plotTuneMultiCritResult, makeTuneMultiCritControl, tuneParamsMultiCrit |
wrapper |
makeBaggingWrapper, makeClassificationViaRegressionWrapper, makeConstantClassWrapper, makeCostSensClassifWrapper, makeCostSensRegrWrapper, makeDownsampleWrapper, makeDummyFeaturesWrapper, makeExtractFDAFeatsWrapper, makeFeatSelWrapper, makeFilterWrapper, makeImputeWrapper, makeMulticlassWrapper, makeOverBaggingWrapper, makeUndersampleWrapper, makePreprocWrapperCaret, makePreprocWrapper, makeRemoveConstantFeaturesWrapper, makeSMOTEWrapper, makeTuneWrapper, makeWeightedClassesWrapper |
Motor Trend Car Road Tests clustering task.
Description
Contains the task (mtcars.task
).
References
See datasets::mtcars.
Normalize features.
Description
Normalize features by different methods. Internally BBmisc::normalize is used for every feature column. Non numerical features will be left untouched and passed to the result. For constant features most methods fail, special behaviour for this case is implemented.
Usage
normalizeFeatures(
obj,
target = character(0L),
method = "standardize",
cols = NULL,
range = c(0, 1),
on.constant = "quiet"
)
Arguments
obj |
(data.frame | Task) |
target |
( |
method |
( |
cols |
(character) |
range |
( |
on.constant |
( |
Value
data.frame | Task. Same type as obj
.
See Also
Other eda_and_preprocess:
capLargeValues()
,
createDummyFeatures()
,
dropFeatures()
,
mergeSmallFactorLevels()
,
removeConstantFeatures()
,
summarizeColumns()
,
summarizeLevels()
Over- or undersample binary classification task to handle class imbalancy.
Description
Oversampling: For a given class (usually the smaller one) all existing observations are taken and copied and extra observations are added by randomly sampling with replacement from this class.
Undersampling: For a given class (usually the larger one) the number of observations is reduced (downsampled) by randomly sampling without replacement from this class.
Usage
oversample(task, rate, cl = NULL)
undersample(task, rate, cl = NULL)
Arguments
task |
(Task) |
rate |
( |
cl |
( |
Value
Task.
See Also
Other imbalancy:
makeOverBaggingWrapper()
,
makeUndersampleWrapper()
,
smote()
Supported parallelization methods
Description
mlr supports different methods to activate parallel computing capabilities through the integration of the parallelMap::parallelMap package, which supports all major parallelization backends for R.
You can start parallelization with parallelStart*
, where *
should be replaced with the chosen backend.
parallelMap::parallelStop is used to stop all parallelization backends.
Parallelization is divided into different levels and will automatically be carried out for the first level that occurs, e.g. if you call resample()
after parallelMap::parallelStart, each resampling iteration is a parallel job and possible underlying calls like parameter tuning won't be parallelized further.
The supported levels of parallelization are:
"mlr.resample"
Each resampling iteration (a train/test step) is a parallel job.
"mlr.benchmark"
Each experiment "run this learner on this data set" is a parallel job.
"mlr.tuneParams"
Each evaluation in hyperparameter space "resample with these parameter settings" is a parallel job. How many of these can be run independently in parallel depends on the tuning algorithm. For grid search or random search there is no limit, but for other tuners it depends on how many points to evaluate are produced in each iteration of the optimization. If a tuner works in a purely sequential fashion, we cannot work magic and the hyperparameter evaluation will also run sequentially. But note that you can still parallelize the underlying resampling.
"mlr.selectFeatures"
Each evaluation in feature space "resample with this feature subset" is a parallel job. The same comments as for
"mlr.tuneParams"
apply here."mlr.ensemble"
For all ensemble methods, the training and prediction of each individual learner is a parallel job. Supported ensemble methods are the makeBaggingWrapper, makeCostSensRegrWrapper, makeMulticlassWrapper, makeMultilabelBinaryRelevanceWrapper and the makeOverBaggingWrapper.
Measure performance of prediction.
Description
Measures the quality of a prediction w.r.t. some performance measure.
Usage
performance(
pred,
measures,
task = NULL,
model = NULL,
feats = NULL,
simpleaggr = FALSE
)
Arguments
pred |
(Prediction) |
measures |
(Measure | list of Measure) |
task |
(Task) |
model |
(WrappedModel) |
feats |
(data.frame) |
simpleaggr |
(logical) |
Value
(named numeric). Performance value(s), named by measure(s).
See Also
Other performance:
ConfusionMatrix
,
calculateConfusionMatrix()
,
calculateROCMeasures()
,
estimateRelativeOverfitting()
,
makeCostMeasure()
,
makeCustomResampledMeasure()
,
makeMeasure()
,
measures
,
setAggregation()
,
setMeasurePars()
Examples
training.set = seq(1, nrow(iris), by = 2)
test.set = seq(2, nrow(iris), by = 2)
task = makeClassifTask(data = iris, target = "Species")
lrn = makeLearner("classif.lda")
mod = train(lrn, task, subset = training.set)
pred = predict(mod, newdata = iris[test.set, ])
performance(pred, measures = mmce)
# Compute multiple performance measures at once
ms = list("mmce" = mmce, "acc" = acc, "timetrain" = timetrain)
performance(pred, measures = ms, task, mod)
Phoneme functional data multilabel classification task.
Description
Contains the task (phoneme.task
).
The task contains a single functional covariate and 5 equally big classes (aa, ao, dcl, iy, sh).
The aim is to predict the class of the phoneme in the functional.
The dataset is contained in the package fda.usc.
References
F. Ferraty and P. Vieu (2003) "Curve discrimination: a nonparametric functional approach", Computational Statistics and Data Analysis, 44(1-2), 161-173. F. Ferraty and P. Vieu (2006) Nonparametric functional data analysis, New York: Springer. T. Hastie and R. Tibshirani and J. Friedman (2009) The elements of statistical learning: Data mining, inference and prediction, 2nd edn, New York: Springer.
PimaIndiansDiabetes classification task.
Description
Contains the task (pid.task
).
References
See mlbench::PimaIndiansDiabetes. Note that this is the uncorrected version from mlbench.
Create box or violin plots for a BenchmarkResult.
Description
Plots box or violin plots for a selected measure
across all iterations
of the resampling strategy, faceted by the task.id
.
Usage
plotBMRBoxplots(
bmr,
measure = NULL,
style = "box",
order.lrns = NULL,
order.tsks = NULL,
pretty.names = TRUE,
facet.wrap.nrow = NULL,
facet.wrap.ncol = NULL
)
Arguments
bmr |
(BenchmarkResult) |
measure |
(Measure) |
style |
( |
order.lrns |
( |
order.tsks |
( |
pretty.names |
( |
facet.wrap.nrow , facet.wrap.ncol |
(integer) |
Value
ggplot2 plot object.
See Also
Other plot:
createSpatialResamplingPlots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCalibration()
,
plotCritDifferences()
,
plotLearningCurve()
,
plotPartialDependence()
,
plotROCCurves()
,
plotResiduals()
,
plotThreshVsPerf()
Other benchmark:
BenchmarkResult
,
batchmark()
,
benchmark()
,
convertBMRToRankMatrix()
,
friedmanPostHocTestBMR()
,
friedmanTestBMR()
,
generateCritDifferencesData()
,
getBMRAggrPerformances()
,
getBMRFeatSelResults()
,
getBMRFilteredFeatures()
,
getBMRLearnerIds()
,
getBMRLearnerShortNames()
,
getBMRLearners()
,
getBMRMeasureIds()
,
getBMRMeasures()
,
getBMRModels()
,
getBMRPerformances()
,
getBMRPredictions()
,
getBMRTaskDescs()
,
getBMRTaskIds()
,
getBMRTuneResults()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCritDifferences()
,
reduceBatchmarkResults()
Examples
# see benchmark
Create a bar chart for ranks in a BenchmarkResult.
Description
Plots a bar chart from the ranks of algorithms. Alternatively,
tiles can be plotted for every rank-task combination, see pos
for details. In all plot variants the ranks of the learning algorithms are displayed on
the x-axis. Areas are always colored according to the learner.id
.
Usage
plotBMRRanksAsBarChart(
bmr,
measure = NULL,
ties.method = "average",
aggregation = "default",
pos = "stack",
order.lrns = NULL,
order.tsks = NULL,
pretty.names = TRUE
)
Arguments
bmr |
(BenchmarkResult) |
measure |
(Measure) |
ties.method |
( |
aggregation |
( |
pos |
( |
order.lrns |
( |
order.tsks |
( |
pretty.names |
( |
Value
ggplot2 plot object.
See Also
Other plot:
createSpatialResamplingPlots()
,
plotBMRBoxplots()
,
plotBMRSummary()
,
plotCalibration()
,
plotCritDifferences()
,
plotLearningCurve()
,
plotPartialDependence()
,
plotROCCurves()
,
plotResiduals()
,
plotThreshVsPerf()
Other benchmark:
BenchmarkResult
,
batchmark()
,
benchmark()
,
convertBMRToRankMatrix()
,
friedmanPostHocTestBMR()
,
friedmanTestBMR()
,
generateCritDifferencesData()
,
getBMRAggrPerformances()
,
getBMRFeatSelResults()
,
getBMRFilteredFeatures()
,
getBMRLearnerIds()
,
getBMRLearnerShortNames()
,
getBMRLearners()
,
getBMRMeasureIds()
,
getBMRMeasures()
,
getBMRModels()
,
getBMRPerformances()
,
getBMRPredictions()
,
getBMRTaskDescs()
,
getBMRTaskIds()
,
getBMRTuneResults()
,
plotBMRBoxplots()
,
plotBMRSummary()
,
plotCritDifferences()
,
reduceBatchmarkResults()
Examples
# see benchmark
Plot a benchmark summary.
Description
Creates a scatter plot, where each line refers to a task. On that line the aggregated scores for all learners are plotted, for that task. Optionally, you can apply a rank transformation or just use one of ggplot2's transformations like ggplot2::scale_x_log10.
Usage
plotBMRSummary(
bmr,
measure = NULL,
trafo = "none",
order.tsks = NULL,
pointsize = 4L,
jitter = 0.05,
pretty.names = TRUE
)
Arguments
bmr |
(BenchmarkResult) |
measure |
(Measure) |
trafo |
( |
order.tsks |
( |
pointsize |
( |
jitter |
( |
pretty.names |
( |
Value
ggplot2 plot object.
See Also
Other benchmark:
BenchmarkResult
,
batchmark()
,
benchmark()
,
convertBMRToRankMatrix()
,
friedmanPostHocTestBMR()
,
friedmanTestBMR()
,
generateCritDifferencesData()
,
getBMRAggrPerformances()
,
getBMRFeatSelResults()
,
getBMRFilteredFeatures()
,
getBMRLearnerIds()
,
getBMRLearnerShortNames()
,
getBMRLearners()
,
getBMRMeasureIds()
,
getBMRMeasures()
,
getBMRModels()
,
getBMRPerformances()
,
getBMRPredictions()
,
getBMRTaskDescs()
,
getBMRTaskIds()
,
getBMRTuneResults()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotCritDifferences()
,
reduceBatchmarkResults()
Other plot:
createSpatialResamplingPlots()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotCalibration()
,
plotCritDifferences()
,
plotLearningCurve()
,
plotPartialDependence()
,
plotROCCurves()
,
plotResiduals()
,
plotThreshVsPerf()
Examples
# see benchmark
Plot calibration data using ggplot2.
Description
Plots calibration data from generateCalibrationData.
Usage
plotCalibration(
obj,
smooth = FALSE,
reference = TRUE,
rag = TRUE,
facet.wrap.nrow = NULL,
facet.wrap.ncol = NULL
)
Arguments
obj |
(CalibrationData) |
smooth |
( |
reference |
( |
rag |
( |
facet.wrap.nrow , facet.wrap.ncol |
(integer) |
Value
ggplot2 plot object.
See Also
Other plot:
createSpatialResamplingPlots()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCritDifferences()
,
plotLearningCurve()
,
plotPartialDependence()
,
plotROCCurves()
,
plotResiduals()
,
plotThreshVsPerf()
Other calibration:
generateCalibrationData()
Examples
## Not run:
lrns = list(makeLearner("classif.rpart", predict.type = "prob"),
makeLearner("classif.nnet", predict.type = "prob"))
fit = lapply(lrns, train, task = iris.task)
pred = lapply(fit, predict, task = iris.task)
names(pred) = c("rpart", "nnet")
out = generateCalibrationData(pred, groups = 3)
plotCalibration(out)
fit = lapply(lrns, train, task = sonar.task)
pred = lapply(fit, predict, task = sonar.task)
names(pred) = c("rpart", "lda")
out = generateCalibrationData(pred)
plotCalibration(out)
## End(Not run)
Plot critical differences for a selected measure.
Description
Plots a critical-differences diagram for all classifiers and a selected measure. If a baseline is selected for the Bonferroni-Dunn test, the critical difference interval will be positioned around the baseline. If not, the best performing algorithm will be chosen as baseline.
The positioning of some descriptive elements can be moved by modifying the generated data.
Usage
plotCritDifferences(obj, baseline = NULL, pretty.names = TRUE)
Arguments
obj |
( |
baseline |
( |
pretty.names |
( |
Value
ggplot2 plot object.
References
Janez Demsar, Statistical Comparisons of Classifiers over Multiple Data Sets, JMLR, 2006
See Also
Other plot:
createSpatialResamplingPlots()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCalibration()
,
plotLearningCurve()
,
plotPartialDependence()
,
plotROCCurves()
,
plotResiduals()
,
plotThreshVsPerf()
Other benchmark:
BenchmarkResult
,
batchmark()
,
benchmark()
,
convertBMRToRankMatrix()
,
friedmanPostHocTestBMR()
,
friedmanTestBMR()
,
generateCritDifferencesData()
,
getBMRAggrPerformances()
,
getBMRFeatSelResults()
,
getBMRFilteredFeatures()
,
getBMRLearnerIds()
,
getBMRLearnerShortNames()
,
getBMRLearners()
,
getBMRMeasureIds()
,
getBMRMeasures()
,
getBMRModels()
,
getBMRPerformances()
,
getBMRPredictions()
,
getBMRTaskDescs()
,
getBMRTaskIds()
,
getBMRTuneResults()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
reduceBatchmarkResults()
Examples
# see benchmark
Plot filter values using ggplot2.
Description
Plot filter values using ggplot2.
Usage
plotFilterValues(
fvalues,
sort = "dec",
n.show = nrow(fvalues$data),
filter = NULL,
feat.type.cols = FALSE
)
Arguments
fvalues |
(FilterValues) |
sort |
(
Default is decreasing. |
n.show |
( |
filter |
( |
feat.type.cols |
( |
Value
ggplot2 plot object.
See Also
Other filter:
filterFeatures()
,
generateFilterValuesData()
,
getFilteredFeatures()
,
listFilterEnsembleMethods()
,
listFilterMethods()
,
makeFilter()
,
makeFilterEnsemble()
,
makeFilterWrapper()
Other generate_plot_data:
generateCalibrationData()
,
generateCritDifferencesData()
,
generateFeatureImportanceData()
,
generateFilterValuesData()
,
generateLearningCurveData()
,
generatePartialDependenceData()
,
generateThreshVsPerfData()
Examples
fv = generateFilterValuesData(iris.task, method = "variance")
plotFilterValues(fv)
Plot the hyperparameter effects data
Description
Plot hyperparameter validation path. Automated plotting method for
HyperParsEffectData
object. Useful for determining the importance
or effect of a particular hyperparameter on some performance measure and/or
optimizer.
Usage
plotHyperParsEffect(
hyperpars.effect.data,
x = NULL,
y = NULL,
z = NULL,
plot.type = "scatter",
loess.smooth = FALSE,
facet = NULL,
global.only = TRUE,
interpolate = NULL,
show.experiments = FALSE,
show.interpolated = FALSE,
nested.agg = mean,
partial.dep.learn = NULL
)
Arguments
hyperpars.effect.data |
( |
x |
( |
y |
( |
z |
( |
plot.type |
( |
loess.smooth |
( |
facet |
( |
global.only |
( |
interpolate |
(Learner | |
show.experiments |
( |
show.interpolated |
( |
nested.agg |
( |
partial.dep.learn |
(Learner | |
Value
ggplot2 plot object.
Note
Any NAs incurred from learning algorithm crashes will be indicated in
the plot (except in the case of partial dependence) and the NA values will be
replaced with the column min/max depending on the optimal values for the
respective measure. Execution time will be replaced with the max.
Interpolation by its nature will result in predicted values for the
performance measure. Use interpolation with caution. If “partial.dep”
is set to TRUE
in generateHyperParsEffectData, only
partial dependence will be plotted.
Since a ggplot2 plot object is returned, the user can change the axis labels and other aspects of the plot using the appropriate ggplot2 syntax.
Examples
# see generateHyperParsEffectData
Visualizes a learning algorithm on a 1D or 2D data set.
Description
Trains the model for 1 or 2 selected features, then displays it via ggplot2::ggplot. Good for teaching or exploring models.
For classification and clustering, only 2D plots are supported. The data points, the classification and potentially through color alpha blending the posterior probabilities are shown.
For regression, 1D and 2D plots are supported. 1D shows the data, the estimated mean and potentially the estimated standard error. 2D does not show estimated standard error, but only the estimated mean via background color.
The plot title displays the model id, its parameters, the training performance and the cross-validation performance.
Usage
plotLearnerPrediction(
learner,
task,
features = NULL,
measures,
cv = 10L,
...,
gridsize,
pointsize = 2,
prob.alpha = TRUE,
se.band = TRUE,
err.mark = "train",
bg.cols = c("darkblue", "green", "darkred"),
err.col = "white",
err.size = pointsize,
greyscale = FALSE,
pretty.names = TRUE
)
Arguments
learner |
(Learner | |
task |
(Task) |
features |
(character) |
measures |
(Measure | list of Measure) |
cv |
( |
... |
(any) |
gridsize |
( |
pointsize |
( |
prob.alpha |
( |
se.band |
( |
err.mark |
( |
bg.cols |
( |
err.col |
( |
err.size |
( |
greyscale |
( |
pretty.names |
( |
Value
The ggplot2 object.
Plot learning curve data using ggplot2.
Description
Visualizes data size (percentage used for model) vs. performance measure(s).
Usage
plotLearningCurve(
obj,
facet = "measure",
pretty.names = TRUE,
facet.wrap.nrow = NULL,
facet.wrap.ncol = NULL
)
Arguments
obj |
(LearningCurveData) |
facet |
( |
pretty.names |
( |
facet.wrap.nrow , facet.wrap.ncol |
(integer) |
Value
ggplot2 plot object.
See Also
Other learning_curve:
generateLearningCurveData()
Other plot:
createSpatialResamplingPlots()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCalibration()
,
plotCritDifferences()
,
plotPartialDependence()
,
plotROCCurves()
,
plotResiduals()
,
plotThreshVsPerf()
Plot a partial dependence with ggplot2.
Description
Plot a partial dependence from generatePartialDependenceData using ggplot2.
Usage
plotPartialDependence(
obj,
geom = "line",
facet = NULL,
facet.wrap.nrow = NULL,
facet.wrap.ncol = NULL,
p = 1,
data = NULL
)
Arguments
obj |
PartialDependenceData |
geom |
( |
facet |
( |
facet.wrap.nrow , facet.wrap.ncol |
(integer) |
p |
( |
data |
(data.frame) |
Value
ggplot2 plot object.
See Also
Other partial_dependence:
generatePartialDependenceData()
Other plot:
createSpatialResamplingPlots()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCalibration()
,
plotCritDifferences()
,
plotLearningCurve()
,
plotROCCurves()
,
plotResiduals()
,
plotThreshVsPerf()
Plots a ROC curve using ggplot2.
Description
Plots a ROC curve from predictions.
Usage
plotROCCurves(
obj,
measures,
diagonal = TRUE,
pretty.names = TRUE,
facet.learner = FALSE
)
Arguments
obj |
(ThreshVsPerfData) |
measures |
([list(2)' of Measure) |
diagonal |
( |
pretty.names |
( |
facet.learner |
( |
Value
ggplot2 plot object.
See Also
Other plot:
createSpatialResamplingPlots()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCalibration()
,
plotCritDifferences()
,
plotLearningCurve()
,
plotPartialDependence()
,
plotResiduals()
,
plotThreshVsPerf()
Other thresh_vs_perf:
generateThreshVsPerfData()
,
plotThreshVsPerf()
Examples
lrn = makeLearner("classif.rpart", predict.type = "prob")
fit = train(lrn, sonar.task)
pred = predict(fit, task = sonar.task)
roc = generateThreshVsPerfData(pred, list(fpr, tpr))
plotROCCurves(roc)
r = bootstrapB632plus(lrn, sonar.task, iters = 3)
roc_r = generateThreshVsPerfData(r, list(fpr, tpr), aggregate = FALSE)
plotROCCurves(roc_r)
r2 = crossval(lrn, sonar.task, iters = 3)
roc_l = generateThreshVsPerfData(list(boot = r, cv = r2), list(fpr, tpr), aggregate = FALSE)
plotROCCurves(roc_l)
Create residual plots for prediction objects or benchmark results.
Description
Plots for model diagnostics. Provides scatterplots of true vs. predicted values and histograms of the model's residuals.
Usage
plotResiduals(
obj,
type = "scatterplot",
loess.smooth = TRUE,
rug = TRUE,
pretty.names = TRUE
)
Arguments
obj |
(Prediction | BenchmarkResult) |
type |
Type of plot. Can be “scatterplot”, the default. Or “hist”, for a histogram, or in case of classification problems a barplot, displaying the residuals. |
loess.smooth |
( |
rug |
( |
pretty.names |
( |
Value
ggplot2 plot object.
See Also
Other plot:
createSpatialResamplingPlots()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCalibration()
,
plotCritDifferences()
,
plotLearningCurve()
,
plotPartialDependence()
,
plotROCCurves()
,
plotThreshVsPerf()
Plot threshold vs. performance(s) for 2-class classification using ggplot2.
Description
Plots threshold vs. performance(s) data that has been generated with generateThreshVsPerfData.
Usage
plotThreshVsPerf(
obj,
measures = obj$measures,
facet = "measure",
mark.th = NA_real_,
pretty.names = TRUE,
facet.wrap.nrow = NULL,
facet.wrap.ncol = NULL
)
Arguments
obj |
(ThreshVsPerfData) |
measures |
(Measure | list of Measure) |
facet |
( |
mark.th |
( |
pretty.names |
( |
facet.wrap.nrow , facet.wrap.ncol |
(integer) |
Value
ggplot2 plot object.
See Also
Other plot:
createSpatialResamplingPlots()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCalibration()
,
plotCritDifferences()
,
plotLearningCurve()
,
plotPartialDependence()
,
plotROCCurves()
,
plotResiduals()
Other thresh_vs_perf:
generateThreshVsPerfData()
,
plotROCCurves()
Examples
lrn = makeLearner("classif.rpart", predict.type = "prob")
mod = train(lrn, sonar.task)
pred = predict(mod, sonar.task)
pvs = generateThreshVsPerfData(pred, list(acc, setAggregation(acc, train.mean)))
plotThreshVsPerf(pvs)
Plots multi-criteria results after tuning using ggplot2.
Description
Visualizes the pareto front and possibly the dominated points.
Usage
plotTuneMultiCritResult(
res,
path = TRUE,
col = NULL,
shape = NULL,
pointsize = 2,
pretty.names = TRUE
)
Arguments
res |
(TuneMultiCritResult) |
path |
( |
col |
( |
shape |
( |
pointsize |
( |
pretty.names |
( |
Value
ggplot2 plot object.
See Also
Other tune_multicrit:
TuneMultiCritControl
,
tuneParamsMultiCrit()
Examples
# see tuneParamsMultiCrit
Predict new data.
Description
Predict the target variable of new data using a fitted model.
What is stored exactly in the (Prediction) object depends
on the predict.type
setting of the Learner.
If predict.type
was set to “prob” probability thresholding
can be done calling the setThreshold function on the
prediction object.
The row names of the input task
or newdata
are preserved in the output.
Usage
## S3 method for class 'WrappedModel'
predict(object, task, newdata, subset = NULL, ...)
Arguments
object |
(WrappedModel) |
task |
(Task) |
newdata |
(data.frame) |
subset |
(integer | logical | |
... |
(any) |
Value
(Prediction).
See Also
Other predict:
asROCRPrediction()
,
getPredictionProbabilities()
,
getPredictionResponse()
,
getPredictionTaskDesc()
,
setPredictThreshold()
,
setPredictType()
Examples
# train and predict
train.set = seq(1, 150, 2)
test.set = seq(2, 150, 2)
model = train("classif.lda", iris.task, subset = train.set)
p = predict(model, newdata = iris, subset = test.set)
print(p)
predict(model, task = iris.task, subset = test.set)
# predict now probabiliies instead of class labels
lrn = makeLearner("classif.lda", predict.type = "prob")
model = train(lrn, iris.task, subset = train.set)
p = predict(model, task = iris.task, subset = test.set)
print(p)
getPredictionProbabilities(p)
Predict new data with an R learner.
Description
Mainly for internal use. Predict new data with a fitted model. You have to implement this method if you want to add another learner to this package.
Usage
predictLearner(.learner, .model, .newdata, ...)
Arguments
.learner |
(RLearner) |
.model |
(WrappedModel) |
.newdata |
(data.frame) |
... |
(any) |
Details
Your implementation must adhere to the following:
Predictions for the observations in .newdata
must be made based on the fitted
model (.model$learner.model
).
All parameters in ...
must be passed to the underlying predict function.
Value
For classification: Either a factor with class labels for type “response” or, if the learner supports this, a matrix of class probabilities for type “prob”. In the latter case the columns must be named with the class labels.
For regression: Either a numeric vector for type “response” or, if the learner supports this, a matrix with two columns for type “se”. In the latter case the first column contains the estimated response (mean value) and the second column the estimated standard errors.
For survival: Either a numeric vector with some sort of orderable risk for type “response” or, if supported, a numeric vector with time dependent probabilities for type “prob”.
For clustering: Either an integer with cluster IDs for type “response” or, if supported, a matrix of membership probabilities for type “prob”.
For multilabel: A logical matrix that indicates predicted class labels for type “response” or, if supported, a matrix of class probabilities for type “prob”. The columns must be named with the class labels.
Reduce results of a batch-distributed benchmark.
Description
This creates a BenchmarkResult from a batchtools::ExperimentRegistry. To setup the benchmark have a look at batchmark.
Usage
reduceBatchmarkResults(
ids = NULL,
keep.pred = TRUE,
keep.extract = FALSE,
show.info = getMlrOption("show.info"),
reg = batchtools::getDefaultRegistry()
)
Arguments
ids |
(data.frame or integer) |
keep.pred |
( |
keep.extract |
( |
show.info |
( |
reg |
(batchtools::ExperimentRegistry) |
Value
See Also
Other benchmark:
BenchmarkResult
,
batchmark()
,
benchmark()
,
convertBMRToRankMatrix()
,
friedmanPostHocTestBMR()
,
friedmanTestBMR()
,
generateCritDifferencesData()
,
getBMRAggrPerformances()
,
getBMRFeatSelResults()
,
getBMRFilteredFeatures()
,
getBMRLearnerIds()
,
getBMRLearnerShortNames()
,
getBMRLearners()
,
getBMRMeasureIds()
,
getBMRMeasures()
,
getBMRModels()
,
getBMRPerformances()
,
getBMRPredictions()
,
getBMRTaskDescs()
,
getBMRTaskIds()
,
getBMRTuneResults()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCritDifferences()
Re-extract features from a data set
Description
This function accepts a data frame or a task and an extractFDAFeatDesc (a FDA feature extraction description) as returned by extractFDAFeatures to extract features from previously unseen data.
Usage
reextractFDAFeatures(obj, desc, ...)
Arguments
obj |
(Task | data.frame) |
desc |
( |
... |
(any) |
Value
data.frame or Task containing the extracted Features
Re-impute a data set
Description
This function accepts a data frame or a task and an imputation description as returned by impute to perform the following actions:
Restore dropped columns, setting them to
NA
Add dummy variables for columns as specified in
impute
Optionally check factors for new levels to treat them as
NA
sReorder factor levels to ensure identical integer representation as before
Impute missing values using previously collected data
Usage
reimpute(obj, desc)
Arguments
obj |
(data.frame | Task) |
desc |
( |
Value
Imputated data.frame
or task with imputed data.
See Also
Other impute:
imputations
,
impute()
,
makeImputeMethod()
,
makeImputeWrapper()
Remove constant features from a data set.
Description
Constant features can lead to errors in some models and obviously provide no information in the training set that can be learned from. With the argument “perc”, there is a possibility to also remove features for which less than “perc” percent of the observations differ from the mode value.
Usage
removeConstantFeatures(
obj,
perc = 0,
dont.rm = character(0L),
na.ignore = FALSE,
wrap.tol = .Machine$double.eps^0.5,
show.info = getMlrOption("show.info"),
...
)
Arguments
obj |
(data.frame | Task) |
perc |
( |
dont.rm |
(character) |
na.ignore |
( |
wrap.tol |
( |
show.info |
( |
... |
To ensure backward compatibility with old argument |
Value
data.frame | Task. Same type as obj
.
See Also
Other eda_and_preprocess:
capLargeValues()
,
createDummyFeatures()
,
dropFeatures()
,
mergeSmallFactorLevels()
,
normalizeFeatures()
,
summarizeColumns()
,
summarizeLevels()
Remove hyperparameters settings of a learner.
Description
Remove settings (previously set through mlr) for some parameters. Which means that the default behavior for that param will now be used.
Usage
removeHyperPars(learner, ids = character(0L))
Arguments
learner |
(Learner | |
ids |
(character) |
Value
See Also
Other learner:
LearnerProperties
,
getClassWeightParam()
,
getHyperPars()
,
getLearnerId()
,
getLearnerNote()
,
getLearnerPackages()
,
getLearnerParVals()
,
getLearnerParamSet()
,
getLearnerPredictType()
,
getLearnerShortName()
,
getLearnerType()
,
getParamSet()
,
helpLearner()
,
helpLearnerParam()
,
makeLearner()
,
makeLearners()
,
setHyperPars()
,
setId()
,
setLearnerId()
,
setPredictThreshold()
,
setPredictType()
Fit models according to a resampling strategy.
Description
The function resample
fits a model specified by Learner on a Task
and calculates predictions and performance measures for all training
and all test sets specified by a either a resampling description (ResampleDesc)
or resampling instance (ResampleInstance).
You are able to return all fitted models (parameter models
) or extract specific parts
of the models (parameter extract
) as returning all of them completely
might be memory intensive.
The remaining functions on this page are convenience wrappers for the various
existing resampling strategies. Note that if you need to work with precomputed training and
test splits (i.e., resampling instances), you have to stick with resample
.
Usage
resample(
learner,
task,
resampling,
measures,
weights = NULL,
models = FALSE,
extract,
keep.pred = TRUE,
...,
show.info = getMlrOption("show.info")
)
crossval(
learner,
task,
iters = 10L,
stratify = FALSE,
measures,
models = FALSE,
keep.pred = TRUE,
...,
show.info = getMlrOption("show.info")
)
repcv(
learner,
task,
folds = 10L,
reps = 10L,
stratify = FALSE,
measures,
models = FALSE,
keep.pred = TRUE,
...,
show.info = getMlrOption("show.info")
)
holdout(
learner,
task,
split = 2/3,
stratify = FALSE,
measures,
models = FALSE,
keep.pred = TRUE,
...,
show.info = getMlrOption("show.info")
)
subsample(
learner,
task,
iters = 30,
split = 2/3,
stratify = FALSE,
measures,
models = FALSE,
keep.pred = TRUE,
...,
show.info = getMlrOption("show.info")
)
bootstrapOOB(
learner,
task,
iters = 30,
stratify = FALSE,
measures,
models = FALSE,
keep.pred = TRUE,
...,
show.info = getMlrOption("show.info")
)
bootstrapB632(
learner,
task,
iters = 30,
stratify = FALSE,
measures,
models = FALSE,
keep.pred = TRUE,
...,
show.info = getMlrOption("show.info")
)
bootstrapB632plus(
learner,
task,
iters = 30,
stratify = FALSE,
measures,
models = FALSE,
keep.pred = TRUE,
...,
show.info = getMlrOption("show.info")
)
growingcv(
learner,
task,
horizon = 1,
initial.window = 0.5,
skip = 0,
measures,
models = FALSE,
keep.pred = TRUE,
...,
show.info = getMlrOption("show.info")
)
fixedcv(
learner,
task,
horizon = 1L,
initial.window = 0.5,
skip = 0,
measures,
models = FALSE,
keep.pred = TRUE,
...,
show.info = getMlrOption("show.info")
)
Arguments
learner |
(Learner | |
task |
(Task) |
resampling |
(ResampleDesc or ResampleInstance) |
measures |
(Measure | list of Measure) |
weights |
(numeric) |
models |
( |
extract |
( |
keep.pred |
( |
... |
(any) |
show.info |
( |
iters |
( |
stratify |
( |
folds |
( |
reps |
( |
split |
( |
horizon |
( |
initial.window |
( |
skip |
( |
Value
Note
If you would like to include results from the training data set, make sure to appropriately adjust the resampling strategy and the aggregation for the measure. See example code below.
See Also
Other resample:
ResamplePrediction
,
ResampleResult
,
addRRMeasure()
,
getRRPredictionList()
,
getRRPredictions()
,
getRRTaskDesc()
,
getRRTaskDescription()
,
makeResampleDesc()
,
makeResampleInstance()
Examples
task = makeClassifTask(data = iris, target = "Species")
rdesc = makeResampleDesc("CV", iters = 2)
r = resample(makeLearner("classif.qda"), task, rdesc)
print(r$aggr)
print(r$measures.test)
print(r$pred)
# include the training set performance as well
rdesc = makeResampleDesc("CV", iters = 2, predict = "both")
r = resample(makeLearner("classif.qda"), task, rdesc,
measures = list(mmce, setAggregation(mmce, train.mean)))
print(r$aggr)
Feature selection by wrapper approach.
Description
Optimizes the features for a classification or regression problem by choosing a variable selection wrapper approach. Allows for different optimization methods, such as forward search or a genetic algorithm. You can select such an algorithm (and its settings) by passing a corresponding control object. For a complete list of implemented algorithms look at the subclasses of (FeatSelControl).
All algorithms operate on a 0-1-bit encoding of candidate solutions. Per
default a single bit corresponds to a single feature, but you are able to
change this by using the arguments bit.names
and bits.to.features
. Thus
allowing you to switch on whole groups of features with a single bit.
Usage
selectFeatures(
learner,
task,
resampling,
measures,
bit.names,
bits.to.features,
control,
show.info = getMlrOption("show.info")
)
Arguments
learner |
(Learner | |
task |
(Task) |
resampling |
(ResampleInstance | ResampleDesc) |
measures |
(list of Measure | Measure) |
bit.names |
character |
bits.to.features |
( |
control |
[see FeatSelControl) Control object for search method. Also selects the optimization algorithm for feature selection. |
show.info |
( |
Value
See Also
Other featsel:
FeatSelControl
,
analyzeFeatSelResult()
,
getFeatSelResult()
,
makeFeatSelWrapper()
Examples
rdesc = makeResampleDesc("Holdout")
ctrl = makeFeatSelControlSequential(method = "sfs", maxit = NA)
res = selectFeatures("classif.rpart", iris.task, rdesc, control = ctrl)
analyzeFeatSelResult(res)
Set aggregation function of measure.
Description
Set how this measure will be aggregated after resampling. To see possible aggregation functions: aggregations.
Usage
setAggregation(measure, aggr)
Arguments
measure |
(Measure) |
aggr |
(Aggregation) |
Value
(Measure) with changed aggregation behaviour.
See Also
Other performance:
ConfusionMatrix
,
calculateConfusionMatrix()
,
calculateROCMeasures()
,
estimateRelativeOverfitting()
,
makeCostMeasure()
,
makeCustomResampledMeasure()
,
makeMeasure()
,
measures
,
performance()
,
setMeasurePars()
Set the hyperparameters of a learner object.
Description
Set the hyperparameters of a learner object.
Usage
setHyperPars(learner, ..., par.vals = list())
Arguments
learner |
(Learner | |
... |
(any) |
par.vals |
(list) |
Value
Note
If a named (hyper)parameter can't be found for the given learner, the 3 closest (hyper)parameter names will be output in case the user mistyped.
See Also
Other learner:
LearnerProperties
,
getClassWeightParam()
,
getHyperPars()
,
getLearnerId()
,
getLearnerNote()
,
getLearnerPackages()
,
getLearnerParVals()
,
getLearnerParamSet()
,
getLearnerPredictType()
,
getLearnerShortName()
,
getLearnerType()
,
getParamSet()
,
helpLearner()
,
helpLearnerParam()
,
makeLearner()
,
makeLearners()
,
removeHyperPars()
,
setId()
,
setLearnerId()
,
setPredictThreshold()
,
setPredictType()
Examples
cl1 = makeLearner("classif.ksvm", sigma = 1)
cl2 = setHyperPars(cl1, sigma = 10, par.vals = list(C = 2))
print(cl1)
# note the now set and altered hyperparameters:
print(cl2)
Only exported for internal use.
Description
Only exported for internal use.
Usage
setHyperPars2(learner, par.vals)
Arguments
learner |
(Learner) |
par.vals |
(list) |
Set the id of a learner object.
Description
Deprecated, use setLearnerId instead.
Usage
setId(learner, id)
Arguments
learner |
(Learner | |
id |
( |
Value
See Also
Other learner:
LearnerProperties
,
getClassWeightParam()
,
getHyperPars()
,
getLearnerId()
,
getLearnerNote()
,
getLearnerPackages()
,
getLearnerParVals()
,
getLearnerParamSet()
,
getLearnerPredictType()
,
getLearnerShortName()
,
getLearnerType()
,
getParamSet()
,
helpLearner()
,
helpLearnerParam()
,
makeLearner()
,
makeLearners()
,
removeHyperPars()
,
setHyperPars()
,
setLearnerId()
,
setPredictThreshold()
,
setPredictType()
Set the ID of a learner object.
Description
Set the ID of the learner.
Usage
setLearnerId(learner, id)
Arguments
learner |
(Learner | |
id |
( |
Value
See Also
Other learner:
LearnerProperties
,
getClassWeightParam()
,
getHyperPars()
,
getLearnerId()
,
getLearnerNote()
,
getLearnerPackages()
,
getLearnerParVals()
,
getLearnerParamSet()
,
getLearnerPredictType()
,
getLearnerShortName()
,
getLearnerType()
,
getParamSet()
,
helpLearner()
,
helpLearnerParam()
,
makeLearner()
,
makeLearners()
,
removeHyperPars()
,
setHyperPars()
,
setId()
,
setPredictThreshold()
,
setPredictType()
Set parameters of performance measures
Description
Sets hyperparameters of measures.
Usage
setMeasurePars(measure, ..., par.vals = list())
Arguments
measure |
(Measure) |
... |
(any) |
par.vals |
(list) |
Value
See Also
Other performance:
ConfusionMatrix
,
calculateConfusionMatrix()
,
calculateROCMeasures()
,
estimateRelativeOverfitting()
,
makeCostMeasure()
,
makeCustomResampledMeasure()
,
makeMeasure()
,
measures
,
performance()
,
setAggregation()
Set the probability threshold the learner should use.
Description
See predict.threshold
in makeLearner and setThreshold.
For complex wrappers only the top-level predict.type
is currently set.
Usage
setPredictThreshold(learner, predict.threshold)
Arguments
learner |
(Learner | |
predict.threshold |
(numeric) |
Value
See Also
Other predict:
asROCRPrediction()
,
getPredictionProbabilities()
,
getPredictionResponse()
,
getPredictionTaskDesc()
,
predict.WrappedModel()
,
setPredictType()
Other learner:
LearnerProperties
,
getClassWeightParam()
,
getHyperPars()
,
getLearnerId()
,
getLearnerNote()
,
getLearnerPackages()
,
getLearnerParVals()
,
getLearnerParamSet()
,
getLearnerPredictType()
,
getLearnerShortName()
,
getLearnerType()
,
getParamSet()
,
helpLearner()
,
helpLearnerParam()
,
makeLearner()
,
makeLearners()
,
removeHyperPars()
,
setHyperPars()
,
setId()
,
setLearnerId()
,
setPredictType()
Set the type of predictions the learner should return.
Description
Possible prediction types are: Classification: Labels or class probabilities (including labels). Regression: Numeric or response or standard errors (including numeric response). Survival: Linear predictor or survival probability.
For complex wrappers the predict type is usually also passed down the encapsulated learner in a recursive fashion.
Usage
setPredictType(learner, predict.type)
Arguments
learner |
(Learner | |
predict.type |
( |
Value
See Also
Other predict:
asROCRPrediction()
,
getPredictionProbabilities()
,
getPredictionResponse()
,
getPredictionTaskDesc()
,
predict.WrappedModel()
,
setPredictThreshold()
Other learner:
LearnerProperties
,
getClassWeightParam()
,
getHyperPars()
,
getLearnerId()
,
getLearnerNote()
,
getLearnerPackages()
,
getLearnerParVals()
,
getLearnerParamSet()
,
getLearnerPredictType()
,
getLearnerShortName()
,
getLearnerType()
,
getParamSet()
,
helpLearner()
,
helpLearnerParam()
,
makeLearner()
,
makeLearners()
,
removeHyperPars()
,
setHyperPars()
,
setId()
,
setLearnerId()
,
setPredictThreshold()
Set threshold of prediction object.
Description
Set threshold of prediction object for classification or multilabel classification.
Creates corresponding discrete class response for the newly set threshold.
For binary classification: The positive class is predicted if the probability value exceeds the threshold.
For multiclass: Probabilities are divided by corresponding thresholds and the class with maximum resulting value is selected.
The result of both are equivalent if in the multi-threshold case the values are greater than 0 and sum to 1.
For multilabel classification: A label is predicted (with entry TRUE
) if a probability matrix entry
exceeds the threshold of the corresponding label.
Usage
setThreshold(pred, threshold)
Arguments
pred |
(Prediction) |
threshold |
(numeric) |
Value
(Prediction) with changed threshold and corresponding response.
See Also
Examples
# create task and train learner (LDA)
task = makeClassifTask(data = iris, target = "Species")
lrn = makeLearner("classif.lda", predict.type = "prob")
mod = train(lrn, task)
# predict probabilities and compute performance
pred = predict(mod, newdata = iris)
performance(pred, measures = mmce)
head(as.data.frame(pred))
# adjust threshold and predict probabilities again
threshold = c(setosa = 0.4, versicolor = 0.3, virginica = 0.3)
pred = setThreshold(pred, threshold = threshold)
performance(pred, measures = mmce)
head(as.data.frame(pred))
Simplify measure names.
Description
Clips aggregation names from character vector. E.g: 'mmce.test.mean' becomes 'mmce'. Elements that don't contain a measure name are ignored and returned unchanged.
Usage
simplifyMeasureNames(xs)
Arguments
xs |
(character) |
Value
(character).
Synthetic Minority Oversampling Technique to handle class imbalancy in binary classification.
Description
In each iteration, samples one minority class element x1, then one of x1's nearest neighbors: x2. Both points are now interpolated / convex-combined, resulting in a new virtual data point x3 for the minority class.
The method handles factor features, too. The gower distance is used for nearest neighbor calculation, see cluster::daisy. For interpolation, the new factor level for x3 is sampled from the two given levels of x1 and x2 per feature.
Usage
smote(task, rate, nn = 5L, standardize = TRUE, alt.logic = FALSE)
Arguments
task |
(Task) |
rate |
( |
nn |
( |
standardize |
( |
alt.logic |
( |
Value
Task.
References
Chawla, N., Bowyer, K., Hall, L., & Kegelmeyer, P. (2000) SMOTE: Synthetic Minority Over-sampling TEchnique. In International Conference of Knowledge Based Computer Systems, pp. 46-57. National Center for Software Technology, Mumbai, India, Allied Press.
See Also
Other imbalancy:
makeOverBaggingWrapper()
,
makeUndersampleWrapper()
,
oversample()
Sonar classification task.
Description
Contains the task (sonar.task
).
References
See mlbench::Sonar.
Spam classification task.
Description
Contains the task (spam.task
).
References
See kernlab::spam.
J. Muenchow's Ecuador landslide data set
Description
Data set created by Jannes Muenchow, University of Erlangen-Nuremberg, Germany. These data should be cited as Muenchow et al. (2012) (see reference below). This publication also contains additional information on data collection and the geomorphology of the area. The data set provded here is (a subset of) the one from the 'natural' part of the RBSF area and corresponds to landslide distribution in the year 2000.
Format
a data.frame
with point samples of landslide and
non-landslide locations in a study area in the Andes of southern Ecuador.
References
Muenchow, J., Brenning, A., Richter, M., 2012. Geomorphic process rates of landslides along a humidity gradient in the tropical Andes. Geomorphology, 139-140: 271-284.
Brenning, A., 2005. Spatial prediction models for landslide hazards: review, comparison and evaluation. Natural Hazards and Earth System Sciences, 5(6): 853-862.
Subset data in task.
Description
See title.
Usage
subsetTask(task, subset = NULL, features)
Arguments
task |
(Task) |
subset |
(integer | logical | |
features |
(character | integer | logical) |
Value
(Task). Task with subsetted data.
See Also
Other task:
getTaskClassLevels()
,
getTaskCosts()
,
getTaskData()
,
getTaskDesc()
,
getTaskFeatureNames()
,
getTaskFormula()
,
getTaskId()
,
getTaskNFeats()
,
getTaskSize()
,
getTaskTargetNames()
,
getTaskTargets()
,
getTaskType()
Examples
task = makeClassifTask(data = iris, target = "Species")
subsetTask(task, subset = 1:100)
Summarize columns of data.frame or task.
Description
Summarizes a data.frame, somewhat differently than the normal summary function of R. The function is mainly useful as a basic EDA tool on data.frames before they are converted to tasks, but can be used on tasks as well.
Columns can be of type numeric, integer, logical, factor, or character. Characters and logicals will be treated as factors.
Usage
summarizeColumns(obj)
Arguments
obj |
(data.frame | Task) |
Value
(data.frame). With columns:
name |
Name of column. |
type |
Data type of column. |
na |
Number of NAs in column. |
disp |
Measure of dispersion, for numerics and integers sd is used, for categorical columns the qualitative variation. |
mean |
Mean value of column, NA for categorical columns. |
median |
Median value of column, NA for categorical columns. |
mad |
MAD of column, NA for categorical columns. |
min |
Minimal value of column, for categorical columns the size of the smallest category. |
max |
Maximal value of column, for categorical columns the size of the largest category. |
nlevs |
For categorical columns, the number of factor levels, NA else. |
See Also
Other eda_and_preprocess:
capLargeValues()
,
createDummyFeatures()
,
dropFeatures()
,
mergeSmallFactorLevels()
,
normalizeFeatures()
,
removeConstantFeatures()
,
summarizeLevels()
Examples
summarizeColumns(iris)
Summarizes factors of a data.frame by tabling them.
Description
Characters and logicals will be treated as factors.
Usage
summarizeLevels(obj, cols = NULL)
Arguments
obj |
(data.frame | Task) |
cols |
(character) |
Value
(list). Named list of tables.
See Also
Other eda_and_preprocess:
capLargeValues()
,
createDummyFeatures()
,
dropFeatures()
,
mergeSmallFactorLevels()
,
normalizeFeatures()
,
removeConstantFeatures()
,
summarizeColumns()
Examples
summarizeLevels(iris)
Train a learning algorithm.
Description
Given a Task, creates a model for the learning machine which can be used for predictions on new data.
Usage
train(learner, task, subset = NULL, weights = NULL)
Arguments
learner |
(Learner | |
task |
(Task) |
subset |
(integer | logical | |
weights |
(numeric) |
Value
(WrappedModel).
See Also
Examples
training.set = sample(seq_len(nrow(iris)), nrow(iris) / 2)
## use linear discriminant analysis to classify iris data
task = makeClassifTask(data = iris, target = "Species")
learner = makeLearner("classif.lda", method = "mle")
mod = train(learner, task, subset = training.set)
print(mod)
## use random forest to classify iris data
task = makeClassifTask(data = iris, target = "Species")
learner = makeLearner("classif.rpart", minsplit = 7, predict.type = "prob")
mod = train(learner, task, subset = training.set)
print(mod)
Train an R learner.
Description
Mainly for internal use. Trains a wrapped learner on a given training set. You have to implement this method if you want to add another learner to this package.
Usage
trainLearner(.learner, .task, .subset, .weights = NULL, ...)
Arguments
.learner |
(RLearner) |
.task |
(Task) |
.subset |
(integer) |
.weights |
(numeric) |
... |
(any) |
Details
Your implementation must adhere to the following:
The model must be fitted on the subset of .task
given by .subset
. All parameters
in ...
must be passed to the underlying training function.
Value
(any). Model of the underlying learner.
Hyperparameter tuning.
Description
Optimizes the hyperparameters of a learner. Allows for different optimization methods, such as grid search, evolutionary strategies, iterated F-race, etc. You can select such an algorithm (and its settings) by passing a corresponding control object. For a complete list of implemented algorithms look at TuneControl.
Multi-criteria tuning can be done with tuneParamsMultiCrit.
Usage
tuneParams(
learner,
task,
resampling,
measures,
par.set,
control,
show.info = getMlrOption("show.info"),
resample.fun = resample
)
Arguments
learner |
(Learner | |
task |
(Task) |
resampling |
(ResampleInstance | ResampleDesc) |
measures |
(list of Measure | Measure) |
par.set |
(ParamHelpers::ParamSet) |
control |
(TuneControl) |
show.info |
( |
resample.fun |
(closure) |
Value
(TuneResult).
Note
If you would like to include results from the training data set, make sure to appropriately adjust the resampling strategy and the aggregation for the measure. See example code below.
See Also
Other tune:
TuneControl
,
getNestedTuneResultsOptPathDf()
,
getNestedTuneResultsX()
,
getResamplingIndices()
,
getTuneResult()
,
makeModelMultiplexer()
,
makeModelMultiplexerParamSet()
,
makeTuneControlCMAES()
,
makeTuneControlDesign()
,
makeTuneControlGenSA()
,
makeTuneControlGrid()
,
makeTuneControlIrace()
,
makeTuneControlMBO()
,
makeTuneControlRandom()
,
makeTuneWrapper()
,
tuneThreshold()
Examples
set.seed(123)
# a grid search for an SVM (with a tiny number of points...)
# note how easily we can optimize on a log-scale
ps = makeParamSet(
makeNumericParam("C", lower = -12, upper = 12, trafo = function(x) 2^x),
makeNumericParam("sigma", lower = -12, upper = 12, trafo = function(x) 2^x)
)
ctrl = makeTuneControlGrid(resolution = 2L)
rdesc = makeResampleDesc("CV", iters = 2L)
res = tuneParams("classif.ksvm", iris.task, rdesc, par.set = ps, control = ctrl)
print(res)
# access data for all evaluated points
df = as.data.frame(res$opt.path)
df1 = as.data.frame(res$opt.path, trafo = TRUE)
print(head(df[, -ncol(df)]))
print(head(df1[, -ncol(df)]))
# access data for all evaluated points - alternative
df2 = generateHyperParsEffectData(res)
df3 = generateHyperParsEffectData(res, trafo = TRUE)
print(head(df2$data[, -ncol(df2$data)]))
print(head(df3$data[, -ncol(df3$data)]))
## Not run:
# we optimize the SVM over 3 kernels simultanously
# note how we use dependent params (requires = ...) and iterated F-racing here
ps = makeParamSet(
makeNumericParam("C", lower = -12, upper = 12, trafo = function(x) 2^x),
makeDiscreteParam("kernel", values = c("vanilladot", "polydot", "rbfdot")),
makeNumericParam("sigma", lower = -12, upper = 12, trafo = function(x) 2^x,
requires = quote(kernel == "rbfdot")),
makeIntegerParam("degree", lower = 2L, upper = 5L,
requires = quote(kernel == "polydot"))
)
print(ps)
ctrl = makeTuneControlIrace(maxExperiments = 5, nbIterations = 1, minNbSurvival = 1)
rdesc = makeResampleDesc("Holdout")
res = tuneParams("classif.ksvm", iris.task, rdesc, par.set = ps, control = ctrl)
print(res)
df = as.data.frame(res$opt.path)
print(head(df[, -ncol(df)]))
# include the training set performance as well
rdesc = makeResampleDesc("Holdout", predict = "both")
res = tuneParams("classif.ksvm", iris.task, rdesc, par.set = ps,
control = ctrl, measures = list(mmce, setAggregation(mmce, train.mean)))
print(res)
df2 = as.data.frame(res$opt.path)
print(head(df2[, -ncol(df2)]))
## End(Not run)
Hyperparameter tuning for multiple measures at once.
Description
Optimizes the hyperparameters of a learner in a multi-criteria fashion. Allows for different optimization methods, such as grid search, evolutionary strategies, etc. You can select such an algorithm (and its settings) by passing a corresponding control object. For a complete list of implemented algorithms look at TuneMultiCritControl.
Usage
tuneParamsMultiCrit(
learner,
task,
resampling,
measures,
par.set,
control,
show.info = getMlrOption("show.info"),
resample.fun = resample
)
Arguments
learner |
(Learner | |
task |
(Task) |
resampling |
(ResampleInstance | ResampleDesc) |
measures |
[list of Measure) |
par.set |
(ParamHelpers::ParamSet) |
control |
(TuneMultiCritControl) |
show.info |
( |
resample.fun |
(closure) |
Value
See Also
Other tune_multicrit:
TuneMultiCritControl
,
plotTuneMultiCritResult()
Examples
# multi-criteria optimization of (tpr, fpr) with NGSA-II
lrn = makeLearner("classif.ksvm")
rdesc = makeResampleDesc("Holdout")
ps = makeParamSet(
makeNumericParam("C", lower = -12, upper = 12, trafo = function(x) 2^x),
makeNumericParam("sigma", lower = -12, upper = 12, trafo = function(x) 2^x)
)
ctrl = makeTuneMultiCritControlNSGA2(popsize = 4L, generations = 1L)
res = tuneParamsMultiCrit(lrn, sonar.task, rdesc, par.set = ps,
measures = list(tpr, fpr), control = ctrl)
plotTuneMultiCritResult(res, path = TRUE)
Tune prediction threshold.
Description
Optimizes the threshold of predictions based on probabilities. Works for classification and multilabel tasks. Uses BBmisc::optimizeSubInts for normal binary class problems and GenSA::GenSA for multiclass and multilabel problems.
Usage
tuneThreshold(pred, measure, task, model, nsub = 20L, control = list())
Arguments
pred |
(Prediction) |
measure |
(Measure) |
task |
(Task) |
model |
(WrappedModel) |
nsub |
( |
control |
(list) |
Value
(list). A named list with with the following components:
th
is the optimal threshold, perf
the performance value.
See Also
Other tune:
TuneControl
,
getNestedTuneResultsOptPathDf()
,
getNestedTuneResultsX()
,
getResamplingIndices()
,
getTuneResult()
,
makeModelMultiplexer()
,
makeModelMultiplexerParamSet()
,
makeTuneControlCMAES()
,
makeTuneControlDesign()
,
makeTuneControlGenSA()
,
makeTuneControlGrid()
,
makeTuneControlIrace()
,
makeTuneControlMBO()
,
makeTuneControlRandom()
,
makeTuneWrapper()
,
tuneParams()
Wisonsin Prognostic Breast Cancer (WPBC) survival task.
Description
Contains the task (wpbc.task
).
References
See TH.data::wpbc. Incomplete cases have been removed from the task.
Yeast multilabel classification task.
Description
Contains the task (yeast.task
).
Source
https://archive.ics.uci.edu/ml/datasets/Yeast (In long instead of wide format)
References
Elisseeff, A., & Weston, J. (2001): A kernel method for multi-labelled classification. In Advances in neural information processing systems (pp. 681-687).