version 3.19
This function transforms (calibrates) the raw data to either crisp or fuzzy sets values, using the direct method of calibration.
calibrate(x, type = "fuzzy", method = "direct", thresholds = NA, logistic = TRUE, idm = 0.95, ecdf = FALSE, below = 1, above = 1, ...)
| x | A numerical causal condition. | |||
| type | Calibration type, either "crisp"or"fuzzy". | |||
| method | Calibration method, either "direct","indirect",
            or"TFR". | |||
| thresholds | A vector of (named) thresholds. | |||
| logistic | Calibrate to fuzzy sets using the logistic function. | |||
| idm | The set inclusion degree of membership for the logistic function. | |||
| ecdf | Calibrate to fuzzy sets using the empirical cumulative distribution function of the raw data. | |||
| below | Numeric (non-negative), determines the shape below crossover. | |||
| above | Numeric (non-negative), determines the shape above crossover. | |||
| ... | Additional parameters, mainly for backwards compatibility. | 
Calibration is a transformational process from raw numerical data (interval or ratio level of measurement) to set membership scores, based on a certain number of qualitative anchors.
When type = "crisp", the process is similar to recoding the original
values to a number of categories defined by the number of thresholds. For one
threshold, the calibration produces two categories (intervals): 0 if below, 1 if above.
For two thresholds, the calibration produces three categories: 0 if below the first threshold,
1 if in the interval between the thresholds and 2 if above the second threshold etc.
When type = "fuzzy", calibration produces fuzzy set membership scores, using
three anchors for the increasing or decreasing s-shaped distributions (including
the logistic function), and six anchors for the increasing or decreasing bell-shaped
distributions.
The argument thresholds can be specified either as a simple numeric vector, or as a
named numeric vector. If used as a named vector, for the first category of s-shaped
distributions, the names of the thresholds should be:
| "e" | for the full set exclusion | 
| "c" | for the set crossover | 
| "i" | for the full set inclusion | 
For the second category of bell-shaped distributions, the names of the thresholds should be:
| "e1" | for the first (left) threshold for full set exclusion | 
| "c1" | for the first (left) threshold for set crossover | 
| "i1" | for the first (left) threshold for full set inclusion | 
| "i2" | for the second (right) threshold for full set inclusion | 
| "c2" | for the second (right) threshold for set crossover | 
| "e2" | for the second (right) threshold for full set exclusion | 
If used as a simple numerical vector, the order of the values matter.
If e $<$ c $<$ i, then the membership 
function is increasing from e to i. If i $<$ 
c $<$ e, then the membership function is decreasing from 
i to e.
Same for the bell-shaped distribution, if e1 $<$ c1
$<$ i1 $\le$ i2 $<$ c2 $<$
e2, then the membership function is first increasing from e1
to i1, then flat between i1 and  i2, and then
decreasing from i2 to e2. In contrast, if i1
$<$ c1 $<$ e1 $\le$ e2 $<$
c2 $<$ i1, then the membership function is first decreasing
from i1 to e1, then flat between e1 and 
e2, and finally increasing from e2 to i2.
When logistic = TRUE (the default), the argument idm specifies the
inclusion degree of membership for the logistic function. If logistic = FALSE, the
function returns linear s-shaped or bell-shaped distributions (curved using the
arguments below and above), unless activating the argument
ecdf.
If there is no prior knowledge on the shape of the distribution, the argument ecdf
asks the computer to determine the underlying distribution of the empirical, observed points,
and the calibrated measures are found along that distribution.
Both logistic and ecdf arguments can be used only for s-shaped
distributions (using 3 thresholds), and they are mutually exclusive.
The parameters below and above (active only when both
logistic = TRUE and ecdf are deactivated, establish the degree of
concentration and  dilation (convex or concave shape) between the threshold and crossover:
| 0 < below < 1 | dilates in a concave shape below the crossover | 
| below = 1 | produces a linear shape (neither convex, nor concave) | 
| below > 1 | concentrates in a convex shape below the crossover | 
| 0 < above < 1 | dilates in a concave shape above the crossover | 
| above = 1 | produces a linear shape (neither convex, nor concave) | 
| above > 1 | concentrates in a convex shape above the crossover | 
Usually, below and above have equal values, unless specific reasons
exist to make them different.
For the type = "fuzzy" it is also possible to use the "indirect"
method to calibrate the data, using a procedure first introduced by Ragin (2008). The indirect method
assumes a vector of thresholds to cut the original data into equal intervals, then it applies
a (quasi)binomial logistic regression with a fractional polynomial equation.
      
The results are also fuzzy between 0 and 1, but the method is entirely different: it has no anchors (specific to the direct method), and it doesn't need to specify a calibration function to calculate the scores with.
The third method applied to fuzzy calibrations is called type = "TFR" and calibrates
categorical data (such as Likert type response scales) to fuzzy values using the Totally Fuzzy and Relative
method (Chelli and Lemmi, 1995).
Thiem, A.; Dusa, A. (2013) Qualitative Comparative Analysis with R: A User's Guide. New York: Springer.
Thiem, A. (2014) Membership Function Sensitivity of Descriptive Statistics in Fuzzy-Set Relations. International Journal of Social Research Methodology vol.17, no.6, pp.625-642.
# generate heights for 100 people # with an average of 175cm and a standard deviation of 10cm set.seed(12345) x <- rnorm(n = 100, mean = 175, sd = 10) cx <- calibrate(x, type = "crisp", thresholds = 175) plot(x, cx, main="Binary crisp set using 1 threshold", xlab = "Raw data", ylab = "Calibrated data", yaxt="n") axis(2, at = 0:1)
cx <- calibrate(x, type = "crisp", thresholds = c(170, 180)) plot(x, cx, main="3 value crisp set using 2 thresholds", xlab = "Raw data", ylab = "Calibrated data", yaxt="n") axis(2, at = 0:2)
# calibrate to an increasing, s-shaped fuzzy-set cx <- calibrate(x, thresholds = "e=165, c=175, i=185") plot(x, cx, main = "Membership scores in the set of tall people", xlab = "Raw data", ylab = "Calibrated data")
# calibrate to a decreasing, s-shaped fuzzy-set cx <- calibrate(x, thresholds = "i=165, c=175, e=185") plot(x, cx, main = "Membership scores in the set of short people", xlab = "Raw data", ylab = "Calibrated data")
# when not using the logistic function, linear increase cx <- calibrate(x, thresholds = "e=165, c=175, i=185", logistic = FALSE) plot(x, cx, main = "Membership scores in the set of tall people", xlab = "Raw data", ylab = "Calibrated data")
# tweaking the parameters "below" and "above" the crossover, # at value 3.5 approximates a logistic distribution, when e=155 and i=195 cx <- calibrate(x, thresholds = "e=155, c=175, i=195", logistic = FALSE, below = 3.5, above = 3.5) plot(x, cx, main = "Membership scores in the set of tall people", xlab = "Raw data", ylab = "Calibrated data")
# calibrate to a bell-shaped fuzzy set cx <- calibrate(x, thresholds = "e1=155, c1=165, i1=175, i2=175, c2=185, e2=195", below = 3, above = 3) plot(x, cx, main = "Membership scores in the set of average height", xlab = "Raw data", ylab = "Calibrated data")
# calibrate to an inverse bell-shaped fuzzy set cx <- calibrate(x, thresholds = "i1=155, c1=165, e1=175, e2=175, c2=185, i2=195", below = 3, above = 3) plot(x, cx, main = "Membership scores in the set of non-average height", xlab = "Raw data", ylab = "Calibrated data")
# the default values of "below" and "above" will produce a triangular shape cx <- calibrate(x, thresholds = "e1=155, c1=165, i1=175, i2=175, c2=185, e2=195") plot(x, cx, main = "Membership scores in the set of average height", xlab = "Raw data", ylab = "Calibrated data")
# different thresholds to produce a linear trapezoidal shape cx <- calibrate(x, thresholds = "e1=156, c1=164, i1=172, i2=179, c2=187, e2=195") plot(x, cx, main = "Membership scores in the set of average height", xlab = "Raw data", ylab = "Calibrated data")
# larger values of above and below will increase membership in or out of the set cx <- calibrate(x, thresholds = "e1=155, c1=165, i1=175, i2=175, c2=185, e2=195", below = 10, above = 10) plot(x, cx, main = "Membership scores in the set of average height", xlab = "Raw data", ylab = "Calibrated data")
# while extremely large values will produce virtually crisp results cx <- calibrate(x, thresholds = "e1=155, c1=165, i1=175, i2=175, c2=185, e2=195", below = 10000, above = 10000) plot(x, cx, main = "Binary crisp scores in the set of average height", xlab = "Raw data", ylab = "Calibrated data", yaxt="n") axis(2, at = 0:1) abline(v = c(165, 185), col = "red", lty = 2)
# check if crisp cx[1] 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 0 1 0 0 0 1 1 1 1 1 0 0 0 1 1 1 0 0 1 0 [42] 0 0 1 1 0 0 1 1 0 1 0 1 1 1 1 1 1 0 0 1 0 1 0 1 0 1 0 1 0 1 1 0 0 0 1 1 1 1 0 1 0 [83] 1 0 1 0 1 1 1 1 1 1 0 1 1 1 1 0 1 1# using the empirical cumulative distribution function # require manually setting logistic to FALSE cx <- calibrate(x, thresholds = "e=155, c=175, i=195", logistic = FALSE, ecdf = TRUE) plot(x, cx, main = "Membership scores in the set of tall people", xlab = "Raw data", ylab = "Calibrated data")
## the indirect method, per capita income data from Ragin (2008) inc <- c(40110, 34400, 25200, 24920, 20060, 17090, 15320, 13680, 11720, 11290, 10940, 9800, 7470, 4670, 4100, 4070, 3740, 3690, 3590, 2980, 1000, 650, 450, 110) cinc <- calibrate(inc, method = "indirect", thresholds = "1000, 4000, 5000, 10000, 20000") plot(inc, cinc, main = "Membership scores in the set of high income", xlab = "Raw data", ylab = "Calibrated data")
set.seed(12345) values <- sample(1:7, 100, replace = TRUE) TFR <- calibrate(values, method = "TFR") table(round(TFR, 3))0 0.151 0.314 0.477 0.605 0.814 1 14 13 14 14 11 18 16