## 
## lessR 3.9.7  feedback: gerbing@pdx.edu  web: lessRstats.com/new
## ---------------------------------------------------------------
## > d <- Read("")   Read text, Excel, SPSS, SAS, or R data file
##   d is default data frame, data= in analysis routines optional
## 
## > vignette("topic") for help on the following topics
##    "Read": Read data and variable labels, write data
##    "BarChart", "Histogram", "Plot": Visualizations
##    "Means": Analyze means with t-tests and ANOVA
##    "Regression": Least-squares, logistic regression
##    "Factor Analysis": Exploratory and confirmatory
##    "Customize": Custom color palettes, more customization
##    "Extract": General, simple data frame subsetting
##    "pivot": 1-d and 2-d simply created pivot tablesOne of the most frequently encountered visualizations is the bar chart.
Bar chart: Plot a number associated with each category of a categorical variable as the height of the corresponding bars.
A call to a function to create a bar chart contains the name of the variable that contains the categories to be plotted. With the BarChart() function, that variable name is the first argument passed to the function, and often, as in this example, the only argument passed to the function. In that situation, the numerical value associated with each bar is the corresponding count of the number of occurrences.
First read the Employee data included as part of lessR.
## 
## >>> Suggestions
## Details about your data, Enter:  details()  for d, or  details(name)
## 
## Data Types
## ------------------------------------------------------------
## character: Non-numeric data values
## integer: Numeric data values, integers only
## double: Numeric data values with decimal digits
## ------------------------------------------------------------
## 
##     Variable                  Missing  Unique 
##         Name     Type  Values  Values  Values   First and last values
## ------------------------------------------------------------------------------------------
##  1     Years   integer     36       1      16   7  NA  15 ... 1  2  10
##  2    Gender character     37       0       2   M  M  M ... F  F  M
##  3      Dept character     36       1       5   ADMN  SALE  SALE ... MKTG  SALE  FINC
##  4    Salary    double     37       0      37   53788.26  94494.58 ... 56508.32  57562.36
##  5    JobSat character     35       2       3   med  low  low ... high  low  high
##  6      Plan   integer     37       0       3   1  1  3 ... 2  2  1
##  7       Pre   integer     37       0      27   82  62  96 ... 83  59  80
##  8      Post   integer     37       0      22   92  74  97 ... 90  71  87
## ------------------------------------------------------------------------------------------To illustrate, consider the categorical variable Dept in the Employee data table. Use BarChart() to tabulate and display the number of employees in each department, here relying upon the default data frame (table) named d.
Bar chart of tablulated counts of employees in each department.
## >>> Suggestions
## BarChart(Dept, horiz=TRUE)  # horizontal bar chart
## BarChart(Dept, fill="greens")  # sequential green bars
## PieChart(Dept)  # doughnut (ring) chart
## Plot(Dept)  # bubble plot
## Plot(Dept, stat="count")  # lollipop plot 
## 
## 
## --- Dept ---
## 
## 
## Missing Values of Dept: 1 
## 
## 
##                 ACCT   ADMN   FINC   MKTG   SALE    Total 
## Frequencies:       5      6      4      6     15       36 
## Proportions:   0.139  0.167  0.111  0.167  0.417    1.000 
## 
## 
## Chi-squared test of null hypothesis of equal probabilities 
##   Chisq = 10.944, df = 4, p-value = 0.027The default color theme, colors, fills the bars in the bar chart with the lessR qualitative palette of different hues called "hues". Explained more in the vignette Customize, here are the hues, all at the same level of brightness (chroma=65, luminance=55).
The BarChart() function provides a default color theme, and labels each bar with the associated numerical value. The function also provides the corresponding frequency distribution, the table that lists the count of each category, from which the bar chart is constructed.
Specify a single fill color with the fill parameter, and a horizontal bar chart with base R parameter horiz. Turn off console output with the parameter quiet. Turn off the displayed value on each bar with the parameter values.
Use the theme parameter to change the entire color theme: “colors”, “lightbronze”, “dodgerblue”, “darkred”, “gray”, “gold”, “darkgreen”, “blue”, “red”, “rose”, “green”, “purple”, “sienna”, “brown”, “orange”, “white”, and “light”.
Or, can use style() to change the theme for subsequent visualizations as well.
## >>> Suggestions
## BarChart(Dept, horiz=TRUE)  # horizontal bar chart
## BarChart(Dept, fill="greens")  # sequential green bars
## PieChart(Dept)  # doughnut (ring) chart
## Plot(Dept)  # bubble plot
## Plot(Dept, stat="count")  # lollipop plot 
## 
## 
## --- Dept ---
## 
## 
## Missing Values of Dept: 1 
## 
## 
##                 ACCT   ADMN   FINC   MKTG   SALE    Total 
## Frequencies:       5      6      4      6     15       36 
## Proportions:   0.139  0.167  0.111  0.167  0.417    1.000 
## 
## 
## Chi-squared test of null hypothesis of equal probabilities 
##   Chisq = 10.944, df = 4, p-value = 0.027Dept is not an ordinal variable, but to illustrate, can choose many different sequential palettes from getColors(): “reds”, “rusts”, “browns”, “olives”, “greens”, “emeralds”, “turquoises”, “aquas”, “blues”, “purples”, “violets”, “magentas”, and “grays”.
## >>> Suggestions
## BarChart(Dept, horiz=TRUE)  # horizontal bar chart
## BarChart(Dept, fill="greens")  # sequential green bars
## PieChart(Dept)  # doughnut (ring) chart
## Plot(Dept)  # bubble plot
## Plot(Dept, stat="count")  # lollipop plot 
## 
## 
## --- Dept ---
## 
## 
## Missing Values of Dept: 1 
## 
## 
##                 ACCT   ADMN   FINC   MKTG   SALE    Total 
## Frequencies:       5      6      4      6     15       36 
## Proportions:   0.139  0.167  0.111  0.167  0.417    1.000 
## 
## 
## Chi-squared test of null hypothesis of equal probabilities 
##   Chisq = 10.944, df = 4, p-value = 0.027Rotate and offset the axis labels with rotate_x and offset parameters. Do a descending sort of the categories by frequencies with the sort parameter.
## >>> Suggestions
## BarChart(Dept, horiz=TRUE)  # horizontal bar chart
## BarChart(Dept, fill="greens")  # sequential green bars
## PieChart(Dept)  # doughnut (ring) chart
## Plot(Dept)  # bubble plot
## Plot(Dept, stat="count")  # lollipop plot 
## 
## 
## --- Dept ---
## 
## 
## Missing Values of Dept: 1 
## 
## 
##                 SALE   ADMN   MKTG   ACCT   FINC    Total 
## Frequencies:      15      6      6      5      4       36 
## Proportions:   0.417  0.167  0.167  0.139  0.111    1.000 
## 
## 
## Chi-squared test of null hypothesis of equal probabilities 
##   Chisq = 10.944, df = 4, p-value = 0.027Instead of setting the value of the interior color of the bars with the fill parameter, map the value of tabulated count to bar fill. With mapping, the color of the bars reflects the bar height. The higher the bar, the darker the color.
## >>> Suggestions
## BarChart(Dept, horiz=TRUE)  # horizontal bar chart
## BarChart(Dept, fill="greens")  # sequential green bars
## PieChart(Dept)  # doughnut (ring) chart
## Plot(Dept)  # bubble plot
## Plot(Dept, stat="count")  # lollipop plot 
## 
## 
## --- Dept ---
## 
## 
## Missing Values of Dept: 1 
## 
## 
##                 ACCT   ADMN   FINC   MKTG   SALE    Total 
## Frequencies:       5      6      4      6     15       36 
## Proportions:   0.139  0.167  0.111  0.167  0.417    1.000 
## 
## 
## Chi-squared test of null hypothesis of equal probabilities 
##   Chisq = 10.944, df = 4, p-value = 0.027Long value labels on the horizontal axis are also addressed by moving to a new line whenever a space is encountered in the label. Here also read variable labels into the l data frame. Convert the specified four Mach items to new factor variables named m06_f, etc. with the lessR function factors().
Specify both the categorical variable, \(x\), as well as the numerical variable that specifies the height of the bars, \(y\). Then can do a statistical transformation of \(y\). Set the bars proportional to the height of the corresponding mean deviations of \(y\) with the stat parameter. Possible values of stat: “sum”, “mean”, “sd”, “dev”, “min”, “median”, and “max”. The “dev” value displays the mean deviations to further facilitate a comparison among levels.
Here the \(x\) is Dept and \(y\) is Salary.
Display bars for values of dev <= 0 in a different color than values above with the fill_split parameter. Do an ascending sort with the sort parameter.
## Salary 
##   - by levels of - 
## Dept 
##  
##        n   miss         mean           sd          min          mdn          max 
## ACCT    5      0    61792.776    12774.606    46124.970    69547.600    72502.500 
## ADMN    6      0    81277.117    27585.151    53788.260    71058.595   122563.380 
## FINC    4      0    69010.675    17852.498    57139.900    61937.625    95027.550 
## MKTG    6      0    70257.128    19869.812    51036.850    61658.990    99062.660 
## SALE   15      0    78830.065    23476.839    49188.960    77714.850   134419.230## >>> Suggestions
## Plot(Salary, Dept) # lollipop plot 
## 
## 
##  Data for:  Salary 
##  ----------------- 
##        ACCT       FINC       MKTG      SALE      ADMN 
##  -10440.776  -3222.877  -1976.424  6596.513  9043.565Can annotate a plot with the add parameter. To add a rectangle around the message (here centered at <3,10>), specify two corners of the rectangle, <x1,y1> and <x2,y2>. Specify the beginning coordinate of the text with <x1,y1>. Because in the add parameter, the message follows the specification of rect, the coordinates of the text message follows the coordinates for the rectangle. First lighten the fill color of the annotation with the add_fill parameter for the style() function.
style(add_fill="aliceblue")
BarChart(Dept, add=c("rect", "Employees by\nDepartment"),
                     x1=c(1.75,3), y1=c(11, 10), x2=4.25, y2=9)## >>> Suggestions
## BarChart(Dept, horiz=TRUE)  # horizontal bar chart
## BarChart(Dept, fill="greens")  # sequential green bars
## PieChart(Dept)  # doughnut (ring) chart
## Plot(Dept)  # bubble plot
## Plot(Dept, stat="count")  # lollipop plot 
## 
## 
## --- Dept ---
## 
## 
## Missing Values of Dept: 1 
## 
## 
##                 ACCT   ADMN   FINC   MKTG   SALE    Total 
## Frequencies:       5      6      4      6     15       36 
## Proportions:   0.139  0.167  0.111  0.167  0.417    1.000 
## 
## 
## Chi-squared test of null hypothesis of equal probabilities 
##   Chisq = 10.944, df = 4, p-value = 0.027An alternative to the bar chart for a single categorical variable is the pie chart.
Pie Chart: Relate each level of a categorical variable to the area of a circle (pie) scaled according to the value of an associated numerical variable.
Here the presented version of a pie chart is the doughnut or ring chart.
## >>> Suggestions
## PieChart(Dept, hole=0)  # traditional pie chart
## PieChart(Dept, values="%")  # display %'s on the chart
## BarChart(Dept)  # bar chart
## Plot(Dept)  # bubble plot
## Plot(Dept, values="count")  # lollipop plot 
## 
## 
## --- Dept ---
## 
## 
##                 ACCT   ADMN   FINC   MKTG   SALE    Total 
## Frequencies:       5      6      4      6     15       36 
## Proportions:   0.139  0.167  0.111  0.167  0.417    1.000 
## 
## 
## Chi-squared test of null hypothesis of equal probabilities 
##   Chisq = 10.944, df = 4, p-value = 0.027The doughnut or ring chart appears easier to read than a standard bar chart. But the lessR function PieChart() also can create the “old-fashioned” pie chart. We have seen the summary statistics several times now, so turn off the output to the R console here with the quiet parameter.
Standard pie chart of variable Dept in the d data frame.
Set the size of the hole in the doughnut or ring chart with the parameter hole, which specifies the proportion of the pie occupied by the hole. The default hole size is 0.65. Set that value to 0 to close the hole.
Specify the second categorical variable with the by parameter.
## >>> Suggestions
## Plot(Dept, Gender)  # bubble plot
## BarChart(Dept, by=Gender, horiz=TRUE)  # horizontal bar chart
## BarChart(Dept, fill="steelblue")  # steelblue bars 
## 
## 
## Joint and Marginal Frequencies 
## ------------------------------ 
##  
##        Dept 
## Gender   ACCT ADMN FINC MKTG SALE Sum 
##   F         3    4    1    5    5  18 
##   M         2    2    3    1   10  18 
##   Sum       5    6    4    6   15  36 
## 
## 
## Cramer's V: 0.415 
##  
## Chi-square Test:  Chisq = 6.200, df = 4, p-value = 0.185 
## >>> Low cell expected frequencies, chi-squared approximation may not be accurateThe stacked version is default, but the values of the second categorical variable can also be represented with bars, more helpful to compare the values with each other.
## >>> Suggestions
## Plot(Dept, Gender)  # bubble plot
## BarChart(Dept, by=Gender, horiz=TRUE)  # horizontal bar chart
## BarChart(Dept, fill="steelblue")  # steelblue bars 
## 
## 
## Joint and Marginal Frequencies 
## ------------------------------ 
##  
##        Dept 
## Gender   ACCT ADMN FINC MKTG SALE Sum 
##   F         3    4    1    5    5  18 
##   M         2    2    3    1   10  18 
##   Sum       5    6    4    6   15  36 
## 
## 
## Cramer's V: 0.415 
##  
## Chi-square Test:  Chisq = 6.200, df = 4, p-value = 0.185 
## >>> Low cell expected frequencies, chi-squared approximation may not be accurateCan also do a Trellis chart with the by1 parameter.
## [Trellis graphics from Deepayan Sarkar's lattice package]## Joint and Marginal Frequencies 
## ------------------------------ 
##  
##        Dept 
## Gender   ACCT ADMN FINC MKTG SALE Sum 
##   F         3    4    1    5    5  18 
##   M         2    2    3    1   10  18 
##   Sum       5    6    4    6   15  36 
## 
## 
## Cramer's V: 0.415 
##  
## Chi-square Test:  Chisq = 6.200, df = 4, p-value = 0.185 
## >>> Low cell expected frequencies, chi-squared approximation may not be accurateOr, stack the charts vertically by specifying one column with the n_col parameter. Turn off text output to the console with the quiet parameter set to TRUE.
Obtain the 100% stacked version with the stack100 parameter. This visualization is most useful for comparing levels of the by variable across levels of the x variable, here Dept, when the frequencies in each level of the x variable differ. The comparisons are done with the percentage in each category instead of the count.
## >>> Suggestions
## Plot(Dept, Gender)  # bubble plot
## BarChart(Dept, by=Gender, horiz=TRUE)  # horizontal bar chart
## BarChart(Dept, fill="steelblue")  # steelblue bars 
## 
## 
## Joint and Marginal Frequencies 
## ------------------------------ 
##  
##        Dept 
## Gender   ACCT ADMN FINC MKTG SALE Sum 
##   F         3    4    1    5    5  18 
##   M         2    2    3    1   10  18 
##   Sum       5    6    4    6   15  36 
## 
## 
## Cramer's V: 0.415 
##  
## Chi-square Test:  Chisq = 6.200, df = 4, p-value = 0.185 
## >>> Low cell expected frequencies, chi-squared approximation may not be accurate 
## 
## 
## Cell Proportions within Each Column 
## ----------------------------------- 
##  
##        Dept 
## Gender    ACCT  ADMN  FINC  MKTG  SALE 
##   F      0.600 0.667 0.250 0.833 0.333 
##   M      0.400 0.333 0.750 0.167 0.667 
##   Sum    1.000 1.000 1.000 1.000 1.000Use the base R help() function to view the full manual for BarChart(). Simply enter a question mark followed by the name of the function.
?BarChart