Overview of Datasets

The {avocado} package consists of three different datasets that summarize the weekly sales of Hass Avocados at different regional levels.

PLU

The product/price lookup code (PLU) uniquely identifies a product (mainly produce). The Hass Avocado Board focuses on six different PLUs:

Bags vs PLU

Another distinction that the HAB makes is between bags versus bulk. Bulk typically means avocados sold as individual pieces and are easily distinguishable with their PLU codes. Hence, the PLU refers to a bulk sale. On the other hand, the bags indicates a pre-packaged container consisting of a variable number of avocados of mixed PLU type. For instance, a package of six avocados may consist of 2 PLU 4046, 3 PLU 4770 and 1 PLU 4225. In other words, bagged sales are unable to account for individual PLU sales.

Region vs. Location

The hass_region and hass datasets contain a shared variable called region and the hass dataset has a variable called location. Regions are defined by the Hass Avocado Board and Locations are selected cities or sub-regions that are part of the overall Region. The totals found for all locations within a Region will not equal the total found for the specific Region due to the aforementioned point. For convenience, here is a breakdown of the Regions and Locations:

Datasets

hass_usa

The hass_usa dataset focuses on weekly Hass Avocado sales at the country (i.e., contiguous US) level and consists of the following fields:

library(avocado)
data('hass_usa')
dplyr::glimpse(hass_usa)
#> Rows: 197
#> Columns: 15
#> $ week_ending       <dttm> 2017-01-02, 2017-01-08, 2017-01-15, 2017-01-22, 20…
#> $ avg_price_nonorg  <dbl> 0.89, 0.99, 0.98, 0.94, 0.96, 0.77, 0.87, 0.99, 0.9…
#> $ plu4046           <dbl> 12707895, 11809728, 12936858, 14254150, 14034076, 2…
#> $ plu4225           <dbl> 14201201, 13856935, 12625665, 14212882, 11683465, 2…
#> $ plu4770           <dbl> 549844.6, 539068.4, 579346.5, 908616.4, 818727.0, 1…
#> $ small_nonorg_bag  <dbl> 8551134, 9332972, 9445622, 9462854, 9918256, 125671…
#> $ large_nonorg_bag  <dbl> 2802709, 2432259, 2638918, 3231020, 2799961, 361827…
#> $ xlarge_nonorg_bag <dbl> 66933.73, 78840.96, 69077.52, 70871.48, 119095.66, …
#> $ avg_price_org     <dbl> 1.48, 1.43, 1.44, 1.37, 1.43, 1.36, 1.41, 1.31, 1.2…
#> $ plu94046          <dbl> 99820.77, 117721.87, 121132.79, 115700.02, 131808.2…
#> $ plu94225          <dbl> 273329.61, 283532.68, 280287.84, 281786.33, 292584.…
#> $ plu94770          <dbl> 4425.75, 8697.35, 8750.08, 6707.82, 3524.96, 2238.7…
#> $ small_org_bag     <dbl> 273456.00, 379088.19, 400292.46, 401805.90, 346391.…
#> $ large_org_bag     <dbl> 157939.75, 201681.74, 165261.11, 188282.91, 178896.…
#> $ xlarge_org_bag    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …

haas_region

The hass_region dataset focuses on weekly US Hass Avocado sales at the region level and consist of the following fields:

library(avocado)
data('hass_region')
dplyr::glimpse(hass_region)
#> Rows: 1,576
#> Columns: 16
#> $ week_ending       <dttm> 2017-01-02, 2017-01-02, 2017-01-02, 2017-01-02, 20…
#> $ region            <chr> "California", "Great Lakes", "Midsouth", "Northeast…
#> $ avg_price_nonorg  <dbl> 0.89, 0.88, 1.12, 1.35, 0.83, 0.64, 0.94, 0.79, 1.0…
#> $ plu4046           <dbl> 2266313.4, 636277.7, 653896.0, 174842.6, 1462454.2,…
#> $ plu4225           <dbl> 2877688.3, 2157249.6, 1285364.3, 2589315.7, 509659.…
#> $ plu4770           <dbl> 90899.53, 189356.95, 64703.08, 39606.96, 4780.52, 2…
#> $ small_nonorg_bag  <dbl> 1762033.7, 885769.2, 719379.6, 659611.8, 387098.0, …
#> $ large_nonorg_bag  <dbl> 151333.95, 349032.80, 151226.51, 49532.71, 13008.21…
#> $ xlarge_nonorg_bag <dbl> 27007.76, 7559.14, 4398.04, 478.61, 5742.12, 16334.…
#> $ avg_price_org     <dbl> 1.46, 1.44, 1.72, 2.00, 1.62, 1.23, 1.43, 1.19, 1.5…
#> $ plu94046          <dbl> 31267.65, 4104.86, 3353.64, 9132.13, 4547.17, 22474…
#> $ plu94225          <dbl> 65430.27, 69800.29, 36090.72, 36276.39, 15245.73, 4…
#> $ plu94770          <dbl> 6.16, 0.00, 1813.60, 923.53, 1366.36, 17.37, 262.28…
#> $ small_org_bag     <dbl> 50963.98, 23593.03, 29203.33, 65447.53, 17253.74, 3…
#> $ large_org_bag     <dbl> 16468.98, 6655.73, 1826.50, 3476.51, 8629.21, 1968.…
#> $ xlarge_org_bag    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …

hass

The hass dataset summarizes weekly Hass Avocado sales within the contiguous US based on city or sub-region. These areas are defined by the HAB and make up portions of the region field in the haas_region dataset. The fields are:

library(avocado)
data('hass')
dplyr::glimpse(hass)
#> Rows: 8,865
#> Columns: 17
#> $ week_ending       <dttm> 2017-01-02, 2017-01-02, 2017-01-02, 2017-01-02, 20…
#> $ location          <chr> "Albany", "Atlanta", "Baltimore/Washington", "Boise…
#> $ region            <chr> "Northeast", "Southeast", "Midsouth", "West", "Nort…
#> $ avg_price_nonorg  <dbl> 1.47, 0.93, 1.47, 0.92, 1.29, 1.43, 1.21, 1.15, 0.6…
#> $ plu4046           <dbl> 4845.77, 224073.54, 54530.42, 27845.16, 4119.90, 12…
#> $ plu4225           <dbl> 117027.41, 118926.37, 408952.26, 9408.92, 371223.34…
#> $ plu4770           <dbl> 200.36, 337.48, 14387.01, 11341.75, 3933.72, 102.52…
#> $ small_nonorg_bag  <dbl> 7866.86, 111599.58, 151345.59, 53093.47, 79339.78, …
#> $ large_nonorg_bag  <dbl> 7.83, 92628.91, 2542.41, 2793.61, 213.75, 255.65, 1…
#> $ xlarge_nonorg_bag <dbl> 0.00, 0.00, 3.12, 27.20, 0.00, 18.06, 46.67, 5089.3…
#> $ avg_price_org     <dbl> 1.87, 1.81, 1.92, 1.05, 2.06, 1.64, 1.70, 1.34, 1.2…
#> $ plu94046          <dbl> 71.65, 956.73, 1420.47, 0.00, 14.80, 8.52, 120.83, …
#> $ plu94225          <dbl> 192.63, 2862.95, 6298.07, 368.63, 2181.53, 320.56, …
#> $ plu94770          <dbl> 0.00, 0.00, 325.44, 0.00, 0.00, 0.00, 489.12, 0.00,…
#> $ small_org_bag     <dbl> 1112.42, 5.55, 5857.48, 577.91, 10636.25, 2585.10, …
#> $ large_org_bag     <dbl> 0.00, 1517.62, 0.00, 1877.28, 605.64, 511.31, 353.9…
#> $ xlarge_org_bag    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …