# "runexp" R package Authors: Annie Sauer (anniees@vt.edu) and Sierra Merkes (smerkes@vt.edu) Implements two methods of estimating runs scored in a softball scenario: (1) theoretical expectation using discrete Markov chains and (2) empirical distribution using multinomial random simulation. Scores are based on player-specific input probabilities (out, single, double, triple, walk, and homerun). Optional inputs include probability of attempting a steal, probability of succeeding in an attempted steal, and an indicator of whether a player is "fast" (e.g. the player could stretch home). These probabilities may be calculated from common player statistics that are publicly available on team's webpages. Scores are evaluated based on a nine-player lineup and may be used to compare lineups, evaluate base scenarios, and compare the offensive potential of individual players. Manuscript forthcoming. See Bukiet & Harold (1997) for implementation of discrete Markov chains. Functions: * `chain`: calculates run expectancy using discrete Markov chains * `sim`: estimates run expectancy using multinomial simulation * `plot.chain`: S3 method for plotting `chain` output objects * `prob_calc`: calculates player probabilities from commonly available stats * `scrape`: scrapes player statistics from a given URL Data Files: * `wku_stats`: player statistics for the 2013 Western Kentucky University softball team * `wku_probs`: calculated player probabilities for the 2013 Western Kentucky University softball team Examples: ``` # scrape ---------------------------------------------------------------------- url <-"https://wmubroncos.com/sports/softball/stats/2019" test <- scrape(url) test_probs <- prob_calc(test) # prob_calc ------------------------------------------------------------------- probs <- prob_calc(wku_stats) # probs corresponds to wku_probs # chain ----------------------------------------------------------------------- # Expected score for single batter (termed "offensive potential") chain1 <- chain("B", wku_probs) plot(chain1) # Expected score without cycling lineup <- wku_probs$name[1:9] chain2 <- chain(lineup, wku_probs) plot(chain2) # Expected score with cycling chain3 <- chain(lineup, wku_probs, cycle = TRUE) plot(chain3, type = 1:3) # sim ------------------------------------------------------------------------- # Short simulation (designed to run in less than 5 seconds) sim1 <- sim("B", wku_probs, inn = 1, reps = 100, cores = 1) # Simulation with interactive graphic lineup <- wku_probs$name[1:9] sim2 <- sim(lineup, wku_probs, inn = 7, reps = 1, graphic = TRUE) # Simulation for entire game (recommended to increase cores) sim3 <- sim(lineup, wku_probs, cores = 1) boxplot(sim3$score) points(1, sim3$score_avg_game) # game situation comparison of chain and sim ---------------------------------- # Select lineup made up of the nine "starters" lineup <- sample(wku_probs$name[1:9], 9) # Average chain across lead-off batters chain_avg <- mean(chain(lineup, wku_probs, cycle = TRUE)$score) # Simulate full 7 inning game (recommended to increase cores) sim_score <- sim(lineup, wku_probs, inn = 7, reps = 50000, cores = 1) # Split into bins in order to plot averages sim_grouped <- split(sim_score$score, rep(1:100, times = 50000 / 100)) boxplot(sapply(sim_grouped, mean), ylab = 'Expected Score for Game') points(1, sim_score$score_avg_game, pch = 16, cex = 2, col = 2) points(1, chain_avg * 7, pch = 18, cex = 2, col = 3) ```