---
title: "Code Review and Testing with gooseR"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Code Review and Testing with gooseR}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = FALSE
)
```

```{r setup}
library(gooseR)
```

## Introduction

gooseR provides intelligent code review that goes beyond static analysis. Unlike traditional linters that check syntax, gooseR actually reads and understands your code, providing context-aware feedback based on what you're trying to accomplish.

## The Power of goose_honk()

The `goose_honk()` function is your intelligent code reviewer. It offers four severity levels, each with a different personality and focus:

### Severity Levels

```{r}
# Gentle - Encouraging and constructive
goose_honk(severity = "gentle")
# "I notice you're using a loop here. Have you considered using lapply()? 
#  It might be more efficient! Your code structure looks good overall! 🦆"

# Moderate - Balanced and professional
goose_honk(severity = "moderate")
# "Your loop on line 15 could be replaced with vectorized operations for 
#  better performance. Also, consider adding error handling for the mean() 
#  function in case of NA values."

# Harsh - Direct and critical
goose_honk(severity = "harsh")
# "That loop is killing your performance. Use vectorization. No error 
#  handling? That's asking for production failures. Fix the variable 
#  naming - 'x' and 'df' tell me nothing."

# Brutal - No holds barred
goose_honk(severity = "brutal")
# "This code is a disaster. Loops in R? Really? No error handling, 
#  meaningless variable names, and zero documentation. Did you even 
#  test this? Start over and do it right."
```

## Real-World Example: Data Analysis Script

Let's see how goose_honk() helps improve a real analysis:

```{r}
# A typical analysis script with issues
analyze_sales <- function(sales_data) {
  # Calculate totals
  total = 0
  for(i in 1:nrow(sales_data)) {
    total = total + sales_data$amount[i]
  }
  
  # Get average
  avg = mean(sales_data$amount)
  
  # Find best month
  best = sales_data[sales_data$amount == max(sales_data$amount),]
  
  # Make plot
  plot(sales_data$amount)
  
  return(list(total, avg, best))
}

# Get gentle feedback first
goose_honk(severity = "gentle")
```

### Gentle Review Output:
```
🦆 Hey there! Your analyze_sales function is a great start! Here are some friendly suggestions:

✨ Opportunities for improvement:
• The loop for calculating total could be replaced with sum(sales_data$amount) - much faster!
• Consider checking for NA values in your mean calculation: mean(sales_data$amount, na.rm = TRUE)
• Your plot could use some labels to make it clearer

💡 Quick wins:
• Named list elements would make your return value clearer
• Adding input validation would make your function more robust

Keep up the good work! Your logic is sound! 🌟
```

```{r}
# Now let's get more critical feedback
goose_honk(severity = "harsh")
```

### Harsh Review Output:
```
Your analyze_sales function has serious issues:

PERFORMANCE PROBLEMS:
- Line 3-6: That loop is inexcusable. Use sum(sales_data$amount). You're writing R like it's C.
- This will be 100x slower on large datasets.

BUGS WAITING TO HAPPEN:
- No NA handling in mean() - this WILL break in production
- max() without na.rm = TRUE is a ticking time bomb
- What if multiple months have the same max value? Your subsetting is naive.

POOR PRACTICES:
- Unnamed return list? How is anyone supposed to use this?
- No input validation - what if sales_data is NULL? Empty? Missing columns?
- Base R plotting in 2024? At least use ggplot2.

Fix these issues before this sees production.
```

## Improved Version After Review

Based on the feedback, here's an improved version:

```{r}
analyze_sales <- function(sales_data) {
  # Input validation
  if (is.null(sales_data) || nrow(sales_data) == 0) {
    stop("sales_data cannot be NULL or empty")
  }
  if (!"amount" %in% names(sales_data)) {
    stop("sales_data must contain 'amount' column")
  }
  
  # Calculate metrics with NA handling
  total_sales <- sum(sales_data$amount, na.rm = TRUE)
  avg_sales <- mean(sales_data$amount, na.rm = TRUE)
  
  # Find best months (handle ties)
  max_amount <- max(sales_data$amount, na.rm = TRUE)
  best_months <- sales_data[sales_data$amount == max_amount & 
                           !is.na(sales_data$amount), ]
  
  # Create informative visualization
  library(ggplot2)
  p <- ggplot(sales_data, aes(x = seq_along(amount), y = amount)) +
    geom_line() +
    geom_point() +
    theme_brand("block") +
    labs(title = "Sales Trend", x = "Period", y = "Sales Amount")
  
  print(p)
  
  # Return named list
  return(list(
    total = total_sales,
    average = avg_sales,
    best_months = best_months,
    plot = p
  ))
}

# Check our improvements
goose_honk(severity = "moderate")
```

## Context-Aware Analysis

goose_honk() understands different types of R code:

### Data Manipulation
```{r}
# It recognizes dplyr chains
result <- data %>%
  filter(x > 10) %>%
  group_by(category) %>%
  summarise(mean = mean(value))

goose_honk()
# "Good use of dplyr! Consider adding .groups = 'drop' to summarise() 
#  to avoid the grouped data frame warning."
```

### Statistical Models
```{r}
# It understands modeling
model <- lm(mpg ~ wt + cyl, data = mtcars)

goose_honk()
# "Linear model looks good. Have you checked assumptions? 
#  Consider plot(model) for diagnostics. Also, you might want 
#  to check for multicollinearity between wt and cyl."
```

### Visualization
```{r}
# It recognizes ggplot2
p <- ggplot(data, aes(x, y)) + geom_point()

goose_honk()
# "Basic scatter plot. Consider adding labels with labs(), 
#  applying a theme, and perhaps adding a trend line with 
#  geom_smooth() if appropriate."
```

## Generating Tests

gooseR can generate test suites for your functions:

```{r}
# Your function
calculate_bmi <- function(weight_kg, height_m) {
  if (height_m <= 0) stop("Height must be positive")
  if (weight_kg <= 0) stop("Weight must be positive")
  
  bmi <- weight_kg / (height_m ^ 2)
  
  category <- if (bmi < 18.5) "Underweight"
  else if (bmi < 25) "Normal"
  else if (bmi < 30) "Overweight"
  else "Obese"
  
  return(list(bmi = bmi, category = category))
}

# Generate tests
tests <- goose_generate_tests("calculate_bmi")
cat(tests)
```

### Generated Tests:
```r
test_that("calculate_bmi works correctly", {
  # Test normal case
  result <- calculate_bmi(70, 1.75)
  expect_equal(result$bmi, 22.86, tolerance = 0.01)
  expect_equal(result$category, "Normal")
  
  # Test edge cases
  expect_equal(calculate_bmi(50, 1.8)$category, "Underweight")
  expect_equal(calculate_bmi(85, 1.75)$category, "Overweight")
  expect_equal(calculate_bmi(100, 1.7)$category, "Obese")
  
  # Test error conditions
  expect_error(calculate_bmi(0, 1.75), "Weight must be positive")
  expect_error(calculate_bmi(70, 0), "Height must be positive")
  expect_error(calculate_bmi(-70, 1.75), "Weight must be positive")
  
  # Test boundary conditions
  result_boundary <- calculate_bmi(56.25, 1.5)  # Exactly BMI = 25
  expect_equal(result_boundary$bmi, 25)
})
```

## Generating Documentation

Let gooseR write your roxygen2 documentation:

```{r}
# Your function
clean_text <- function(text, remove_numbers = FALSE, lowercase = TRUE) {
  if (lowercase) text <- tolower(text)
  text <- gsub("[[:punct:]]", " ", text)
  if (remove_numbers) text <- gsub("[0-9]", "", text)
  text <- gsub("\\s+", " ", text)
  trimws(text)
}

# Generate documentation
docs <- goose_document("clean_text")
cat(docs)
```

### Generated Documentation:
```r
#' Clean and Preprocess Text Data
#'
#' Performs text cleaning operations including punctuation removal,
#' case conversion, number removal, and whitespace normalization.
#'
#' @param text Character vector. The text to be cleaned.
#' @param remove_numbers Logical. If TRUE, removes all numeric characters
#'   from the text. Default is FALSE.
#' @param lowercase Logical. If TRUE, converts all text to lowercase.
#'   Default is TRUE.
#'
#' @return Character vector of the same length as input with cleaned text.
#'   Punctuation is replaced with spaces, multiple spaces are collapsed
#'   to single spaces, and leading/trailing whitespace is removed.
#'
#' @examples
#' clean_text("Hello, World! 123")
#' # [1] "hello world 123"
#' 
#' clean_text("Hello, World! 123", remove_numbers = TRUE)
#' # [1] "hello world"
#' 
#' clean_text("HELLO WORLD", lowercase = FALSE)
#' # [1] "HELLO WORLD"
#'
#' @export
```

## Error Explanation and Fixes

When you encounter errors, gooseR can help:

```{r}
# You get an error
data <- read.csv("myfile.csv")
model <- lm(y ~ x1 + x2 + x3, data = data)
# Error: object 'y' not found

# Get help
error_help <- goose_explain_error()
cat(error_help)
```

### Error Explanation:
```
The error "object 'y' not found" means the column 'y' doesn't exist in your data frame.

POSSIBLE CAUSES:
1. Column name mismatch (check names(data))
2. Column was renamed during import
3. CSV has different column names than expected

SOLUTIONS:
# Check column names
names(data)

# If column exists with different name:
model <- lm(actual_column_name ~ x1 + x2 + x3, data = data)

# Or rename the column:
names(data)[names(data) == "old_name"] <- "y"

# Defensive approach:
if (!"y" %in% names(data)) {
  stop("Column 'y' not found in data. Available columns: ", 
       paste(names(data), collapse = ", "))
}
```

## Best Practices Workflow

Here's a complete development workflow with gooseR:

```{r}
# 1. Write your function
my_function <- function(data) {
  # Initial implementation
  result <- process_data(data)
  return(result)
}

# 2. Get initial review
goose_honk(severity = "gentle")

# 3. Improve based on feedback
my_function <- function(data) {
  # Improved implementation with error handling
  if (is.null(data)) stop("Data cannot be NULL")
  result <- process_data(data)
  return(result)
}

# 4. Get stricter review
goose_honk(severity = "moderate")

# 5. Generate documentation
docs <- goose_document("my_function")

# 6. Generate tests
tests <- goose_generate_tests("my_function")

# 7. Final review
goose_honk(severity = "harsh")

# 8. Save your work
goose_save(my_function, category = "functions", tags = c("reviewed", "tested"))
```

## Integration with RStudio/Positron

Use the addins for quick access:

1. **Quick Review**: Select code and use the "Review Code" addin
2. **Generate Docs**: Place cursor in function and use "Document Function" addin
3. **Explain Error**: When you hit an error, use "Explain Last Error" addin

## Tips for Effective Code Review

1. **Start Gentle**: Begin with gentle reviews to build confidence
2. **Progress Gradually**: Move to harsher reviews as code improves
3. **Focus on Patterns**: goose_honk() identifies recurring issues
4. **Learn from Feedback**: Each review teaches best practices
5. **Review Often**: Regular reviews during development, not just at the end

## Conclusion

gooseR's code review and testing features transform you into a better R programmer. The context-aware feedback, automatic test generation, and documentation creation save time while improving code quality.

For more information about gooseR's capabilities, see the other vignettes in the package documentation.