Changes requested during the initial CRAN review:
normality_tests() no longer touches the global
random-number state. Large columns are now reduced with a deterministic,
evenly-spaced subsample instead of set.seed() +
sample(); the seed argument has been
removed.New analysis and reporting:
report() renders a complete profile to a self-contained
HTML file (requires pandoc, via ).categorical_association() and
plot_association() add Cramer’s V between categorical
columns (the categorical analogue of the correlation matrix).analyze_dates() profiles date/datetime columns: range,
unique count, and the largest gap between consecutive timestamps.compare_groups() summarises numeric columns within the
levels of a grouping column (grouped/comparative profiling).Pipeline changes:
profile_data() gains group_by (adds a
grouped comparison to the diagnostics) and distributions
(set FALSE to skip the eager per-column distribution plots
on wide data). Association and date results are now part of the returned
object, and plot() accepts
which = "association".summary() now also prints date, association and
grouped-comparison sections when present.profile_data() with type inference,
missing-value analysis, summary statistics (incl. skewness/kurtosis),
normality tests, outlier detection (IQR/z-score/robust), correlation
analysis, a data-quality score, and ggplot2 figures,
returned as a data_profile S3 object with
print(), summary() and plot()
methods.