Upsilon test by example

Xuye Luo and Mingzhou Song, New Mexico State University

December 20, 2025

Here, we illustrate how the Upsilon test promote dominant function patterns among categorical variables, while most other tests of association favor all function patterns. This property makes the Upsilon test favor robust association patterns.

We explain both differences and similarities between the Upsilon test and four other tests of association on the same contingency tables. They include three long established tests: Pearson’s chi-squared test (Pearson, 1900), Woolf’s G-Test (Woolf, 1957), and Fisher’s exact test (Fisher, 1935), and the recent U-statistic permutation (USP) test (Berrett et al., 2020, 2021).

The Upsilon test runs as fast as Pearson’s chi-squared test and G-Test. Fisher’s exact test is slow due to table enumeration. The USP test may consume many more CPU cycles than others due to the number of permutations required for p-value precision.

Promoting dominant function patterns

The Upsilon test promotes non-constant functions that dominate counts in a contingency table, while most other methods promote all mathematical functions.

Here, a contingency table is formed by observed frequencies of two categorical variables: one as the row index and the other as the column index.

Table 1 contains a perfect function pattern tested significant by all methods at \(\alpha=0.05\). The non-constant function pattern is covered by the entire 3 \(\times\) 3 table, thus dominating the table.

Table 2 also contains a perfect function pattern spanning the entire table. However, the table is dominated by the single entry of 16, which can be considered a constant function. The p-value by the Upsilon test is 0.4 demoting the table, while in sharp contrast all other tests call the table significant at much smaller p-values.

In the p-value bar plots hereafter, the \(y\)-axis is \(-\log_{10}\) p-value, with the original p-values printed at the top of bar.

Demoting non-dominant function patterns

Table 3 contains a function pattern dominant in the top-left 2 \(\times\) 2 sub-table. All tests declared this table significant. The function is not perfect but strong, where the entries containing 1 can be considered noise.

Table 4 contains the same 2 \(\times\) 2 sub-table which is no longer dominant. The bottom row of the table dominates the count and presents a constant function pattern. The Upsilon test gave an insignificant p-value of 0.4, while all other tests returned substantially lower p-values calling the pattern significant.

Robust to change in low expected count

Tables 5 and 6 are dominated by the 1 \(\times\) 2 sub-table on the top-left. Both represent constant function patterns and are declared to be insignificant by the Upsilon test at similar p-values close to 1.0. The USP test also declared them insignificant. However, the tables differ in the last column with the entry at the bottom-right corner having low expected counts (given the marginals) of 2/63 and 3/63, respectively. Despite the two tables being highly similar except for the sparse last columns, the remaining tests consider them differ dramatically with Table 5 being insignificant and Table 6 being significant.

This demonstrates that the Upsilon test is not swayed by the instability of small numbers; a tiny shift in count is insufficient to turn a pattern from non-function to function and vice versa.

Lung transplant surgery type and outcome

Table 8 is from a clinical study of lung transplant surgeries (Jung et al., 2003) . Columns represent two surgery options A and B; rows represent four possible outcomes from grade G0 to G3. The p-value by the Upsilon test is 0.12 ( > 0.05), giving an insignificant result, the same as the original study. The USP test gave p-values ranging from 0.05 to 0.10 run-to-run. However, all remaining tests returned p-values smaller than 0.05, contrary to expert intuition in the original study.

Table 8. Lung transplant data
Surgery
A B




Grade
G0 6 0
G1 8 12
G2 8 15
G3 2 1

A contingency table showing similar testing results

Table 7 presents cross classification of party affiliation by gender from the 2018 General Social Survey (Kim & Eom, 2022). All tests declared a significant association between gender and party affiliation, with the Upsilon test result being the most significant.

Table 7. Party affiliation by gender from 2018 General Social Survey
Democrat Independent Republican
Female 359 133 234
Male 257 96 253

How to cite this document

Luo, Xuye, & Song, Mingzhou. (2025). Upsilon test by example. Vignettes, Upsilon: Another Test of Association for Count Data. R package.

References

Berrett, T. B., Kontoyiannis, I., & Samworth, R. J. (2020). USP: U-statistic permutation tests of independence for all data types. The R Foundation. https://doi.org/10.32614/cran.package.usp
Berrett, T. B., Kontoyiannis, I., & Samworth, R. J. (2021). Optimal rates for independence testing via U-statistic permutation tests. The Annals of Statistics, 49(5), 2457–2490. https://doi.org/10.1214/20-aos2041
Fisher, R. A. (1935). The logic of inductive inference. Journal of the Royal Statistical Society, 98(1), 39–82. https://doi.org/10.2307/2342435
Jung, S.-H., Kang, S.-H., & Ahn, C. (2003). Chi-Square Test for R×C Contingency Tables with Clustered Data. Journal of Biopharmaceutical Statistics, 13(2), 241–251. https://doi.org/10.1081/bip-120019269
Kim, S.-H., & Eom, H. J. (2022). Power and sample size for contingency tables. American Journal of Educational Research and Reviews, 7, 89. https://doi.org/10.28933/ajerr-2021-09-2608sk
Pearson, K. (1900). X. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 50(302), 157–175. https://doi.org/10.1080/14786440009463897
Woolf, B. (1957). The log likelihood ratio test (the G-test); methods and tables for tests of heterogeneity in contingency tables. Annals of Human Genetics, 21(4), 397–409. https://doi.org/10.1111/j.1469-1809.1972.tb00293.x