Cohen's d effect sizes. — Mosaic plot — a cool way to visualize data from a crosstable.

If you are interested in some unsolicited advice about choosing tests and/or a walk-through of this function, keep reading. Otherwise, feel free to skip to the full function at the bottom.

Why You Should Always Use Fisher’s Exact Test Instead

Most students in the sciences learn about the \(\chi^{2}\) test of independence in their introductory statistics course. The idea is that if binary variables x and y are unrelated, then the observed values and expected values will be identical, in which cases \(\chi^{2}\) is equal to zero. The expected values are just the counts of each combination of x and y we would expect if x and y were unrelated in the population, while the observed counts are the counts we actually have.

The \(\chi^{2}\) value, then, is a function of the difference between the expected values and the observed values (the values in from the sample). The greater the difference between expected and observed values, the lower the probability that the difference between them (or a larger difference) is due to chance, assuming that there is actually no difference at the population level. Thus all the \(\chi^{2}\) “test” really means is that we check to see whether the \(p\)-value from the \(\chi^{2}\) we get is less than alpha, which is usually a cutoff \(p\)-value of .05.

One assumption of the \(\chi^{2}\) test violated fairly often is that no expected values are less than 5. This is sort of an arbitrary cutoff - the larger the expected counts, the more accurate the \(\chi^{2}\) value will be (in general). Unlike the \(\chi^{2}\) test, the Fisher’s exact test is accurate even in the presence of expected values less than 5. In cases where the expected values are all well above 5, the \(p\)-values from the tests will be close, if not identical. There isn’t really a drawback to using Fisher’s exact test, so it makes sense to just always use Fisher’s exact test. This argument is analogous to the argument that we should always use Welch’s instead of Student’s \(t\)-test by default .

As an illustration, let’s compare the results of a \(\chi^{2}\) test and Fisher’s exact test. We’re interested in seeing whether transmission type (am, 0 = automatic, 1 = manual) and engine shape (vs, 0 = V-shaped, 1 = straight) are related. These date come from the mtcars dataset, which is built in to R. Here are the observed and expected values:

Table 1: Crosstabulation of am and vs Variables

	Observed		Expected
	vs = 0	vs = 1	vs = 0	vs = 1
am = 0	12	7	10.6875	8.3125
am = 1	6	7	7.3125	5.6875

In R, we can use the built-in stats package to get a \(\chi^{2}\) test, which prints the following output:

Code

stats::chisq.test(x = mtcars$am, y = mtcars$vs)


    Pearson's Chi-squared test with Yates' continuity correction

data:  mtcars$am and mtcars$vs
X-squared = 0.34754, df = 1, p-value = 0.5555

We can also use the stats package to get a Fisher’s exact test. Here is what the exact test gives us:


    Fisher's Exact Test for Count Data

data:  mtcars$am and mtcars$vs
p-value = 0.4727
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
  0.3825342 10.5916087
sample estimates:
odds ratio 
  1.956055

Notice that the \(p\)-values are a bit different. By definition, the Fisher’s exact test is more accurate - it’s an “exact” test, while the \(\chi^{2}\) is an approximation test. In other words, the \(\chi^{2}\) test merely aims to be “close enough”, because back in the day, people had to do these tests by hand, and getting the exact answers would be nearly impossible by hand. Today, computers can solve for these answers for us. Assuming adequate sample size, the \(\chi^2\) test is usually fine, though it will become less accurate when sample sizes become smaller. Not so with Fisher’s exact test.

A Function for Tidy Crosstabulations and Fisher’s Exact Tests

The stats package, which comes preloaded in RStudio, provides chi.sq() and fisher.test() functions. Personally I don’t like the format of the output. I’ve written a function that will conduct Fisher’s exact test and provide output including the odds ratio, \(p\)-value, and confidence intervals in tidy wide or long output, along with pre-formatted cross-tabulation tables with counts, percentages, and totals. I usually want to look at these things to help put my Fisher test results in context. I wrote a function which I’ll call my.fisher() that gives me some crucial information in the output, ready to be looked at without further effort. Here’s the first Fisher’s exact test, but this time using my.fisher():

Code

fisher.results <- my.fisher(mtcars, am, vs)
# Just some formatting for the blog :)
fisher.results$crosstab.all %>% 
  as.data.frame.matrix() %>% 
  rename(am = V1, "vs = 0" = vs,  "vs = 1" = V3,  Total = V4) %>% 
  slice(-1) %>% 
  huxtable_print() %>% 
  set_align(col = 2:ncol(.), value = "center")

Table 2: Observed Percentages, Row Totals, Column Totals, and Overall Totals

am	vs = 0	vs = 1	Total
0	37.50% (12)	21.88% (7)	59.38% (19)
1	18.75% (6)	21.88% (7)	40.62% (13)
Total	56.25% (18)	43.75% (14)	100.00% (32)

Code

fisher.results$fish.test %>% 
  huxtable_print()

Table 3: Fisher’s Exact Test Output

Odds Ratio	p	CI_Lower	CI_Higher	Test	Direction
1.96	0.473	0.38	10.59	Fisher's Exact Test for Count Data	two.sided

Ah, that’s easier on the eyes. We can also get that in long format too:

Code

fisher.results <- my.fisher(mtcars, am, vs)
fisher.results$fish.test.long %>% 
  huxtable_print()

Table 4: Fisher’s Exact Test Output in Long Format

Statistic	Value
Odds Ratio	1.96
p	0.473
CI_Lower	0.38
CI_Higher	10.59
Test	Fisher's Exact Test for Count Data
Direction	two.sided

You can also pass arguments to janitor::fisher.test(). For example, say you wanted to do a leff-sided single-tailed test. Just add alternative = "less" to my.fisher() (note that the default is a two-sided test):

Code

fisher.results <- my.fisher(mtcars, am, vs, alternative = "less")
fisher.results$fish.test %>%
  huxtable_print()

Table 5: Fisher’s Exact Test With One-Sided Test

Odds Ratio	p	CI_Lower	CI_Higher	Test	Direction
1.96	0.906	0.00	8.34	Fisher's Exact Test for Count Data	less

Notice the \(p\)-value is smaller; that is because one-tailed tests are more powerful than two-tailed tests, assuming there is a real relationship at the population level, specifically in the direction of the test.

An added benefit of Fisher’s exact test is that unlike \(\chi^{2}\), it can give you a \(p\)-value in cases where x, y, or both have greater than 2 categories (resulting in a 2 \(\times\) 3 table, for example). This function can also accommodate nominal variables with greater than 2 categories. For example, a 2 \(\times\) 3 table between transmission type and number of cylinders (cyl, 4, 6, or 8):

Code

fisher.results <- my.fisher(mtcars, am, cyl)
fisher.results$fish.test %>% 
  huxtable_print()

Table 6: Fisher’s Exact Test, Transmission Type and Number of Cylinders

p.value	method	alternative
0.009	Fisher's Exact Test for Count Data	two.sided

The odds ratio left out of the output in these cases, because it is not applicable to exact tests one or more variables has greater than 2 categories. In this case, you’d want to look at a crosstabulation of x and y to contextualize the \(p\)-value. This function provides them for you in a nice format with the help of the janitor package.

Getting Crosstabs

Code

fisher.results$crosstab.all %>% 
  # formatting for blog :)
  huxtable_print(add_colnames = FALSE) %>% 
  insert_row(after = 0, NA, rep("cyl", 3), NA) %>% 
  merge_cells(row = 1, col = 2:4) %>% 
  set_align(row = 1, col = 2:4, value = "center") %>% 
  set_bottom_border(row = 1, col = 3)

Table 7: Crosstab With Sample Total As Percentage Denominator

	cyl
am	4	6	8	Total
0	9.38% (3)	12.50% (4)	37.50% (12)	59.38% (19)
1	25.00% (8)	9.38% (3)	6.25% (2)	40.62% (13)
Total	34.38% (11)	21.88% (7)	43.75% (14)	100.00% (32)

Table 7 shows the cross-tabulation between transmission type and number of cylinders, with percentages and counts (the latter in parentheses). You also get row and column totals, along with the sample total in the right-most bottom corner. You also get two other cross-tabulation, one with percentages calculated relative the row total; the other with percentages calculated relative to the column total. Here’s an example with row totals:

Code

fisher.results$crosstab.row %>% 
  # formatting for blog :)
  huxtable_print(add_colnames = FALSE) %>% 
  insert_row(after = 0, NA, rep("cyl", 3), NA) %>% 
  merge_cells(row = 1, col = 2:4) %>% 
  set_align(row = 1, col = 2:4, value = "center") %>% 
  set_bottom_border(row = 1, col = 3)

Table 8: Crosstab With Row Total As Denominator For Percentage

	cyl
am	4	6	8	Total
0	15.79% (3)	21.05% (4)	63.16% (12)	100.00% (19)
1	61.54% (8)	23.08% (3)	15.38% (2)	100.00% (13)
Total	34.38% (11)	21.88% (7)	43.75% (14)	100.00% (32)

Ah - with the row-wise percentages, you can see that the likely driver of statistical significance here is the inverse relationship between transmission type and cylinder. More than half of automatics have 8 cylinders, while more than half of manuals have only 4 cylinders. Looks like an interaction effect!

Full Function

Copy-and-paste this function to use it in your own R projects. You will be prompted to install the tidyverse broom, and janitor packages if you do not already have them. Click the Code drop-down below to show the full function:

Code

library(tidyverse)
library(janitor)
library(broom)
my.fisher <-
  # ... allows you to pass arguments to functions where ... are present
  function(data, ..., alternative = "two.sided") {
    # load required packages
    require(janitor)
    require(dplyr)
    require(broom)
    # make.pretty is a function-within-a-function that
    # adds totals and formatting using janitor package
    make.pretty <- function(tab, ...){
      tab %>%
        janitor::adorn_totals(where = c("row", "col")) %>%
        janitor::adorn_percentages(...) %>%
        janitor::adorn_pct_formatting(digits = 2) %>%
        janitor::adorn_ns() %>%
        janitor::adorn_title()
    }
    # this gets the n x n crosstab
    my.tab <- janitor::tabyl(
      data,
      ...,
      show_na = FALSE,
      show_missing_levels = FALSE)
    # use make.pretty to add totals and formatting
    my.tab.all <- make.pretty(tab = my.tab, denominator = "all")
    my.tab.row <- make.pretty(tab = my.tab, denominator = "row")
    my.tab.col <- make.pretty(tab = my.tab, denominator = "col")
    # get the fisher's exact test
    # can pass arguments to stats::fisher.test via ...
    # this determines which type of test to run, depending
    # on alternative = argument
    # default is "two.sided"
    if(alternative == "greater"){
      fish <- my.tab %>%
        janitor::fisher.test(., alternative = "greater")
    } else {
      if (alternative == "less"){
        fish <- my.tab %>%
          janitor::fisher.test(., alternative = "less")
      }
      fish <- my.tab %>%
        janitor::fisher.test(., alternative = "two.sided")
    }
    fish <- fish %>%
      # organizes output into a table
      broom::tidy() %>%
      mutate(p.value = scales::pvalue(p.value)) %>% 
      # rounds values
      dplyr::mutate_if(is.numeric, round, 2) %>%
      # makes them into character format with exactly 2 decimal points
      # so that numeric and character columns can be combined (for long format)
      dplyr::mutate_if(is.numeric, format, nsmall = 2)
    # if estimate column is present (in case of 2x2's), rename as "odds.ratio"
    if ("estimate" %in% names(fish)) {
      fish <- fish %>% 
        dplyr::rename(
          "Odds Ratio" = estimate,
          p = p.value, 
          CI_Lower = conf.low,
          CI_Higher = conf.high,
          Test = method,
          Direction = alternative
        )
      fish.long <- fish %>% 
        # reshape to long format
        tidyr::pivot_longer(
          cols = everything(), 
          # name of column that gets the row values
          values_to = "Value", 
          # name of column that gets the column names
          names_to = "Statistic")
    }
    # this is the same thing as above, except this gets done when
    # estimate column is not present
    fish.long <- fish %>% 
      tidyr::pivot_longer(
        cols = everything(), 
        values_to = "Value", 
        names_to = "Statistic")
    # what does the function give as output?
    # left side of = is what you'll see with $
    # right side is object saved somewhere above
    return(list(
      crosstab.all = my.tab.all,
      crosstab.row = my.tab.row,
      crosstab.column = my.tab.col,
      fish.test = fish, 
      fish.test.long = fish.long))
  }

Citation

BibTeX citation:

@online{e._vanaman2021,
  author = {E. Vanaman, Matthew},
  title = {Tidy {Fisher’s} {Exact} {Test} {In} {R} {(And} {Why} {To}
    {Always} {Use} {It)}},
  date = {2021-02-06},
  url = {https://matthewvanaman.com/blog/tidy-fisher-test/},
  langid = {en}
}

For attribution, please cite this work as:

E. Vanaman, M. (2021, February 6). Tidy Fisher’s Exact Test In R (And Why To Always Use It).