R/summaryfactorlist.R
summary_factorlist.Rd
A function that takes a single dependent variable with a vector of explanatory variable names (continuous or categorical variables) to produce a summary table.
summary_factorlist(
.data,
dependent = NULL,
explanatory = NULL,
formula = NULL,
cont = "mean",
cont_nonpara = NULL,
cont_cut = 5,
cont_range = TRUE,
p = FALSE,
p_cont_para = "aov",
p_cat = "chisq",
column = TRUE,
total_col = FALSE,
orderbytotal = FALSE,
digits = c(1, 1, 3, 1, 0),
na_include = FALSE,
na_include_dependent = FALSE,
na_complete_cases = FALSE,
na_to_p = FALSE,
na_to_prop = TRUE,
fit_id = FALSE,
add_dependent_label = FALSE,
dependent_label_prefix = "Dependent: ",
dependent_label_suffix = "",
add_col_totals = FALSE,
include_col_totals_percent = TRUE,
col_totals_rowname = NULL,
col_totals_prefix = "",
add_row_totals = FALSE,
include_row_totals_percent = TRUE,
include_row_missing_col = TRUE,
row_totals_colname = "Total N",
row_missing_colname = "Missing N",
catTest = NULL,
weights = NULL
)
Dataframe.
Character vector of length 1: name of dependent variable (2 to 5 factor levels).
Character vector of any length: name(s) of explanatory variables.
an object of class "formula" (or one that can be coerced to that class). Optional instead of standard dependent/explanatory format. Do not include if using dependent/explanatory.
Summary for continuous explanatory variables: "mean" (standard deviation) or "median" (interquartile range). If "median" then non-parametric hypothesis test performed (see below).
Numeric vector of form e.g. c(1,2)
. Specify which
variables to perform non-parametric hypothesis tests on and summarise with
"median".
Numeric: number of unique values in continuous variable at which to consider it a factor.
Logical. Median is show with 1st and 3rd quartiles.
Logical: Include null hypothesis statistical test.
Character. Continuous variable parametric test. One of either "aov" (analysis of variance) or "t.test" for Welch two sample t-test. Note continuous non-parametric test is always Kruskal Wallis (kruskal.test) which in two-group setting is equivalent to Mann-Whitney U /Wilcoxon rank sum test.
For continous dependent and continuous explanatory, the parametric test p-value returned is for the Pearson correlation coefficient. The non-parametric equivalent is for the p-value for the Spearman correlation coefficient.
Character. Categorical variable test. One of either "chisq" or "fisher".
Logical: Compute margins by column rather than row.
Logical: include a total column summing across factor levels.
Logical: order final table by total column high to low.
Number of digits to round to (1) mean/median, (2) standard deviation / interquartile range, (3) p-value, (4) count percentage, (5) weighted count.
Logical: make explanatory variables missing data explicit
(NA
).
Logical: make dependent variable missing data explicit.
Logical: include only rows with complete data.
Logical: include missing as group in statistical test.
Logical: include missing in calculation of column proportions.
Logical: allows merging via finalfit_merge
.
Add the name of the dependent label to the top left of table.
Add text before dependent label.
Add text after dependent label.
Logical. Include column total n.
Include column percentage of total.
Logical. Row name for column totals.
Character. Prefix to column totals, e.g. "N=".
Logical. Include row totals. Note this differs from
total_col
above particularly for continuous explanatory variables.
Include row percentage of total.
Logical. Include missing data total for each
row. Only used when add_row_totals
is TRUE
.
Character. Column name for row totals.
Character. Column name for missing data totals for each row.
Deprecated. See p_cat
above.
Character vector of length 1: name of column to use for weights. Explanatory continuous variables are multiplied by weights. Explanatory categorical variables are counted with a frequency weight (sum(weights)).
Returns a factorlist
dataframe.
This function aims to produce publication-ready summary tables for categorical or continuous dependent variables. It usually takes a categorical dependent variable to produce a cross table of counts and proportions expressed as percentages or summarised continuous explanatory variables. However, it will take a continuous dependent variable to produce mean (standard deviation) or median (interquartile range) for use with linear regression models.
fit2df
ff_column_totals
ff_row_totals
ff_label
ff_glimpse
ff_percent_only
. For lots of examples, see https://finalfit.org/
library(finalfit)
library(dplyr)
# Load example dataset, modified version of survival::colon
data(colon_s)
# Table 1 - Patient demographics ----
explanatory = c("age", "age.factor", "sex.factor", "obstruct.factor")
dependent = "perfor.factor"
colon_s %>%
summary_factorlist(dependent, explanatory, p=TRUE)
#> Warning: There was 1 warning in `dplyr::summarise()`.
#> ℹ In argument: `chisq.test(age.factor, perfor.factor)$p.value`.
#> Caused by warning in `chisq.test()`:
#> ! Chi-squared approximation may be incorrect
#> label levels No Yes p
#> Age (years) Mean (SD) 59.8 (11.9) 58.4 (13.3) 0.542
#> Age <40 years 68 (7.5) 2 (7.4) 1.000
#> 40-59 years 334 (37.0) 10 (37.0)
#> 60+ years 500 (55.4) 15 (55.6)
#> Sex Female 432 (47.9) 13 (48.1) 1.000
#> Male 470 (52.1) 14 (51.9)
#> Obstruction No 715 (81.2) 17 (63.0) 0.035
#> Yes 166 (18.8) 10 (37.0)
# summary.factorlist() is also commonly used to summarise any number of
# variables by an outcome variable (say dead yes/no).
# Table 2 - 5 yr mortality ----
explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor")
dependent = "mort_5yr"
colon_s %>%
summary_factorlist(dependent, explanatory)
#> Note: dependent includes missing data. These are dropped.
#> label levels Alive Died
#> Age <40 years 31 (6.1) 36 (8.9)
#> 40-59 years 208 (40.7) 131 (32.4)
#> 60+ years 272 (53.2) 237 (58.7)
#> Sex Female 243 (47.6) 194 (48.0)
#> Male 268 (52.4) 210 (52.0)
#> Obstruction No 408 (82.1) 312 (78.6)
#> Yes 89 (17.9) 85 (21.4)
#> Perforation No 497 (97.3) 391 (96.8)
#> Yes 14 (2.7) 13 (3.2)