R/summaryfactorlist.R
summary_factorlist.Rd
A function that takes a single dependent variable with a vector of explanatory variable names (continuous or categorical variables) to produce a summary table.
summary_factorlist( .data, dependent = NULL, explanatory, cont = "mean", cont_nonpara = NULL, cont_cut = 5, cont_range = TRUE, p = FALSE, p_cont_para = "aov", p_cat = "chisq", column = TRUE, total_col = FALSE, orderbytotal = FALSE, digits = c(1, 1, 3, 1), na_include = FALSE, na_include_dependent = FALSE, na_complete_cases = FALSE, na_to_p = FALSE, fit_id = FALSE, add_dependent_label = FALSE, dependent_label_prefix = "Dependent: ", dependent_label_suffix = "", add_col_totals = FALSE, include_col_totals_percent = TRUE, col_totals_rowname = NULL, col_totals_prefix = "", add_row_totals = FALSE, include_row_totals_percent = TRUE, include_row_missing_col = TRUE, row_totals_colname = "Total N", row_missing_colname = "Missing N", catTest = NULL )
.data  Dataframe. 

dependent  Character vector of length 1: name of dependent variable (2 to 5 factor levels). 
explanatory  Character vector of any length: name(s) of explanatory variables. 
cont  Summary for continuous explanatory variables: "mean" (standard deviation) or "median" (interquartile range). If "median" then nonparametric hypothesis test performed (see below). 
cont_nonpara  Numeric vector of form e.g. 
cont_cut  Numeric: number of unique values in continuous variable at which to consider it a factor. 
cont_range  Logical. Median is show with 1st and 3rd quartiles. 
p  Logical: Include null hypothesis statistical test. 
p_cont_para  Character. Continuous variable parametric test. One of either "aov" (analysis of variance) or "t.test" for Welch two sample ttest. Note continuous nonparametric test is always Kruskal Wallis (kruskal.test) which in twogroup setting is equivalent to MannWhitney U /Wilcoxon rank sum test. For continous dependent and continuous explanatory, the parametric test pvalue returned is for the Pearson correlation coefficient. The nonparametric equivalent is for the pvalue for the Spearman correlation coefficient. 
p_cat  Character. Categorical variable test. One of either "chisq" or "fisher". 
column  Logical: Compute margins by column rather than row. 
total_col  Logical: include a total column summing across factor levels. 
orderbytotal  Logical: order final table by total column high to low. 
digits  Number of digits to round to (1) mean/median, (2) standard deviation / interquartile range, (3) pvalue, (4) count percentage. 
na_include  Logical: make explanatory variables missing data explicit
( 
na_include_dependent  Logical: make dependent variable missing data explicit. 
na_complete_cases  Logical: include only rows with complete data. 
na_to_p  Logical: include missing as group in statistical test. 
fit_id  Logical: allows merging via 
add_dependent_label  Add the name of the dependent label to the top left of table. 
dependent_label_prefix  Add text before dependent label. 
dependent_label_suffix  Add text after dependent label. 
add_col_totals  Logical. Include column total n. 
include_col_totals_percent  Include column percentage of total. 
col_totals_rowname  Logical. Row name for column totals. 
col_totals_prefix  Character. Prefix to column totals, e.g. "N=". 
add_row_totals  Logical. Include row totals. Note this differs from

include_row_totals_percent  Include row percentage of total. 
include_row_missing_col  Logical. Include missing data total for each
row. Only used when 
row_totals_colname  Character. Column name for row totals. 
row_missing_colname  Character. Column name for missing data totals for each row. 
catTest  Deprecated. See 
Returns a factorlist
dataframe.
This function aims to produce publicationready summary tables for categorical or continuous dependent variables. It usually takes a categorical dependent variable to produce a cross table of counts and proportions expressed as percentages or summarised continuous explanatory variables. However, it will take a continuous dependent variable to produce mean (standard deviation) or median (interquartile range) for use with linear regression models.
library(finalfit) library(dplyr) # Load example dataset, modified version of survival::colon data(colon_s) # Table 1  Patient demographics  explanatory = c("age", "age.factor", "sex.factor", "obstruct.factor") dependent = "perfor.factor" colon_s %>% summary_factorlist(dependent, explanatory, p=TRUE)#> Warning: Chisquared approximation may be incorrect#> label levels No Yes p #> Age (years) Mean (SD) 59.8 (11.9) 58.4 (13.3) 0.542 #> Age <40 years 68 (7.5) 2 (7.4) 1.000 #> 4059 years 334 (37.0) 10 (37.0) #> 60+ years 500 (55.4) 15 (55.6) #> Sex Female 432 (47.9) 13 (48.1) 1.000 #> Male 470 (52.1) 14 (51.9) #> Obstruction No 715 (81.2) 17 (63.0) 0.035 #> Yes 166 (18.8) 10 (37.0)# summary.factorlist() is also commonly used to summarise any number of # variables by an outcome variable (say dead yes/no). # Table 2  5 yr mortality  explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "mort_5yr" colon_s %>% summary_factorlist(dependent, explanatory)#>#> label levels Alive Died #> Age <40 years 31 (6.1) 36 (8.9) #> 4059 years 208 (40.7) 131 (32.4) #> 60+ years 272 (53.2) 237 (58.7) #> Sex Female 243 (47.6) 194 (48.0) #> Male 268 (52.4) 210 (52.0) #> Obstruction No 408 (82.1) 312 (78.6) #> Yes 89 (17.9) 85 (21.4) #> Perforation No 497 (97.3) 391 (96.8) #> Yes 14 (2.7) 13 (3.2)