R/summaryfactorlist.R
summary_factorlist.Rd
A function that takes a single dependent variable with a vector of explanatory variable names (continuous or categorical variables) to produce a summary table.
summary_factorlist(.data, dependent = NULL, explanatory, cont = "mean", cont_cut = 5, p = FALSE, na_include = FALSE, column = FALSE, total_col = FALSE, orderbytotal = FALSE, fit_id = FALSE, na_to_missing = TRUE, add_dependent_label = FALSE, dependent_label_prefix = "Dependent: ", dependent_label_suffix = "")
.data  Dataframe. 

dependent  Character vector of length 1: name of dependent variable (2 to 5 factor levels). 
explanatory  Character vector of any length: name(s) of explanatory variables. 
cont  Summary for continuous variables: "mean" (standard deviation) or "median" (interquartile range). 
cont_cut  Numeric: number of unique values in continuous variable at which to consider it a factor. 
p  Logical: Include statistical test (see

na_include  Logical: include missing data in summary ( 
column  Logical: Compute margins by column rather than row. 
total_col  Logical: include a total column summing across factor levels. 
orderbytotal  Logical: order final table by total column high to low. 
fit_id  Logical: not used directly, allows merging via

na_to_missing  Logical: convert 
add_dependent_label  Add the name of the dependent label to the top left of table 
dependent_label_prefix  Add text before dependent label 
dependent_label_suffix  Add text after dependent label 
Returns a factorlist
dataframe.
This function is mostly a wrapper for Hmisc:::summary.formula(...,
method = "reverse")
but produces a publicationready table the way we like
them. It usually takes a categorical dependent variable (with two to five
levels) to produce a cross table of counts and proportions expressed as
percentages. However, it will take a continuous dependent variable to produce
mean (standard deviation) or median (interquartile range) for use with linear
regression models.
library(finalfit) library(dplyr) # Load example dataset, modified version of survival::colon data(colon_s) # Table 1  Patient demographics  explanatory = c("age", "age.factor", "sex.factor", "obstruct.factor") dependent = "perfor.factor" colon_s %>% summary_factorlist(dependent, explanatory, p=TRUE)#> Warning: Chisquared approximation may be incorrect#> label levels No Yes p #> 1 Age (years) Mean (SD) 59.8 (11.9) 58.4 (13.3) 0.578 #> 2 Age <40 years 68 (97.1) 2 (2.9) 1.000 #> 3 4059 years 334 (97.1) 10 (2.9) #> 4 60+ years 500 (97.1) 15 (2.9) #> 7 Sex Female 432 (97.1) 13 (2.9) 0.979 #> 8 Male 470 (97.1) 14 (2.9) #> 5 Obstruction No 715 (97.7) 17 (2.3) 0.018 #> 6 Yes 166 (94.3) 10 (5.7)# summary.factorlist() is also commonly used to summarise any number of # variables by an outcome variable (say dead yes/no). # Table 2  5 yr mortality  explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "mort_5yr" colon_s %>% summary_factorlist(dependent, explanatory)#> label levels Alive Died #> 1 Age <40 years 31 (46.3) 36 (53.7) #> 2 4059 years 208 (61.4) 131 (38.6) #> 3 60+ years 272 (53.4) 237 (46.6) #> 8 Sex Female 243 (55.6) 194 (44.4) #> 9 Male 268 (56.1) 210 (43.9) #> 4 Obstruction No 408 (56.7) 312 (43.3) #> 5 Yes 89 (51.1) 85 (48.9) #> 6 Perforation No 497 (56.0) 391 (44.0) #> 7 Yes 14 (51.9) 13 (48.1)