A function that takes a single dependent variable with a vector of explanatory variable names (continuous or categorical variables) to produce a summary table.

summary_factorlist(.data, dependent = NULL, explanatory, cont = "mean",
  cont_cut = 5, p = FALSE, na_include = FALSE, column = FALSE,
  total_col = FALSE, orderbytotal = FALSE, fit_id = FALSE,
  na_to_missing = TRUE, add_dependent_label = FALSE,
  dependent_label_prefix = "Dependent: ", dependent_label_suffix = "",
  ...)

Arguments

.data

Dataframe.

dependent

Character vector of length 1: name of dependent variable (2 to 5 factor levels).

explanatory

Character vector of any length: name(s) of explanatory variables.

cont

Summary for continuous variables: "mean" (standard deviation) or "median" (interquartile range).

cont_cut

Numeric: number of unique values in continuous variable at which to consider it a factor.

p

Logical: Include statistical test (see summary.formula).

na_include

Logical: include missing data in summary (NA).

column

Logical: Compute margins by column rather than row.

total_col

Logical: include a total column summing across factor levels.

orderbytotal

Logical: order final table by total column high to low.

fit_id

Logical: not used directly, allows merging via finalfit_merge.

na_to_missing

Logical: convert NA to 'Missing' when na_include=TRUE.

add_dependent_label

Add the name of the dependent label to the top left of table.

dependent_label_prefix

Add text before dependent label.

dependent_label_suffix

Add text after dependent label.

...

Pass other arguments to summary.formula), e.g. catTest = catTestfisher.

Value

Returns a factorlist dataframe.

Details

This function is mostly a wrapper for Hmisc:::summary.formula(..., method = "reverse") but produces a publication-ready table the way we like them. It usually takes a categorical dependent variable (with two to five levels) to produce a cross table of counts and proportions expressed as percentages. However, it will take a continuous dependent variable to produce mean (standard deviation) or median (interquartile range) for use with linear regression models.

See also

Examples

library(finalfit) library(dplyr) # Load example dataset, modified version of survival::colon data(colon_s) # Table 1 - Patient demographics ---- explanatory = c("age", "age.factor", "sex.factor", "obstruct.factor") dependent = "perfor.factor" colon_s %>% summary_factorlist(dependent, explanatory, p=TRUE)
#> Warning: Chi-squared approximation may be incorrect
#> label levels No Yes p #> 1 Age (years) Mean (SD) 59.8 (11.9) 58.4 (13.3) 0.578 #> 2 Age <40 years 68 (97.1) 2 (2.9) 1.000 #> 3 40-59 years 334 (97.1) 10 (2.9) #> 4 60+ years 500 (97.1) 15 (2.9) #> 7 Sex Female 432 (97.1) 13 (2.9) 0.979 #> 8 Male 470 (97.1) 14 (2.9) #> 5 Obstruction No 715 (97.7) 17 (2.3) 0.018 #> 6 Yes 166 (94.3) 10 (5.7)
# summary.factorlist() is also commonly used to summarise any number of # variables by an outcome variable (say dead yes/no). # Table 2 - 5 yr mortality ---- explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "mort_5yr" colon_s %>% summary_factorlist(dependent, explanatory)
#> label levels Alive Died #> 1 Age <40 years 31 (46.3) 36 (53.7) #> 2 40-59 years 208 (61.4) 131 (38.6) #> 3 60+ years 272 (53.4) 237 (46.6) #> 8 Sex Female 243 (55.6) 194 (44.4) #> 9 Male 268 (56.1) 210 (43.9) #> 4 Obstruction No 408 (56.7) 312 (43.3) #> 5 Yes 89 (51.1) 85 (48.9) #> 6 Perforation No 497 (56.0) 391 (44.0) #> 7 Yes 14 (51.9) 13 (48.1)