`R/summaryfactorlist.R`

`summary_factorlist.Rd`

A function that takes a single dependent variable with a vector of explanatory variable names (continuous or categorical variables) to produce a summary table.

```
summary_factorlist(
.data,
dependent = NULL,
explanatory = NULL,
formula = NULL,
cont = "mean",
cont_nonpara = NULL,
cont_cut = 5,
cont_range = TRUE,
p = FALSE,
p_cont_para = "aov",
p_cat = "chisq",
column = TRUE,
total_col = FALSE,
orderbytotal = FALSE,
digits = c(1, 1, 3, 1, 0),
na_include = FALSE,
na_include_dependent = FALSE,
na_complete_cases = FALSE,
na_to_p = FALSE,
na_to_prop = TRUE,
fit_id = FALSE,
add_dependent_label = FALSE,
dependent_label_prefix = "Dependent: ",
dependent_label_suffix = "",
add_col_totals = FALSE,
include_col_totals_percent = TRUE,
col_totals_rowname = NULL,
col_totals_prefix = "",
add_row_totals = FALSE,
include_row_totals_percent = TRUE,
include_row_missing_col = TRUE,
row_totals_colname = "Total N",
row_missing_colname = "Missing N",
catTest = NULL,
weights = NULL
)
```

- .data
Dataframe.

- dependent
Character vector of length 1: name of dependent variable (2 to 5 factor levels).

- explanatory
Character vector of any length: name(s) of explanatory variables.

- formula
an object of class "formula" (or one that can be coerced to that class). Optional instead of standard dependent/explanatory format. Do not include if using dependent/explanatory.

- cont
Summary for continuous explanatory variables: "mean" (standard deviation) or "median" (interquartile range). If "median" then non-parametric hypothesis test performed (see below).

- cont_nonpara
Numeric vector of form e.g.

`c(1,2)`

. Specify which variables to perform non-parametric hypothesis tests on and summarise with "median".- cont_cut
Numeric: number of unique values in continuous variable at which to consider it a factor.

- cont_range
Logical. Median is show with 1st and 3rd quartiles.

- p
Logical: Include null hypothesis statistical test.

- p_cont_para
Character. Continuous variable parametric test. One of either "aov" (analysis of variance) or "t.test" for Welch two sample t-test. Note continuous non-parametric test is always Kruskal Wallis (kruskal.test) which in two-group setting is equivalent to Mann-Whitney U /Wilcoxon rank sum test.

For continous dependent and continuous explanatory, the parametric test p-value returned is for the Pearson correlation coefficient. The non-parametric equivalent is for the p-value for the Spearman correlation coefficient.

- p_cat
Character. Categorical variable test. One of either "chisq" or "fisher".

- column
Logical: Compute margins by column rather than row.

- total_col
Logical: include a total column summing across factor levels.

- orderbytotal
Logical: order final table by total column high to low.

- digits
Number of digits to round to (1) mean/median, (2) standard deviation / interquartile range, (3) p-value, (4) count percentage, (5) weighted count.

- na_include
Logical: make explanatory variables missing data explicit (

`NA`

).- na_include_dependent
Logical: make dependent variable missing data explicit.

- na_complete_cases
Logical: include only rows with complete data.

- na_to_p
Logical: include missing as group in statistical test.

- na_to_prop
Logical: include missing in calculation of column proportions.

- fit_id
Logical: allows merging via

`finalfit_merge`

.- add_dependent_label
Add the name of the dependent label to the top left of table.

- dependent_label_prefix
Add text before dependent label.

- dependent_label_suffix
Add text after dependent label.

- add_col_totals
Logical. Include column total n.

- include_col_totals_percent
Include column percentage of total.

- col_totals_rowname
Logical. Row name for column totals.

- col_totals_prefix
Character. Prefix to column totals, e.g. "N=".

- add_row_totals
Logical. Include row totals. Note this differs from

`total_col`

above particularly for continuous explanatory variables.- include_row_totals_percent
Include row percentage of total.

- include_row_missing_col
Logical. Include missing data total for each row. Only used when

`add_row_totals`

is`TRUE`

.- row_totals_colname
Character. Column name for row totals.

- row_missing_colname
Character. Column name for missing data totals for each row.

- catTest
Deprecated. See

`p_cat`

above.- weights
Character vector of length 1: name of column to use for weights. Explanatory continuous variables are multiplied by weights. Explanatory categorical variables are counted with a frequency weight (sum(weights)).

Returns a `factorlist`

dataframe.

This function aims to produce publication-ready summary tables for categorical or continuous dependent variables. It usually takes a categorical dependent variable to produce a cross table of counts and proportions expressed as percentages or summarised continuous explanatory variables. However, it will take a continuous dependent variable to produce mean (standard deviation) or median (interquartile range) for use with linear regression models.

`fit2df`

`ff_column_totals`

`ff_row_totals`

`ff_label`

`ff_glimpse`

`ff_percent_only`

. For lots of examples, see https://finalfit.org/

```
library(finalfit)
library(dplyr)
# Load example dataset, modified version of survival::colon
data(colon_s)
# Table 1 - Patient demographics ----
explanatory = c("age", "age.factor", "sex.factor", "obstruct.factor")
dependent = "perfor.factor"
colon_s %>%
summary_factorlist(dependent, explanatory, p=TRUE)
#> Warning: Chi-squared approximation may be incorrect
#> label levels No Yes p
#> Age (years) Mean (SD) 59.8 (11.9) 58.4 (13.3) 0.542
#> Age <40 years 68 (7.5) 2 (7.4) 1.000
#> 40-59 years 334 (37.0) 10 (37.0)
#> 60+ years 500 (55.4) 15 (55.6)
#> Sex Female 432 (47.9) 13 (48.1) 1.000
#> Male 470 (52.1) 14 (51.9)
#> Obstruction No 715 (81.2) 17 (63.0) 0.035
#> Yes 166 (18.8) 10 (37.0)
# summary.factorlist() is also commonly used to summarise any number of
# variables by an outcome variable (say dead yes/no).
# Table 2 - 5 yr mortality ----
explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor")
dependent = "mort_5yr"
colon_s %>%
summary_factorlist(dependent, explanatory)
#> Note: dependent includes missing data. These are dropped.
#> label levels Alive Died
#> Age <40 years 31 (6.1) 36 (8.9)
#> 40-59 years 208 (40.7) 131 (32.4)
#> 60+ years 272 (53.2) 237 (58.7)
#> Sex Female 243 (47.6) 194 (48.0)
#> Male 268 (52.4) 210 (52.0)
#> Obstruction No 408 (82.1) 312 (78.6)
#> Yes 89 (17.9) 85 (21.4)
#> Perforation No 497 (97.3) 391 (96.8)
#> Yes 14 (2.7) 13 (3.2)
```