Lists and Inline Reporting

how-to
R Markdown

Reporting values inline is one of the strong suits of R Markdown. Using lists can make storing variables more organized, and with the right functions lists can be generated automatically. This makes writing with inline values in R Markdown easier.

Author

Nathan Craig

Published

March 13, 2022

Introduction

With great interest I read Tristan Mahr’s post on lists as a secret weapon for reporting values inline using R Markdown with knitr. I find the ability to report values inline is one of R Markdown’s strengths. Adopting Tristan’s approach to using lists for reporting inline has helped improve my craft so I’m writing up my take on what I found useful.

The Problems

When working on longer documents, ones with several sections or chapters, I wind up with a very full (i.e. cluttered) R environment. When things get complicated, I like having analysis in a separate document. Then results are loaded into memory. That way I have access to all of the values and tables for the project regardless of what file I have open. In such a situation, there might be 10-15 data frames, each representing specific aspects of analysis. Particularly if I’m wanting to report inline values in the text (which is a big part of the point of using R Markdown), I often wind up with easily 30 variables. Naming all these things is tricky.

In an effort to make writing easier, I try to name things very carefully using names that go from general to specific. The aim of this is to have names that will group together so that they: sort together in the Environment tab of R Studio, and can be accessed with auto-completion.

Doing things this way, the variable names can get very long. Often projects start to feel both brittle and clumsy as they get larger and larger; keeping track of things starts to get difficult. Finding a better way of grouping variables, so that they are associated and easy to call, and so they are created with less code, seems really useful— not only for reporting values inline— but generally. Tristan’s approach to using lists can address some of both issues of writing thrifty code and organizing with lists simultaneously. Turns out, using lists is just good R style.

Tip

If you find yourself attempting to cram data into variable names (e.g. model_2018, model_2019, model_2020), consider using a list or data frame instead. — The Tidyverse Style Guide

Organizing Summary Variables in a List

Manual Approach

A list can be created manually and the list items named.

weight <- list(
  n = length(mtcars$wt),
  min = min(mtcars$wt),
  max = max(mtcars$wt),
  mean = mean(mtcars$wt),
  sd = sd(mtcars$wt),
  mode = mode(mtcars$wt)
)
str(weight)
List of 6
 $ n   : int 32
 $ min : num 1.51
 $ max : num 5.42
 $ mean: num 3.22
 $ sd  : num 0.978
 $ mode: chr "numeric"

Now the values are associated with weight, contained in the weight list, can be accessed with the $, and are readable in the Markdown source. For example, the mean weight of vehicles in mtcars is 3.2 (SD = 0.98). To me, this is better than having variables like weight_mean and weight_sd, which is what I was doing. Now both values are listed under weight. This is an improvement, but it gets better.

Tip

Both [[ and $ select a single element of the list. The main difference is that $ does not allow computed indices, whereas [[ does. x$name is equivalent to x[["name", exact = FALSE]]. Also, the partial matching behavior of [[ can be controlled using the exact argument. — Base R Documentation

Written:

  • Mean =`r format(weight$mean, digits = 2)`
  • SD =`r format(weight$sd, digits =2)`

Automated Lists of Summary Stats (shwing!)

The really exciting thing about using lists is making them automatically with existing functions. Here a list of several summary values is created using the psych package. One could do something similar with summary or fivenum.

weight2 <- as.list(psych::describe(mtcars$wt))
str(weight2)
List of 13
 $ vars    : num 1
 $ n       : num 32
 $ mean    : num 3.22
 $ sd      : num 0.978
 $ median  : num 3.33
 $ trimmed : num 3.15
 $ mad     : num 0.767
 $ min     : num 1.51
 $ max     : num 5.42
 $ range   : num 3.91
 $ skew    : num 0.423
 $ kurtosis: num -0.0227
 $ se      : num 0.173

With far less code, it was possible to generate all of the same summary attributes along with additional ones like skew (0.42) and range (3.9). Normally, describe() returns a data frame, but it can be coerced to a list.

Written:

  • `r format(weight2$skew, digits = 2)`
  • `r format(weight2$range, digits = 2)`

Adding Items to Lists

Lists can be appended, which is really handy. Lists can also contain lots of different kinds of things: lists, lists of lists, and data frames. Lists are very flexible containers. Here, We run a Shapiro-Wilk normality test, identify the number of outliers, their row names, along with a data frame of outliers, and add all of these to the list.

weight2 <- append(weight2,
                  list(s_test = shapiro.test(mtcars$wt),
                       outliers_n = nrow(rstatix::identify_outliers(mtcars, wt)),
                       outliers = row.names(rstatix::identify_outliers(mtcars, wt)),
                       outliers_df = rstatix::identify_outliers(mtcars, wt)))
str(weight2)
List of 17
 $ vars       : num 1
 $ n          : num 32
 $ mean       : num 3.22
 $ sd         : num 0.978
 $ median     : num 3.33
 $ trimmed    : num 3.15
 $ mad        : num 0.767
 $ min        : num 1.51
 $ max        : num 5.42
 $ range      : num 3.91
 $ skew       : num 0.423
 $ kurtosis   : num -0.0227
 $ se         : num 0.173
 $ s_test     :List of 4
  ..$ statistic: Named num 0.943
  .. ..- attr(*, "names")= chr "W"
  ..$ p.value  : num 0.0927
  ..$ method   : chr "Shapiro-Wilk normality test"
  ..$ data.name: chr "mtcars$wt"
  ..- attr(*, "class")= chr "htest"
 $ outliers_n : int 3
 $ outliers   : chr [1:3] "Cadillac Fleetwood" "Lincoln Continental" "Chrysler Imperial"
 $ outliers_df:'data.frame':    3 obs. of  13 variables:
  ..$ mpg       : num [1:3] 10.4 10.4 14.7
  ..$ cyl       : num [1:3] 8 8 8
  ..$ disp      : num [1:3] 472 460 440
  ..$ hp        : num [1:3] 205 215 230
  ..$ drat      : num [1:3] 2.93 3 3.23
  ..$ wt        : num [1:3] 5.25 5.42 5.34
  ..$ qsec      : num [1:3] 18 17.8 17.4
  ..$ vs        : num [1:3] 0 0 0
  ..$ am        : num [1:3] 0 0 0
  ..$ gear      : num [1:3] 3 3 3
  ..$ carb      : num [1:3] 4 4 4
  ..$ is.outlier: logi [1:3] TRUE TRUE TRUE
  ..$ is.extreme: logi [1:3] FALSE FALSE FALSE

Now using this same list, which is still a single (compound) object in the environment we have many kinds of handy information about the variable, like everything needed to report a Shapiro test (W = 0.94, p=0.093), or note that there are 3 outliers in this variable. They are: Cadillac Fleetwood, Lincoln Continental, and Chrysler Imperial.

Written:

  • W = `r format(weight2$s_test$statistic, digits = 2)`
  • p=`r scales::pvalue(weight2$s_test$p.value)`
  • `r weight2$outliers_n`
  • `r knitr::combine_words(weight2$outliers)`

With relatively few lines of code, everything is tucked away nice and neatly in a single list. Each of the list objects can be accessed using the $ in a nested way.

Unfortunately, I can’t get the tab completion to work properly when writing inline, but a ctrl + 2 gets me to the console where the statement can be built, checked, and copied back into the editor window if the console is being glitchy.

Extending the Idea Beyond Inline Values (maybe?)

Tristan’s examples involved reporting linear mixed models, and he used the split() function in some interesting ways. I haven’t dug into that just yet, but I’m wondering about keeping related data frames in a list as another approach to simplifying both the environment and variable naming.

There are some compelling reasons to keep data frames in lists.

Continuing to work with the same list on weight, one might want to group by cylinders and then get the mean weight of the group. This involves making another data frame, one that would occupy another spot in the environment. Rather than assigning it to a variable, it can be added to a list.

weight2 <- append(weight2, list(df_mean_wt_by_cyl = mtcars %>% 
                       group_by(cyl) %>%
                       summarise(n = n(),
                                 mean_wt = mean(wt),
                                 sd_wt = sd(wt))))

Now that summary table is created, but within the list, and it can be called in a code chunk just like any other variable.

knitr::kable(weight2$df_mean_wt_by_cyl,
             caption = "A deeply insightful look at weight by cylinder")
A deeply insightful look at weight by cylinder
cyl n mean_wt sd_wt
4 11 2.285727 0.5695637
6 7 3.117143 0.3563455
8 14 3.999214 0.7594047

But I’m Used to Assigning Data Frames to Individual Variables

I’m accustomed to assigning data frames to variables, so putting them into lists is new to me. I generally do something like the following:

df1 <- tibble(id = 1:100,
              group = stringi::stri_rand_strings(100, 1, pattern = "[A-E]"),
              val = rnorm(100))
df2 <- df1 %>% group_by(group) %>% 
  summarise(mean = mean(val),
            sd = sd(val))

These could be manually placed in a list.

mylist <- list(df1, df2)

That is useful, and just required one additional line to add to a list. However, we are still making items in the environment, and if we wanted a clean environment then we would need to remove the temporary items. It would be better to create the objects directly into a list.

remove(df1,df2)

Fortunately, making items directly into a list is nearly identical to assigning them to an unlisted variable.

Rather than assigning each new data frame to its own stand alone variable, place it in the list with the $ character (or [[]]). If some of the data frames are dependent or built from another, those data can be accessed using the $.

Note that the first step is to create an empty list, because we cannot add items to a list that does not exist. Below we create an empty list and create the new data frames directly into it. The data frames aren’t added to the environment and don’t need to be removed.

mylist <- list()
mylist$df1 <- tibble(id = 1:100,
              group = stringi::stri_rand_strings(100, 1, pattern = "[A-E]"),
              val = rnorm(100))
mylist$df2 <- mylist$df1 %>% group_by(group) %>% 
  summarise(mean = mean(val),
            sd = sd(val))
sessionInfo()
R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_1.1.4

loaded via a namespace (and not attached):
 [1] pillar_1.9.0      compiler_4.2.1    tools_4.2.1       digest_0.6.35    
 [5] jsonlite_1.8.8    evaluate_0.23     lifecycle_1.0.4   tibble_3.2.1     
 [9] nlme_3.1-157      lattice_0.20-45   pkgconfig_2.0.3   rlang_1.1.1      
[13] psych_2.3.3       cli_3.6.2         rstudioapi_0.14   yaml_2.3.8       
[17] parallel_4.2.1    xfun_0.42         fastmap_1.1.1     withr_3.0.0      
[21] knitr_1.45        generics_0.1.3    vctrs_0.6.5       htmlwidgets_1.6.4
[25] grid_4.2.1        tidyselect_1.2.1  glue_1.7.0        R6_2.5.1         
[29] rstatix_0.7.2     fansi_1.0.6       rmarkdown_2.26    carData_3.0-5    
[33] purrr_1.0.2       tidyr_1.3.1       car_3.1-2         magrittr_2.0.3   
[37] scales_1.3.0      backports_1.4.1   htmltools_0.5.7   abind_1.4-5      
[41] mnormt_2.1.1      colorspace_2.1-0  utf8_1.2.4        stringi_1.8.3    
[45] munsell_0.5.0     broom_1.0.1      

Citation

BibTeX citation:
@online{craig2022,
  author = {Craig, Nathan},
  title = {Lists and {Inline} {Reporting}},
  date = {2022-03-13},
  url = {https://nmc.quarto.pub/nmc/posts/2022-03-13-lists-and-inline-reporting},
  langid = {en}
}
For attribution, please cite this work as:
Craig, Nathan. 2022. “Lists and Inline Reporting.” March 13, 2022. https://nmc.quarto.pub/nmc/posts/2022-03-13-lists-and-inline-reporting.