The purpose of this tutorial for our open lab is to show you some options to work with and efficiently present output from Bayesian models in article manuscripts: regression tables, regression plots, predicted probabilities, marginal effects, and others. While there are a number of dedicated R packages to generate similar output,1 this tutorial aims to give you the tools to produce this output on your own and customize it, with the hope that you can use this in your own applied work.

In the lab, please feel encouraged to work in groups! You can focus on whichever part of the lab seems the most useful to you, but we recommend starting with section 1 and attempting no more than two sections during one lab. We’ll be happy to assist with whatever question or problem you might have. When you leave this lab, you should be comfortable with fitting a regression model, perhaps with an interaction or on non-continuous outcome variables, and presenting the results in a clear manner.

You’re also welcome to work on Homeworks 5, 6, or 7 during this lab and pick the equivalent of different exercises according to the specific homework.

R code for each of the exercises are provided in each section; you can also refer to slides and code from the lecture for additional guidance. Ask us if you have questions about specific code.

This tutorial focuses on using JAGS and WinBUGS/OpenBUGS for fitting Bayesian models via R. However, everything in this tutorial can be generalized to MCMC output that you obtained from other software, including Stan, rstanarm, etc.

There is also a PDF version of this tutorial and the underlying R script containing all code.

If you find any errors in this document or have suggestions for improvement, please email me.

Working with BUGS/JAGS output

For most of the exercises in this lab (and your own analyses), you will want to work with BUGS/JAGS output directly. One way to get this output in a form that you can use for the applications below is to follow the steps below.

  1. Estimate the model, resulting in the JAGS/BUGS object angell.fit.
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 43
##    Unobserved stochastic nodes: 4
##    Total graph size: 266
## 
## Initializing model
  1. Convert your JAGS/BUGS output into an MCMC list. If you work with R2jags/R2WinBUGS/R2OpenBUGS, you can convert your jags/bugs object mymodel.fit into an MCMC list using as.mcmc().
  1. Convert the MCMC list into a matrix and then a data frame. In this matrix/dataframe, each row contains one iteration from your samples, and each column identifies one parameter (such as regression coefficients):
  1. If necessary, use the grep() function to easily extract the columns you will need for post processing. For instance, if you monitored a variety of parameters, but need only regression coefficients, which you named beta[1], beta[2], etc., you can do the following:
  • The brackets let you put conditions on which [rows, columns] you wish to select from the mymodel.dat data frame.
  • The grep() function creates a vector that lists all parts of the object x that match the specified pattern. The fixed argument ensures that the pattern is matched as-is.
##       beta[1]    beta[2]
## 1 -0.11474579 -0.1776192
## 2 -0.10589442 -0.1537442
## 3 -0.09147779 -0.1755540
## 4 -0.08662682 -0.1418562
## 5 -0.10034898 -0.2027007
## 6 -0.09993862 -0.2188148

Regression tables

Bayesian variants of the regression model don’t differ much in their interpretation from frequentist models—aside from how you interpret uncertainty around your estimate. Hence, if you want to present a regression table in your paper, a “Bayesian” regression table will look very similar to the “frequentist” regression tables you are all familiar with. For Bayesian models, you want to display the standard deviation of the posterior estimates in lieu of coefficient standard errors Alternatively, you may choose to display the 95% credible interval besides (or below) the posterior mean.

As before, you can use the mcmctab() function to create a table:

##   Variable  Median    SD   Lower   Upper
## 1    alpha  19.628 1.182  17.263  21.934
## 2  beta[1]  -0.107 0.017  -0.141  -0.072
## 3  beta[2]  -0.185 0.035  -0.252  -0.119
## 4 deviance 191.917 2.871 188.966 199.862

The function takes the following arguments:

  • sims: the posterior output from your model. mcmctab() automatically recognizes posterior distributions that were produced by R2jags, rjags, R2WinBUGS, R2OpenBUGS, MCMCpack, rstan, and rstanarm.
  • ci: desired level for credible intervals; defaults to 0.95, i.e. a 95% credible interval
  • pars: character vector of parameters to be printed; defaults to NULL (all parameters are printed)
  • Pr: print the percent of posterior draws that have the same sign as the median of the posterior distribution; defaults to FALSE

You can include the table in an RMarkdown document:

Summary of posterior estimates.
Variable Median Std. Dev. Lower 95% CI Upper 95% CI
Heterogeneity -0.11 0.02 -0.14 -0.07
Mobility -0.18 0.04 -0.25 -0.12

Alternatively, you can create a Word version using flextable::flextable. Install the “flextable” and “officer” packages first. The flextable vignette offers guidance on customization options. The following command will create the Word document bayes_m1_table.docx in your working directory:

## [1] "/Users/johanneskarreth/Documents/Dropbox/Uni/9 - ICPSR/2019/Applied Bayes/Course materials/Labs/5 - Model presentation/angell_table.docx"

You can also export your table to LaTeX using the the xtable package. The following command will create the LaTeX table angell_table.tex in your working directory:

##        Variable Median    SD  Lower  Upper
## 1 Heterogeneity -0.107 0.017 -0.141 -0.072
## 2      Mobility -0.185 0.035 -0.252 -0.119

Regression dot plots

Coefficient dot plots (or caterpillar plots) are a more intuitive way to present regression results. The reader can evaluate the size and significance of relationships more easily, and if the coefficients are on a comparable scale, dot plots make it more convenient to compare the size of relationships or effects.

You have a variety of options to generate these plots, some of which you’ve seen in Lab 3/4. Here are a few:

Because the resulting plot is a ggplot2 object, you can modify it as you see fit:

You can also manually create this plot or a similar version, working with the data frame angell.fit.dat that you’ve created from the posterior (MCMC) output of your model object. Note that I’m using the dplyr::select() function here and refer to beta[1] and beta[2] within apostrophes to accommodate the square brackets.

Next, I convert the posterior draws into a long dataframe for plotting:

##             key       value
## 1 Heterogeneity -0.11474579
## 2 Heterogeneity -0.10589442
## 3 Heterogeneity -0.09147779
## 4 Heterogeneity -0.08662682
## 5 Heterogeneity -0.10034898
## 6 Heterogeneity -0.09993862

I can use the ggridges package to create density plots of each posterior coefficient:

Alternatively, I could summarize the data and recreate the dot-and-whisker plot from above:

Conditional effects: interactions

Conditional effects from interaction terms are best presented graphically (Brambor, Clark, and Golder 2006). In a frequentist regression model, you can calculate the conditional effect of \(x_1\) across the range of a moderator \(x_2\) by using the following equations for the effect and its variance

\[ \frac{\partial y}{\partial x_1} = \widehat{\beta_1} + \widehat{\beta_3} \times x_2 \]

\[ \sigma^2 = \text{var}(\widehat{\beta_1}) + x_2 \times \text{var}(\widehat{\beta_2}) + 2 \times x_2 \times \text{cov}(\widehat{\beta_1} \widehat{\beta_3}) \]

In a Bayesian model, you can take advantage of the posterior variance to calculate the variance of the effect. The results should be virtually identical.

Below, I use example data from Dave Armstrong’s DAMisc package. The code below is also on my Github page.

##           y         x1          x2           z
## 1 -4.010072  0.2027309  1.41113326 -0.01256869
## 2  2.011219 -0.4000385 -0.76709469  1.05371939
## 3 -4.188976 -1.4077535  0.78765992 -0.74397566
## 4  3.780981  1.6701312  1.42516111 -0.35803597
## 5  5.882727  1.1879621 -0.01225183 -0.52016632
## 6 -6.129506  1.0175018  2.21588900  0.56806248

Estimate the frequentist model with an interaction:

## 
## Call:
## lm(formula = y ~ x1 + x2 + x1 * x2, data = InteractionEx)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.7506 -1.4777 -0.0852  1.4432  6.5931 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.03030    0.10790   9.549  < 2e-16 ***
## x1           1.92826    0.11812  16.324  < 2e-16 ***
## x2          -2.95627    0.11740 -25.181  < 2e-16 ***
## x1:x2        0.41002    0.08386   4.889 1.37e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.137 on 496 degrees of freedom
## Multiple R-squared:  0.5703, Adjusted R-squared:  0.5677 
## F-statistic: 219.5 on 3 and 496 DF,  p-value: < 2.2e-16

Look at var-cov matrix (for later):

##               (Intercept)           x1            x2        x1:x2
## (Intercept)  0.0116417541 -0.000137230  0.0007508126 -0.004174858
## x1          -0.0001372300  0.013953237 -0.0081416754 -0.000183120
## x2           0.0007508126 -0.008141675  0.0137828625 -0.000161990
## x1:x2       -0.0041748579 -0.000183120 -0.0001619900  0.007033174

Estimate the Bayesian model with an interaction:

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 500
##    Unobserved stochastic nodes: 5
##    Total graph size: 3508
## 
## Initializing model

Process the MCMC output:

Simulate the range of the moderating variable:

Calculate conditional effect of X1 across the range of X2, first using the Bayesian estimates:

Note: the variance now comes from the posterior, not the vcov matrix.

Calculate conditional effect and SD for the frequentist estimates (cf. Brambor, Clark, and Golder (2006)):

Combine both estimates into one dataframe for plotting:

Plot both conditional effects:

Predicted probabilities

For nonlinear models, predicted probabilities are usually an intuitive option to present interpretable results to readers. The most straightforward way to use them is to plot the predicted probability of the outcome across the simulated range of a predictor while holding all other covariates constant at typical values (the average-case approach) or across all observations (the observed-values approach). To do this using the output of a Bayesian model for binary outcomes, follow the steps below, which are also documented in the slides for Day 9 (available on MBox).

First, I re-estimate the logistic regression of voting that you saw on Day 9:

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 2713
##    Unobserved stochastic nodes: 7
##    Total graph size: 23384
## 
## Initializing model

Predicted probabilities “by hand”

##      constant age female education income demsatis information
## [1,]        1  51      1         6      3        0           2
## [2,]        1  51      1         6      3        1           2
## [3,]        1  51      1         6      3        2           2
## [4,]        1  51      1         6      3        3           2
## [5,]        1  51      1         6      3        4           2
##           b[1]       b[2]        b[3]      b[4]      b[5]       b[6]
## [1,] -3.578523 0.03540421 -0.31104627 0.2599117 0.2211951 -0.1744231
## [2,] -2.784095 0.03368484 -0.23428892 0.1734164 0.2019026 -0.2019965
## [3,] -2.731465 0.03244011 -0.12673791 0.1209164 0.2097966 -0.1848315
## [4,] -2.770216 0.03274643 -0.07796306 0.1280757 0.2103883 -0.1803071
## [5,] -2.537810 0.03231998 -0.30074858 0.1521294 0.1884328 -0.1923877
## [6,] -2.625810 0.03627861 -0.21035067 0.1092436 0.2503272 -0.2332926
##           b[7]
## [1,] 0.5482789
## [2,] 0.4652988
## [3,] 0.5642639
## [4,] 0.5195990
## [5,] 0.5032192
## [6,] 0.4573557
## [1] 2000    5
## # A tibble: 5 x 4
##   Dissatisfaction median_pp lower_pp upper_pp
##   <chr>               <dbl>    <dbl>    <dbl>
## 1 0                   0.780    0.751    0.806
## 2 1                   0.747    0.724    0.770
## 3 2                   0.712    0.685    0.737
## 4 3                   0.673    0.633    0.712
## 5 4                   0.632    0.575    0.687

Instead of implementing these steps line-by-line, you could also use one of two functions I wrote. Both are available on my Github page.

Generate predicted probabilities for a simulated “average” case

The following approach (which you also saw on Day 9) follows the practice in King, Tomz, and Wittenberg (2000) and calculates the predicted probability for each value of a focal predictor with all other predictors set to one “typical” value (usually a mean or a median). This is similar to the effect() function in R or clarify or margins in Stata.

The relevant function is called MCMC_simcase_probs(). It can be read into R:

The function requires a few arguments:

  • model_matrix: model matrix, including intercept. Create with model.matrix(formula, data)
  • mcmc_out: posterior distributions of all coefficients in matrix form - can easily be created from rstan, MCMCpack, R2jags, R2OpenBUGS, etc.
  • x_col: column number of the explanatory variable for which to calculate associated Pr(y = 1)
  • x_range_vec: name of the vector with the range of relevant values of the explanatory variable for which to calculate associated Pr(y = 1)
  • link: link function, character vector set to “logit” (default) or “probit”
  • lower: lower percentile (default: 5th) for credible interval of predicted probabilities
  • upper: upper percentile (default: 95th) for credible interval of predicted probabilities

Note: as with every R function, you can type the name of the function into the console to see what the function actually does. If you take a close look at MCMC_simcase_probs, you will see that the function mirrors the steps I just took above “by hand”. Give it a try!

Some of the objects to feed to these arguments are best created before using the function:

Now, you can call the function and generate predicted probabilities across the range of the information variable:

The resulting object is a data frame:

## # A tibble: 5 x 4
##   predictor median_pp lower_pp upper_pp
##       <int>     <dbl>    <dbl>    <dbl>
## 1         0     0.780    0.746    0.811
## 2         1     0.747    0.720    0.773
## 3         2     0.712    0.679    0.742
## 4         3     0.673    0.625    0.719
## 5         4     0.632    0.563    0.699

You may then plot the predicted probabilities for easier communication:

Generate average predicted probabilities for all observed cases

The following approach (which you haven’t seen yet) follows the points raised by Hanmer and Kalkan (2013) that the previous approach is somewhat limited in exploring the full range of cases. In the words of the authors, “this approach creates a weaker connection between the results and the larger goals of the research enterprise … rather than seeking to understand the effect for the average case, the goal is to obtain an estimate of the average effect in the population.

For a very clear exposition of the intuition behind this approach, please read Hanmer and Kalkan (2013).

I wrote a function implementing this approach. It is called MCMC_observed_probs() and can be read into R:

The function requires the same arguments as MCMC_simcase_probs():

  • model_matrix: model matrix, including intercept. Create with model.matrix(formula, data)
  • mcmc_out: posterior distributions of all coefficients in matrix form - can easily be created from rstan, MCMCpack, R2jags, R2OpenBUGS, etc.
  • x_col: column number of the explanatory variable for which to calculate associated Pr(y = 1)
  • x_range_vec: name of the vector with the range of relevant values of the explanatory variable for which to calculate associated Pr(y = 1)
  • link: link function, character vector set to “logit” (default) or “probit”
  • lower: lower percentile (default: 5th) for credible interval of predicted probabilities
  • upper: upper percentile (default: 95th) for credible interval of predicted probabilities

Because we already created these objects above, we can move directly to calling the function. Note that this function can be quite slow as it calculates probabilities for \(n \times k \times z\) cases, where \(n\) is the number of observations in the data, \(k\) is the number of posterior draws, and \(z\) is the number of distinct values of the simulated variable.

The resulting object is again a data frame:

## # A tibble: 5 x 4
##   predictor median_pp lower_pp upper_pp
##       <dbl>     <dbl>    <dbl>    <dbl>
## 1         0     0.735    0.714    0.756
## 2         1     0.704    0.690    0.719
## 3         2     0.671    0.653    0.689
## 4         3     0.637    0.605    0.667
## 5         4     0.601    0.553    0.645

You may then plot the predicted probabilities for easier communication:

Comparing the two side by side is useful to adjudicate the sensitivity of the results to either approach:

First differences

As you saw in the slides for Day 9, you can also easily use the posterior distribution to calculate first differences, i.e. the difference in predicted probabilities for two types of cases - one with low and one with high values in a predictor of interest.

First differences “by hand”

First, here is the step-by-step code similar to what you saw in the slides. I calculate here the first difference in Pr(Voted) between men and women.

You can now plot the posterior distribution of the difference in voting propensity between men and women:

Using a function to calculate first differences

To facilitate the calculation and plotting of first differences for all explanatory variables, I’ve written a two functions. You can access them here:

Both are convenience functions and not fully tested, so be careful when using them. Here is how MCMClogit.fd.mat() works - it requires the following arguments:

  • model_matrix: model matrix, including intercept. Create with model.matrix(formula, data)
  • mcmc_out: posterior distributions of all coefficients in matrix form - can easily be created from rstan, MCMCpack, R2jags, R2OpenBUGS, etc.
  • credint: the desired credible interval for the first differences; defaults to c(0.05, 0.95)
  • percentiles: the percentiles of each explanatory variable to compare; defaults to c(0.25, 0.75) for continuous variables. Binary variables are automatically set to 0 and 1. Note: for explanatory variables that are categorical with few values (e.g., 1, 2, 3, 4) and do not vary much across categories, the default c(0.25, 0.75) may return a first difference estimate of 0. This is because in that case, the 25th and 75th percentile of the variable are identical. You need to expand the range of percentiles in that case, e.g. to c(0.1, 0.9). I will eventually update the function to handle this case better. (And of course, you’re always welcome to fork the repository on Github and rework the function yourself!)
  • full_sims: should the function return the full distribution of first differences or only the median and credible interval? Defaults to FALSE.

MCMCprobit.fd.mat() works the same, but uses the probit link function and should therefore be used for Bayesian probit models.

The resulting first differences can then be plotted:

Alternatively, you can plot the full density of the distribution of each first difference:

Predicted probabilities after ordered or multinomial logit/probit models

To generate similar figures, please refer to the slides for Day 10. Here are some examples of useful presentations of predicted probabilities across the range of predictors.

Probability of each outcome in an ordered logistic regression.

Probability of each outcome in an ordered logistic regression.

Probability of each outcome in an ordered logistic regression.

Probability of each outcome in an ordered logistic regression.

Differences in the probability of each choice in a multinomial logistic regression.

Differences in the probability of each choice in a multinomial logistic regression.

Differences in the probability of each choice in a multinomial logistic regression.

Differences in the probability of each choice in a multinomial logistic regression.

Differences in the probability of each choice in a multinomial logistic regression.

Differences in the probability of each choice in a multinomial logistic regression.

Proportional reduction of error

One (imperfect) measure of model fit for categorical data (logit, ordered logit, multinomial logit) is to compare the predicted categories for each observation to the modal prediction of the model (i.e. a model that predicts for each observation the category matching the modal outcome in the observed data). When you fit a Bayesian model, you can obtain a posterior distribution for this measure of proportional reduction of error (PRE). On how to calculate the PRE, read (for instance) section 4 in Herron (1999). Comparing the distributions of the PREs of nested models than allows introducing (un)certainty to making statements on comparing model fits.

Per the slides from Day 9, you can first calculate the percent of observations that the model predicts correctly:

  • If \(\widehat{\text{Pr(y)}} > 0.5\), score as \(\widehat{\text{y}} = 1\)
  • If \(\widehat{\text{Pr(y)}} \leq 0.5\), score as \(\widehat{\text{y}} = 0\)
  • For each posterior draw, calculate the ratio of \(\widehat{\text{y}} = \text{y}\) over \(\widehat{\text{y}} \neq \text{y}\)

Recall that for this procedure, you need to monitor each observation’s p[i], the individual predicted probability from the model.

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 2713
##    Unobserved stochastic nodes: 7
##    Total graph size: 23384
## 
## Initializing model
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.7221  0.7298  0.7309  0.7309  0.7324  0.7372

To calculate the proportional reduction in error, refer to the PRE’s definition:

\[ \text{PRE} = \frac{\text{PCP} - \text{PMC}}{\text{1 - PMC}} \]

where PMC is the percent correctly predicted by the modal outcome, here 0.6951714:

## 
##         0         1 
## 0.3048286 0.6951714

The (distribution of!) the PRE is then:

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.08827 0.11366 0.11729 0.11729 0.12213 0.13785

References

Brambor, Thomas, William R. Clark, and Matt Golder. 2006. “Understanding Interaction Models: Improving Empirical Analyses.” Political Analysis 14 (1): 63–82. https://doi.org/10.1093/pan/mpi014.

Hanmer, Michael J., and Kerem Ozan Kalkan. 2013. “Behind the Curve: Clarifying the Best Approach to Calculating Predicted Probabilities and Marginal Effects from Limited Dependent Variable Models.” American Journal of Political Science 57 (1): 263–77. https://doi.org/10.1111/j.1540-5907.2012.00602.x.

Herron, Michael C. 1999. “Postestimation Uncertainty in Limited Dependent Variable Models.” Political Analysis 8 (1): 83–98. https://doi.org/https://doi.org/10.1093/oxfordjournals.pan.a029806.

King, Gary, Michael Tomz, and Jason Wittenberg. 2000. “Making the Most of Statistical Analyses: Improving Interpretation and Presentation.” American Journal of Political Science 44 (2): 347–61. http://www.jstor.org/stable/2669316.


  1. Examples: tidybayes (the most comprehensive so far), bayesplot, bayestable, ggmcmc, sjstats, and sjPlot.