How to empirically evaluate the performance of MCMC methods

How to investigate the performance of MCMC methods using empirical methods? E.g. for the purpose of
- Comparing two algorithms
- Understanding how algorithms scale with number of data points/parameters/parallelization
Most common measure of sampling quality: Effective Sample Size
- Intuition: the number of independent samples which would give similar Monte Carlo approximation error as your MCMC output
- Based on Central Limit Theorems for Markov chains
- To estimate ESS from MCMC output: best method out there is the batch mean method
  - https://projecteuclid.org/euclid.aos/1266586622
  - R package
- To get the full picture:
  - Use ESS per unit of time (e.g. time to run the software, or number of target evaluation if comparable across methods)
  - Use log-log plot to approximate scaling trends (as in https://www.stat.ubc.ca/~bouchard/courses/stat520-sp2020-21/T9-consistency.html#(7))
Other methods:
- Looking at convergence of Monte Carlo averages
- https://arxiv.org/abs/1703.01717 and https://arxiv.org/abs/1611.06972
- https://arxiv.org/pdf/1712.06006.pdf
- In the context of parallel tempering, round trip rates (see e.g. https://arxiv.org/pdf/1905.02939.pdf)

Evaluation of MCMC methods