Lab 1
Logistics
Setup (please do before the tutorial if possible)
In preparation for the tutorial, it would be great if you could install the following on your computer:
If you encounter problems, feel free to come by at the office hour (4pm), or to post the issue on Piazza.
References on java
- Learning the Java Language (you can skip the following topics on first reading: nested classes, annotations, generics)
- Collections (you can skip the following topics on first reading: Aggregate Operations, custom implementation, Interoperability)
- Notes from last year
But be assured that we are there to answer any questions you may have!
Setup at the beginning of the tutorial
At the beginning of the tutorial, you will able to clone the repository simplesmc-scaffold using
git clone git@github.com:alexandrebouchard/simplesmc-scaffold.git
.
Then, follow these steps:
- Type
gradle eclipse
from the root of the repository - From eclipse:
Import
inFile
menuImport existing projects into workspace
- Select the root
- Deselect
Copy projects into workspace
to avoid having duplicates
High performance posterior inference with SMC
We will cover a prefix of the following (another lab can be organized to cover the rest if there is interest):
- Efficient resampling
- Modular design of proposal distributions
- Parallel and correct SMC implementation
- PMCMC
Efficient resampling
Wee src/test/java/simplesmc/resampling/TestPopulationResampling.java
for instructions.
Modular design of proposal distributions
Eventually, we would like to apply our SMC algorithm to complex problems. However, to ensure correctness of the implementation, it is useful to first attack an easy problem (fully discrete HMM), where we can compute the data likelihood Z analytically (to check SMC's Z estimate converges to it). But how to make sure we do not introduce bugs when we then adapt the code back to a complex model?
The answer is to use interfaces
, which will allow us to decouple the interaction of the SMC algorithm with the details of the problems we are trying to solve.
Look at src/main/java/simplesmc/ProblemSpecification.java
for the interface we will be using in our algorithm. Read the description of each method (function that will be associated with objects).
Now make src/main/java/simplesmc/hmm/HMMProblemSpecification.java
implement this interface (using the keyword implements after the class name, followed by the name of the interface, ProblemSpecification
). This will enforce the implementation of the correct methods (right click, select Source - Override/Implement methods). Fill in the implementation by using the HMM's transition as the proposal, and the HMM's emission as you weight update.
After the assignment, if you want to use your algorithm for your own problems/project, you will just have to repeat these steps for your new problem.
Parallel and correct SMC implementation
We will now complete the implementation of SMC, in src/main/java/simplesmc/SMCAlgorithm.java
.
Start by implementing a serial version of the proposal. To make it parallel, use BriefParallel.process(..)
. See an example at http://github.com/alexandrebouchard/briefj#briefparallel. To take into account a required library dependency update pushed to the scaffold on April 10, you may need to pull the latest version of the git repo, and then download the dependencies with gradle eclipse
, and finally refresh eclipse.
To test your implementation, run the test in src/test/java/simplesmc/TestSMC.java
. Your implementation should agree with the analytic version.
Also create a new test to ensure that the code outputs the same thing when ran twice with the same seed. This is especially important in the concurrent version of the algorithm.
PMCMC
Note: make sure to use git pull to pick up an additional scaffold file created on April 9.
We will now use PMCMC to resample the value of the global parameter selfTransitionProbability. Let us place a uniform prior on this parameter.
We will use artificial data in this question. You can find in the class HMMUtils
a function generate(..)
, which creates synthetic data. We will be using an held-out value of 0.8
for the parameter selfTransitionPr, and a two-states HMM of length 100
.
You have two possible avenues here. You can either write your own Metropolis-Hastings proposal that directly calls the SMC implementation you have completed in the previous part.
Alternatively, you use our home-brewed probabilistic programming blang. The main file, TestPMCMC
has been written for you. It creates a model declaratively, and provides some explanations on blang. The main thing left to implement is PMCMCFactor
, which in logDensity()
should delegate likelihood approximation to the existing SMC class. The main tricky part is to ensure that the SMC is not reran when a parameter is rejected. To do this, we need some way to detect changes in the parameters. This is mediated by the cache
object, which indexes configurations by calling the signature()
method on the parameters (you will have to make HMMProblemSpecification
implement WithSignature
).
Going further
Some ideas for additional things to try if you are curious:
- Create a new command line option to set an ess threshold, and only perform resampling when the ess goes below this threshold.
- Add a command line for the type of resampling scheme as well.
- Create a new ProblemSpecification containing a more challenging domain.